Teaming Up with Artificial Intelligence

Creating a good artificial intelligence (AI) user experience is not easy. Everyone who uses autocorrect while writing knows that while the system usually does a pretty good job of acthing and correcting errors, it sometimes makes bizarre mistakes. The same is true for the autopilot in a Tesla, but unfortunately the stakes are much higher on the road than when sitting behind a computer.

Daniel S. Weld of the University of Washington in Seattle has done a lot of research on human-AI teams. Last year, he and a group of colleagues from Microsoft proposed 18 generally applicable design guidelines for human-AI interaction, which were validated through multiple rounds of evaluation.

Bennie Mols interviewed Weld about the challenges of building a human-AI dream team:

What makes a human-AI team different from a team of a human and a digital system without AI?

First of all, AI systems are probabilistic: sometimes they get it right, but sometimes they err. Unfortunately, their mistakes are often unpredictable. In contrast, classical computer programs, like spreadsheets, work in a much more predictable way.

Second, AI can behave differently in subtly different contexts. Sometimes the change in context isn't even clear to the human. Google Search might give different auto-suggest results to different people, based on their previous behavior, which was different.

The third important difference is that AI systems can change over time, for example through learning.

How did your research team arrive at the guidelines for human-AI-interaction?

We started by analyzing 20 years of research on human-AI interaction. We did a user evaluation with 20 AI products and 50 practitioners, and we also did expert reviews. This led to 18 guidelines divided over four phases of the human-AI interaction process: the initial phase, this is before the interaction has started; the phase during interaction; the phase after the interaction, in case the AI system made a mistake; and finally, over time. During the last phase, the system might get updates, while humans might evolve their interaction with the system.

Can you offer three guidelines that you consider particularly important, which are needed to build a human-AI dream team?

From the initial phase, I'd choose the guideline that the AI must set expectations correctly; explain very well what the system can do and what it can't do.

From the phase 'when wrong', I would pick the guideline that the AI should support efficient correction; make it easy to edit, refine, or recover when the AI system makes a mistake. A simple example is my e-mail program; it automatically suggests a couple of responses to every e-mail that I receive. If the suggestion is completely wrong, I can ignore it. If the suggestion is only slightly wrong, I can still select one of the suggestions and then edit it into what I want. This might seem obvious, but it took people many years to invent this design pattern for supporting efficient correction.

Microsoft Clippy, introduced in Microsoft Office 97, is a famous example of an AI that was intended to help users but did so in a very annoying way; if Clippy made a mistake, it was hard to correct or dismiss it.

The third important guideline I would select is that over time, the AI system should learn from user behavior. Nothing is more frustrating to me than a program that doesn't learn from my interaction in order to make itself better. For example, whenever I paste text in PowerPoint, I select 'Paste Special > Keep Only Text' rather than the default, which pastes formatting. It's an extra 3-4 clicks per paste. Why hasn't PowerPoint learned that after a decade of interaction?

What is an example of a situation in which the human-AI team performs much better than either the human or the AI alone?

By combining human and AI in identifying metastatic treats cancer in a 2016 test, the percentage of errors went down about 85%, as compared to the best human doctor. In our studies of human-AI teams, we have seen many more examples of this. We get the most benefit if the AI on average is much better than the human, and every year we see this in more and more domains.

How is it possible that sometimes human-AI performance goes down while the AI improves?

Let's take the example of the Tesla autopilot, which sometimes updates itself. Let's say that before the update, it worked with 100% accuracy on the highway and with 70% accuracy on winding roads. Let's now imagine that the update improves the accuracy on the winding roads to 95%, but at the same time the accuracy on the highway goes a bit down to 95%. Suddenly the team could have accidents on the highway, where the human was already used to trusting the AI completely. And the extra benefit on the winding roads doesn't really matter, because humans were already used to monitoring and overriding the autopilot on such roads. In a number of our studies we have seen similar cases happening in human-AI interaction.

Does it improve the human-AI team performance if the AI can explain its outcomes?

The literature suggests that it does, in those cases where the accuracy of the AI is much higher than the accuracy of the human. However, we have studied cases in which the accuracy of human and AI are about the same, and here the situation is more complex. Overall, explanations tend to make the human trust the AI, regardless of whether the AI is right or wrong, so explanations help the team when the AI is correct, but can be harmful when the AI makes a mistake. Therefore it's particularly important that we seek explanation techniques that better show the human when not to trust the AI system.

Bennie Mols is a science and technology writer based in Amsterdam, the Netherlands.