Modeling the kind of rapport that can be built up between human friends has recently been adapted to embodied conversational agents (ECAs), a major goal of David Novick at the University of Texas at El Paso (UTEP) and his Advanced aGent ENgagement Team (AGENT).
"David Novick is an innovative researcher who is advancing the field of artificial intelligence, one virtual agent at a time," said Stefan Scherer of the University of Southern California. "As demo chair of the ACM International Conference of Multimodal Interaction (ICMI 2015), I saw great merit in his research and awarded "Survival on Jungle Island" with the Outstanding Demonstration Award for its compelling storytelling and multimodal interactive capabilities."
Novick's group combined speech recognition with body language to make the human-agent interaction responsive and realistic. In order to perfect the engagement and rapport between human and agent, the team used a multi-modal interactive experience, requiring the coordination of many moving parts.
The team that achieved that goal included:
- post-doctoral researcher Ivan Gris, whose dissertation used the game "Survival on Jungle Island" for a study of human-agent gesture interaction;
- Jacqueline Brixey, whose master’s thesis used "Survival on Jungle Island" to study the effect on rapport of introverted versus extroverted language;
- doctoral candidate and National Science Foundation Graduate Research Fellow Adriana Camacho;
- Master’s candidates Alex Rayon and Laura Rodriguez;
- undergraduates Alfonso Peralta, Victoria Bravo, Yahaira Reyes, Paola Gallardo, Timothy Gonzale, and Chelsey Jurado;
- now-graduated baccalaureates Diego Rivera, Joel Quintana, and Anuar Jauregui;
- French Air Force Academy cadets Guillaume Adoneth and David Manuel,
- and El Paso high school students Brynne Blaugrund and Nick Farber.
Said Novick, "The point of the underlying research is primarily to study paralinguistics in human-agent interaction—the role that behaviors such as turn-taking, gaze, gesture, and prosody [language cadence and intonation] play in coordinating and building meaning in conversations."
In other words, the difference between run-of-the-mill agents and Novick's group's agents is that they go beyond linguistics (words and their meaning) in their human-agent interactions—trying to mimic interactions between real people, especially those trying to build rapport, such as co-workers, club members, or platoon-mates. In order to go beyond words, their agents use three-dimensional (3-D) vision to analyze posture, gestures, and other body-language cues, along with advanced speech recognition to determine prosody from which personality traits can be inferred, all aimed at building positive rapport between the human and the agent.
The game begins with a prerecorded sequence of a shipwreck, to provide the players with the context by which they have arrived on "Adriana's island." Two "tutorial" scenes provide details on how player and agent can interact with each other from within the immersive experience. By the end of scene three, the player and agent have gotten to know each other, with the player receiving the impression Adriana is listening, watching, and responding to the player’s words and body language; Adriana has also provided an explanation for how she has survived in the jungle before the player’s arrival. The story continues based on what the player decides to do next, using speech input, speech output, gesture input, gesture output, scenery, triggers, and decision points, which can carry on for as long as 60 minutes (there are 23 scenes in all). Personalized interactions include "high fives," as well as learning tasks like spear fishing. In the final scene, the pair are rescued by a helicopter.
This multi-modal approach enables the agent to perceive the human in a manner similar to how the human perceives the agent. To achieve that, Novick's group used off-the-shelf hardware — including Microsoft's Kinect 3-D camera and the Windows Speech software development kit, synchronized to provide the agent the information it needs from the human to respond with appropriately complex behaviors. They also used the Unity 4 game engine and Mecanim system to provide the foundation for the agent's extensive animations.
The real magic was the "middleware," as Novick calls it, which interprets input from the hardware, then provides commands to the rendering software controlling the animations.
"We wrote our own middleware to interpret the human's gestures and to produce the agent’s gestures, mostly dealing with upper body motions," said Novick. "We then built a knowledge base and authored scenes creating the interactive experience using XML [extensible markup language] interpreted by our middleware."
Novick has been working on agents for 28 years, but began in earnest about four years ago. Before "Survival on Jungle Island," Novick's group built "Escape from the Castle of the Vampire King." Next, he plans to up the ante again by using an agent that can betray the user in "Gods of the Neon City," a game in which humans must build a positive rapport with the agent in order to not be betrayed.
"This project is imaginative and engaging, bringing together state-of-the-art sensing and graphics technologies with storytelling," said Louis-Philippe Morency at Carnegie Mellon University, program chair of ICMI 2014.
Novick offers a look at the group and its work in this video clip.
R. Colin Johnson is a Kyoto Prize Fellow who has worked as a technology journalist for two decades.