Home → Magazine Archive → April 2004 (Vol. 47, No. 4) → -nspoken Rules of Spoken Interaction → Full Text

-nspoken Rules of Spoken Interaction

By Timothy W. Bickmore

Communications of the ACM, Vol. 47 No. 4, Pages 38-44

Save PDF

Our face-to-face interactions with other people are governed by a complex set of rules, of which we are mostly unaware. For decades now, social scientists have been unraveling the threads of face-to-face interaction, investigating everything from descriptions of body posture used to indicate interest in starting a conversation, to eye gaze dynamics used to convey liking or disliking, to the myriad ways that language can convey attitude, social status, relationship status, and affective state. Even though we are not always aware of them, these rules underpin how we make sense of and navigate in our social world. These rules may seem uninteresting and irrelevant to many computer scientists, but to the

extent that a given interaction rule is universally followed within a user population, it can be profitably incorporated into a human-machine interface in order to make the interface more natural and intuitive to use. Computers without anthropomorphic faces and bodies can (and already do) make use of a limited range of such rulessuch as rules for conversational turn-taking in existing interfacesbut one kind of interface has the potential to make explicit, maximal use of these rules: embodied conversational agents (ECAs).

ECAs are animated humanoid computer characters that emulate face-to-face conversation through the use of hand gestures, facial display, head motion, gaze behavior, body posture, and speech intonation, in addition to speech content [5]. The use of verbal and nonverbal modalities gives ECAs the potential to fully employ the rules of etiquette observed in human face-to-face interaction. ECAs have been developed for research purposes, but there are also a growing number of commercial ECAs, such as those developed by Extempo, Artificial Life, and the Ananova newscaster. These systems vary greatly in their linguistic capabilities, input modalities (most are mouse/text/speech input only), and task domains, but all share the common feature of attempting to engage the user in natural, full-bodied (in some sense) conversation.

Social scientists have long recognized the utility of making a distinction between conversational behaviors (surface form, such as head nodding) and conversational function (the role played by the behavior, such as acknowledgement). This distinction is important if general rules of interaction are to be induced that capture the underlying regularities in conversation, enabling us to build ECA architectures that have manageable complexity, and that have the potential of working across languages and cultures. This distinction is particularly important given that there is usually a many-to-many mapping between functions and behaviors (for example, head nodding can also be used for emphasis and acknowledgment can also be indicated verbally).

Although classical linguistics have traditionally focused on the conveying of propositional information, there are actually many different kinds of conversational function. The following list reviews some of the functions most commonly implemented in ECAs and examines their range of conversational functions and associated behaviors:

Propositional functions of conversational behavior involve representing a thought to be conveyed to a listener. In addition to the role played by speech, hand gestures are used extensively to convey propositional information either redundant with, or complementary to, the information delivered in speech. In ECA systems developed to date, the most common kind of hand gesture implemented is the deictic, or pointing gesture. Steve [10], the DFKI Persona [1], and pedagogical agents developed by Lester et al. [7], use pointing gestures that can reference objects in the agent's immediate (virtual or real) environment.

Interactional functions are those that serve to regulate some aspect of the flow of conversation (also called "envelope" functions). Examples include turn-taking functions, such as signaling intent to take or give up a speaking turn, and conversation initiation and termination functions, such as greetings and farewells (used in REA, see pevious page). Other examples are "engagement" functions, which serve to continually verify that one's conversational partner is still engaged in and attending to the conversation, as implemented in the MEL robotic ECA [11]. Framing functions (enacted through behaviors called "contextualization cues") serve to signal changes in the kind of interaction taking place, such as problem-solving talk versus small talk versus joke-telling, and are used in the FitTrack Laura ECA (see "Managing Long-Term Relationships with Laura.")

Attitudinal functions signal liking, disliking, or other attitudes directed toward one's conversational partner (as one researcher put it, "you can barely utter a word without indicating how you feel about the other"). One of the most consistent findings in this area is that the use of nonverbal immediacy behaviorsclose conversational distance, direct body and facial orientation, forward lean, increased and direct gaze, smiling, pleasant facial expressions and facial animation in general, head nodding, frequent gesturing, and postural opennessprojects liking for the other and engagement in the interaction, and is correlated with increased solidarity [2]. Attitudinal functions were built into the FitTrack ECA so it could signal liking when attempting to establish and maintain working relationships with users, and into the Cosmo pedagogical agent to express admiration or disappointment when students experienced success or difficulties [7].

Etiquette rules often serve as coordination devices and can be seen as enacting an interactional function.

Affective display functions. In addition to communicating attitudes about their conversational partners, people also communicate their overall affective state to each other using a wide range of verbal and nonverbal behaviors. Although researchers have widely differing opinions about the function of affective display in conversation, it seems clear it is the result of both spontaneous readouts of internal state and deliberate communicative action. Most ECA work in implementing affective display functions has focused on the use of facial display, such as the work by Poggi and Pelachaud [8].

Relational functions are those that either indicate a speaker's current assessment of his or her social relationship to the listener ("social deixis"), or serve to move an existing relationship along a desired trajectory (for example, increasing trust, decreasing intimacy, among others). Explicit management of the ECA-user relationship is important in applications in which the purpose of the ECA is to help the user undergo a significant change in behavior or cognitive or emotional state, such as in learning, psychotherapy, or health behavior change [3]. Both REA and Laura were developed to explore the implementation and utility of relational functions in ECA interactions.

While it is easiest to think of the occurrence (versus non-occurrence) of a conversational behavior as achieving a given function, conversational functions are often achieved by the manner in which a given behavior is performed. For example, a gentle rhythmic gesture communicates a very different affective state or interpersonal attitude compared to a sharp exaggerated gesture. Further, while a given conversational behavior may be used primarily to affect a single function, it can usually be seen to achieve functions from several (if not all) of the categories listed here. A well-told conversational story can communicate information, transition a conversation into a new topic, convey liking and appreciation of the listener, explicate the speaker's current emotional state, and serve to increase trust between the speaker and listener.

Back to Top

The Rules of Etiquette

Within this framework, rules of etiquette can be seen as those conversational behaviors that fulfill certain conversational functions. Emily Post would have us believe the primary purpose of etiquette is the explicit signaling of "consideration for the other"that one's conversational partner is important and valued [9]indicating these behaviors enact a certain type of attitudinal function. Etiquette rules often serve as coordination devices (for example, ceremonial protocols) and can be seen as enacting an interactional function. They can also be used to explicitly signal group membership or to indicate a desire to move a relationship in a given direction, in which case they are fulfilling a relational function. Each of these functions has been (partially) explored in existing ECA systems.

Is etiquetteespecially as enacted in nonverbal behaviorimportant in all kinds of human-computer interactions? Probably not. However, for tasks more fundamentally social in nature, the rules of etiquette and the affordances of nonverbal behavior can certainly have an impact. Several studies of mediated human-human interaction have found that the additional nonverbal cues provided by video-mediated communication do not affect performance in task-oriented interactions, but in interactions of a more relational nature, such as getting acquainted, video is superior [12]. These studies have found that for social tasks, interactions were more personalized, less argumentative, and more polite when conducted via video-mediated communication, that participants believed video-mediated (and face-to-face) communication was superior, and that groups conversing using video-mediated communication tended to like each other more, compared to audio-only interactions. The importance of nonverbal behavior is also supported by the intuition of business people who still conduct important meetings face-to-face rather than on the phone. It would seem that when a user is performing these kinds of social tasks with a computer, an ECA would have a distinct advantage over non-embodied interfaces.

Will users willingly engage in a social chat with an animated real estate agent or tell their troubles to a virtual coach? Evidence to date indicates the answer is yes.

Back to Top


Will users willingly engage in a social chat with an animated real estate agent or tell their troubles to a virtual coach? Evidence to date indicates the answer is, for the most part, yes. In the commercial arena, people have shown willingness to engage artifacts such as Tamagotchis, Furbies, and robotic baby dolls in ever more sophisticated and encompassing social interactions. Experience in the laboratory also indicates users will not only readily engage in a wide range of social behavior appropriate to the task context, but that the computer's behavior will have the same effect on them as if they had been interacting with another person [35]. This trend seems to indicate a human readiness, or even need, to engage computational artifacts in deeper and more substantive social interactions.

Unfortunately, there is no cookbook defining all of the rules for human face-to-face interaction that human-computer interface practitioners can simply implement. However, many of the most fundamental rules have been codified in work by linguists, sociolinguists, and social psychologists (for example, [2]), and exploration that makes explicit use of these rules in work with ECAs and robotic interfaces has begun. By at least being cognizant of these rules, and at most by giving them explicit representation in system design, developers can build systems that are not only more natural, intuitive, and flexible to use, but result in better outcomes for many different kinds of tasks.

Back to Top


1. Andre, E., Muller, J. and Rist, T. The PPP persona: A multipurpose animated presentation agent. Advanced Visual Interfaces, (1996).

2. Argyle, M. Bodily Communication. Methuen, New York, 1988.

3. Bickmore, T. Relational agents: Effecting change through human-computer relationships. Media Arts & Sciences, MIT, Cambridge, MA, 2003.

4. Cassell, J. and Bickmore, T. Negotiated collusion: Modeling social language and its relationship effects in intellient agents. User Modeling and Adaptive Interfaces 13 (12), 89132.

5. Cassell, J., Sullivan, J., Prevost, S. and Churchill, E., Eds. Embodied Conversational Agents. The MIT Press, Cambridge, MA, 2000.

6. Cassell, J., Vilhj·lmsson, H. and Bickmore, T. BEAT: The Behavior Expression Animation Toolkit. In Proceedings of ACM SIGGRAPH, (Los Angeles, CA, 2001), 477486.

7. Lester, J., Towns, S., Callaway, C., Voerman, J. and Fitzgerald, P. Deictic and emotive communication in animated pedagogical agents. Embodied Conversational Agents. J. Cassell, J. Sullivan, S. Prevost, and E. Churchill, Eds. MIT Press, Cambridge, MA, 2000.

8. Poggi, I. and Pelachaud, C. Performative facial expressions in animated faces. Embodied Conversational Agents. J. Cassell, J. Sullivan, S. Prevost, and E. Churchill, Eds. MIT Press, Cambridge, MA, 2000, 15588.

9. Post, E. Etiquette in Society, in Business, in Politics and at Home. Funk and Wagnalls, New York, 1922.

10. Rickel, J. and Johnson, W.L. Animated agents for procedural training in virtual reality: Perception, cognition and motor control. Applied Artificial Intelligence 13 (1999), 343382.

11. Sidner, C., Lee, C. and Lesh, N. Engagement rules for human-computer collaborative interactions. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, (2003), 39573962.

12. Whittaker, S. and O'Conaill, B. The role of vision in face-to-face and mediated communication. Video-Mediated Communication. K. Finn, A. Sellen, and S. Wilbur, Eds. Lawrence Erlbaum, 1997, 2349.

Back to Top


Timothy W. Bickmore ([email protected]) is an assistant professor in the Medical Information Systems Unit of Boston University School of Medicine, Boston, MA.

Back to Top

UF1-1Figure. REA interviewing a buyer.

Back to Top

UF2-2Figure. BEAT annotated parse tree and its performance.

Back to Top

UF3-3Figure. Laura and the MIT FitTrack system.

©2004 ACM  0002-0782/04/0400  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2004 ACM, Inc.


No entries found