A nurse asks a patient to describe her symptoms. A fast-food worker greets a customer and asks for his order. A tourist asks a police officer for directions to a local point of interest.
For those with all of their physical faculties intact, each of these scenarios can be viewed as a routine occurrence of everyday life, as they are able to easily and efficiently interact without any assistance. However, each of these interactions are significantly more difficult when a person is deaf, and must rely on the use of sign language to communicate.
In a perfect world, a person that is well-versed in communicating via sign language would be available at all times and at all places to communicate with a deaf person, particularly in settings there is a safety, convenience, or legal imperative to ensure real-time, accurate communication. However, it is exceptionally challenging, from both a logistical and cost perspective, to have a signer available at all times and in all places.
That's why, in many cases, sign language interpreting services are provided by Video Remote Interpreting, which uses a live interpreter that is connected to the person needing sign language services via a videoconferencing link. Institutions such as hospitals, clinics, and courts often prefer to use these services, because they can save money (interpreters not only bill for the actual translation service, but for the time and expenses incurred traveling to and from a job).
However, video interpreters sometimes do not match the accuracy of live interpreters, says Howard Rosenblum, CEO of the National Association of the Deaf (NAD), the self-described "premier civil rights organization of, by, and for deaf and hard of hearing individuals in the United States of America."
"This technology has failed too often to provide effective communications, and the stakes are higher in hospital and court settings," Rosenblum says, noting that "for in-person communications, sometimes technology is more of an impediment than a solution." Indeed, technical issues such as slow or intermittent network bandwidth often make the interpreting experience choppy, resulting in confusion or misunderstanding between the interpreter and the deaf person.
That's why researchers have been seeking ways in which a more effective technological solution or tool might handle the conversion of sign language to speech, which would be useful for a deaf person to communicate with a person who does not understand sign language, either via an audio solution or a visual, text-based solution. Similarly, there is a desire to allow real-time, audio-based speech or text to be delivered to a person who is deaf, often through sign language, via a portable device that can be carried and used at any time.
Nonetheless, sign languages, such as the commonly used American Sign Language (ASL), are able to convey words, phrases, and sentences through a complex combination of hand movements and positions, which are then augmented by facial expressions and body gestures. The result is a complex communication system that requires a combination of sensors, natural language processing, speech recognition technology, and machine learning technology, in order to capture and process words or phrases.
One system designed to allow people fluent in ASL to communicate with non-signers is SignAloud, which was developed in 2016 by a pair of University of Washington undergraduate students. The system consists of a pair of gloves that are designed to recognize the hand gestures that correspond to words and phrases used in American Sign Language (ASL).
Worn by the signer, each glove is fitted with motion-capture sensors that record the position and movements of the hand wearing it, then sends that data to a central computer via a wireless Bluetooth link. The data is fed through various sequential statistical regressions, which are similar to a neural network, for processing. When the data matches an ASL gesture, the associated word or phrase is spoken through a speaker. The idea is to allow for real-time translation of ASL into spoken English.
Despite the promise of SignAloud, whose inventors received the $10,000 Lemelson-MIT Student Prize, there was significant criticism of the product from the deaf community, who complained that SignAloud did not capture the nuances of sign language, which relies on secondary signals such as eyebrow movements, shifts in the signer's body, and motions of the mouth, to fully convey meaning and intent. Furthermore, strict word-for-word translations of ASL, like other languages, often results in an inaccurate translation, as each language requires sentence structure and context in order to make sense.
That has not stopped other companies from developing similar products, such as the BrightSign Glove, developed by Hadeel Ayoub as a relatively inexpensive (pricing is expected to be in the hundreds-of-dollars range) way to allow two-way communication between those who sign and those who do not. BrightSign's technology is slightly different than SignAloud; users record and name their own gestures to correspond with specific words or phrases, thereby ensuring that the lack of facial cues or body motions will not impact meaning. As a result, BrightSign users can take advantage of a 97% accuracy rate when using the gloves.
BrightSign is readying several versions of the glove for commercialization, including a version aimed at children, with a substantial wristband with its own embedded screen and audio output. Another version, targeted at the adult deaf community, can send translations directly to the wearer's smartphone, which can then enunciate the words or phrases.
"The challenge is that every person signs with their own flair and nuance, just like every person has a different sound or inflection on how they pronounce certain words."
The company says it has about 700 customers on its preorder list, and is trying to secure about $1.4 million in capital from investors, which would allow the company to fulfill all existing preorders.
Other tools are being developed to address the technological challenges of translating ASL to speech, although the complexity of ASL and other sign languages present significant technological challenges to handle these tasks in real time, which is needed to ensure smooth communication.
"There are several companies that are developing software and databases, including with the use of AI and machine learning, to create programs on computers that can 'read' a person that is signing in ASL," Rosenblum says, noting that these tools not only read hand-signing, but also capture facial cues and other visible markers. Using cameras to capture these signs and cues, the systems then use machine learning to identify and recognize specific movements and gestures, and then match them to specific words or phrases which can then be sent to a speech or text generator that can be read or heard by a non-signing individual.
"However, the challenge is that every person signs with their own flair and nuance, just like every person has a different sound or inflection on how they pronounce certain words," Rosenblum says. To manage the variances in the way people sign, videos of people signing must be input and processed by a machine learning algorithm to train the system to account for these stylistic variances. As such, the systems need lots of time and data in order to improve accuracy.
Another major issue is allowing people who don't sign to communicate in real time with those who do sign. One application that appears to be functioning well enough for some users to utilize today is Hand Talk. This app allows a non-signer to input words and phrases by speaking to the app located on a deaf person's phone. The app engine translates the words in real time into Libras, the sign language used in Brazil. Then, an animated avatar known as Hugo will begin signing on the deaf person's smartphone screen.
Unlike other apps that are using machine learning to train an algorithm, Hand Talk's founder Ronaldo Tenorio and his team program thousands of example sentences every month and match them with three-dimensional (3D) animations of sign language, including Hugo's facial expressions, which carry meaning in sign language. Improvements to the application are pushed out through regular app updates.
According to the company, the app handles six million monthly translations on Hand Talk, and has reached one million downloads, approximately one-sixth of Brazil's deaf population.
Still, for applications that will be useful across a wide range of languages, cultures, and situations, developers likely will need to use machine learning algorithms to learn all the possible variations, nuances, and cadences of conversational sign language. Further, ASL and other sign languages are very complex, with signs bleeding into one another, anticipating the shape or location of the following sign, which is similar to how some spoken sounds take on the characteristics of adjacent sounds. As such, Rosenblum says, "the capacity or development of computers being able to "read" the zillions of variations of rendering ASL is extremely difficult and probably will take a decade to accomplish."
A key reason why even advanced technologies that use machine learning to train and ingest the many variations of sign language do not work as seamlessly as a live signer is due to the lack of participation of deaf or hard-of-hearing people in the development process, thereby missing key linguistic, stylistic, and usability concerns of signers.
"That's a huge problem," Rosenblum says. "Several companies do have deaf and hard-of-hearing engineers, scientists, or other highly trained professionals, but this is more of an exception than the rule."
Perhaps the biggest reason why technology for the deaf is not as functional as it could be is because technology is driven, in large part, by the lack of regulatory requirements covering non-signer to signer communications, and vice versa. Improvements in accessibility within the television and video industries was driven by regulation, and may serve as an example of how real-time communications may eventually be regulated.
"For individuals with hearing loss, videos need captioning or a transcript of what is verbally communicated," says Nancy Kastl, Testing Practice Director at the digital technology consulting firm, SPR. "For individuals with vision loss, the captioning or transcript (readable by a screen reader) should include a description of the scenes or actions, if there are segments with music only or no dialogue."
"The capacity or development of computers to 'read' the zillions of variations of rendering ASL is extremely difficult, and probably will take a decade to accomplish."
Likewise, Rosenblum says that "many of the best advances in technology for deaf and hard of hearing people have been because laws demanded them," noting that the text and video relay systems provided by telecommunications companies were very basic and voluntary prior to the adoption of the Americans with Disabilities Act (ADA) of 1990.
Furthermore, the closed captioning of television content for the hearing impaired "in the original analog format was mandated by the Telecommunications Act of 1996, and expanded to digital access online through the 21st Century Communications and Video Accessibility Act of 2010, as well as by the lawsuit of NAD v. Netflix in 2012," Rosenblum says, noting that the suit required Netflix to ensure that 100% of its streaming content is made available with closed captions for the hearing impaired.
Cooper, H., Holt, B., and Bowden, R.
Sign Language Recognition, Visual Analysis of Humans, 2011 http://info.ee.surrey.ac.Uk/Personal/H.M/research/papers/SLR-LAP.pdf
Why Sign Language Gloves Don't Help Deaf People, The Atlantic, November 9, 2017, https://www.theatlantic.com/technology/archive/2017/11/why-sign-language-gloves-dont-help-deaf-people/545441/
25 Basic ASL Signs For Beginners, American Sign Language Institute, Oct. 22, 2016, https://www.youtube.com/watch?v=Raa0vBXA8OQ
©2018 ACM 0001-0782/18/12
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.