Over the last decade, smartphones have ushered in radical and disruptive changes to society and business. They have introduced entirely new ways to interact, consume media, shop, buy products, and even order and pay for food and beverages.
For all their sophisticated capabilities and features, however, today's smartphones really aren't all that smart. They ring, buzz, or beep while we're attending a classical music concert or sitting in church. They are unable to warn us when we exercise too strenuously. They send distracting or unwanted traffic alerts while we are driving, or away on vacation.
The next step in the evolution of the smartphone is to make these devices more situationally and contextually aware. This means adding new and improved sensors, and pushing the boundaries further on artificial intelligence (AI).
"The goal is to develop smartphones that can sense the environment and make autonomous decisions using AI," says Yusuf Aytar, a post-doctoral research associate at the Massachusetts Institute of Technology (MIT).
Although image recognition, speech processing, motion detection, and other technologies continue to advance—and alter the way people use apps and devices—getting to the point where a phone or other device can sense the surrounding environment like a human is no simple task. "Right now, a lot of sensing functions take place individually. We are not yet at the point where phones can combine data sources and make good decisions," Aytar says.
All of this is likely to change over the next decade. As manufacturers load smartphones with sensors that gauge velocity and direction of movement, ambient temperature, lighting levels, images, sound, and more, they will gain greater sensing capabilities. Combined with algorithms that can transform data points into perceptions, they will become more human-like in their decision making.
"These sensing capabilities will assist people in their daily lives by providing appropriate contextual information or situational awareness, thus changing the device output behavior based on particular events and user status," says Frank A. Shemansky, Jr., chief technology officer of the MEMS & Sensors Industry Group, a SEMI Strategic Association Partner.
At a basic level, such sensing capabilities could allow a phone to switch off its ringer automatically when a person enters a train, or is running to catch a flight. It might even auto-dial 9-1-1 if a person falls off a ladder. Yet, more advanced sensing might also provide an early warning for the onset of diminished cognitive function resulting from dehydration or fatigue, which would be particularly valuable in high-risk occupations, as well as for fighter pilots and other military applications.
"The key," Shemansky says, "is to figure out which inputs correlate to something of value."
One of the challenges in building smarter sensing systems is packing enough processing power on board. "Every time you hit the cloud, there is latency. For many real-time events, latency cannot be tolerated," says Vivienne Sze, an assistant professor in the Department of Electrical Engineering and Computer Sciences at MIT.
Sze and a research team at MIT are taking aim at the issue. They have developed a chip called Eyeriss that accommodates numerous types of convolutional neural networks on board. These neural nets can be used for applications such as object recognition, speech processing, and facial detection. The system reduces data movement by placing the computational inputs closer to the multiplier. Eyeriss is about 10 times more efficient than a mobile graphics processing unit (GPU) at running AI algorithms, and it consumes less power through highly efficient data flow, including skipping unnecessary calculations. Local processing also increases privacy, Sze adds.
Others are pushing the boundaries on digital sensory perception. For example, Aytar and his team in MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) are using AI to identify a given situation based on sound; this includes everything from ocean waves and thunder to sirens and crying babies. In 2016, the group fed 26 terabytes of data from Flickr into a neural network and added audio clips to build a database that has achieved up to a 93% recognition rate compared to humans.
The system, called SoundNet, could ultimately be used to automatically route calls to voicemail if a person is in a movie theater or attending a live theatrical event (or, it could push calls or messages through if there's a sense of urgency in the caller's voice or choice of words). The same technology—combined with image recognition and Global Positioning System (GPS) capabilities—could also enhance robots, drones, and autonomous vehicles as they attempt to sense nearby events and adjust their behaviors accordingly.
Further out, Aytar says the goal is to develop systems that use multiple modalities to sense things in broader and deeper ways. A device might detect a person's emotional state through pulse, perspiration, or breathing and, based on external factors such as light, sound, or temperature, adjust decisions accordingly, Aytar explains.
"The end goal is to have an alignment of data across different modalities. That's when we can build systems that approximate or even extend beyond human senses."
Samuel Greengard is an author and journalist based in West Linn, OR.