Facial recognition is no longer a challenge for computer vision systems. The technology is now widely used in daily activities such as looking at your phone to unlock it or checking your identity at airports. However, identifying family members from their faces is a much more difficult task for artificial intelligence (AI).
"It's a type of face recognition problem where the faces are not actually that similar," explains Yosi Keller, associate professor and co-founder of the Deep Learning Lab at Bar Ilan University in Ramat Gan, Israel. "The similarity is there, but I would say it is very implicit and well-hidden."
An AI system that can pick out a person's relatives from images could have many potential uses. Kinship recognition systems are of interest to help identify children being exploited online who may not be pictured in image databases, but whose family members might be. They could also facilitate the task of uniting refugee families that have been separated. "Using low-cost security cameras at different camps, we could connect families back together," says Joseph Robinson, an kinship recognition researcher at Northeastern University in Boston who recently presented his work at the Conference for Computer Vision and Pattern Recognition (CVPR 2019) last June in Long Beach, CA.
Social media apps and photo sharing sites could also use the technology to sort photographs containing family members. Also, it could help scholars doing studies involving historic lineage, or be used to identify long-lost relatives on ancestry sites.
Computer vision researchers have been working on the problem for over 20 years. Initially, similar facial features to focus on, such as eye color or nose shape, were manually selected. However, it was difficult to get accurate results, since common characteristics are not always obvious and vary from family to family.
Now, deep learning using convoluted neural networks (CNNs) can be harnessed to do a better job.
"We are asking the machine to find the best features which discriminate family members," says Abdenour Hadid, an adjunct professor of computer vision at the University of Oulu in Finland. "We give our machine a lot of examples of family members and non-family members so that it can learn."
Compared to standard facial recognition, there are many more confounding factors in identifying relatives. For instance, picking out relatives of a different gender adds an extra layer of complexity. A child is also easier to identify if there are images available of both his parents, rather than just one. According to Robinson, age difference seems to be the biggest obstacle. "We've noticed that little children tend to be failure cases because as people get older, they look more like their parents did," he says.
Robinson and his colleagues found their system performs best when they have photos of grown-up children and images of their parents at the age at which they had children. The researchers found that in some cases, they could mitigate age gaps by including images of additional family members (for example a grandfather and his grandchild are easier to match when an image of the child's parent is provided as well).
Kinship recognition systems are improving, and can now perform well when good-quality images are provided. However, CNN algorithms learn on their own, so researchers don't know exactly what they are doing to achieve the task. "It's basically a black box," says Hadid.
In early kinship recognition work, not understanding how the algorithms worked proved to be problematic. Early family image databases used group photos of all family members. A recent paper showed that CNNs seemed to be able to identify family members with over 90% accuracy, but that they were cheating to complete the task. "Algorithms were just learning to identify [face] crops coming from the same image," says Keller. "It meant that our algorithm was biased, and the same goes for all the other papers that used old school data sets."
A dataset called Families in the Wild (FIW) created by Robinson and his team is now the gold standard for automatic kinship recognition research. It doesn't suffer from the group photo issue and is the largest, most comprehensive family image data set available, with the latest release containing about 50,000 faces. The images are arranged into over a million face-pairs representing different relationships, such as father-daughter and sister-sister. They are used by CNNs to determine whether two faces are blood relatives or not, a task called kinship verification.
The FIW dataset is key to improving the performance of kinship recognition systems. The collection of images contains photos of families from different countries, from America to China, so it's a good representation of real families worldwide. It's also large enough to fuel data-driven models. "Now we're at the point where we're going to start framing more real-world problems, to aid in more practical applications such as large-scale search and retrieval," says Robinson. "Three years ago, it would be unthinkable."
In a paper published last year, Robinson and his colleagues showed that their fine-tuned CNN algorithm could identify family members better than people can. The work used an older version of FIW, with over 13,000 family photos making up 1,000 family trees, but their system surpassed human abilities to recognize kin by about 15%.
Kinship recognition has typically focused on still images, but it is now being extended by using different types of data, such as video and voice recordings. Hadid and his colleagues showed that family relationships can be identified more easily from videos, since they provide information about how a person moves, which is often similar to how their relatives move. "They also show gestures, the way you are looking, the way you are smiling, the way you are moving your eyes," says Hadid. "So this gives you additional information and video works better than images."
Hadid and his team recently started exploring whether audio recordings of family members' voices could be used for kin recognition. There are usually similarities in the way parents and children speak, for example, and the researchers are trying to learn to detect the common attributes. In preliminary tests, voice information seems to give similar results to static images of faces. I
Combining different sources of data such as faces and voices, however, is a strategy that should be able to pinpoint kin relationships most effectively. Having multiple types of data can help compensate if one source is not reliable. And certain types of data may be preferable in different cases; for example, facial images would provide better-quality data in a noisy train station, while voices would be a better source of information in a dark, quiet place.
Some families may have similar faces but not voices, while the opposite may be the case for other families. With that in mind, greater consideration should be given to certain sources of data, depending on the situation. "We have to find the best way to adapt our combination to every case," says Hadid. "This is the challenge."
Sandrine Ceurstemont is a freelance science writer based in London, U.K.