The ability to examine data in deep and revealing ways has fundamentally changed the world. Yet, despite enormous advances in analytics and machine learning, researchers, businesses, and governments remain hamstrung by a basic but nagging problem: a lot of data is sensitive and sharing it presents security and privacy risks.
Although numerous technologies, such as data tokenization, k-anonymity, homomorphic encryption and Trusted Execution Environments (TEE) have helped tame the beast, they do not address the fundamental problem that sensitive data is crossing undesired boundaries. Consequently, many organizations hesitate to share data that could yield remarkable insights. The problem is especially acute when organizations have trade secrets, customer data, or personally identifiable information (PII) embedded in data.
An emerging framework called Federated AI (artificial intelligence) Learning could change all of this, while ushering in broader changes in the way data is used, and owned. The technology has major repercussions for machine learning, security, and privacy. "It represents the future of secure machine learning," says Eliano Marques, vice president of data and AI at data security firm Protegrity.
Rethinking the Model
Federated AI takes the conventional idea of machine learning and turns it on its head. Instead of multiple groups sending data to a central cloud where the machine learning takes place, the algorithm travels to the computing device. All training is performed on the client or device, and when the algorithm determines that it is finished, it exits the device, taking the results with it. "There is no data sharing," says Jigar Mody, head of AI services at Oracle.
In fact, after the machine learning process takes place, the output is the only common asset. This makes it possible for a group of companies, research institutes, or government agencies to pool data and extract results securely. As a result, it is possible to better protect trade secrets and personally identifiable information (PII). It also is possible to push machine learning beyond conventional boundaries and open up new opportunities. For example, the technology can perform computations on huge datasets independently, minimizing problems associated with bandwidth and slow connections.
Although the framework is only a few years old—it was conceived by a group of researchers in 2017—Federated AI Learning is advancing rapidly. Google has introduced TensorFlow Federated, an open source framework that supports federated machine learning. Several other open source libraries have emerged, including PySyt, PyTorch and Federated AI Technology Enabler (FATE). Meanwhile, startups such as S20.ai, Owkin, and Snips have introduced commercial solutions to support Federated AI.
The technology is gaining traction, particularly in the healthcare arena. For example, researchers at the New York City-based Mount Sinai Health System recently used the technique to analyze electronic healthcare records and better predict how COVID-19 patients will progress, without compromising patient privacy. In the U.K., King's College London and the National Health Service (NHS) are using the technology to study patient data about cancer, heart disease and neurological problems.
Federated AI Learning is particularly valuable for researchers attempting to study rare conditions, says Nicola Rieke, senior deep learning solution architect for Healthcare at NVIDIA. "Federated Learning could revolutionize how AI models are trained."
Eye on Privacy
The value of Federated AI Learning extends far beyond healthcare. It can help pinpoint fraud, and study black holes in space using data from multiple telescopes. Apple already uses it to improve Siri.
Federated AI Learning could fundamentally change law enforcement and address areas such as fraud, human trafficking, and terrorism, says Marques. By establishing a framework that simplifies data collection and analysis—and removing the need to transmit highly sensitive data—a group of organizations or agencies could connect data points and solve crimes. "The technology would identify suspicious patterns and identify people when there's a red flag," he explains.
Oracle's Mody says the technology could fundamentally reshape consumer privacy. With the right Federated AI software running on a PC or smartphone, an individual could consent to sharing data with a company—or not. What's more, as data regulations such as Europe's General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) expand to other countries and states, the technology may provide a way to simplify data security by making it easier for consumers to own their data.
For now, a few obstacles remain. Federated AI algorithms and training models are continuing to mature and improve, models and methodologies are somewhat inconsistent, and limited processing capabilities on many edge devices can limit training and inference capabilities. Nevertheless, the concept of decentralizing AI and machine learning is gaining momentum. Says Mody: "This is an extremely promising technology that will shape the future of AI."
Samuel Greengard is an author and journalist based in West Linn, OR, USA.