Swarm Learning – Shared Ethically without Privacy Breach

by | May 31, 2022 | Machine Learning / Artificial Intelligence

MIT Technology Review Insights in connection with Hewlett Packard Enterprise presented a podcast on August 16, 2021, hosted by Laurel Ruma, director of MIT Insights, who interviews Dr. Eng Lim Goh, VP, and CTO of AI at Hewlett Packard Enterprise. Their conversation features A New Age of Data Mean Embracing the Edge, found in the MIT Technology Review website[1] 

Dr. Goh opens with the thought “The world will shift from where we have centralized data to where we are comfortable with data being everywhere.”  This is based on the statistic that by 2022, there will be over 50 billion devices connected at the edge with massive amounts of data.  The question: How to keep all that collected data secure but still be able to share learning from the data?  Dr.Goh answers, Swarm Learning.

Dr. Goh explains the shift of AI gravity from centralized to decentralized data. In applying this to hospitals, he frames the sharing of medical information in terms of a neural network model whereby hospitals share the insights of what is learned from patterns of illness without the privacy data.

Sharing of findings is an important part of advancing science. … Some organizations started with natural language processing tools as specific questions rather than keywords to get answers from the corpus of documents.  NLP scans documents for specific areas of critical information without personal data. … Nature publication presented Swarm Learning for Decentralized and Confidential Clinical Machine Learning. … There are two steps: first hospitals use machine learning to collect data from patients; then blockchain is brought in to collect all the learning found in the averaged data without the patient data to be sent back to hospitals with the updated globally combined average learnings. 

The purpose of blockchain is to guard the security of data collection results. Dr. Goh explains the two reasons why blockchain is used in swarm learning:

We keep that information private, because, in a private blockchain, only participants, main participants, or certified participants are allowed in this blockchain. Even if the blockchain is compromised, what is only seen are the weights or the parameters of the learnings, not the private patient data, because the private patient data are not in the blockchain.

The second reason is opposed to having a central custodian who collects the parameters of learning.  To ensure equitable sharing, we use blockchain that randomly appoints one of the participants as collector or leader to average the parameters to be sent out. In each cycle, there is a new leader appointed.

What shared learning does through blockchain is reduce bias that might be found from one organization to another.  Dr. Goh uses the example of lung disease – one hospital might have biases in its report on tuberculosis while a second hospital will have biases in its reports of lung collapse.  Swarm learning will average the two reports without the biases. 

Dr. Goh explains the increase of devices forcing the use of edge computing.

We’re coming to a point where we have an average of about 10 connected devices collecting data per person.  Given that situation, the center of gravity will be at the edge in terms of where data is generated.  This will change dynamics tremendously for enterprises. We will see that these devices can’t afford to bring their data to and from the cloud.  Data growth will far exceed the growth in bandwidth and the intelligence to the edge by deciding what data is moved to the cloud and what not. It’s going to be a new age. The world will shift from centralized data to being everywhere.  That’s when communication, collaboration, and learning will happen peer-to-peer.

Fundamentally, when we want to learn from each other. What swarm learning does is try to avoid the sharing of data by sharing the insights of learning, which is more secure, meaning the neural network weights are the shared learning.  To do this, we encrypt the weights and average them so that anyone trying to hack the system will not be able to deduce any personal data from the learning.

This lets organizations grow the amount of data they need to safely share their insights


[1] https://www.technologyreview.com/2021/08/16/1031738/a-new-age-of-data-means-embracing-the-edge/