Federated Learning: Sharing Knowledge Instead of Data

by DarwinAI

Dr.  Javad Shafiee

Federated learning is a new machine learning technique in which the AI model is trained across several independent devices holding their own local data samples without the need to share data with other participants. In other words, training data is not received from or exchanged between those independent devices. This allows the AI model to benefit from what the independent devices learn without having to collect all of the data from each device and thus preserving the privacy of each party. Through our work with enterprises using AI, we’ve seen that federated learning shows promise in a number of fields; however, it is especially gaining a lot of traction in the manufacturing and health fields.  

High value for practical applications

In a manufacturing context, if a single inspection unit detects a new kind of flaw, the ability to detect that flaw can be shared with each unit in the factory or across factories without having to manage the flow and maintenance of the data. This ensures these types of applications never lack sufficient training data at any one site.  Additionally, if one unit or facility has not seen a specific flaw, it can still be  ready to act the first time it appears. Importantly, in today’s regulated industry contexts around the world, global manufacturers can meet concerns around preserving privacy, confidentiality and important competitive data that must be safeguarded.

For healthcare applications, disease detection can be shared between units or hospitals without compromising patients’ private data; it also improves access to the best possible diagnostic tools and expertise in different geographical locations. With emerging governance policies and regulations, this is a top priority for industry today. 

How it works

Each device trains a local version of the model on its own data, it then shares just the model information (not the data) with a central server that aggregates the model information from all participating devices to update a global model. This updated global model is then shared with all devices and the process repeats. 

The process starts by registering all devices provided by parties willing to participate in the learning and ensuring secure communication starts between the server and each participant. Then the server shares the latest available model with the clients and waits to receive the new tuned models from the clients. Each client fine-tunes the shared model via the data available to it and then sends the new model to the server. The insight of federated learning is how the server then aggregates the learned knowledge from all clients in the form of received models and generates a new model with higher performance. This process is performed iteratively until it reaches a predefined critereon  or convergence.   

Go lean on resource management

You can get all the benefits of each device’s experience without having to spend resources transferring and storing all of the data on every other device. In addition to data privacy benefits, this approach reduces the need to store data from all clients on the server which can reduce the storage requirement drastically. Moreover, given there is no need to transfer all of the data to the server, the requirement for high bandwidth transmission and communication is significantly lessened.

Ensure essential security and privacy

When only the learning (and not the data) is transferred between devices, many new data-driven applications become possible by using things learned from an individual’s data without it even being shared. 

This allows companies who do not wish to explicitly share sensitive data but do want to use it to assist in other open source projects to do so with limited risk.

However, if the project involves other actors (i.e. an open source project), you should do due diligence to make sure everyone is participating in good faith.  It is possible that someone could intentionally share a faulty model that could be damaging.  Several risk mitigation factors have been proposed in this area which can be incorporated to avoid this type of attack and make sure the collaboration does not bring any harm to your pipeline. 

If you are debating what kind of technique you want to use in your facility, federated learning offers a relatively lightweight option that yields excellent results and provides a level of security and privacy of data for you and your clients. 

Click the link to learn more about DarwinAI or Manufacturing Use Cases & White Paper Download.