A Systems Approach to Tackling Fairness in Federated Learning
Due to the growing computational power of end-user devices (e.g., mobile phones) coupled with concerns over transmitting private information, it is increasingly attractive to keep the locally stored data and push the model training task to end-user devices, a machine learning setting commonly known as Federated Learning (FL).
In this FL setting, clients owning different set of data having common data structure collaboratively learn a global model without ever transmitting the data. Even though, there is no one unified architecture, the training process may involve the following steps (or a variant of them):
- The clients pull a copy of the current up-to-date global model from a centralized FL server.
- The clients perform a number of local optimization steps.
- The clients encrypt all or a subset of the updated local model parameters with encryption, differential privacy, or secret sharing technique.
- The clients push the encrypted model to the central server hosting the global model.
- The central server performs secure aggregation of the local models pushed by the clients.
- The central server updates the current state of the global model.
In this project, we explore the inherent problem of bias (or unfairness) resulting from the heterogeneous infrastructure of Federated Learning. In the FL setting, computation, network and storage, resources of each client are heterogeneous. This is because client devices employ different hardware (CPU, memory), accelerators (GPU, AI Chips), and network connectivity (WiFi, 5G). Moreover, system-level constraints (e.g., network size, resource utilization, or power consumption) can prohibit all clients from participation and so only a small fraction of the clients would be active at any training round. In addition, the system cannot expect that all clients would complete successfully their round as it is possible the any active client drops out due to unpredictable events like loss of connectivity or power.
Most of the system-level constraints and resource heterogeneity can result in a significant problem of bias of the model towards certain clients over others. Therefore, the challenge is to design methods catered for FL which can tolerate heterogeneous hardware resources, deal with low levels of client participation, and be resilient to sudden loss of clients during training.