CoLExT

Collaborative Learning Experimentation Testbed

CoLExT is a Collaborative Learning Testbed built for machine learning researchers to realistically execute and profile Federated Learning (FL) algorithms on real edge devices and smartphones. To facilitate experiments, CoLExT comes with a software library that seamlessly deploys and monitors code compatible with the Flower Framework.

Despite there being vast research on FL, most research is done in simulated environments, and not on real devices. In a simulation, real device limitations may be overlooked, making some algorithms unusable in a real deployment. That’s where this project comes in. Our aim is to provide one of the first Collaborative Learning Testbeds where researchers and practitioners can easily deploy, execute and profile their cross-device FL algorithms with real devices.

At the moment the testbed is only accessible in KAUST. However, we plan to open source our system design and setup so that others can build their own testbed, or even connect their testbed with ours, expanding the device pool.

Current CoLExT devices

Current setup

  • 28 Single Board Computers (SBC)
    • Orange Pi 5B
    • LattePanda Delta 3
    • Nvidia Jetson - AGX Orin, Orin Nano, Xavier NX, Nano
  • 20 Smartphones
    • Samsung - XCover 6 Pro, Galaxy M54
    • Xiaomi - 12, Poco X5 Pro
    • Google Pixel 7
    • Asus ROG phone 6
    • One Plus Nord 2T 5G

Interacting with CoLExT

We’ve designed the system with simplicity in mind. Our current setup, assumes the researchers have code that runs using the Flower Framework.

Below we describe the steps to interact with the testbed.

  1. Install our Python package (in development)

    $ python3 -m pip install colext
    
  2. Add decorators from CoLExT to Flower client and strategy

    @MonitorFlwrClient
    class YourFlowerClient(flwr.client.NumPyClient):
    
    @MonitorFlwrStrategy
    class YourFlowerStrategy(flwr.server.strategy.Strategy):
    

    Note: The CoLExT decorators do nothing if outside the CoLExT environment

  3. Write configuration file

     job_name: "SOTA FL experiment"
    
     code: 
       # Path can be omitted if colext_config.yaml is in the root of the project
       path: "/home/fl_researcher/sota_fl_algorithm/"
       client:
         entrypoint: "client.py" 
         args: "--flserver_address=${COLEXT_SERVER_ADDRESS}"
       server: 
         entrypoint: "server.py"
         args: "--num_clients=${COLEXT_N_CLIENTS} --num_rounds=2"
    
     devices:
       - { device_type: JetsonOrinNano, count: 4 }
       - { device_type: OrangePi5B,     count: 6 }
    
  4. Have a pip requirements.txt file in the root of code path

  5. Launch the experiment using a command from the CoLExT package

     $ colext_launch_job --config <path-to-config>
    
  6. Monitor performance metrics and logs on a Grafana dashboard Initial setup

  7. Retrieve metrics once job finishes

     $ colext_get_metrics --job_id <job-id>
    

Behind the scenes

When interacting with the testbed, the following happens behind the scenes:

  • Automatic containarization
    • Each experiment has their own isolated environment
    • Done automatically to hide the details of different architectures and OS
  • Deploying and orchestraing experiments
    • For SBCs:
      • The testbed forms a Kubernetes cluster
      • Kubernetes orchestrates the deployment of clients and server
    • For Smartphones:
      • Deployment is managed using Android Debug Bridge
      • Orchestration is handled directly from the colext package
  • Automatic performance metrics
    • User code does not need to concern with performance metric collection
    • Anchored on Flower API we collect metrics tagged with round number

Current limitations

  • No automatic code containerization for smartphones
    • Custom Kotlin code is required
    • On smartphones the only supported ML framework is Tensorflow Lite
  • Smartphones and SBCs cannot be used in the same experiment.
    • Different serialization between the two device categories prevent this
  • Dataset management
    • Currently, datasets must always be downloaded and partitioned
    • A way to do this is using Flower Datasets
    • We plan to remove this limitation shortly

Future work

The testbed development has only just began. We currently have a total of 48 devices, but we envision scaling the FL testbed to over 100 devices. This value can grow even further if we merge our FL testbed with other similar testbeds from other institutions.

This project benefited tremendously from the contributions by: Janez Bozic, Amandio Faustino, Veljko Pejovic, Marco Canini, Boris Radovic, Suliman Alharbi, Abdullah Alamoudi, Rasheed Alhaddad.

Avatar
Marco Canini
Associate Professor of Computer Science

My current interest is in designing better systems support for AI/ML and provide practical implementations deployable in the real-world.