FEDn Projects

A FEDn project is a convention for packaging/wrapping machine learning code to be used for federated learning with FEDn. At the core, a project is a directory of files (often a Git repository), containing your machine learning code, FEDn entry points, and a specification of the runtime environment (python environment or a Docker image). The FEDn API and command-line tools provides functionality to help a user automate deployment and management of a project that follows the conventions.

Overview

We recommend that projects have roughly the following folder and file structure:

project
├ client
│ ├ fedn.yaml
│ ├ python_env.yaml
│ ├ data.py
│ ├ model.py
│ ├ train.py
│ └ validate.py
├ data
│ └ mnist.npz
├ README.md
├ scripts / notebooks
└ Dockerfile / docker-compose.yaml

The “client” folder is referred to as the compute package. The file fedn.yaml is the FEDn Project File. It informs the FEDn Client of the code entry points to execute when computing model updates (local training) and validating models (optionally) . When deploying the project to FEDn, the client folder will be compressed as a .tgz bundle and uploaded to the FEDn controller. FEDn can then manage the distribution of the compute package to each client/data provider when they connect. Upon recipt of the bundle, the client will unpack it and stage it locally.

Compute package overview

The above figure provides a logical view of how FEDn uses the compute package (client folder). When the fedn.network.clients recieves a model update request, it calls upon a Dispatcher that looks up entry point definitions in the compute package from the FEDn Project File.

FEDn Project File (fedn.yaml)

FEDn uses on a project file named ‘fedn.yaml’ to specify which entrypoints to execute when the client recieves a training or validation request, and what environment to execute those entrypoints in.

python_env: python_env.yaml

entry_points:
    startup:
        command: python data.py
    train:
        command: python train.py
    validate:
        command: python validate.py

Environment

The software environment to be used to exectute the entry points. This should specify all client side dependencies of the project. FEDn currently supports Virtualenv environments, with packages on PyPI. When a project specifies a python_env, the FEDn client will create an isolated virtual environment and install the project dependencies into it before starting up the client.

Entry Points

There are up to four Entry Points to be specified.

Build Entrypoint (build, optional):

This entrypoint is usually called once for building artifacts such as initial seed models. However, it not limited to artifacts, and can be used for any kind of setup that needs to be done before the client starts up.

Startup Entrypoint (startup, optional):

This entrypoint is called once, immediately after the client starts up and the environment has been initalized. It can be used to do runtime configurations of the local execution environment. For example, in the quickstart tutorial example, the startup entrypoint invokes a script that downloads the MNIST dataset and creates a partition to be used by that client. This is a convenience useful for automation of experiments and not all clients will specify such a script.

Training Entrypoint (train, mandatory):

This entrypoint is invoked every time the client recieves a new model update request. The training entry point must be a single-input single-output (SISO) program. It will be invoked by FEDn as such:

python train.py model_in model_out

where ‘model_in’ is the file containing the current global model to be updated, and ‘model_out’ is a path to write the new model update to. Download and upload of these files are handled automatically by the FEDn client, the user only specifies how to read and parse the data contained in them (see examples) .

Validation Entrypoint (validate, optional):

The validation entry point works in a similar was as the trainig entrypoint. It can be used to specify how a client should validate the current global model on local test/validation data. It should read a model update from file, validate it (in any way suitable to the user), and write a json file containing validation data:

   python validate.py model_in validations.json

The validate entrypoint is optional.

Example train entry point

Below is an example training entry point taken from the PyTorch getting stated project.

import math
import os
import sys

import torch
from data import load_data
from model import load_parameters, save_parameters

from fedn.utils.helpers.helpers import save_metadata

dir_path = os.path.dirname(os.path.realpath(__file__))
sys.path.append(os.path.abspath(dir_path))


def train(in_model_path, out_model_path, data_path=None, batch_size=32, epochs=1, lr=0.01):
    """ Complete a model update.

    Load model paramters from in_model_path (managed by the FEDn client),
    perform a model update, and write updated paramters
    to out_model_path (picked up by the FEDn client).

    :param in_model_path: The path to the input model.
    :type in_model_path: str
    :param out_model_path: The path to save the output model to.
    :type out_model_path: str
    :param data_path: The path to the data file.
    :type data_path: str
    :param batch_size: The batch size to use.
    :type batch_size: int
    :param epochs: The number of epochs to train.
    :type epochs: int
    :param lr: The learning rate to use.
    :type lr: float
    """
    # Load data
    x_train, y_train = load_data(data_path)

    # Load parmeters and initialize model
    model = load_parameters(in_model_path)

    # Train
    optimizer = torch.optim.SGD(model.parameters(), lr=lr)
    n_batches = int(math.ceil(len(x_train) / batch_size))
    criterion = torch.nn.NLLLoss()
    for e in range(epochs):  # epoch loop
        for b in range(n_batches):  # batch loop
            # Retrieve current batch
            batch_x = x_train[b * batch_size:(b + 1) * batch_size]
            batch_y = y_train[b * batch_size:(b + 1) * batch_size]
            # Train on batch
            optimizer.zero_grad()
            outputs = model(batch_x)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()
            # Log
            if b % 100 == 0:
                print(
                    f"Epoch {e}/{epochs-1} | Batch: {b}/{n_batches-1} | Loss: {loss.item()}")

    # Metadata needed for aggregation server side
    metadata = {
        # num_examples are mandatory
        'num_examples': len(x_train),
        'batch_size': batch_size,
        'epochs': epochs,
        'lr': lr
    }

    # Save JSON metadata file (mandatory)
    save_metadata(metadata, out_model_path)

    # Save model update (mandatory)
    save_parameters(model, out_model_path)


if __name__ == "__main__":
    train(sys.argv[1], sys.argv[2])

The format of the input and output files (model updates) are using numpy ndarrays. A helper instance fedn.utils.helpers.plugins.numpyhelper is used to handle the serialization and deserialization of the model updates. The first function (_compile_model) is used to define the model architecture and creates an initial model (which is then used by _init_seed). The second function (_load_data) is used to read the data (train and test) from disk. The third function (_save_model) is used to save the model to disk using the numpy helper module fedn.utils.helpers.plugins.numpyhelper. The fourth function (_load_model) is used to load the model from disk, again using the pytorch helper module. The fifth function (_init_seed) is used to initialize the seed model. The sixth function (_train) is used to train the model, observe the two first arguments which will be set by the FEDn client. The seventh function (_validate) is used to validate the model, again observe the two first arguments which will be set by the FEDn client.

Packaging for distribution

To deploy a project to FEDn (Studio or pseudo-local) we simply compress the client folder as .tgz file. using fedn command line tool or manually:

fedn package create --path client

The created file package.tgz can then be uploaded to the FEDn network using the fedn.network.api.client.APIClient.set_package().

More on local data access

There are many possible ways to interact with the local dataset. In principle, the only requirement is that the train and validate endpoints are able to correctly read and use the data. In practice, it is then necessary to make some assumption on the local environemnt when writing entrypoint.py. This is best explained by looking at the code above. Here we assume that the dataset is present in a file called “mnist.npz” in a folder “data” one level up in the file hierarchy relative to the exection of entrypoint.py. Then, independent on the preferred way to run the client (native, Docker, K8s etc) this structure needs to be maintained for this particular compute package. Note however, that there are many ways to accompish this on a local operational level.

Testing the entry points before deploying the package to FEDn

We recommend you to test your code before deploying it to FEDn for distibution to clients. You can conveniently test train and validate by:

python train.py ../seed.npz ../model_update.npz --data_path ../data/mnist.npz
python validate.py ../model_update.npz ../validation.json --data_path ../data/mnist.npz

Once everything works as expected you can start the federated network, upload the .tgz compute package and the initial model (use fedn.network.api.client.APIClient.set_initial_model() for uploading an initial model).