User Guide

This is a comprehensive guide to deploying ML projects to Kubernetes using Bodywork. It assumes that you understand the key concepts that Bodywork is built upon and that you have worked-through the Quickstart Tutorials.

Deployment Project Structure

Bodywork-compatible ML projects need to be structured in a specific way. All the files necessary for defining a stage must be contained within a directory dedicated to that stage. The directory name defines the name of the stage. This enables the Bodywork workflow-controller to identify the stages and run them in the desired order. Consider the following example directory structure,

 |-- prepare-data/
     |-- requirements.txt
     |-- config.ini
 |-- train-svm/
     |-- requirements.txt
     |-- config.ini
 |-- train-random-forest/
     |-- requirements.txt
     |-- config.ini
 |-- choose-model/
     |-- requirements.txt
     |-- config.ini
 |-- model-scoring-service/
     |-- requirements.txt
     |-- config.ini
 |-- bodywork.ini

Here we have five directories given names that relate to the ML tasks contained within them. There is also a single workflow configuration file, bodywork.ini. Each directory must contain the following files:

An executable Python module that contains all the code required for the stage. For example, should be capable of performing all data preparation steps when executed from the command line using python
For listing 3rd party Python packages required by the executable Python module. This must follow the format required by Pip.
Containing stage configuration that will be discussed in more detail below.

Running Tasks in Remote Python Environments

Bodywork ML pipeline

Bodywork projects must be packaged as a Git repositories (e.g. on GitHub), that will be cloned by Bodywork when executing workflows. When the Bodywork workflow-controller executes a stage, it starts a new Python-enabled container in your Kubernetes cluster and instructs it to pull the required directory from your project's Git repository. Then, it installs any 3rd party Python package requirements, before running the executable Python module.

Configuring Workflows

All configuration for a workflow is contained within the bodywork.ini file, that must exist in the root directory of your project's Git repository. An example bodywork.ini file for the project structure in the example above could be,


DAG=prepare-data >> train-svm, train-random-forest >> choose-model >> model-scoring-service


Each configuration parameter is used as follows:

This will be used to identify all Kubernetes resources deployed for this project.
The container image to use for remote execution of Bodywork workflows and stages. This should be set to bodyworkml/bodywork-core:latest, which will be pulled from DockerHub.
A description of the workflow structure - the stages to include in each step of the workflow - this will be discussed in more detail below. - LOG_LEVEL: must be one of: DEBUG, INFO, WARNING, ERROR or CRITICAL. Manages the types of log message to stream to the workflow-controller's standard output stream (stdout).

Defining Workflow DAGs

The DAG string is used to control the execution of stages by assigning them to different steps of the workflow. Steps are separated using the >> operator and commas are used to delimit multiple stages within a single step (if this is required). Steps are executed from left to right. In the example above,

DAG=prepare-data >> train-svm, train-random-forest >> choose-model >> model-scoring-service

The workflow will be interpreted as follows:

  • step 1: run prepare-data; then,
  • step 2: run train-svm and train-random-forest in separate containers, in parallel; then,
  • step 3: run choose-model; and finally,
  • step 4: run model-scoring-service.

Configuring Stages

The behavior of each stage is controlled by the configuration parameters in the config.ini file. For the model-scoring-service stage in our example project this could be,




The [default] section is common to all types of stage and the [secrets] section is optional. The remaining section must be one of [batch] or [service].

Each [default] configuration parameter is to be used as follows:

One of batch or service. If batch is selected, then the executable script will be run as a discrete job (with a start and an end), and will be managed as a Kubernetes job. If service is selected, then the executable script will be run as part of a Kubernetes deployment and will expose a Kubernetes cluster-ip service to enable access over HTTP, within the cluster.
The name of the executable Python module to run, which must exist within the stage's directory. Executable means that executing python from the CLI would cause the module (or script) to run.
The compute resources to request from the cluster in order to run the stage. For more information on the units used in these parameters refer here.

Batch Stages

An example [batch] configuration for the prepare-data stage could be as follows,



Time to wait for the given task to run, before retrying or raising a workflow execution error.
Number of times to retry executing a failed stage, before raising a workflow execution error.

Service Deployment Stages

An example [service] configuration for the model-scoring-service stage could be as follows,



Time to wait for the service to be 'ready' without any errors having occurred. When the service reaches the time limit without raising errors, then it will be marked as 'successful'. If a service deployment stage fails to be successful, then the deployment will be automatically rolled-back to the previous version.
Number of independent containers running the service started by the stage's Python executable module - The service endpoint will automatically route requests to each replica at random.
The port to expose on the container - e.g. Flask-based services usually send and receive HTTP requests on port 5000.
Whether or not to create a route (or path) from the cluster's externally-facing ingress controller, to this service. If set to True, it will enable external requests to reach the service via the ingress controller (acting as an API gateway), with the following URL,
See Configuring Ingress for more information on exposing services to external HTTP requests.

Injecting Secrets

Credentials will be required whenever you wish to pull data or persist models to cloud storage, access private APIs, etc. We provide a secure mechanism for dynamically injecting credentials as environment variables within the container running a stage.

The first step in this process is to store your project's secret credentials, securely within its namespace - see Managing Credentials and Other Secrets below for instructions on how to achieve this using Bodywork.

The second step is to configure the use of this secret with the [secrets] section of the stages's config.ini file. For example,


Will instruct Bodywork to look for values assigned to the keys USERNAME and PASSWORD within the Kubernetes secret named my-classification-product-cloud-storage-credentials. Bodywork will then assign these secrets to environment variables within the container, called USERNAME and PASSWORD, respectively. These can then be accessed from within the stage's executable Python module - for example,

import os

if __name__ == '__main__':
    username = os.environ['USERNAME']
    password = os.environ['PASSWORD']

Configuring Namespaces

Each Bodywork project should operate within its own namespace in your Kubernetes cluster. To setup a Bodywork compatible namespace, issue the following command from the CLI,

$ bodywork setup-namespace my-classification-product

Which will yield the following output,

creating namespace=my-classification-product
creating service-account=bodywork-workflow-controller in namespace=my-classification-product
creating cluster-role-binding=bodywork-workflow-controller--my-classification-product
creating service-account=bodywork-jobs-and-deployments in namespace=my-classification-product

We can see that in addition to creating the namespace, two service-accounts will also be created. This will grant containers in my-classification-product the appropriate authorisation to run workflows, batch jobs and deployments within the newly created namespace. Additionally, a binding to a cluster-role is also created. This will enable containers in the new namespace to list all available namespaces on the cluster. The cluster-role will be created if it does not yet exist.

Managing Secrets

Credentials will be required whenever you wish to pull data or persist models to cloud storage, or access private APIs from within a stage. We provide a secure mechanism for dynamically injecting secret credentials as environment variables into the container running a stage. Before a stage can be configured to inject a secret into its host container, the secret has to be placed within the Kubernetes namespace that the workflow will be deployed to. This can be achieved from the command line - for example,

$ bodywork secret create \
    --namespace=my-classification-product \
    --name=my-classification-product-cloud-storage-credentials \
    --data USERNAME=bodywork PASSWORD=bodywork123!

Will store USERNAME and PASSWORD within a Kubernetes secret resource called my-classification-product-cloud-storage-credentials in the my-classification-product namespace. To inject USERNAME and PASSWORD as environment variables within a stage, see Injecting Secrets into Stage Containers below.

Working with Private Git Repositories using SSH

When working with remote Git repositories that are private, Bodywork will attempt to access them via SSH. For example, to setup SSH access for use with GitHub, see this article. This process will result in the creation of a private and public key-pair to use for authenticating with GitHub. The private key must be stored as a Kubernetes secret in the project's namespace, using the following naming convention for the secret name and secret data key,

$ bodywork secret create \
    --namespace=my-classification-product \
    --name=ssh-github-private-key \
    --data BODYWORK_GITHUB_SSH_PRIVATE_KEY=paste_your_private_key_here

When executing a workflow defined in a private Git repository, make sure to use the SSH protocol when specifying the git-repo-url - e.g. use,

As opposed to,

Testing Workflows Locally

Workflows can be triggered locally from the command line, with the workflow-controller logs streamed to your terminal. In this mode of operation, the workflow-controller is operating on your local machine, but it is still orchestrating containers on Kubernetes remotely. It will still clone your project from the specified branch of the Bodywork project's Git repository, and delete it when finished.

For the example project used throughout this user guide, the CLI command for triggering the workflow locally using the master branch of the remote Git repository, would be as follows,

$ bodywork workflow \
    --namespace=my-classification-product \ \

It is also possible to specify a branch from a local Git repository. A local version of the above example - this time using the dev branch - could be as follows,

$ bodywork workflow \
    --namespace=my-classification-product \
    file:///absolute/path/to/my-classification-product \

Testing Service Deployments

A brief summary of all service-related information can be retrieved by issuing,

$ bodywork service display \

Which will yield output like,

|- GIT_URL     
|- GIT_BRANCH            master
|- INGRESS_CREATED       true
|- INGRESS_ROUTE         /my-classification-product/my-classification-product--model-scoring-service

Service deployments are accessible via HTTP from within the cluster - they cannot be exposed to the public internet, unless you have installed an ingress controller in your cluster. The simplest way to test a service from your local machine, is by using a local proxy server to enable access to your cluster. This can be achieved by issuing the following command,

$ kubectl proxy

Then in a new shell, you can use the curl tool to test the service. For example, issuing,

$ curl http://localhost:8001/api/v1/namespaces/my-classification-product/services/my-classification-product--model-scoring-service/proxy \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"x": 5.1, "y": 3.5}'

Should return the payload according to how you've defined your service in the executable Python module - e.g. in the file found within the model-scoring-service stage's directory.

If you have installed an ingress controller in your cluster, and if the the INGRESS configuration parameter has been set to True in the service stage's config.ini file, then the service can be tested via the public internet using,

$ curl http://YOUR_CLUSTERS_EXTERNAL_IP/my-classification-product/my-classification-product--model-scoring-service/ \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"x": 5.1, "y": 3.5}'

See here for instruction on how to retrieve YOUR_CLUSTERS_EXTERNAL_IP.

Deleting Service Deployments

Once you have finished testing, you may want to delete any service deployments that have been created. To list all active service deployments within a namespace, issue the command,

$ bodywork service display \

Then to delete a service deployment use,

$ bodywork service delete

Workflow-Controller Logs

All logs should start in the same way,

2020-11-24 20:04:12,648 - INFO - workflow.run_workflow - attempting to run workflow for project= on branch=master in kubernetes namespace=my-classification-product
git version 2.24.3 (Apple Git-128)
Cloning into 'bodywork_project'...
remote: Enumerating objects: 92, done.
remote: Counting objects: 100% (92/92), done.
remote: Compressing objects: 100% (64/64), done.
remote: Total 92 (delta 49), reused 70 (delta 27), pack-reused 0
Receiving objects: 100% (92/92), 20.51 KiB | 1.58 MiB/s, done.
Resolving deltas: 100% (49/49), done.
2020-11-24 20:04:15,579 - INFO - workflow.run_workflow - attempting to execute DAG step=['prepare-data']
2020-11-24 20:04:15,580 - INFO - workflow.run_workflow - creating job=my-classification-product--prepare-data in namespace=my-classification-product

After a stage completes, you will notice that the logs from within the container are streamed into the workflow-controller logs. For example,

---- pod logs for my-classification-product--prepare-data
2020-11-24 20:04:18,917 - INFO - stage.run_stage - attempting to run stage=prepare-data from master branch of repo at
git version 2.20.1
Cloning into 'bodywork_project'...
Collecting boto3==1.16.15
  Downloading boto3-1.16.15-py2.py3-none-any.whl (129 kB)

The aim of this log structure is to provide a useful way of debugging workflows out-of-the-box, without forcing you to integrate a complete logging solution. This is not a replacement for a complete logging solution - e.g. one based on Elasticsearch. It is intended as a temporary solution to get your ML projects operational, as quickly as possible.

Deploying Workflows

Workflows can be executed remotely using,

$ bodywork deployment create \
    --namespace=my-classification-product \
    --name=initial-deployment \
    --git-repo-url= \
    --git-repo-branch=master \

You can check on the status of the deployment using,

$ bodywork deployment display \

Which will yield output like,

JOB_NAME              START_TIME                    COMPLETION_TIME               ACTIVE      SUCCEEDED       FAILED
initial-deployment    2020-12-11 20:21:04+00:00     2020-12-11 20:23:12+00:00     0           1               0

And retrieve the logs using,

$ bodywork deployment logs \
    --namespace=my-classification-product \

Which will stream logs directly to your terminal. This output stream could also be redirected to a local file by using a shell redirection command such as,

$ bodywork deployment logs ... > log.txt

To overwrite the existing contents of log.txt, or,

$ bodywork deployment logs ... >> log.txt

To append to the existing contents of log.txt.

Scheduling Workflows

If your workflows are executing successfully, then you can schedule the workflow-controller to operate remotely on the cluster as a Kubernetes cronjob. For example, issuing the following command from the CLI,

$ bodywork cronjob create \
    --namespace=my-classification-product \
    --name=my-classification-product \
    --schedule="0,15,30,45 * * * *" \
    --git-repo-url= \
    --git-repo-branch=master \

Would schedule our example project to run every 15 minutes. The cronjob's execution history can be retrieved from the cluster using,

$ bodywork cronjob history \
    --namespace=my-classification-product \

Which will yield output along the lines of,

JOB_NAME                                START_TIME                    COMPLETION_TIME               ACTIVE      SUCCEEDED       FAILED
my-classification-product-1605214260    2020-11-12 20:51:04+00:00     2020-11-12 20:52:34+00:00     0           1               0

Accessing Historic Logs

The logs for each job executed by the cronjob are contained within the remote workflow-controller. The logs for a single workflow execution attempt can be retrieved by issuing the bodywork cronjob logs command on the CLI - for example,

$ bodywork cronjob logs \
    --namespace=my-classification-product-1605214260 \

Would stream logs directly to your terminal, from the workflow execution attempt labelled my-classification-product-1605214260, in precisely the same way as was described for the bodywork deployment logs command described above.