Skip to content

Stages, Steps and Workflows

deployment workflow


Each task you want to run, such as training a model, scoring data or starting a model-scoring service, needs to be defined within an executable Python module. Each module defines a single stage. Bodywork will run each stage in its own pre-built Bodywork container, on Kubernetes.

There are two different types of stage that can be created:

Batch Stages
For executing code that performs a discrete task - e.g. preparing features, training a model or scoring a dataset. Batch stages have a well defined end and will be automatically shut-down after they have successfully completed.
Service Stages
For executing code that starts a service - e.g. a Flask application that loads a model and then exposes a REST API for model-scoring. Service stages are long-running processes with no end, that will be kept up-and-running until they are deleted.


A step is a collection of one or more stages that can be running at the same time (concurrently). For example, when training multiple models in parallel or starting multiple services at once. Stages that should only be executed after another stage has finished, should be placed in different steps, in the correct order.


A workflow is an ordered collection of one or more steps, that are executed sequentially, where the next step is only executed after all of the stages in the previous step have completed successfully. A workflow can be represented as a Directed Acyclic Graph (DAG).

Example: Batch Job

batch stage

Workflows need not be complex and often all that's required is for a simple batch job to be executed - for example, to score a dataset using a pre-trained model. Bodywork handles this scenario as a workflow consisting of a single batch stage, running within a single step.

Example: Deploy Service

service stage

Sometimes models are trained off-line, or on external platforms, and all that's required is to deploy a service that exposes them. Bodywork handles this scenario as a workflow consisting of a single service stage, running within a single step.

Example: Train-and-Serve Pipeline

train-and-serve ML pipeline

Most ML projects can be described by one model-training stage and one service deployment stage. The training stage is executed in the first step and the serving stage in the second. This workflow can be used to automate the process of re-training models as new data becomes available, and to automatically re-deploy the model-scoring service with the newly-trained model.

Deployment from Git Repos

Bodywork requires projects to be stored and distributed as Git repositories, hosted on GitHub, GitLab, Azure DevOps or BitBucket. When a deployment is triggered, Bodywork starts a workflow-controller that clones the repository, analyses configuration data provided in a bodywork.yaml file and then manages the execution of the workflow, creating new containers for each stage.

At no point is there any need to build Docker images and push them to a container registry. This simplifies the CI/CD pipeline for your project, so that you can focus on the aspects (e.g. tests) that are more relevant to your machine learning task.

ML pipeline deployment

Bodywork does not impact how you choose to structure and engineer your projects. The only requirement for deploying a project with Bodywork, is to add a single bodywork.yaml file to your project's root directory. This file contains all of the configuration data required by Bodywork to deploy your project to Kubernetes.

For the train-and-serve scenario discussed above, the project structure can be like that for any conventional Python project - for example,

Git project structure


Executable Python modules that contain the code required for a stage.
Bodywork configuration data - for example, which Python module to use for each stage, external Python packages that need to be installed, arguments to pass to modules, secret credentials, the workflow DAG, etc. These are covered in detail, in the user guide.

This project can then be configured to run on a schedule with one command,

schedule ML deployments

Working with private Git repositories

The example above assumes the Git repository is public - for more information on working with private repositories, please see here.