Bodywork is distributed as a Python 3 package that exposes a CLI for interacting with your Kubernetes cluster. Using the Bodywork CLI you can deploy Bodywork-compatible ML projects, packaged as Git repositories hosted on either, GitHub, GitLab, Azure DevOps or BitBucket. This page is a reference for all Bodywork CLI commands.
$ bodywork --version
Prints the Bodywork package version to stdout.
Validate Configuration File¶
bodywork.yaml file can be checked for errors by issuing the following command from the CLI,
$ bodywork validate --check-files
--check-files flag will check if all
executable_module_path paths map to files that exist and can be reached by Bodywork, from the root directory where
bodywork.yaml is located. This command assumes that
bodywork.yaml is in the current working directory - if this is not the case, use the
--file option to specify the path of
bodywork.yaml. Validation errors are printed to stdout.
$ bodywork setup-namespace YOUR_NAMESPACE
Create and prepare a Kubernetes namespace for running Bodywork workflows - see Preparing a Namespace for use with Bodywork for more information. This command will also work with namespaces created by other means - e.g.
kubectl create ns YOUR_NAMESPACE - where it will not seek to recreate the existing namespace, only to ensure that it is correctly configured.
$ bodywork workflow \ --namespace=YOUR_NAMESPACE \ REMOTE_GIT_REPO_URL \ REMOTE_GIT_REPO_BRANCH
Clones the chosen branch of a Git repository containing a Bodywork ML project and then executes the workflow configured within it. This will start a Bodywork workflow-controller wherever the command is called. If you are working with private remote repositories you will need to use the SSH protocol and ensure that the appropriate private-key is available within a secret - see Working with Private Git Repositories using SSH for more information.
$ bodywork stage \ REMOTE_GIT_REPO_URL \ REMOTE_GIT_REPO_BRANCH \ STAGE_NAME
Clones the chosen branch of a Git repository containing a Bodywork ML project and then executes the named stage. This is equivalent to installing all the 3rd party Python package requirements specified in the stage's
requirement.txt file, and then executing
python NAME_OF_EXECUTABLE_PYTHON_MODULE.py as defined for the stage in
bodywork.yaml. See Configuring Stages for more information. The Bodywork stage-runner will be started wherever the command is called.
This command is intended for use by Bodywork containers and it is not recommended for use during Bodywork project development on your local machine.
A deployment is defined as a workflow-controller running as a job within the cluster (as opposed to a workflow-controller running locally). The workflow-controller deploys projects by executing the workflow defined in the project's DAG.
$ bodywork deployment display \ --namespace=YOUR_NAMESPACE
Will list all workflow-controller jobs that have run within
YOUR_NAMESPACE, whether or not they have been successful. All workflow-controller jobs are deleted after they have been in a completed state for 15 minutes.
$ bodywork deployment create \ --namespace=YOUR_NAMESPACE \ --name=DEPLOYMENT_NAME \ --git-repo-url=REMOTE_GIT_REPO_URL \ --git-repo-branch=REMOTE_GIT_REPO_BRANCH \ --retries=NUMBER_OF_TIMES_TO_RETRY_ON_FAILURE \ --local-workflow-contoller
Will immediately deploy your project by starting a workflow-controller job in your cluster, unless the
--local-workflow-contoller has been used, in which case this command becomes an alias for the Run Workflow command, and will run the workflow controller locally for easy testing.
$ bodywork deployment delete_job \ --namespace=YOUR_NAMESPACE \ --name=DEPLOYMENT_NAME
When a deployment is created, a workflow-controller job is started in your cluster. Not all clusters are configured to clean-up these jobs up automatically, in which case you may have to delete them manually.
Get Deployment Workflow Logs¶
$ bodywork deployment logs \ --namespace=YOUR_NAMESPACE \ --name=DEPLOYMENT_NAME
Stream the workflow logs from the workflow-controller job, to your terminal's standard output stream.
Secrets are used to pass credentials to containers running workflow stages that require authentication with 3rd party services (e.g. cloud storage providers). See Managing Credentials and Other Secrets and Injecting Secrets into Stage Containers for more information.
$ bodywork secret create \ --namespace=YOUR_NAMESPACE \ --name=SECRET_NAME \ --data SECRET_KEY_1=secret-value-1 SECRET_KEY_2=secret-value-2
$ bodywork secret delete \ --namespace=YOUR_NAMESPACE \ --name=SECRET_NAME
$ bodywork secret display \ --namespace=YOUR_NAMESPACE
Will print all secrets in
YOUR_NAMESPACE to stdout.
$ bodywork secret display \ --namespace=YOUR_NAMESPACE \ --name=SECRET_NAME
Will only print
SECRET_NAME to stdout.
Unlike batch stages that have a discrete lifetime, service deployments live indefinitely and may need to be managed as your project develops.
$ bodywork service display \ --namespace=YOUR_NAMESPACE
Will list information on all active service deployments available in
YOUR_NAMESPACE, including their internal cluster URLs.
$ bodywork service delete \ --namespace=YOUR_NAMESPACE \ --name=SERVICE_NAME
Delete an active service deployment - e.g. one that is no longer required for a project.
Workflows can be executed on a schedule using Bodywork cronjobs. Scheduled workflows will be managed by workflow-controller jobs that Bodywork starts automatically on your cluster.
$ bodywork cronjob display \ --namespace=YOUR_NAMESPACE
Will list all active cronjobs within
$ bodywork cronjob create \ --namespace=YOUR_NAMESPACE \ --name=CRONJOB_NAME \ --schedule=CRON_SCHEDULE \ --git-repo-url=REMOTE_GIT_REPO_URL \ --git-repo-branch=REMOTE_GIT_REPO_BRANCH \ --retries=NUMBER_OF_TIMES_TO_RETRY_ON_FAILURE \ --history-limit=MIN_NUMBER_OF_WORKFLOW_CONTROLLER_JOBS_TO_RETAIN
Will create a cronjob whose schedule must be a valid cron expression - e.g.
0 * * * * will run the workflow every hour. Use the
MIN_NUMBER_OF_WORKFLOW_CONTROLLER_JOBS_TO_RETAIN argument to set the minimum number of historical workflow-controller jobs that are retained, at any given moment in time.
$ bodywork cronjob delete \ --namespace=YOUR_NAMESPACE \ --name=CRONJOB_NAME
Will also delete all historic workflow-controller jobs associated with this cronjob.
Get Cronjob History¶
$ bodywork cronjob history \ --namespace=YOUR_NAMESPACE \ --name=CRONJOB_NAME
Display all workflow-controller jobs that were created by a cronjob.
Get Cronjob Workflow Logs¶
$ bodywork cronjob logs \ --namespace=YOUR_NAMESPACE \ --name=HISTORICAL_CRONJOB_WORKFLOW_EXECUTION_JOB_NAME
Stream the workflow logs from a historical workflow-controller job, to your terminal's standard output stream.
$ bodywork debug SECONDS
Runs the Python
time.sleep function for
SECONDS. This is intended for use with the Bodywork image and kubectl - for deploying a container on which to open shell access for advanced debugging. For example, issuing the following command,
$ kubectl create deployment DEBUG_DEPLOYMENT_NAME \ -n YOUR_NAMESPACE \ --image=bodyworkml/bodywork-core:latest \ -- bodywork debug SECONDS
Will deploy the Bodywork container and run the
bodywork debug SECONDS command within it. While the container is sleeping, a shell on the container in this deployment can be started. To achieve this, first of all find the pod's name, using,
$ kubectl get pods -n YOUR_NAMESPACE | grep DEBUG_DEPLOYMENT_NAME
And then open a shell to the container within this pod using,
$ kubectl exec DEBUG_DEPLOYMENT_POD_NAME -n YOUR_NAMESPACE -it -- /bin/bash
Once you're finished debugging, the deployment can be shut-down using,
$ kubectl delete deployment DEBUG_DEPLOYMENT_NAME -n YOUR_NAMESPACE