Setting-up Kubernetes¶

If you already have access to a Kubernetes cluster, then skip to Configuring Ingress. If you are new to Kubernetes, then skip to Getting Started with Kubernetes. Otherwise, please read on.

Deploying Locally¶

If you want to test deployments locally, then you can run Kubernetes on your local machine with the help of one of the following tools:

We currently recommend Minikube, as it comes packaged with useful add-ons (e.g., ingress and the Kubernetes dashboard), that makes life easy for those starting-out with Kubernetes. It is also well documented and supported by a large community.

Managed Kubernetes Services¶

When you are ready to deploy to the cloud, then the easiest path is via a managed Kubernetes service from one of the following cloud infrastructure providers:

Required Kubernetes Version¶

Bodywork relies on the official Kubernetes Python client, whose latest version (17.17.0) has full compatibility with Kubernetes 1.17. We recommend that you also use Kubernetes 1.17, but in-practice Bodywork will work with other versions - more information can be found here. Bodywork is tested against Kubernetes 1.17 running on Amazon Elastic Kubernetes Service (EKS).

Installing the Kubectl Tool¶

Kubectl is the command-line tool that lets you control your Kubernetes cluster - see here for an overview. Bodywork does not use Kubectl (it talks directly to the Kubernetes API instead), and so it is not a requirement. Regardless, Kubectl is an essential tool to have access to, so we strongly recommend that you install it - see here for instructions.

Configuring Ingress¶

If you want to expose Bodywork-deployed services to requests from outside your cluster, then you need to install the NGINX Ingress Controller within your cluster. This will act like an API Gateway for your cluster, that will route external HTTP requests to internal services.

The NGINX Ingress controller is an official Kubernetes project and can be installed with a single command - for local a Minikube cluster you would use,

$ minikube addons enable ingress

Or for EKS on AWS you would use,

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.44.0/deploy/static/provider/aws/deploy.yaml

The precise details for every potential Kubernetes deployment option are listed here.

Managed Kubernetes services will also provision an external load balancer to manage the flow of traffic to the ingress controller (and hence the cluster). Note, this will have an associated cost that is additional to that of the Kubernetes cluster.

Connecting to the Cluster¶

Get the public-facing IP address for your ingress controller with the following command,

kubectl -n ingress-nginx get service ingress-nginx-controller

And make a note of the EXTERNAL-IP field, that will have to be used for all requests to Bodywork-deployed services originating from outside the cluster. Services within the cluster can communicate with one another using the cluster's internal network.

Getting Started with Kubernetes¶

An easy way to get started with Kubernetes, is with Minikube. Minikube enables you to easily create and manage single-node Kubernetes clusters on your local machine, via the command line. It comes with everything you need to use Bodywork and prepare for deploying to remote clusters.

If you are running on MacOS with the Homebrew package manager available, then installing Minikube is as simple as running,

$ brew install minikube

If you’re running on Windows or Linux, then follow the appropriate installation instructions.

Creating your first Cluster¶

Once you have Minikube installed, start a cluster using the latest version of Kubernetes that Bodywork supports,

$ minikube start --kubernetes-version=v1.17.17

And then enable ingress, so we can route HTTP requests to services deployed using Bodywork.

$ minikube addons enable ingress

You’ll also need the cluster’s IP address, which you can get using,

$ minikube profile list

|----------|-----------|---------|--------------|------|----------|---------|-------|
| Profile  | VM Driver | Runtime |      IP      | Port | Version  | Status  | Nodes |
|----------|-----------|---------|--------------|------|----------|---------|-------|
| minikube | hyperkit  | docker  | 192.168.64.5 | 8443 | v1.17.17 | Running |     1 |
|----------|-----------|---------|--------------|------|----------|---------|-------|

When you’re done with this tutorial, the cluster can be powered-down using.

$ minikube stop

Basic Concepts¶

Here is a brief introduction to the most common types of Kubernetes resources, with a guide to how Bodywork uses them to deploy your projects:

namespace

You can think of a namespace as a virtual cluster (within the cluster), where related resources can be grouped together. Bodywork creates and manages namespaces on your behalf.

pod

A pod can be thought of as a collection of one or more containers, running on a single machine. Bodywork will create pods in which to run your batch jobs and services.

deployment

A high-level resource for managing applications running in pods. It can ensure that a minimum number of pods are always operational (by restarting failed pods), manage rolling-updates and (where necessary) rollbacks. Bodywork uses deployments for managing your services.

service

A service is a single constant IP address, through which clients can connect to services running in pods. Bodywork will create an internal cluster service for every service that you want to deploy. This enables any other client within the cluster to access it at this IP address, or via a domain name following the convention,

http://SERVICE_NAME.NAMESPACE.svc.cluster.local

ingress

If you have enabled ingress for your cluster, then it will be running the NGINX ingress-controller. This will route requests from clients external to the cluster, to your services within the cluster, using the URL to locate the desired service. Bodywork can create and manage ingress rules for your services, so that they're accessible by clients external to the cluster.

secret

A mechanism for storing sensitive information in an encrypted format and securely distributing it to the pods that need it. Bodywork uses secrets to store any credentials that your projects may need access to - e.g., SHH keys for private Git repositories or API credentials.

Accessing the Dashboard¶

The Kubernetes dashboard allows you to view all resources that have been deployed to your cluster and also provides some basic resource management functionality. You can access it by issuing the following command,

$ minikube dashboard

Which will open the dashboard in your default web browser. By default, it will only show you resources deployed to the default namespace. Use the namespace selector drop-down box at the top of the dashboard to switch to other namespaces - e.g., those created for your Bodywork deployments.

The Kubectl Tool¶

Kubectl is the command-line tool that lets you control your Kubernetes cluster. Minikube comes packaged with a version of Kubectl that you can use via the Minikube CLI. For example, to get basic cluster information you would use,

$ minikube kubectl -- cluster-info

Kubernetes master is running at https://192.168.64.5:8443
KubeDNS is running at https://192.168.64.5:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

i.e., you add the Kubectl command you want to run, after minikube kubectl --. If you get tired of prepending this to each Kubectl command, you can follow our instructions for installing the official version of Kubectl.

Some useful Kubectl commands to get started with:

Listing all Namespaces¶

$ kubectl get ns

Deleting all resources within a Namespace¶

If your Bodywork deployment has come to an end,

$ kubectl delete ns MY_NAMESPACE

Listing resources within a Namespace¶

To get a list of every resource within a namespace,

$ kubectl -n MY_NAMESPACE get all

To focus on a specific resource type, for example pods, you would instead use,

$ kubectl -n MY_NAMESPACE get pods

Getting resource Information¶

$ kubectl -n MY_NAMESPACE describe RESOURCE_TYPE RESOURCE_NAME

For example, to get high-level info for a pod named foo-7dd8975899-57hj6, you would use,

$ kubectl -n MY_NAMESPACE describe pod foo-7dd8975899-57hj6

This will list all events associated with the pod, as well as a lot more information about how it has been configured by Bodywork.

Retrieving Pod Logs¶

To stream a pod's stdout and stderr to your local shell,

$ kubectl -n MY_NAMESPACE logs MY_POD_NAME

Which can be useful for debugging.

Starting a shell within a running Container¶

For a running pod named foo-7dd8975899-57hj6, you can start a shell in this container using,

kubectl exec -n MY_NAMESPACE foo-7dd8975899-57hj6 -it -- /bin/bash

This is also useful for debugging - for example, you can open a Python REPL or run env to list all environment variables (e.g. secrets) that have made it onto the container.

Starting a HTTP proxy server to the Kubernetes API¶

Issuing the following command,

$ kubectl proxy --port 8001

Starts a local proxy server that acts as a gateway to the Kubernetes API. Among other things, this allows you to access services on the cluster that are not exposed to the public internet. For example, with the proxy server operational, browsing to,

http://localhost:8001/api/v1/namespaces/NAMESPACE/services/SERVICE_NAME/proxy/

Will take you to service SERVICE_NAME, in the namespace NAMESPACE.

Monitoring Deployments¶

An effective way of monitoring a Bodywork deployment, is via the Kubernetes dashboard. Before you trigger a new deployment, open the dashboard and browse to Workloads, for the namespace in which the deployment is to be made. Leave your browser open while you trigger the deployment using the Bodywork CLI. The dashboard will update automatically, showing you the resources that have been created as they are deployed.

An alternative to the Kubernetes dashboard, is to use the watch command from within a shell, to monitor the results of a Kubectl command. For example,

$ watch --interval 1 kubectl -n default get all

Will display a list of all resources in the default namespace, updating with an interval of 1 second.

Working with remote Clusters¶

There are many options for creating managed Kubernetes clusters, in the cloud. Setting these up is beyond the scope of this introduction to Kubernetes. Once your remote cluster is operational, deploying to it is as easy as changing the cluster that Kubectl is targeting. To see what clusters Kubectl has been setup to use, run,

$ kubectl config get-contexts

CURRENT   NAME                                         CLUSTER                            AUTHINFO                                     NAMESPACE
          aws_admin@my-cluster.eu-west-2.eksctl.io     my-cluster.eu-west-2.eksctl.io     aws_admin@my-cluster.eu-west-2.eksctl.io
*         minikube                                     minikube                           minikube                                     default

To switch from the minikube to the aws_admin@my-cluster.eu-west-2.eksctl.io I would run,

$ kubectl config use-context aws_admin@my-cluster.eu-west-2.eksctl.io

And then Kubectl and Bodywork will automatically target my chosen cluster.

Learning More¶

Familiarity with basic Kubernetes concepts and some exposure to the Kubectl command-line tool make life easier, but are not essential for using Bodywork. If you would like to learn a bit more about Kubernetes, then we recommend the first two introductory sections of Marko Lukša's excellent book Kubernetes in Action, or the introductory article we wrote on Deploying Python ML Models with Flask, Docker and Kubernetes.

Getting Help¶

If you need help with Kubernetes, then please don't hesitate to post questions and ask for help on our discussion board. You are not alone and we'll do our best to get you up-and-running quickly.