Workloads
Introduction
A workload is an application running on Kubernetes. Whether your workload is a single component or several that work together, on Kubernetes you run it inside a set of Pods. In Kubernetes, a Pod represents a set of running containers on your cluster.
A Pod has a defined lifecycle. For example, once a Pod is running in your cluster then a critical failure on the node where that Pod is running means that all the Pods on that node fail. Kubernetes treats that level of failure as final: you would need to create a new Pod even if the node later recovers.
However, to make life considerably easier, you don't need to manage each Pod directly. Instead, you can use workload resources that manage a set of Pods on your behalf. These resources configure controllers that make sure the right number of the right kind of Pod are running, to match the state you specified.
Those workload resources include:
- Deployment and ReplicaSet (replacing the legacy resource ReplicationController)
- StatefulSet
- DaemonSet for running Pods that provide node-local facilities, such as a storage driver or network plugin
- Job and CronJob for tasks that run to completion
ReplicationController
A ReplicationController ensures that a specified number of pod replicas are running at any one time. In other words, a ReplicationController makes sure that a pod or a homogeneous set of pods is always up and available.
A Deployment that configures a ReplicaSet is now the recommended way to set up replication.
If there are too many pods, the ReplicationController terminates the extra pods. If there are too few, the ReplicationController starts more pods. Unlike manually created pods, the pods maintained by a ReplicationController are automatically replaced if they fail, are deleted, or are terminated.
This example ReplicationController config runs three copies of the nginx web server.
apiVersion: v1
kind: ReplicationController
metadata:
name: nginx
spec:
replicas: 3
selector:
app: nginx
template:
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
ReplicaSet
A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.
A ReplicaSet is defined with fields, including a selector that specifies how to identify Pods it can acquire, a number of replicas indicating how many Pods it should be maintaining, and a pod template specifying the data of new Pods it should create to meet the number of replicas criteria. A ReplicaSet then fulfills its purpose by creating and deleting Pods as needed to reach the desired number. When a ReplicaSet needs to create new Pods, it uses its Pod template.
The only difference between ReplicationController and ReplicaSet is in the usage of selectors to replicate pods. Replica Set use Set-Based selectors while replication controllers use Equity-Based selectors.
This example ReplicaSet runs three copies of the php-redis pod.
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: frontend
labels:
app: guestbook
tier: frontend
spec:
# modify replicas according to your case
replicas: 3
selector:
matchLabels:
tier: frontend
template:
metadata:
labels:
tier: frontend
spec:
containers:
- name: php-redis
image: gcr.io/google_samples/gb-frontend:v3
Deployment
A Deployment provides declarative updates for Pods and ReplicaSets.
You describe a desired state in a Deployment, and the Deployment Controller changes the actual state to the desired state at a controlled rate. You can define Deployments to create new ReplicaSets, or to remove existing Deployments and adopt all their resources with new Deployments.
Do not manage ReplicaSets owned by a Deployment. Consider opening an issue in the main Kubernetes repository if your use case is not covered below.
Use Cases
The following are typical use cases for Deployments:
- Create a Deployment to rollout a ReplicaSet. The ReplicaSet creates Pods in the background. Check the status of the rollout to see if it succeeds or not.
- Declare the new state of the Pods by updating the PodTemplateSpec of the Deployment. A new ReplicaSet is created and the Deployment manages moving the Pods from the old ReplicaSet to the new one at a controlled rate. Each new ReplicaSet updates the revision of the Deployment.
- Rollback to an earlier Deployment revision if the current state of the Deployment is not stable. Each rollback updates the revision of the Deployment.
- Scale up the Deployment to facilitate more load.
- Pause the Deployment to apply multiple fixes to its PodTemplateSpec and then resume it to start a new rollout.
- Use the status of the Deployment as an indicator that a rollout has stuck.
- Clean up older ReplicaSets that you don't need anymore
The following is an example of a Deployment. It creates a ReplicaSet to bring up three nginx Pods:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
In this example:
- A Deployment named nginx-deployment is created, indicated by the .metadata.name field.
- The Deployment creates three replicated Pods, indicated by the .spec.replicas field.
- The .spec.selector field defines how the Deployment finds which Pods to manage. In this case, you simply select a label that is defined in the Pod template (app: nginx). However, more sophisticated selection rules are possible, as long as the Pod template itself satisfies the rule.
- The template field contains the following sub-fields:
- The Pods are labeled app: nginxusing the .metadata.labels field.
- The Pod template's specification, or .template.spec field, indicates that the Pods run one container, nginx, which runs the nginx Docker Hub image at version 1.14.2.
- Create one container and name it nginx using the .spec.template.spec.containers[0].name field.
Updating
A Deployment’s rollout is triggered if and only if the Deployment’s Pod template (that is, .spec.template) is changed, for example if the labels or container images of the template are updated. Other updates, such as scaling the Deployment, do not trigger a rollout.
Example
update the nodejs-app Pods to use the hellojs:2.0.0
image instead of the hellojs:1.0.0
image.
$ kubectl --record deployment.apps/nodejs-app set image nodejs=hellojs:2.0.0
deployment.apps/nodejs-app image updated
$ kubectl rollout status deployment.apps/nodejs-app
Waiting for deployment "nodejs-app" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "nodejs-app" rollout to finish: 1 old replicas are pending termination...
deployment "nodejs-app" successfully rolled out
Sometimes, you may want to rollback a Deployment; for example, when the Deployment is not stable, such as crash looping.
Example: typo 1.0.1
$ kubectl --record deployment.apps/nodejs-app set image nodejs=hellojs:1.0.1
$ kubectl get rs
NAME DESIRED CURRENT READY AGE
nodejs-app-78fbc49f85 3 3 3 28m
nodejs-app-7fbd5bcc78 0 0 0 22m
nodejs-app-97fc9fcf6 1 1 0 23s
$ kubectl rollout history deployment.apps/nodejs-app
REVISION CHANGE-CAUSE
3 kubectl.exe deployment.apps/nodejs-app set image nodejs=hellojs:1.0.0 --record=true
4 kubectl.exe deployment.apps/nodejs-app set image nodejs=hellojs:1.0.1 --record=true
$ kubectl rollout undo deployment.apps/nodejs-app
deployment.apps/nodejs-app rolled back
$ kubectl get deployment.apps/nodejs-app
NAME READY UP-TO-DATE AVAILABLE AGE
nodejs-app 3/3 3 3 30m
Scaling
You can scale a Deployment by using the following command:
$ kubectl scale deployment.apps/nodejs-app --replicas=5
deployment.apps/nodejs-app scaled
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nodejs-app-78fbc49f85-6v8kw 1/1 Running 0 49s
nodejs-app-78fbc49f85-7dbwt 1/1 Running 0 48s
nodejs-app-78fbc49f85-7hq9k 1/1 Running 0 25m
nodejs-app-78fbc49f85-8cdzt 1/1 Running 0 25m
nodejs-app-78fbc49f85-t89rm 1/1 Running 0 35m
$ kubectl get rs
NAME DESIRED CURRENT READY AGE
nodejs-app-78fbc49f85 5 5 5 40m
Failed Deployment
Your Deployment may get stuck trying to deploy its newest ReplicaSet without ever completing. This can occur due to some of the following factors:
- Insufficient quota
- Readiness probe failures
- Image pull errors
- Insufficient permissions
- Limit ranges
- Application runtime misconfiguration
One way you can detect this condition is to specify a deadline parameter in your Deployment spec: (.spec.progressDeadlineSeconds)
. .spec.progressDeadlineSeconds
denotes the number of seconds the Deployment controller waits before indicating (in the Deployment status) that the Deployment progress has stalled.
The following kubectl command sets the spec with progressDeadlineSeconds to make the controller report lack of progress for a Deployment after 10 minutes:
$ kubectl patch deployment.v1.apps/nginx-deployment -p '{"spec":{"progressDeadlineSeconds":600}}'
deployment.apps/nginx-deployment patched
StatefulSet
StatefulSet is the workload API object used to manage stateful applications.
Manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods.
Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.
If you want to use storage volumes to provide persistence for your workload, you can use a StatefulSet as part of the solution. Although individual Pods in a StatefulSet are susceptible to failure, the persistent Pod identifiers make it easier to match existing volumes to the new Pods that replace any that have failed. Using StatefulSets
StatefulSets are valuable for applications that require one or more of the following.
- Stable, unique network identifiers.
- Stable, persistent storage.
- Ordered, graceful deployment and scaling.
- Ordered, automated rolling updates.
In the above, stable is synonymous with persistence across Pod (re)scheduling. If an application doesn't require any stable identifiers or ordered deployment, deletion, or scaling, you should deploy your application using a workload object that provides a set of stateless replicas. Deployment or ReplicaSet may be better suited to your stateless needs.
The example below demonstrates the definition of a StatefulSet.
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 3 # by default is 1
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
DaemonSet
A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.
Some typical uses of a DaemonSet are:
- running a cluster storage daemon on every node
- running a logs collection daemon on every node
- running a node monitoring daemon on every node
In a simple case, one DaemonSet, covering all nodes, would be used for each type of daemon. A more complex setup might use multiple DaemonSets for a single type of daemon, but with different flags and/or different memory and cpu requests for different hardware types.
The example below demonstrates the definition of a DaemonSet.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: kube-system
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
tolerations:
# this toleration is to have the daemonset runnable on master nodes
# remove it if your masters can't run pods
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd-elasticsearch
image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
Jobs
A Job creates one or more Pods and ensures that a specified number of them successfully terminate. As pods successfully complete, the Job tracks the successful completions. When a specified number of successful completions is reached, the task (ie, Job) is complete. Deleting a Job will clean up the Pods it created.
A simple case is to create one Job object in order to reliably run one Pod to completion. The Job object will start a new Pod if the first Pod fails or is deleted (for example due to a node hardware failure or a node reboot).
You can also use a Job to run multiple Pods in parallel.
Here is an example Job config. It computes π to 2000 places and prints it out. It takes around 10s to complete.
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4
backoffLimit
is to specify the number of retries before considering a Job as failed.
If Job fails and the restartPolicy = OnFailure
, then the Pod stays on the node, but the container is re-run. Else specify restartPolicy = Never
to continue according to the lifecycle of Pod.
CronJob
A CronJob creates Jobs on a repeating schedule.
One CronJob object is like one line of a crontab (cron table) file. It runs a job periodically on a given schedule, written in Cron format.
All CronJob schedule: times are based on the timezone of the kube-controller-manager.
CronJobs are useful for creating periodic and recurring tasks, like running backups or sending emails. CronJobs can also schedule individual tasks for a specific time, such as scheduling a Job for when your cluster is likely to be idle.
A cron job creates a job object about once per execution time of its schedule. Here, "about" means that there are certain circumstances where two jobs might be created, or no job might be created. Kubernetes attempts to make these rare, but cannot completely prevent them. Therefore, jobs should be idempotent.
This example CronJob manifest prints the current time and a hello message every minute:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
imagePullPolicy: IfNotPresent
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
Execise
Create a deployment using "nodejs-deployment.yaml"
apiVersion: apps/v1
kind: Deployment
metadata:
name: nodejs-app
labels:
run: nodejs-app
spec:
replicas: 3
selector:
matchLabels:
run: nodejs-app
template:
metadata:
labels:
run: nodejs-app
spec:
containers:
- name: nodejs
image: hellojs
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
Create a service using "nodejs-svc.yaml"
apiVersion: v1
kind: Service
metadata:
name: nodejs-svc
spec:
selector:
run: nodejs-app
ports:
- protocol: TCP
port: 8080
targetPort: 8080
$ kubectl apply -f nodejs-deployment.yaml
$ kubectl apply -f nodejs-svc.yaml
$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/nodejs-app-7bdb96c8db-5hl4w 1/1 Running 0 5m19s
pod/nodejs-app-7bdb96c8db-gxng6 1/1 Running 0 5m19s
pod/nodejs-app-7bdb96c8db-qrfq6 1/1 Running 0 5m19s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 7d
service/nodejs-svc ClusterIP 10.108.104.238 <none> 8080/TCP 4s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nodejs-app 3/3 3 3 5m20s
NAME DESIRED CURRENT READY AGE
replicaset.apps/nodejs-app-7bdb96c8db …
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
Kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 7h25m
nodejs-svc ClusterIP 10.107.19.196 <none> 8080/TCP 25m
$ kubectl proxy
http://localhost:8001/api/v1/namespaces/default/services/nodejs-svc/proxy/