Writing a Controller for Pod Labels

Authors: Arthur Busser (Padok)

Operators are proving to be an excellent solution to running stateful distributed applications in Kubernetes. Open source tools like the Operator SDK provide ways to build reliable and maintainable operators, making it easier to extend Kubernetes and implement custom scheduling.

Kubernetes operators run complex software inside your cluster. The open source community has already built many operators for distributed applications like Prometheus, Elasticsearch, or Argo CD. Even outside of open source, operators can help to bring new functionality to your Kubernetes cluster.

An operator is a set of custom resources and a set of controllers. A controller watches for changes to specific resources in the Kubernetes API and reacts by creating, updating, or deleting resources.

The Operator SDK is best suited for building fully-featured operators. Nonetheless, you can use it to write a single controller. This post will walk you through writing a Kubernetes controller in Go that will add a pod-name label to pods that have a specific annotation.

Why do we need a controller for this?

I recently worked on a project where we needed to create a Service that routed traffic to a specific Pod in a ReplicaSet. The problem is that a Service can only select pods by label, and all pods in a ReplicaSet have the same labels. There are two ways to solve this problem:

  1. Create a Service without a selector and manage the Endpoints or EndpointSlices for that Service directly. We would need to write a custom controller to insert our Pod's IP address into those resources.
  2. Add a label to the Pod with a unique value. We could then use this label in our Service's selector. Again, we would need to write a custom controller to add this label.

A controller is a control loop that tracks one or more Kubernetes resource types. The controller from option n°2 above only needs to track pods, which makes it simpler to implement. This is the option we are going to walk through by writing a Kubernetes controller that adds a pod-name label to our pods.

StatefulSets do this natively by adding a pod-name label to each Pod in the set. But what if we don't want to or can't use StatefulSets?

We rarely create pods directly; most often, we use a Deployment, ReplicaSet, or another high-level resource. We can specify labels to add to each Pod in the PodSpec, but not with dynamic values, so no way to replicate a StatefulSet's pod-name label.

We tried using a mutating admission webhook. When anyone creates a Pod, the webhook patches the Pod with a label containing the Pod's name. Disappointingly, this does not work: not all pods have a name before being created. For instance, when the ReplicaSet controller creates a Pod, it sends a namePrefix to the Kubernetes API server and not a name. The API server generates a unique name before persisting the new Pod to etcd, but only after calling our admission webhook. So in most cases, we can't know a Pod's name with a mutating webhook.

Once a Pod exists in the Kubernetes API, it is mostly immutable, but we can still add a label. We can even do so from the command line:

kubectl label my-pod my-label-key=my-label-value

We need to watch for changes to any pods in the Kubernetes API and add the label we want. Rather than do this manually, we are going to write a controller that does it for us.

Bootstrapping a controller with the Operator SDK

A controller is a reconciliation loop that reads the desired state of a resource from the Kubernetes API and takes action to bring the cluster's actual state closer to the desired state.

In order to write this controller as quickly as possible, we are going to use the Operator SDK. If you don't have it installed, follow the official documentation.

$ operator-sdk version
operator-sdk version: "v1.4.2", commit: "4b083393be65589358b3e0416573df04f4ae8d9b", kubernetes version: "v1.19.4", go version: "go1.15.8", GOOS: "darwin", GOARCH: "amd64"

Let's create a new directory to write our controller in:

mkdir label-operator && cd label-operator

Next, let's initialize a new operator, to which we will add a single controller. To do this, you will need to specify a domain and a repository. The domain serves as a prefix for the group your custom Kubernetes resources will belong to. Because we are not going to be defining custom resources, the domain does not matter. The repository is going to be the name of the Go module we are going to write. By convention, this is the repository where you will be storing your code.

As an example, here is the command I ran:

# Feel free to change the domain and repo values.
operator-sdk init --domain=padok.fr --repo=github.com/busser/label-operator

Next, we need a create a new controller. This controller will handle pods and not a custom resource, so no need to generate the resource code. Let's run this command to scaffold the code we need:

operator-sdk create api --group=core --version=v1 --kind=Pod --controller=true --resource=false

We now have a new file: controllers/pod_controller.go. This file contains a PodReconciler type with two methods that we need to implement. The first is Reconcile, and it looks like this for now:

func (r *PodReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    _ = r.Log.WithValues("pod", req.NamespacedName)

    // your logic here

    return ctrl.Result{}, nil
}

The Reconcile method is called whenever a Pod is created, updated, or deleted. The name and namespace of the Pod are in the ctrl.Request the method receives as a parameter.

The second method is SetupWithManager and for now it looks like this:

func (r *PodReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        // Uncomment the following line adding a pointer to an instance of the controlled resource as an argument
        // For().
        Complete(r)
}

The SetupWithManager method is called when the operator starts. It serves to tell the operator framework what types our PodReconciler needs to watch. To use the same Pod type used by Kubernetes internally, we need to import some of its code. All of the Kubernetes source code is open source, so you can import any part you like in your own Go code. You can find a complete list of available packages in the Kubernetes source code or here on pkg.go.dev. To use pods, we need the k8s.io/api/core/v1 package.

package controllers

import (
    // other imports...
    corev1 "k8s.io/api/core/v1"
    // other imports...
)

Lets use the Pod type in SetupWithManager to tell the operator framework we want to watch pods:

func (r *PodReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&corev1.Pod{}).
        Complete(r)
}

Before moving on, we should set the RBAC permissions our controller needs. Above the Reconcile method, we have some default permissions:

// +kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=pods/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=core,resources=pods/finalizers,verbs=update

We don't need all of those. Our controller will never interact with a Pod's status or its finalizers. It only needs to read and update pods. Lets remove the unnecessary permissions and keep only what we need:

// +kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch;update;patch

We are now ready to write our controller's reconciliation logic.

Implementing reconciliation

Here is what we want our Reconcile method to do:

  1. Use the Pod's name and namespace from the ctrl.Request to fetch the Pod from the Kubernetes API.
  2. If the Pod has an add-pod-name-label annotation, add a pod-name label to the Pod; if the annotation is missing, don't add the label.
  3. Update the Pod in the Kubernetes API to persist the changes made.

Lets define some constants for the annotation and label:

const (
    addPodNameLabelAnnotation = "padok.fr/add-pod-name-label"
    podNameLabel              = "padok.fr/pod-name"
)

The first step in our reconciliation function is to fetch the Pod we are working on from the Kubernetes API:

// Reconcile handles a reconciliation request for a Pod.
// If the Pod has the addPodNameLabelAnnotation annotation, then Reconcile
// will make sure the podNameLabel label is present with the correct value.
// If the annotation is absent, then Reconcile will make sure the label is too.
func (r *PodReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := r.Log.WithValues("pod", req.NamespacedName)

    /*
        Step 0: Fetch the Pod from the Kubernetes API.
    */

    var pod corev1.Pod
    if err := r.Get(ctx, req.NamespacedName, &pod); err != nil {
        log.Error(err, "unable to fetch Pod")
        return ctrl.Result{}, err
    }

    return ctrl.Result{}, nil
}

Our Reconcile method will be called when a Pod is created, updated, or deleted. In the deletion case, our call to r.Get will return a specific error. Let's import the package that defines this error:

package controllers

import (
    // other imports...
    apierrors "k8s.io/apimachinery/pkg/api/errors"
    // other imports...
)

We can now handle this specific error and — since our controller does not care about deleted pods — explicitly ignore it:

    /*
        Step 0: Fetch the Pod from the Kubernetes API.
    */

    var pod corev1.Pod
    if err := r.Get(ctx, req.NamespacedName, &pod); err != nil {
        if apierrors.IsNotFound(err) {
            // we'll ignore not-found errors, since we can get them on deleted requests.
            return ctrl.Result{}, nil
        }
        log.Error(err, "unable to fetch Pod")
        return ctrl.Result{}, err
    }

Next, lets edit our Pod so that our dynamic label is present if and only if our annotation is present:

    /*
        Step 1: Add or remove the label.
    */

    labelShouldBePresent := pod.Annotations[addPodNameLabelAnnotation] == "true"
    labelIsPresent := pod.Labels[podNameLabel] == pod.Name

    if labelShouldBePresent == labelIsPresent {
        // The desired state and actual state of the Pod are the same.
        // No further action is required by the operator at this moment.
        log.Info("no update required")
        return ctrl.Result{}, nil
    }

    if labelShouldBePresent {
        // If the label should be set but is not, set it.
        if pod.Labels == nil {
            pod.Labels = make(map[string]string)
        }
        pod.Labels[podNameLabel] = pod.Name
        log.Info("adding label")
    } else {
        // If the label should not be set but is, remove it.
        delete(pod.Labels, podNameLabel)
        log.Info("removing label")
    }

Finally, let's push our updated Pod to the Kubernetes API:

    /*
        Step 2: Update the Pod in the Kubernetes API.
    */

    if err := r.Update(ctx, &pod); err != nil {
        log.Error(err, "unable to update Pod")
        return ctrl.Result{}, err
    }

When writing our updated Pod to the Kubernetes API, there is a risk that the Pod has been updated or deleted since we first read it. When writing a Kubernetes controller, we should keep in mind that we are not the only actors in the cluster. When this happens, the best thing to do is start the reconciliation from scratch, by requeuing the event. Lets do exactly that:

    /*
        Step 2: Update the Pod in the Kubernetes API.
    */

    if err := r.Update(ctx, &pod); err != nil {
        if apierrors.IsConflict(err) {
            // The Pod has been updated since we read it.
            // Requeue the Pod to try to reconciliate again.
            return ctrl.Result{Requeue: true}, nil
        }
        if apierrors.IsNotFound(err) {
            // The Pod has been deleted since we read it.
            // Requeue the Pod to try to reconciliate again.
            return ctrl.Result{Requeue: true}, nil
        }
        log.Error(err, "unable to update Pod")
        return ctrl.Result{}, err
    }

Let's remember to return successfully at the end of the method:

    return ctrl.Result{}, nil
}

And that's it! We are now ready to run the controller on our cluster.

Run the controller on your cluster

To run our controller on your cluster, we need to run the operator. For that, all you will need is kubectl. If you don't have a Kubernetes cluster at hand, I recommend you start one locally with KinD (Kubernetes in Docker).

All it takes to run the operator from your machine is this command:

make run

After a few seconds, you should see the operator's logs. Notice that our controller's Reconcile method was called for all pods already running in the cluster.

Let's keep the operator running and, in another terminal, create a new Pod:

kubectl run --image=nginx my-nginx

The operator should quickly print some logs, indicating that it reacted to the Pod's creation and subsequent changes in status:

INFO    controllers.Pod no update required  {"pod": "default/my-nginx"}
INFO    controllers.Pod no update required  {"pod": "default/my-nginx"}
INFO    controllers.Pod no update required  {"pod": "default/my-nginx"}
INFO    controllers.Pod no update required  {"pod": "default/my-nginx"}

Lets check the Pod's labels:

$ kubectl get pod my-nginx --show-labels
NAME       READY   STATUS    RESTARTS   AGE   LABELS
my-nginx   1/1     Running   0          11m   run=my-nginx

Let's add an annotation to the Pod so that our controller knows to add our dynamic label to it:

kubectl annotate pod my-nginx padok.fr/add-pod-name-label=true

Notice that the controller immediately reacted and produced a new line in its logs:

INFO    controllers.Pod adding label    {"pod": "default/my-nginx"}
$ kubectl get pod my-nginx --show-labels
NAME       READY   STATUS    RESTARTS   AGE   LABELS
my-nginx   1/1     Running   0          13m   padok.fr/pod-name=my-nginx,run=my-nginx

Bravo! You just successfully wrote a Kubernetes controller capable of adding labels with dynamic values to resources in your cluster.

Controllers and operators, both big and small, can be an important part of your Kubernetes journey. Writing operators is easier now than it has ever been. The possibilities are endless.

What next?

If you want to go further, I recommend starting by deploying your controller or operator inside a cluster. The Makefile generated by the Operator SDK will do most of the work.

When deploying an operator to production, it is always a good idea to implement robust testing. The first step in that direction is to write unit tests. This documentation will guide you in writing tests for your operator. I wrote tests for the operator we just wrote; you can find all of my code in this GitHub repository.

How to learn more?

The Operator SDK documentation goes into detail on how you can go further and implement more complex operators.

When modeling a more complex use-case, a single controller acting on built-in Kubernetes types may not be enough. You may need to build a more complex operator with Custom Resource Definitions (CRDs) and multiple controllers. The Operator SDK is a great tool to help you do this.

If you want to discuss building an operator, join the #kubernetes-operator channel in the Kubernetes Slack workspace!