rre.NU


June 2, 2021

Update OS with system update controller

My main goal is to build my home environment running in kubernetes as maintenance free as possible and one major task is to update the underlying operating system. I’m using openSUSE Leap 15.2 when I write this, but these instructions might work on other OS with some tweaks and changes. Rancher has developed a system upgrade controller that helps orchestrating an upgrade by draining nodes, do the upgrade and then putting the node back to work. This is what I’m currently using to patch/upgrade my kubernetes nodes.

Prerequisite

You’ll need a working kubernetes cluster, and I’m using k3s in my environment. I’ve previously written a post on how you could set up a k3s cluster with kube-vip if you need some help with that, you find it here You also need that cluster to be running on openSUSE Leap 15.2 to following along and “copy&paste” the instructions.

Installing the system upgrade controller

Just apply the system upgrade controller manifest with kubectl

kubectl apply -f https://raw.githubusercontent.com/rancher/system-upgrade-controller/master/manifests/system-upgrade-controller.yaml

Labeling the nodes

We will later create a Plan for that dictates how the upgrades will be done and these Plans target a specific node label. In this example all nodes are running openSUSE Leap 15.2 so I’ll just label all nodes in the cluster to be targeted by my upgrade plan.

kubectl label node --all plan.upgrade.cattle.io/leap152=true

the value of this label will hold the hash of the Plan and if the hash of the Plan and the node label doesn’t match the system upgrade controller will pick this up and schedule a upgrade for that node. I’ll just set it to true initially.

system upgrade plan

The upgrade plan consists of two manifests, a secret that contains the actual script that is executed for the update and the actual Plan that specifies the node label matching, concurrency, what container image the update job is using, name och the Plan, etc. Save this yaml to leap152.yaml

---
apiVersion: v1
kind: Secret
metadata:
  name: leap152
  namespace: system-upgrade
type: Opaque
stringData:
  upgrade.sh: |
    #!/bin/sh
    set +e
    zypper refresh
    EC=$?
    if [ $EC -ne 0 ];then
        exit $?
    fi  
    # zypper exit code 103 means that only the upgrade framework has been updated
    # loop zypper and do patch until zypper framework is updated
    EC=103
    while [ $EC -eq 103 ]; do
      # --non-interactive : In this mode zypper doesn’t ask user to type answers to various prompts, but uses default answers automatically.
      # path : Install all available needed patches
      # --with-interactive : Avoid skipping of interactive patches when in non-interactive mode.
      zypper --non-interactive patch --with-interactive
      EC=$?
    done
    # zypper exit code 102 means that reboot is required
    if [ $EC -eq 102 ]; then
      echo "Rebooting...."
      reboot
    fi  
    # exit the script with zypper exit code
    exit $EC 

---
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: leap152
  namespace: system-upgrade
spec:
  concurrency: 1
  nodeSelector:
    matchExpressions:
      - {key: plan.upgrade.cattle.io/leap152, operator: Exists}
  serviceAccountName: system-upgrade
  secrets:
    - name: leap152
      path: /host/run/system-upgrade/secrets/leap152
  drain:
    force: true
  version: v1
  upgrade:
    image: opensuse/leap:15.2
    command: ["chroot", "/host"]
    args: ["sh", "/run/system-upgrade/secrets/leap152/upgrade.sh"]

and apply it to the cluster

kubectl apply -f leap152.yaml

the cluster should immediately pick up this job because it matches the labels and the value of the label is not matching the hash of the job and because the concurrency is set to ‘1’ it will just update one node at the time and not continue with the next node until the first have finished. It’s possible to create more complex jobs with the system upgrade controller and make jobs dependant on each other. Let’s say you have a 10 node cluster and you first want to update the master/server nodes one by one and when they have been updated you want to upgrade the workers/agents two by two. That is quite possible

Automating the patching

As soon as the upgrades has been completed all the node labels will contain the hash of the Plan and it will not run the jobs again until you change the Plan. To automate this I’ve created a cron job on one of my master/server nodes that patches the leap152 upgrade Job and sets the version to a date/time stamp, thus triggering the upgrade controller to execute the upgrade plan again because the hash have changed. just create a file in /etc/cron.d/system-upgrade-leap152 file in one of the master/server nodes with the following content:

SHELL=/bin/bash

# m h dom mon dow user  command
0 3 * * * 	root	/usr/local/bin/k3s kubectl --kubeconfig=/etc/rancher/k3s/k3s.yaml -n system-upgrade patch plan leap152 --patch "{ \"spec\": {  \"version\": \"v$(date +"\%Y\%m\%d_\%H\%m\%S")\" }}" --type=merge

this will issue a system upgrade job every night at 3:00 a.m.

It’s not optimal, and my goal is to create a kubernetes CronJob that does this job patching. As it is implemented right now I’m dependant of the master/server nodes that holds the cron job to actually trigger a patching of the OS. It’s good enough for now though.


Meta