cowley-tech/content/blog/replace-failed-kubernetes-etcd-member/index.md

92 lines
3.2 KiB
Markdown
Raw Normal View History

2024-01-18 20:13:37 +01:00
---
date: 2019-03-28
title: Replace Failed Kubernetes Etcd Member
category: devops
featured_image: /images/kubernetes.png
---
I had a pretty knotty problem in my homelab. I am running a Kubernetes cluster in the with 3 masters and an embeded Etcd cluster.
That means that the Etcd cluster runs on the same nodes as the K8s API and scheduler pods. Like them, it is running as Pods controlled directly by Kubelet (magic! except it isn't). The data on one of those members (node3) got corrupted, so naturally it would no longer join the cluster.
What you need to do is remove that (etcd) node from the cluster and recreate it. This is pretty simple, but needs a bit of under-the-bonnet knowledge. So how is this Pod configurered?
I hinted at a bit of magic earlier. These pods are running in K8s, and visible in the `kube-system` namespace, but are not actually manged by the Kubernetes scheduler. They are managed by the Kubelet itself. Kubelet on each master watches `/etc/kubernetes/manifests` and will action any valid manifest files you place in that folder. When I installed the cluster with `kubeadm` it did the following:
```
$ ls /etc/kubernetes/manifests/
etcd.yaml kube-apiserver.yaml kube-controller-manager.yaml kube-scheduler.yaml
```
The part which interests me is in the `spec.volumes` key of `etcd.yaml`:
```
spec:
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
```
This tells me 2 things:
1. The actual cluster data is store in `/var/lib/etcd` on my physical node
2. The certificates for cluster comms are in `/etc/kubernetes/pki/etcd`
So now I need `etcdctl` that I can use which can access both the kube masters and those certificates. I actually had it on another machine in the lab, so I copied the `pki/etcd` contents to that machine, but you could put `etcdctl` on the broken master, it is just a binary.
You will need the UUID for your failed node:
```
export ETCDCTL="etcdctl --endpoints=https://<node1>:2379,https://<node2>:2379,https://<node3>:2379 \
--cert /etc/kubernetes/pki/etcd/server.crt \
--key /etc/kubernetes/pki/etcd/server.key \
--cacert /etc/kubernetes/pki/etcd/ca.crt
${ETCDCTL} member list
```
Remove the failed node from the Etcd cluster:
```
${ETCDCTL} remove <uuid-of-failed-node>
```
The simple move the `etcd.yaml` to one side:
```
mv /etc/kubernetes/manifests/etcd.yaml .
```
The kubelet wil then stop the Etcd pod and you can clean up its corrupted data dir:
```
rm -rf /var/lib/etcd/member
```
Re-start the pod:
```
mv etcd.yaml /etc/kubernetes/manifests/
```
That will restart the pod, but you still need to add it to the cluster:
```
${ETCDCTL} member add --peer-urls=https://<node3>:2380 <node3>
```
It will probably take a couple of restarts before it is properly healthy, but Kubelet will take care of that.
Before long you can run `${ETCDCTL} endpoint health` and all will return good.
## Conclusion
Nothing was actually that complex, but I needed to know a couple of things about how K8s does things:
1. Where `kubeadm` put the certificates
2. That Kubelet watches `/etc/kubernetes/manifests` for static Pods (defined by `staticPodPath`).