### Setting Up `etcd`

We set up `etcd` as a backing database for our `sslip.io` webserver.

#### Generate Certificates

We need to generate certificates for our etcd cluster (our cluster will
communicate over TLS, but our clients won't).

- `ca-config.json`. We set the certificates it issues to expire in 30
  years (262800 hours) because we don't want to go through a certificate
  rotation. Trust me on this one.
- `ca-csr.json`. Again, 30 years.

```shell
cfssl gencert -initca ca-csr.json | cfssljson -bare etcd-ca
```

The key is saved in LastPass as `etcd-ca-key.pem`.

Let's use our newly-created CA to generate the etcd certificates. Note
that we throw almost every IP address/hostname we can think of into the
SANs field (why not?):

```shell
GKE_NODE_PUBLIC_IPv4=$(gcloud compute instances list --format=json |
  jq -r '[.[].networkInterfaces[0].accessConfigs[0].natIP] | join(",")')
PUBLIC_HOSTNAMES=ns-aws.sslip.io,ns-azure.sslip.io,ns-gce.sslip.io
HOSTNAMES=ns-aws,ns-azure,ns-gce
IPv4=127.0.0.1,52.0.56.137,52.187.42.158,104.155.144.4,$GKE_NODE_PUBLIC_IPv4
IPv6=::1,2600:1f18:aaf:6900::a
cfssl gencert \
  -ca=ca.pem \
  -ca-key=etcd-ca-key.pem \
  -config=ca-config.json \
  -hostname=${PUBLIC_HOSTNAMES},${HOSTNAMES},${IPv4},${IPv6} \
  -profile=etcd \
  etcd-csr.json | cfssljson -bare etcd
```

The key is saved in LastPass as `etcd-key.pem`.

#### Generating a New Cert for a New etcd Node

Let's say you've introduced _new_ IPv4 addresses, or that you've recreated your
GKE clusters, and all the addresses have changed, then you'll need to
regenerate the certificates:

```
lpass show --note etcd-ca-key.pem > etcd-ca-key.pem
lpass show --note etcd-key.pem > etcd-key.pem
GKE_NODE_PUBLIC_IPv4=$(gcloud compute instances list --format=json |
  jq -r '[.[].networkInterfaces[0].accessConfigs[0].natIP] | join(",")')
PUBLIC_HOSTNAMES=ns-aws.sslip.io,ns-azure.sslip.io,ns-gce.sslip.io
HOSTNAMES=ns-aws,ns-azure,ns-gce
IPv4=127.0.0.1,52.0.56.137,52.187.42.158,104.155.144.4,$GKE_NODE_PUBLIC_IPv4
IPv6=::1,2600:1f18:aaf:6900::a

cfssl gencsr \
  -key=etcd-key.pem \
  -hostname=${PUBLIC_HOSTNAMES},${HOSTNAMES},${IPv4},${IPv6} \
  -cert=etcd.pem | cfssljson -bare etcd
cfssl sign \
  -ca=ca.pem \
  -ca-key=etcd-ca-key.pem \
  -config=ca-config.json \
  -profile=etcd \
  etcd.csr | cfssljson -bare etcd
```

#### Configure ns-aws.sslip.io & ns-azure.sslip.io

Now let's set up etcd on either ns-aws or ns-azure:

```shell
sudo mkdir /etc/etcd # default's okay: root:root 755
IAAS=${HOST/ns-/}
cd /etc/etcd
sudo curl -OL https://raw.githubusercontent.com/cunnie/sslip.io/main/etcd/ca.pem
sudo curl -OL https://raw.githubusercontent.com/cunnie/sslip.io/main/etcd/etcd.pem
sudo curl -o /etc/default/etcd -L https://raw.githubusercontent.com/cunnie/sslip.io/main/etcd/etcd-$IAAS.conf
lpass login brian.cunnie@gmail.com --trust
lpass show --note etcd-key.pem | sudo tee etcd-key.pem
sudo chmod 400 *key*
sudo chown etcd:etcd *key*
```

Let's fire up etcd:

```shell
sudo systemctl daemon-reload
sudo systemctl enable etcd
sudo systemctl stop etcd
sudo systemctl start etcd
sudo journalctl -xefu etcd # look for any errors on startup
sudo systemctl restart sslip.io-dns
dig @localhost metrics.status.sslip.io txt +short | grep "Key-value store:" # should be "etcd"
```

If the messages look innocuous (ignore "serving client traffic insecurely; this
is strongly discouraged!").

Check the cluster:

```shell
export ETCDCTL_API=3
etcdctl member list # first time: "8e9e05c52164694d, started, default, http://localhost:2380, http://localhost:2379, false"
  # existing cluster:
  660f0ebfd9c21a95: name=ns-aws peerURLs=https://ns-aws.sslip.io:2380 clientURLs=http://localhost:2379 isLeader=true
  6e7e4616e1032417: name=ns-azure peerURLs=https://ns-azure.sslip.io:2380 clientURLs=http://localhost:2379 isLeader=false
  b77b5c23840fa42b: name=ns-gce peerURLs=https://ns-gce.sslip.io:2380 clientURLs= isLeader=false
```

### Wiping old data

ns-aws & ns-azure:

```
sudo systemctl stop etcd
sudo rm -rf /var/lib/etcd/default/member
sudo systemctl start etcd
```

### Deleting and Re-adding ns-azure

This needs to be done when, for example, ns-azure is rebuilt from scratch.

```bash
ssh ns-aws
export ETCDCTL_API=3
etcdctl member list
 # 6e7e4616e1032417: name=ns-azure peerURLs=https://ns-azure.sslip.io:2380 clientURLs=http://localhost:2379 isLeader=false
etcdctl member remove 6e7e4616e1032417
etcdctl member add ns-azure --peer-urls=https://ns-azure.sslip.io:2380
exit
ssh ns-azure
sudo systemctl stop etcd
sudo rm -rf /var/lib/etcd/default/member
sudo -E nvim /etc/default/etcd
 # ETCD_INITIAL_CLUSTER_STATE="existing"
sudo systemctl start etcd
etcdctl member list
sudo du -sH /var/lib/etcd/default/member
```

### Updating the GKE PEM

This needs to be done every darn time the nodes are upgraded (there _must_ be a better way)

```bash
kubectl delete secret etcd-peer-tls
kubectl create secret generic etcd-peer-tls \
  --from-file=ca.pem=<(curl -L https://raw.githubusercontent.com/cunnie/sslip.io/main/etcd/ca.pem) \
  --from-file=etcd.pem=<(curl -L https://raw.githubusercontent.com/cunnie/sslip.io/main/etcd/etcd.pem) \
  --from-file=etcd-key.pem=<(lpass show --note etcd-key.pem)
kubectl rollout restart deployment/k-v.io
```

### Troubleshooting

If `sudo journalctl -xefu etcd` errors with `member xxx has already been
bootstrapped`, then edit `/etc/default/etcd` and set
`ETCD_INITIAL_CLUSTER_STATE="existing"` (previously was `"new"`).