
ns-gce is unable to join the cluster because its source IP address is the node on which its running, 34.72.45.206, and that's not included in the SANs. This commit updates the etcd certificate to one which includes the three GKE nodes' IP addresses in its SANs. This commit also includes instruction to update the certificates in the event of an IP address change. Fixes: ``` Apr 16 14:15:34 ns-aws etcd[500]: rejected connection from "34.72.45.206:43080" (error "tls: \"34.72.45.206\" does not match any of DNSNames [\"ns-aws.sslip.io\" \"ns-azure.sslip.io\" \"ns-gce.sslip.io\" \"ns-aws\" \"ns-azure\" \"ns-gce\"] (lookup ns-gce: Temporary failure in name resolution)", ServerName "ns-aws.sslip.io", IPAddresses ["127.0.0.1" "52.0.56.137" "52.187.42.158" "104.155.144.4" "::1" "2600:1f18:aaf:6900::a"], DNSNames ["ns-aws.sslip.io" "ns-azure.sslip.io" "ns-gce.sslip.io" "ns-aws" "ns-azure" "ns-gce"]) ```
Setting Up etcd
We set up etcd
as a backing database for our sslip.io
webserver.
Generate Certificates
We need to generate certificates for our etcd cluster (our cluster will communicate over TLS, but our clients won't).
ca-config.json
. We set the certificates it issues to expire in 30 years (262800 hours) because we don't want to go through a certificate rotation. Trust me on this one.ca-csr.json
. Again, 30 years.
cfssl gencert -initca ca-csr.json | cfssljson -bare etcd-ca
The key is saved in LastPass as etcd-ca-key.pem
.
Let's use our newly-created CA to generate the etcd certificates. Note that we throw almost every IP address/hostname we can think of into the SANs field (why not?):
GKE_NODE_PUBLIC_IPv4=$(gcloud compute instances list --format=json |
jq -r '[.[].networkInterfaces[0].accessConfigs[0].natIP] | join(",")')
PUBLIC_HOSTNAMES=ns-aws.sslip.io,ns-azure.sslip.io,ns-gce.sslip.io
HOSTNAMES=ns-aws,ns-azure,ns-gce
IPv4=127.0.0.1,52.0.56.137,52.187.42.158,104.155.144.4,$GKE_NODE_PUBLIC_IPv4
IPv6=::1,2600:1f18:aaf:6900::a
cfssl gencert \
-ca=ca.pem \
-ca-key=etcd-ca-key.pem \
-config=ca-config.json \
-hostname=${PUBLIC_HOSTNAMES},${HOSTNAMES},${IPv4},${IPv6} \
-profile=etcd \
etcd-csr.json | cfssljson -bare etcd
The key is saved in LastPass as etcd-key.pem
.
Generating a New Cert for a New etcd Node
Let's say you've introduced new IPv4 addresses, or that you've recreated your GKE clusters, and all the addresses have changed, then you'll need to regenerate the certificates:
lpass show --note etcd-ca-key.pem > etcd-ca-key.pem
lpass show --note etcd-key.pem > etcd-key.pem
GKE_NODE_PUBLIC_IPv4=$(gcloud compute instances list --format=json |
jq -r '[.[].networkInterfaces[0].accessConfigs[0].natIP] | join(",")')
PUBLIC_HOSTNAMES=ns-aws.sslip.io,ns-azure.sslip.io,ns-gce.sslip.io
HOSTNAMES=ns-aws,ns-azure,ns-gce
IPv4=127.0.0.1,52.0.56.137,52.187.42.158,104.155.144.4,$GKE_NODE_PUBLIC_IPv4
IPv6=::1,2600:1f18:aaf:6900::a
cfssl gencsr \
-key=etcd-key.pem \
-hostname=${PUBLIC_HOSTNAMES},${HOSTNAMES},${IPv4},${IPv6} \
-cert=etcd.pem | cfssljson -bare etcd
cfssl sign \
-ca=ca.pem \
-ca-key=etcd-ca-key.pem \
-config=ca-config.json \
-profile=etcd \
etcd.csr | cfssljson -bare etcd
Configure ns-aws.sslip.io & ns-azure.sslip.io
Now let's set up etcd on either ns-aws or ns-azure:
sudo mkdir /etc/etcd # default's okay: root:root 755
IAAS=${HOST/ns-/}
cd /etc/etcd
sudo curl -OL https://raw.githubusercontent.com/cunnie/sslip.io/main/etcd/ca.pem
sudo curl -OL https://raw.githubusercontent.com/cunnie/sslip.io/main/etcd/etcd.pem
sudo curl -o /etc/default/etcd -L https://raw.githubusercontent.com/cunnie/sslip.io/main/etcd/etcd-$IAAS.conf
lpass login brian.cunnie@gmail.com --trust
lpass show --note etcd-key.pem | sudo tee etcd-key.pem
sudo chmod 400 *key*
sudo chown etcd:etcd *key*
Let's fire up etcd:
sudo systemctl daemon-reload
sudo systemctl enable etcd
sudo systemctl stop etcd
sudo systemctl start etcd
sudo journalctl -xefu etcd # look for any errors on startup
sudo systemctl restart sslip.io-dns
dig @localhost metrics.status.sslip.io txt +short | grep "Key-value store:" # should be "etcd"
If the messages look innocuous (ignore "serving client traffic insecurely; this is strongly discouraged!").
Check the cluster:
etcdctl member list # first time: "8e9e05c52164694d, started, default, http://localhost:2380, http://localhost:2379, false"
# existing cluster:
660f0ebfd9c21a95: name=ns-aws peerURLs=https://ns-aws.sslip.io:2380 clientURLs=http://localhost:2379 isLeader=true
6e7e4616e1032417: name=ns-azure peerURLs=https://ns-azure.sslip.io:2380 clientURLs=http://localhost:2379 isLeader=false
b77b5c23840fa42b: name=ns-gce peerURLs=https://ns-gce.sslip.io:2380 clientURLs= isLeader=false
Wiping old data
ns-aws & ns-azure:
sudo systemctl stop etcd
sudo rm -rf /var/lib/etcd/default/member
sudo systemctl start etcd
Troubleshooting
If sudo journalctl -xefu etcd
errors with member xxx has already been bootstrapped
, then edit /etc/default/etcd
and set
ETCD_INITIAL_CLUSTER_STATE="existing"
(previously was "new"
).