The TXT response to the query `metrics.status.sslip.io` was doomed to
exceed the UDP 512-byte limit, which would have forced the client to
re-attempt via TCP, and our server doesn't yet bind to TCP.
This commit fixes that by squeezing the packet. We haven't dropped any
information, but we made it more succinct.
Per [Infoblox](https://www.infoblox.com/dns-security-resource-center/dns-security-faq/is-dns-tcp-or-udp-port-53/):
> when the message size exceeds 512 bytes, it will trigger the ‘TC’ bit
(Truncation) in DNS to be set, informing the client that the message
length has exceeded the allowed size. In these situations, the client
needs to re-transmit over TCP
We implement PTR records for IPv6, for example:
2.a.b.b.4.0.2.9.a.e.e.6.e.c.4.1.0.f.9.6.0.0.1.0.6.4.6.0.1.0.6.2.ip6.arpa →
2601-646-100-69f0-14ce-6eea-9204-bba2.sslip.io.
We implement PTR records for IPv4.
When a PTR record is not found (e.g. "127.in-addr.arpa"), it returns the
SOA record, but, unlike other record lookups (e.g. "MX"), the SOA's
mname is locked to "sslip.io" because setting the mname to
"127.in-addr.arpa" doesn't make sense.
To be done:
- Implement IPv6
- Implement Metrics
- Update README
- Deploy new version
Note: the two biggest users are Cypriot IP addresses:
```
2 106.52.50.235 <- Tencent
1 223.71.46.114 <- China Mobile
157 31.153.14.207 <- Cypriot
110 62.228.164.123 <- Cypriot
4 73.189.219.4 <- My home IP
```
`
Prohibit setting DNS-01 challenge TXT record `_acme-challenge.k-v.io`
Although it may appear the TXT record can be set or deleted, it's
hardcoded to the string, "Please don't try to procure a k-v.io cert via
DNS-01 challenge". Setting a custom value was easier than writing a
special code path.
Special thanks to [Alan Liang](http://symb.olic.link/):
> ... one could easily add (and modify) a TXT record at
_acme-challenge.k-v.io, which I believe is used for verifying domain
ownership at various cert providers, so anyone could in theory obtain
valid SSL certs for k-v.io and *.k-v.io
I've chosen to add the website to GKE, not Hetzner, because I get fewer
strident abuse messages from GKE.
I'm dismayed that when I make a small change to the DNS, I need to go
through the laborious release process for it to take effect. Sigh. Maybe
that's something I'll fix another day.
We now have a Dockerfile to serve the upcoming https://k-v.io.
The dockerfile is patterned after the sslip.io nginx Dockerfile.
Note: the content isn't ready; the HTML needs fleshing out.
Also includes a gratuitous change to the HTML in order to trigger a
build.
Fixes <https://ci.nono.io/teams/main/pipelines/dockerfiles/jobs/build-and-push-sslip.io-nginx/builds/33>:
```
error: failed to solve: rpc error: code = Unknown desc = executor failed running [/bin/sh -c dnf install -y bind-utils iproute less lsof neovim net-tools nginx nmap-ncat procps-ng RUN mv /usr/share/nginx/html /usr/share/nginx/html-orig]: exit code: 1
```
We make sure that each of the three nameservers
(ns-{aws,azure,gce}.sslip.io) can set a key-value, that the value
propagates to the remaining nameservers, that a nameserver can delete a
key, and that the deletion propagates to the remaining nameservers.
ns-gce is unable to join the cluster because its source IP address is
the node on which its running, 34.72.45.206, and that's not included in
the SANs.
This commit updates the etcd certificate to one which includes the three
GKE nodes' IP addresses in its SANs.
This commit also includes instruction to update the certificates in the
event of an IP address change.
Fixes:
```
Apr 16 14:15:34 ns-aws etcd[500]: rejected connection from "34.72.45.206:43080" (error "tls: \"34.72.45.206\" does not match any of DNSNames [\"ns-aws.sslip.io\" \"ns-azure.sslip.io\" \"ns-gce.sslip.io\" \"ns-aws\" \"ns-azure\" \"ns-gce\"] (lookup ns-gce: Temporary failure in name resolution)", ServerName "ns-aws.sslip.io", IPAddresses ["127.0.0.1" "52.0.56.137" "52.187.42.158" "104.155.144.4" "::1" "2600:1f18:aaf:6900::a"], DNSNames ["ns-aws.sslip.io" "ns-azure.sslip.io" "ns-gce.sslip.io" "ns-aws" "ns-azure" "ns-gce"])
```
The original behavior was to return the deleted record, which
inadvertently prolonged the lifetime (in DNS cache) of the record which
was meant to expire as soon as possible.
- Removed the instructions to create a BOSH release. We are no longer
creating a BOSH release because we needed to colocate an etcd release
alongside the BOSH release, and we couldn't find an etcd BOSH release.
- Updated the instructions to run a quick test against the sslip.io DNS
server locally (sanity check) instead of deploying a VM with the BOSH
release & testing against that.
- Updated the instructions for updating ns-azure's DNS server. ns-azure
is no longer a BOSH-deployed VM.
When we check the production servers, we now expect, when we delete a
key, to NOT receive the key's old value as a response, lest we
inadvertently extend the lifetime of the key that we want to expire.
We don't return the deleted value because doing that would have the
unintended consequence of postponing the deletion: downstream caching
servers would cache the deleted value for up to three more minutes. We'd
rather have the key deleted sooner rather than later.
Some APIs, e.g. etcd's, return a list of deleted values on return: those
APIs can afford to do so because they don't need to worry about DNS
propagation.
We also lengthen the timeout of an `etcd` API call from 500 msec to 1928
msecs; 500 msec was too close; some calls routinely took 480 msec to
complete, and we wanted more headroom.
We also no longer do two `etcd` operations when we delete a value.
Previously we would do a GET followed by a DELETE, but since we're not
returning the value deleted, there's no point to the GET. Furthermore,
the GET was never necessary, for the `etcd` DELETE API call returned the
values deleted.
Drive-by:
- README: install gingko the proper way, with `go install`
[fixes#17]
Now that we're no longer create BOSH releases, we don't need to bury the
`src/` directory under `bosh-release`; we can now place it under the
repo root, and we no longer need to fiddle with symbolic links.
We're not creating BOSH releases because when we decided to implement a
key-value store, we'd have to create an `etcd` BOSH release, and we
didn't want to invest the time.
- You can select the port to bind to
- The NS record returned for `_acme-challenge` domains is special
Also, I removed the periods at the ends of bullets to be consistent.
We want to allow users to bind to ports other than 53. A big reason is
that port 53 is a privileged port, and often requires root privileges.
We don't want to force our users to use root privileges in order to run
the tests.
This isn't a problem on macOS, but is on Linux.
Previously we would download the blocklist every hour for every address
we've bound to, which, on Linux machines, could easily amount to 8
addresses (loopback, IPv4, several IPv6). Linux, if you recall, has a
systemd nameserver bound to 127.0.0.53, forcing us to bind to each
address individually. Downloading multiple identical versions of the
blocklist was inefficient.
With this commit, it downloads the blocklist only once per hour,
regardless of the number of individual IP addresses listened on.
But what really excites me about this commit is that I've moved much of
the initialization of the `xip.Xip` struct out of `main()` and into
`xip.NewXip()`. This makes `main()` lean again, and `xip.Xip` has gotten
complex enough that it warrants its own constructor.
This repo has been forked 36 times, and yet I've done a great disservice
to my would-be developers by not describing how to run/test my code.
This commit addresses that shortcoming by having a _Quick Start_ section
very near the top.
- includes new Ginkgo v2
- includes required `sudo` for Linux
- removed the now-wrong comment about TXT records (there's now a
plethora of TXT records such as `ip.sslip.io`)
- minor formatting tweaks