The Nameservers test (in GitHub Actions), fails ~25% of the time, almost
invariably ns-do-sg.sslip.io (I don't know whether it's Digital Ocean's
fault or the large distance between my GitHub Actions runner &
Singapore).
The failures are noisy, typically one day, and have led me to stop
checking the status of my nameservers, which defeats the purpose.
This commit attempts to reduce the failures by increasing both the
timeout and the retries. We are nothing if not persistent.
Slight tweak: I want every WHOIS nameserver to be reflected in the NS
records, but I also want to allow for additional NS records.
Specifically, I've paid the Google Cloud Platform (GCP) "Committed Use
Discounts" for `ns-gce.sslip.io`, but it attracts *lots* of traffic, and
that can easily incur $100+ in bandwidth charges per month. To tamp down
on traffic, I don't include `ns-gce` in the whois nameservers, but I do
include it in the NS records.
But then my tests fail, so this commit tweaks the tests so that as long
as the NS records are a superset of the whois records, I'm fine
(previously they had to match).
Fixes, when running `DOMAIN=sslip.io rspec --format documentation
--color spec/`:
```
rspec './spec/check-dns_spec.rb[1:3]' # sslip.io nameserver ns-ovh.sslip.io.'s NS records match whois's ["ns-ovh.sslip.io.", "ns-hetzner.sslip.io.", "ns-do-sg.sslip.io."], `dig @ns-ovh.sslip.io. ns sslip.io +short`
rspec './spec/check-dns_spec.rb[1:18]' # sslip.io nameserver ns-hetzner.sslip.io.'s NS records match whois's ["ns-ovh.sslip.io.", "ns-hetzner.sslip.io.", "ns-do-sg.sslip.io."], `dig @ns-hetzner.sslip.io. ns sslip.io +short`
rspec './spec/check-dns_spec.rb[1:33]' # sslip.io nameserver ns-do-sg.sslip.io.'s NS records match whois's ["ns-ovh.sslip.io.", "ns-hetzner.sslip.io.", "ns-do-sg.sslip.io."], `dig @ns-do-sg.sslip.io. ns sslip.io +short`
```
We replace `ns-ovh-sg` with `ns-do-sg`; this is a purely financial
decision: `ns-ovh-sg` costs $60/month, $720/year.
`ns-do-sg` (Digital Ocean), is also a Singapore-based DNS server. It's a
basic-regular-2vcpu-4GiB RAM-80GB SSD-4TiB bandwidth for $24/month,
$288/year.
That's a yearly savings of $432.
I had originally overspec'ed the Singapore server because I suspected
that there was a ton of traffic in Asia; I was wrong. It's not even 20%
the traffic of Europe or North America. I am confident the Digital Ocean
server will be able to handle it.
I also reintroduce `ns-gce` as the second server in North America, backing
up `ns-hetzner`. My hope is that `ns-hetzner` carries most of the load,
and `ns-gce` carries the rest, but not so much as to trigger Google
Cloud Platform's (GCP's) expensive bandwidth billing.
| DNS server | Queries / second |
|:-----------|-----------------:|
| ns-hetzner | 10706.4 |
| ns-ovh | 10802.0 |
| ns-ovh-sg | 1677.7 |
I'm worried the traffic to my GCP server will cost me a hundred dollars
in bandwidth fees. It has a volume similar to my late AWS server which,
in its last month, racked up ~$130 in bandwidth fees!
I'm also trying to balance the servers more geographically: instead of
having two servers in the US and none in Asia, I'll have one server in
the US and one in Asia (Singapore).
The OVH server in Asia is expensive — $60/month instead of $20/month for
the OVH server in Warsaw. Also there's a monthly bandwidth cap in
Singapore in addition to the 300 Mbps cap.
I went with a dedicated server, similar to the one in Warsaw, but I took
the opportunity to upgrade it (same price):
- ns-ovh: KS-4: Intel Xeon-E3 1230 v6
- ns-ovh-sg: KS-5: Intel Xeon-E3 1270 v6
I'm hoping that by adding this server to Singapore, the traffic to the
ns-ovh, the Warsaw server, will lessen, and I won't get thos "Anti-DDoS
protection enabled for IP address 51.75.53.19" emails every few days.
Current Queries per second:
- 4,087 ns-gce
- 1,131 ns-hetzner
- 7,183 ns-ovh
We are no longer doing key-value-over-DNS.
Fixes <https://ci.nono.io/teams/main/pipelines/sslip.io/jobs/dns-servers/builds/1097>
```
rspec './spec/check-dns_spec.rb[1:17:1]' # sslip.io k-v.io tested on the ns-aws.sslip.io. nameserver sets a value, 1678804743, on the key sslipio-spec.k-v.io
rspec './spec/check-dns_spec.rb[1:17:2]' # sslip.io k-v.io tested on the ns-aws.sslip.io. nameserver gets the newly-set value, 1678804743, from the key, sslipio-spec.k-v.io
rspec './spec/check-dns_spec.rb[1:33:1]' # sslip.io k-v.io tested on the ns-azure.sslip.io. nameserver sets a value, 1678804743, on the key sslipio-spec.k-v.io
rspec './spec/check-dns_spec.rb[1:33:2]' # sslip.io k-v.io tested on the ns-azure.sslip.io. nameserver gets the newly-set value, 1678804743, from the key, sslipio-spec.k-v.io
rspec './spec/check-dns_spec.rb[1:49:1]' # sslip.io k-v.io tested on the ns-gce.sslip.io. nameserver sets a value, 1678804743, on the key sslipio-spec.k-v.io
rspec './spec/check-dns_spec.rb[1:49:2]' # sslip.io k-v.io tested on the ns-gce.sslip.io. nameserver gets the newly-set value, 1678804743, from the key, sslipio-spec.k-v.io
```
`[0-9]` → `\d`, `[0-9a-f]` → `[[:xdigit:]]`
A follow on to the previous commit, which did the same for Golang.
Ruby supports the above matchers like Golang does:
<https://ruby-doc.org/core-3.1.2/Regexp.html>
We make sure that each of the three nameservers
(ns-{aws,azure,gce}.sslip.io) can set a key-value, that the value
propagates to the remaining nameservers, that a nameserver can delete a
key, and that the deletion propagates to the remaining nameservers.
The original behavior was to return the deleted record, which
inadvertently prolonged the lifetime (in DNS cache) of the record which
was meant to expire as soon as possible.
- Removed the instructions to create a BOSH release. We are no longer
creating a BOSH release because we needed to colocate an etcd release
alongside the BOSH release, and we couldn't find an etcd BOSH release.
- Updated the instructions to run a quick test against the sslip.io DNS
server locally (sanity check) instead of deploying a VM with the BOSH
release & testing against that.
- Updated the instructions for updating ns-azure's DNS server. ns-azure
is no longer a BOSH-deployed VM.
When we check the production servers, we now expect, when we delete a
key, to NOT receive the key's old value as a response, lest we
inadvertently extend the lifetime of the key that we want to expire.
We made a mistake: we blindly invoked a function that was sometimes
`nil`. Specifically, if we had a customized domain (e.g. `ns.sslip.io`)
that didn't have a TXT record (a function), we'd try to invoke it
anyway. Bad move.
Now we ensure the function is there before we try to invoke it.
This is a curious affirmation of installing metrics: if we hadn't seen
that the server had been restarted because uptime was too low, we
wouldn't have caught this bug.
Drive-by: we made the lengths of TXT records of `version.status.sslip.io`
exactly match what we replace them with during the linking phase. We
hope that this fixes the wrong-line-numbers we see in the `panic()`
messages.
[fixes#14]
Also, I moved the "versio" endpoint: `version.sslip.io` →
`version.status.sslip.io`. It seemed to make more sense to corral the
special endpoints under `status`.
Also, change the order of `dig` arguments so that the server being
queried is first (e.g. `@#{whois_nameserver}`) and the arguments (e.g.
`+short`), is last.
- The impetus? I deployed a custom webserver but forgot to add the
A & AAAA records for sslip.io, so the website disappeared.
- I now check for the A & AAAA records (to be present, but not of any
particular value because that gives me the latitude to migrate to
other machines).
- I also check that the website is responsive.
- drive by: removed hard-coding of `sslip.io` in many tests; instead we
now query the domain that the env var `DOMAIN` is set to.
On macOS, `whois` returns _two_ results for the domain `sslip.io` from
two different whois servers:
- whois.nic.io
- whois.namecheap.com
This means that every nameservers is double-counted. To fix, we remove
the duplicates.
fixes:
```
Failure/Error: expect(dig_nameservers.sort).to eq(whois_nameservers.sort)
expected: ["ns-aws.nono.io.", "ns-aws.nono.io.", "ns-azure.nono.io.", "ns-azure.nono.io.", "ns-gce.nono.io.", "ns-gce.nono.io."]
got: ["ns-aws.nono.io.", "ns-azure.nono.io.", "ns-gce.nono.io."]
(compared using ==)
# ./spec/check-dns_spec.rb:44:in `block (3 levels) in <top (required)>'
```
DiG 9.10.6 no longer has the `+noidn` option, and `dig` will error if we
try to use it.
fixes:
```
dig +short +noidnin ns sslip.io @ns-azure.nono.io.
Invalid option: +noidnin
```
And this previously-invalid dig query now works, so we don't need the
option anyway:
```
dig +short AAAA api.--.sslip.io
::
```
This reverts commit a2564c12d3.
Yes, according to the RFC it shouldn't begin with a hyphen. And, since
we're on the topic, underscores were supposed to be off the table, too,
but Microsoft used them anyway, and you know what? We're gonna use the
"forbidden hyphen". And we're gonna instruct `dig` to not be so
persnickety.
fixes:
```
dig +short AAAA api.--.sslip.io
dig: idn2_lookup_ul failed: string start/ends with forbidden hyphen
```
I had to make it work for old-style (e.g. macOS dig) which is version
"DiG 9.8.3-P1" as well as for the new version ("DiG
9.11.3-RedHat-9.11.3-6.fc28") which has this new
[library](https://www.gnu.org/software/libidn/libidn2/reference/libidn2-idn2.html)
which does the following:
> Perform IDNA2008 lookup string conversion on domain name src , as described in section 5 of RFC 5891
- previously Name Server line began with "NS"
- now they begin with "Name Server"
- fixed typo
fixes:
```
1) sslip.io should have at least 2 nameservers
Failure/Error: expect(whois_nameservers.size).to be > 1
expected: > 1
got: 0
# ./sslip.io/spec/check-dns_spec.rb:37:in `block (2 levels) in <top (required)>'
```