Integration tests would fail approximately 11% of the time (4/35) when
run in parallel (on my 8-core MacBook Air). The fix was to lengthen the
amount of time (1ms → 2ms) a port was held to make sure it was really,
truly free. After change, the tests ran 32 times without a failure.
Fixes, during `ginkgo -r -p --until-it-fails .`
```
I couldn't bind to any IPs on port 1974, so I'm exiting
...
Waiting for:
Ready to answer queries
In [JustBeforeEach] at: /Volumes/workspace/sslip.io/src/sslip.io-dns-server/integration_flags_test.go:28 @ 11/11/22 10:38:02.045
```
Previously, when binding to individual IP addrs, the last address bound
is a failure. In that case, it exposes a bug in the code which attempts
to read from a non-functional *UDPConn.
This commit fixes that by only attempting to read after a successful
bind.
Fixes, during start-up:
```
2022/11/11 07:32:44 I couldn't bind to "0.0.0.0:53" (INADDR_ANY, all interfaces), so I'll try to bind to each address individually.
2022/11/11 07:32:44 I bound to the following IPs: "127.0.0.1:53", "[::1]:53", "10.11.0.4:53", "[fc00:11::4]:53"
2022/11/11 07:32:44 I couldn't bind to the following IPs: "fe80::20d:3aff:fec7:4a3"
2022/11/11 07:32:44 Ready to answer queries
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x52e1bd]
goroutine 1 [running]:
net.(*UDPConn).readFromUDP(0x104a880?, {0xc000364800?, 0x0?, 0x4eff80?}, 0xc000080050?)
/opt/homebrew/Cellar/go/1.19.3/libexec/src/net/udpsock.go:146 +0x1d
net.(*UDPConn).ReadFromUDP(...)
/opt/homebrew/Cellar/go/1.19.3/libexec/src/net/udpsock.go:141
main.readFrom(0x0, 0x17?, 0xc00023c200)
/Users/cunnie/workspace/sslip.io/src/sslip.io-dns-server/main.go:94 +0x9f
main.main()
/Users/cunnie/workspace/sslip.io/src/sslip.io-dns-server/main.go:86 +0x645
```
Previously if you wanted to customize the IP addresses, you had to
modify the code. This commit allows you to pass the IP addresses on the
command line, comma-separated, "host=ip", e.g.
```
go run main.go -addresses ns-aws.sslip.io.=52.0.56.137,ns-aws.sslip.io.=2600:1f18:aaf:6900::a,ns-gce.sslip.io.=104.155.144.4
```
This works well in conjunction with the `-nameservers` flag. Indeed, you
could say that this is a pre-requisite of the `-nameservers` flag, for
what is the point of setting a nameserver if you can't set its IP
address?
- The default values for `-addresses` are the originally-hardcoded
values (e.g. sslip.io, ns-aws.sslip.io, k-v.io, etc.)
- Both IPv4 & IPv6 addresses work
- This is mostly tested through integration tests rather than unit
tests; I prefer integration tests in general. They are very assuring.
- A few of the unit tests depended on the hard-coded addresses; I have
removed/modified the tests to accommodate the new behavior
- Today I learned that IPv4 addresses are in the last 4 bytes, not the
first four. Also that IPv4 addresses qualify as IPv6 addresses, so
adjust your code accordingly.
- bump dependencies
- finer-grained testing:
- unit tests are always run against main branch's HEAD
- Docker files, DNS server test are run against the latest tagged
release (which is what's deployed on the production servers)
```shell
go get -u -t
```
Helps address
<https://ci.nono.io/teams/main/pipelines/sslip.io/jobs/unit/builds/39>:
```
Ginkgo detected a version mismatch between the Ginkgo CLI and the version of Ginkgo imported by your packages:
Ginkgo CLI Version:
2.4.0
Mismatched package versions found:
2.1.4 used by sslip.io-dns-server, xip
...
Output from proc 1:
flag provided but not defined: -ginkgo.grace-period
```
...because, hey, I have a Mac, and native is about 10x faster than amd64
emulation. Also because it's cool.
I had to compile my own version of Concourse's
[`registry-image`](https://github.com/concourse/registry-image-resource)
container image because the one shipped with Concourse 7.8.3 is old and
doesn't have the multi-platform feature:
```
docker build --build-arg base_image=ubuntu -t cunnie/registry-image -f dockerfiles/ubuntu/Dockerfile
```
Switch Alpine → Fedora to address weird connection issue:
```
> [linux/arm64 3/3] RUN wget https://github.com/cunnie/sslip.io/releases/download/2.6.0/sslip.io-dns-server-linux-arm64 -O /usr/sbin/sslip.io-dns-server; chmod 755 /usr/sbin/sslip.io-dns-server:
Connecting to github.com (192.30.255.113:443)
wget: error getting response: Connection reset by peer
```
[#21]
If the commit is tagged, then the release is solid, and we can build our
Dockerfiles.
Previously the Dockerfiles were built with every change. Now that I'm
making Dockerfiles first-class citizens ("Official Docker Images"), we
need discipline when building them.
[#21]
This is an admittedly gratuitous commit. I like being on the latest, and
instead of using the builtin Ruby 2.6.8 on macOS, I'd prefer to use the
much newer & sexier Ruby 3.1.2 available from `chruby`.
People may not want my name servers (`ns-gce.sslip.io` et al.), esp. in
an internetless environment where my name servers are unreachable.
This commit addresses this shortcoming by allowing the nameservers to be
set via a new commandline flag (`-nameservers`). We no longer hardcode
our name servers; instead, we make them the default value for the new
flag.
Drive-by: removed an errant `fmt.Println()` in the IPv6 `ip6.arpa` PTR
records.
Finding a free port to bind has always been a thorny problem,
particularly when running parallel tests: parallelism introduces a race
condition where two processes think the same port is free.
This commit improves the behavior by picking the port based on the
millisecond, and furthermore binds to that port for a millisecond to
make sure it's really, truly available.
It also allows a reduction of a 50-millisecond sleep to a 1 millisecond
sleep.
I'm quite proud of this algorithm.
Confession: I have no idea why I didn't use the global variable `port`
instead of deciding to thread `port` as a parameter.
But for some reason I felt that it was a good idea. Oh well. Committing
these changes before they're lost.
Parallelizable tests (`ginkgo -r -p .`) were failing on my 20-core
(`-nodes=20`) Mac Studio. We narrowed this down to two causes:
1. The servers sometimes took longer than the hard-coded 3-second delay
to become ready to answer queries.
2. The blocklist was downloaded asynchronously, and sometimes weren't
ready by the time the queries were run.
To address these, we did the following:
1. Rather than hard-code a 3-second delay, we modified the server to
signal that it's ready to answer queries (by printing "Ready to
answer queries" to the log). We now wait for that string to appear
before we begin testing the server. IMHO, this is a much better
solution than a hard-coded delay.
2. The initial download of the blocklist occurs synchronously, and
subsequent downloads, asynchronously.
Drive-bys:
- If the server can't bind to even one address, it exits.
- Refactored the blocklist code; the nested if-then-else were too deep
Fixes:
```
Expected
<string>: 43.134.66.67
to match regular expression
<string>: \A52.0.56.137\n\z
In [It] at: /Users/cunnie/workspace/sslip.io/src/sslip.io-dns-server/integration_test.go:421
```
We'd like to parallelize the tests to lay the foundation for the
upcoming expansion of flags passed to the executable (e.g.
`-nameservers`), which will spawn a series of executables, each of which
takes 3 seconds to spin up, and running that sequentially would make
testing tiresome.
- We've migrated away from `serverSession.Err).Should(Say())`
to `serverSession.Err.Contents())).Should(MatchRegexp())`. `Say()`
depends on ordering, `MatchRegexp()` doesn't.
- We introduce a short, 50-millisecond `Sleep()` in `isPortFree()` to
eliminate a race condition introduced by parallelization where the
same port is returned twice.
- Some of our `DescribeTable` tests were order-dependent; we moved them
outside the table.
- We parallelize our pipeline's unit tests.
- For the `k-v.io` tests, we used different keys for each `It()` block
to avoid pollution. We are also more careful about waiting for the
setup to complete before running the actual test.
As a side-effect of parallelizing the tests, we no longer require `sudo`
on Linux to run the tests, for we no longer attempt to bind to port 53;
instead, we bind to a series of available unprivileged ports.
Previously our integration tests bound to port 53, and, if that failed,
fell back to binding to port 3553.
This commit introduces code to scan for an open port and uses that,
which lays the foundation for potentially parallelizing the integration
tests.
The massive 80+ line `Customizations` variable is a hard-coded
monstrosity, and I've fallen out of love with it.
I'd like the customizations to be passed in from the caller, in this
case, `main.go`.
To that end, I've created a `default.json`, which should contain all the
customizations with the exception of the key-value functionality, which
I don't have a good way to deal with just yet.
`[0-9]` → `\d`, `[0-9a-f]` → `[[:xdigit:]]`
A follow on to the previous commit, which did the same for Golang.
Ruby supports the above matchers like Golang does:
<https://ruby-doc.org/core-3.1.2/Regexp.html>
Some of them are simple, e.g. `[0-9]` → `\d`, `[0-9a-f]` →
`[[:xdigit:]]`
Others I deliberately chose to ignore, like `defer x.Close()` doesn't
handle the error.
There are dogmatic users on the internet such as [Joe
Shaw](https://www.joeshaw.org/dont-defer-close-on-writable-files/)
screed, who insist that all errors should be handled, and provide
contorted & unnatural solutions that detract from the readability of the
program. I think they're wrong, at least for my purposes: I don't care
if the `Close()` errors.
The TXT response to the query `metrics.status.sslip.io` was doomed to
exceed the UDP 512-byte limit, which would have forced the client to
re-attempt via TCP, and our server doesn't yet bind to TCP.
This commit fixes that by squeezing the packet. We haven't dropped any
information, but we made it more succinct.
Per [Infoblox](https://www.infoblox.com/dns-security-resource-center/dns-security-faq/is-dns-tcp-or-udp-port-53/):
> when the message size exceeds 512 bytes, it will trigger the ‘TC’ bit
(Truncation) in DNS to be set, informing the client that the message
length has exceeded the allowed size. In these situations, the client
needs to re-transmit over TCP
We implement PTR records for IPv6, for example:
2.a.b.b.4.0.2.9.a.e.e.6.e.c.4.1.0.f.9.6.0.0.1.0.6.4.6.0.1.0.6.2.ip6.arpa →
2601-646-100-69f0-14ce-6eea-9204-bba2.sslip.io.
We implement PTR records for IPv4.
When a PTR record is not found (e.g. "127.in-addr.arpa"), it returns the
SOA record, but, unlike other record lookups (e.g. "MX"), the SOA's
mname is locked to "sslip.io" because setting the mname to
"127.in-addr.arpa" doesn't make sense.
To be done:
- Implement IPv6
- Implement Metrics
- Update README
- Deploy new version
Note: the two biggest users are Cypriot IP addresses:
```
2 106.52.50.235 <- Tencent
1 223.71.46.114 <- China Mobile
157 31.153.14.207 <- Cypriot
110 62.228.164.123 <- Cypriot
4 73.189.219.4 <- My home IP
```
`
Prohibit setting DNS-01 challenge TXT record `_acme-challenge.k-v.io`
Although it may appear the TXT record can be set or deleted, it's
hardcoded to the string, "Please don't try to procure a k-v.io cert via
DNS-01 challenge". Setting a custom value was easier than writing a
special code path.
Special thanks to [Alan Liang](http://symb.olic.link/):
> ... one could easily add (and modify) a TXT record at
_acme-challenge.k-v.io, which I believe is used for verifying domain
ownership at various cert providers, so anyone could in theory obtain
valid SSL certs for k-v.io and *.k-v.io
I've chosen to add the website to GKE, not Hetzner, because I get fewer
strident abuse messages from GKE.
I'm dismayed that when I make a small change to the DNS, I need to go
through the laborious release process for it to take effect. Sigh. Maybe
that's something I'll fix another day.