Commit Graph

674 Commits

Author SHA1 Message Date
Brian Cunnie
369ac1140d Improve test parallelization w/ nodes > 8
Parallelizable tests (`ginkgo -r -p .`) were failing on my 20-core
(`-nodes=20`) Mac Studio. We narrowed this down to two causes:

1. The servers sometimes took longer than the hard-coded 3-second delay
   to become ready to answer queries.
2. The blocklist was downloaded asynchronously, and sometimes weren't
   ready by the time the queries were run.

To address these, we did the following:

1. Rather than hard-code a 3-second delay, we modified the server to
   signal that it's ready to answer queries (by printing "Ready to
   answer queries" to the log). We now wait for that string to appear
   before we begin testing the server. IMHO, this is a much better
   solution than a hard-coded delay.
2. The initial download of the blocklist occurs synchronously, and
   subsequent downloads, asynchronously.

Drive-bys:
- If the server can't bind to even one address, it exits.
- Refactored the blocklist code; the nested if-then-else were too deep

Fixes:
```
  Expected
      <string>: 43.134.66.67

  to match regular expression
      <string>: \A52.0.56.137\n\z
  In [It] at: /Users/cunnie/workspace/sslip.io/src/sslip.io-dns-server/integration_test.go:421
```
2022-08-07 07:51:53 -07:00
Brian Cunnie
56924923d3 Ginkgo tests are parallelizable (ginkgo -p)
We'd like to parallelize the tests to lay the foundation for the
upcoming expansion of flags passed to the executable (e.g.
`-nameservers`), which will spawn a series of executables, each of which
takes 3 seconds to spin up, and running that sequentially would make
testing tiresome.

- We've migrated away from `serverSession.Err).Should(Say())`
  to `serverSession.Err.Contents())).Should(MatchRegexp())`. `Say()`
  depends on ordering, `MatchRegexp()` doesn't.
- We introduce a short, 50-millisecond `Sleep()` in `isPortFree()` to
  eliminate a race condition introduced by parallelization where the
  same port is returned twice.
- Some of our `DescribeTable` tests were order-dependent; we moved them
  outside the table.
- We parallelize our pipeline's unit tests.
- For the `k-v.io` tests, we used different keys for each `It()` block
  to avoid pollution. We are also more careful about waiting for the
  setup to complete before running the actual test.

As a side-effect of parallelizing the tests, we no longer require `sudo`
on Linux to run the tests, for we no longer attempt to bind to port 53;
instead, we bind to a series of available unprivileged ports.
2022-08-04 09:11:06 -07:00
Brian Cunnie
ee02e0badc Integration tests: scan for open port
Previously our integration tests bound to port 53, and, if that failed,
fell back to binding to port 3553.

This commit introduces code to scan for an open port and uses that,
which lays the foundation for potentially parallelizing the integration
tests.
2022-08-03 16:31:24 -07:00
Brian Cunnie
25271bf612 Golang Linter: comments atop exported elements
According to [Comment Sentences at
golang.org](https://github.com/golang/go/wiki/CodeReviewComments#comment-sentences),
this is a convention to begin a comment with the name of the exported
element.
2022-07-24 17:42:03 -04:00
Brian Cunnie
583ab609ea Laying the groundwork for passed-in configuration
The massive 80+ line `Customizations` variable is a hard-coded
monstrosity, and I've fallen out of love with it.

I'd like the customizations to be passed in from the caller, in this
case, `main.go`.

To that end, I've created a `default.json`, which should contain all the
customizations with the exception of the key-value functionality, which
I don't have a good way to deal with just yet.
2022-07-23 12:50:55 -04:00
Brian Cunnie
6363636c21 Hygeine: Ruby: Use shorter regexps
`[0-9]` → `\d`, `[0-9a-f]` → `[[:xdigit:]]`

A follow on to the previous commit, which did the same for Golang.

Ruby supports the above matchers like Golang does:
<https://ruby-doc.org/core-3.1.2/Regexp.html>
2022-07-22 12:47:14 -04:00
Brian Cunnie
7fe64f75ab Hygeine: Address Golang inspection warnings
Some of them are simple, e.g. `[0-9]` → `\d`, `[0-9a-f]` →
`[[:xdigit:]]`

Others I deliberately chose to ignore, like `defer x.Close()` doesn't
handle the error.

There are dogmatic users on the internet such as [Joe
Shaw](https://www.joeshaw.org/dont-defer-close-on-writable-files/)
screed, who insist that all errors should be handled, and provide
contorted & unnatural solutions that detract from the readability of the
program. I think they're wrong, at least for my purposes: I don't care
if the `Close()` errors.
2022-07-20 08:52:16 -07:00
Brian Cunnie
3e83a104cd Warning: our nameservers don't replace 8.8.8.8
Some people may think that these are public recursive name servers;
they're not. We warn them.

Drive-by: "nameserver" → "name server"
2022-07-17 18:51:53 -07:00
Brian Cunnie
22613bac91 Updated README's description of metrics 2022-07-15 21:46:43 -07:00
Brian Cunnie
f598bb52c7 Version 2.6.0: PTR records for IPv4 & IPv6 2.6.0 2022-07-14 18:34:40 -07:00
Brian Cunnie
314ce692f2 Update SOA to Bastille Day (7/14)
I love Bastille Day. And I love bumping the SOA right before a new
release.
2022-07-14 09:06:35 -07:00
Brian Cunnie
61f56fea14 Compress TXT metrics.status.sslip.io: 508 → 431
The TXT response to the query `metrics.status.sslip.io` was doomed to
exceed the UDP 512-byte limit, which would have forced the client to
re-attempt via TCP, and our server doesn't yet bind to TCP.

This commit fixes that by squeezing the packet. We haven't dropped any
information, but we made it more succinct.

Per [Infoblox](https://www.infoblox.com/dns-security-resource-center/dns-security-faq/is-dns-tcp-or-udp-port-53/):

> when the message size exceeds 512 bytes, it will trigger the ‘TC’ bit
(Truncation) in DNS to be set, informing the client that the message
length has exceeded the allowed size. In these situations, the client
needs to re-transmit over TCP
2022-07-14 08:57:54 -07:00
Brian Cunnie
0be2cabb08 Implement long-overdue metrics for k-v.io
Watch out: the msg size if 504 bytes, so we need to compact the metrics
before rolling this into production.
2022-07-13 13:35:59 -07:00
Brian Cunnie
d9e9f37f18 Fix typo: AnsweredXTVersionQueriesAnsweredTXTVersionQueries
It was bothering me.
2022-07-12 08:41:42 -07:00
Brian Cunnie
57976fcfb5 PTR for IPv4 is hyphen-, not dot-, separated
I prefer "192-168-0-1.sslip.io" over "192.168.0.1.sslip.io". It's
marginally faster, and it follows the convention set for IPv6 addrs.
2022-07-12 06:30:01 -07:00
Brian Cunnie
9454203f16 PTR records now have metrics
...both for IPv4 and IPv6.
2022-07-12 06:25:54 -07:00
Brian Cunnie
dc53bbccc8 IPv6 PTR (ip6.arpa)
We implement PTR records for IPv6, for example:

2.a.b.b.4.0.2.9.a.e.e.6.e.c.4.1.0.f.9.6.0.0.1.0.6.4.6.0.1.0.6.2.ip6.arpa →
2601-646-100-69f0-14ce-6eea-9204-bba2.sslip.io.
2022-07-11 20:57:55 -07:00
Brian Cunnie
db763e071c PTR: 1.0.0.127.in-addr.arpa → 127.0.0.1.sslip.io
We implement PTR records for IPv4.

When a PTR record is not found (e.g. "127.in-addr.arpa"), it returns the
SOA record, but, unlike other record lookups (e.g. "MX"), the SOA's
mname is locked to "sslip.io" because setting the mname to
"127.in-addr.arpa" doesn't make sense.

To be done:
- Implement IPv6
- Implement Metrics
- Update README
- Deploy new version
2022-07-10 08:08:58 -07:00
Brian Cunnie
110e214cd6 Bump Go dependencies
```shell
go get -u -t
```
2022-06-25 11:47:51 -07:00
Brian Cunnie
359cf6b7df Docs for self: who's using ip.sslip.io?
Note: the two biggest users are Cypriot IP addresses:

```
      2 106.52.50.235  <- Tencent
      1 223.71.46.114  <- China Mobile
    157 31.153.14.207  <- Cypriot
    110 62.228.164.123 <- Cypriot
      4 73.189.219.4   <- My home IP
```
`
2022-05-18 11:44:56 -07:00
Brian Cunnie
50d843a16a Version 2.5.4: .acme_challenge.k-v.io isn't settable 2.5.4 2022-04-30 16:42:35 -07:00
Brian Cunnie
30c72cc5d4 etcd README: use API 3 2022-04-30 16:26:18 -07:00
Brian Cunnie
623ecc4390 Docs: Update install scripts when bumping version 2022-04-28 05:27:31 -07:00
Brian Cunnie
490f0fcd35 etcd instructions: rebuilding a node 2022-04-27 17:40:52 -07:00
Brian Cunnie
03972dc565 Ensure _acme-challenge can't be set on k-v.io subdomains
The integration tests confirm that a user can't set the TXT record of,
say, `_acme-challenge.random-subdomain.k-v.io`
2022-04-27 16:41:51 -07:00
Brian Cunnie
3e98b9215e Bump SOA Serial → 2022042500
For some reason I like to keep the serial updated. Really.
2022-04-25 19:36:13 -07:00
Brian Cunnie
036f70f633 Bump Go dependencies
```shell
go get -u -t
```
2022-04-25 19:34:44 -07:00
Brian Cunnie
b7d8c4d16b k-v.io: protect against scammers seeking wildcards
Prohibit setting DNS-01 challenge TXT record `_acme-challenge.k-v.io`

Although it may appear the TXT record can be set or deleted, it's
hardcoded to the string, "Please don't try to procure a k-v.io cert via
DNS-01 challenge". Setting a custom value was easier than writing a
special code path.

Special thanks to [Alan Liang](http://symb.olic.link/):

> ... one could easily add (and modify) a TXT record at
_acme-challenge.k-v.io, which I believe is used for verifying domain
ownership at various cert providers, so anyone could in theory obtain
valid SSL certs for k-v.io and *.k-v.io
2022-04-25 19:29:31 -07:00
Brian Cunnie
6dadfd6b5b k-v.io: update the website's HTML 2022-04-23 12:01:29 -07:00
Brian Cunnie
48d6514e82 k-v.io website: delete lingering copy-paste artifact 2022-04-22 20:58:10 -07:00
Brian Cunnie
14df91e967 k-v.io website has useful information
It's a beginning, but I really wanted to get this done. I can polish it
later.
2022-04-22 20:46:19 -07:00
Brian Cunnie
9c99b954be Update release docs to 2.5.3
And revert the gratuitous change I made earlier to trigger a build.
2022-04-22 16:36:06 -07:00
Brian Cunnie
4d339cd861 Version 2.5.3: k-v.io is operational 2.5.3 2022-04-22 14:33:47 -07:00
Brian Cunnie
602ba32c7b k-v.io has an A record, a pre-requisite for a website
I've chosen to add the website to GKE, not Hetzner, because I get fewer
strident abuse messages from GKE.

I'm dismayed that when I make a small change to the DNS, I need to go
through the laborious release process for it to take effect. Sigh. Maybe
that's something I'll fix another day.
2022-04-22 13:09:58 -07:00
Brian Cunnie
cb08c5a9c3 k-v.io: has HTML assets, nginx Dockerfile to serve
We now have a Dockerfile to serve the upcoming https://k-v.io.

The dockerfile is patterned after the sslip.io nginx Dockerfile.

Note: the content isn't ready; the HTML needs fleshing out.
2022-04-22 12:25:29 -07:00
Brian Cunnie
d76e592500 🐞 Build sslip.io's nginx Dockerfile correctly
Also includes a gratuitous change to the HTML in order to trigger a
build.

Fixes <https://ci.nono.io/teams/main/pipelines/dockerfiles/jobs/build-and-push-sslip.io-nginx/builds/33>:
```
error: failed to solve: rpc error: code = Unknown desc = executor failed running [/bin/sh -c dnf install -y     bind-utils     iproute     less     lsof     neovim     net-tools     nginx     nmap-ncat     procps-ng RUN mv /usr/share/nginx/html /usr/share/nginx/html-orig]: exit code: 1
```
2022-04-22 09:04:41 -07:00
Brian Cunnie
81b0d4d739 Dockerfile nginx name explicitly states service
In this case, we rename the nginx Dockerfile to
`Dockerfile-sslip.io-nginx` to make room for the upcoming
`Dockerfile-k-v.io-nginx`
2022-04-22 08:39:31 -07:00
Brian Cunnie
1fc970a87e Dockerfiles: Replace deprecated "maintainer" label
Also, do the `dnf install` in one step, not in three.
2022-04-22 08:11:21 -07:00
Brian Cunnie
8d55c534fc Make way for k-v.io HTML website
To make room for the k-v.io HTML website, we rename the `document_root`
of the sslip.io website to the more explicit `document_root_sslip.io`.
2022-04-22 07:59:10 -07:00
Brian Cunnie
19668fac7f CI: rigorously test k-v.io
We make sure that each of the three nameservers
(ns-{aws,azure,gce}.sslip.io) can set a key-value, that the value
propagates to the remaining nameservers, that a nameserver can delete a
key, and that the deletion propagates to the remaining nameservers.
2022-04-20 16:48:50 -07:00
Brian Cunnie
28ae9b4348 k8s: use the etcd cluster IP for queries
Note: this didn't work. So sad.
2022-04-17 20:17:24 -07:00
Brian Cunnie
3f3f0ee78a 🐞 TLS for etcd: add GKE Node IPs
ns-gce is unable to join the cluster because its source IP address is
the node on which its running, 34.72.45.206, and that's not included in
the SANs.

This commit updates the etcd certificate to one which includes the three
GKE nodes' IP addresses in its SANs.

This commit also includes instruction to update the certificates in the
event of an IP address change.

Fixes:
```
Apr 16 14:15:34 ns-aws etcd[500]: rejected connection from "34.72.45.206:43080" (error "tls: \"34.72.45.206\" does not match any of DNSNames [\"ns-aws.sslip.io\" \"ns-azure.sslip.io\" \"ns-gce.sslip.io\" \"ns-aws\" \"ns-azure\" \"ns-gce\"] (lookup ns-gce: Temporary failure in name resolution)", ServerName "ns-aws.sslip.io", IPAddresses ["127.0.0.1" "52.0.56.137" "52.187.42.158" "104.155.144.4" "::1" "2600:1f18:aaf:6900::a"], DNSNames ["ns-aws.sslip.io" "ns-azure.sslip.io" "ns-gce.sslip.io" "ns-aws" "ns-azure" "ns-gce"])
```
2022-04-17 17:08:00 -07:00
Brian Cunnie
b33e1f6b37 Docs: tweaks to releasing new versions
This is the end of a series of changes which allowed my
[dns-servers](https://ci.nono.io/teams/main/pipelines/sslip.io) job to
finally go green
2022-04-13 20:26:30 -07:00
Brian Cunnie
294f54a79a Version 2.5.2: DELETE on k-v.io returns no TXT records
The original behavior was to return the deleted record, which
inadvertently prolonged the lifetime (in DNS cache) of the record which
was meant to expire as soon as possible.

- Removed the instructions to create a BOSH release. We are no longer
  creating a BOSH release because we needed to colocate an etcd release
  alongside the BOSH release, and we couldn't find an etcd BOSH release.
- Updated the instructions to run a quick test against the sslip.io DNS
  server locally (sanity check) instead of deploying a VM with the BOSH
  release & testing against that.
- Updated the instructions for updating ns-azure's DNS server. ns-azure
  is no longer a BOSH-deployed VM.
2.5.2
2022-04-13 12:55:34 -07:00
Brian Cunnie
2a0e6b105d Health checks conform to new key-value delete behavior
When we check the production servers, we now expect, when we delete a
key, to NOT receive the key's old value as a response, lest we
inadvertently extend the lifetime of the key that we want to expire.
2022-04-13 08:35:21 -07:00
Brian Cunnie
033cf481d7 k-v.io: on DELETE, don't return the deleted value
We don't return the deleted value because doing that would have the
unintended consequence of postponing the deletion: downstream caching
servers would cache the deleted value for up to three more minutes. We'd
rather have the key deleted sooner rather than later.

Some APIs, e.g. etcd's, return a list of deleted values on return: those
APIs can afford to do so because they don't need to worry about DNS
propagation.

We also lengthen the timeout of an `etcd` API call from 500 msec to 1928
msecs; 500 msec was too close; some calls routinely took 480 msec to
complete, and we wanted more headroom.

We also no longer do two `etcd` operations when we delete a value.
Previously we would do a GET followed by a DELETE, but since we're not
returning the value deleted, there's no point to the GET. Furthermore,
the GET was never necessary, for the `etcd` DELETE API call returned the
values deleted.

Drive-by:
- README: install gingko the proper way, with `go install`

[fixes #17]
2022-04-12 09:17:38 -07:00
Brian Cunnie
4d6b4375a3 src/ is in the repo's root
Now that we're no longer create BOSH releases, we don't need to bury the
`src/` directory under `bosh-release`; we can now place it under the
repo root, and we no longer need to fiddle with symbolic links.

We're not creating BOSH releases because when we decided to implement a
key-value store, we'd have to create an `etcd` BOSH release, and we
didn't want to invest the time.
2022-04-10 07:48:51 -07:00
Brian Cunnie
8483e1eb1e etcd README has troubleshooting section 2022-04-10 07:40:34 -07:00
Brian Cunnie
f4863813bb ns-aws & ns-azure have consistent etcd configs
Now that both ns-aws & ns-azure are on Ubuntu Impish (previously ns-aws
was on Fedora), we can make the configuration files consistent.
2022-04-09 18:54:18 -07:00
Brian Cunnie
3de0ccc431 README: minor corrections 2022-04-09 08:59:30 -07:00