We made a mistake: we blindly invoked a function that was sometimes
`nil`. Specifically, if we had a customized domain (e.g. `ns.sslip.io`)
that didn't have a TXT record (a function), we'd try to invoke it
anyway. Bad move.
Now we ensure the function is there before we try to invoke it.
This is a curious affirmation of installing metrics: if we hadn't seen
that the server had been restarted because uptime was too low, we
wouldn't have caught this bug.
Drive-by: we made the lengths of TXT records of `version.status.sslip.io`
exactly match what we replace them with during the linking phase. We
hope that this fixes the wrong-line-numbers we see in the `panic()`
messages.
[fixes#14]
Also, I moved the "versio" endpoint: `version.sslip.io` →
`version.status.sslip.io`. It seemed to make more sense to corral the
special endpoints under `status`.
Now we check first to see if etcd is running before diving in & testing
against it.
fixes:
```
Unexpected error:
<*fmt.wrapError | 0xc0003bc8e0>: {
msg: "couldn't GET \"my-key\": context deadline exceeded",
err: <context.deadlineExceededError>{},
}
```
I want the key-value changes to propagate faster, so I dropped the TTL.
Now, on average, it'll take on average 90 seconds for a stale value to
clear out, 3 minutes max.
I don't think this will affect traffic much; I suspect that there's not
much negative-caching going on (based on the queries I see), and the
lion's share of queries (I believe over 3/4 of my traffic are things I
don't have a record for). And I average ~5
queries-with-a-record-returned per second on ns-aws, which means that'll
bump to a little more than 8 queries a second.
- The metrics aren't fleshed out. In fact, there's only two so far:
1. uptime
2. number of queries
- Even though the metrics aren't complete, I'm checking it in because
this commit is already much too big.
- I moved the version information to `version.status.sslip.io`;
previously it was at `version.sslip.io`. I didn't want one endpoint
for both metrics & version (worry: DNS amplification), and I wanted a
consistent subdomain to find that information (i.e.
`status.sslip.io`).
- I'm not worried about atomic updates to the metrics; if a metric is
off by one, if I skip a count because two lookups are happening at the
exact same time, I don't care.
- The `Metrics` struct is a pointer within `Xip` because I might have
several copies of `Xip` (if I'm binding to several interfaces
individually), but I must only have one copy of `Metrics`
- I only include the metrics I'm interested in, usually because it took
some work to implement that feature. I don't care about MX records,
but I care about IPv6 lookups, DNS-01 challenges, public IP lookups.
- got rid of a section of unreachable code at the end of
`ProcessQuestion()`; I was tired of Goland flagging it. I had it there
mostly because I was paranoid of falling through a `switch` statement
We use a simple map as a key-value store when there's no `etcd`.
We test both: when there's etcd, and when there isn't.
I'm inordinately pleased at my intuitive understanding of
`DescribeTable()`, allowing me to define a function and using it in the
two places it's tested (with `etcd` and without), which is much better
than having duplicate copies of the same test (requiring twice the
maintenance when updating tests).
TODO: Don't run `etcd` unit tests when there's no `etcd`
We now use an interface for `Xip.Etcd` (`V3client` not
`*v3client.Client`). We've also used
[counterfeiter](https://github.com/maxbrunsfeld/counterfeiter) to create
a fake to use in testing.
Drive-by: we updated dependencies: `go get -u && go mod tidy`
The key-value portion is about to get a lot more complicated, and as a
prelude I've moved the three verbs into their own functions.
I plan to do the following:
- use an interface for Xip.Etcd (not `*v3client.Client)
- use counterfeiter to create fakes for above interface
- write vanilla Golang, non-Ginkgo test for getKv, putKv, deleteKv
- add local datastructure (non-etcd) to hold kv if no etcd
We need to determine whether we have `etcd` or use a local key-value
store instead.
This is for people who want to use this DNS server but don't want to
install `etcd`. Most people probably don't care for the key-value
portion anyway.
TODO: modify xip.go to accommodate lack of etcd. Test coverage, too.
Ginkgo v2.0.0 is hot off the press, released yesterday. Let's upgrade!
- `extensions/table` no longer needs to be separately imported
- `BeforeSuite()` must be outermost
fixes:
```
It looks like you are trying to add a [BeforeSuite] node within a container
```
```
imported and not used: "github.com/onsi/ginkgo/v2/extensions/table"
```
```
Entry redeclared during import "github.com/onsi/ginkgo/extensions/table"
```
Previously `etcd` wasn't running, causing the integration tests to fail
because they require `etcd`.
We now run `etcd`.
In the future I plan to add the ability to not require `etcd`, to use a
local table of key-value pairs, but I don't plan to test that option in
CI. It'll be for the very few users who use the sslip.io code but not
the service.
fixes <https://ci.nono.io/teams/main/pipelines/sslip.io/jobs/unit/builds/23>:
```
{"level":"warn","ts":"2021-12-31T01:34:28.089Z","logger":"etcd-client","caller":"v3@v3.5.1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001ef340/localhost:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
2021/12/31 01:34:28 couldn't GET "dmy-key": context deadline exceeded
```
We include an invisible "d" in our keys, but we don't want to leak them
to the user (it'll only serve to confuse), so we fix our error messages
to not display them. This code doesn't have coverage, but we don't feel
it's worth the contortions to cover it.
fixes
<https://ci.nono.io/teams/main/pipelines/sslip.io/jobs/unit/builds/23>
(should have been "my-key" not "dmy-key":
```
2021/12/31 17:36:27 couldn't GET "dmy-key": context deadline exceeded
```
...that we can customize for each of our three DNS servers.
Drive-bys:
- Bumped SOA serial 2021080200 → 2021123100. There's something poetic
about it being the last day of the year
- Deleted the old PowerDNS configuration. It's so stale there's no point
in having it. Or mentioning it in the README.
I didn't want a really long domain for the key-value store; I wanted a
short, easy-to-remember domain. And it cost $400 for ten years.
Many good domains (e.g. keyvalue.store, kv.io)
were taken, and some weren't easily registered (e.g. the Albanian
domain, keyv.al).
Browsing these domains that were never put into use is like strolling
along the Boulevard of Broken Dreams: high hopes dashed against the hard
rocks of reality.
Previously we maintained a local table of key-value pairs
(`TxtKvCustomizations`), but this had two drawbacks:
- no persistence: when the server is restarted, all key-value pairs are
wipe.
- no consistency: the key-value pairs on one server are completely
orthogonal to the key-value pairs on another.
By using `etcd` to store our KV pairs, we fix both those problems.
The addition of etcd was enough to inspire me to make a struct (`Xip`)
to hold the important information (source addr, etcd client). That way I
don't have to plumb that information through the hierarchy of function
calls.
Drive-by: fixed a bug in the random-IPv6-address-generator that would,
once in a great while, generate an IPv4 address.
When we implement the key-value store, we want new values to propagate
in a reasonable amount of time. Based on no scientific evidence
whatsoever, based solely on "gut feel", I came up with three minutes
(180 seconds).
The previous value was one week. I can't imagine anyone in their right
mind waiting a full week for their key-value to propagate.
I was uneasy: functions were returning values and mutating arguments
(specifically `response &Response`)--I was mixing meat with dairy, and
the result wasn't kosher.
Now I only return values, and don't mutate.
According to canonical [Go Code Review
Comments](https://github.com/golang/go/wiki/CodeReviewComments#pass-values):
> Don't pass pointers as function arguments just to save a few bytes. If
a function refers to its argument x only as *x throughout, then the
argument shouldn't be a pointer. Common instances of this include
passing a pointer to a string (*string) or a pointer to an interface
value (*io.Reader). In both cases the value itself is a fixed size and
can be passed directly. This advice does not apply to large structs, or
even small structs that might grow.
We set the number of replicas to 1 so that when you create a key-value
on `ns-gce.sslip.io`, you're sure of retrieving that value later from
`ns-gce.sslip.io`.
Previously it could hit the other replica, which would have a different
key-value store, which would make the value "disappear".
We enable special behavior under the `kv.sslip.io` subdomain: it can be
treated as a key-value store, the sub-subdomain being the key, and the
TXT record being the value.
For example, to write ("put") the value "12.0.1" to the key
"macos-version" on the `ns-gce.sslip.io.` nameserver, you'd use the
following `dig` command:
```shell
dig @ns-gce.sslip.io. txt put.12.0.1.macos-version.kv.sslip.io.
```
To read ("get") the value back, you'd write the following `dig` command:
```shell
dig @ns-gce.sslip.io. txt get.macos-version.kv.sslip.io.
```
Since "get" is the default behavior, you don't need to include it in the
domain name:
```shell
dig @ns-gce.sslip.io. txt macos-version.kv.sslip.io.
```
Finally, when you're done with the key-value, you can "delete" it:
```shell
dig @ns-gce.sslip.io. txt delete.macos-version.kv.sslip.io.
```
Notes:
- Keys are case-insensitive (to accommodate DNS convention). In other
words, `KEY.kv.sslip.io` and `key.kv.sslip.io` return the same TXT
record.
- Values are case-sensitive. `put.CamelCase.style.kv.sslip.io` sets the
TXT record to "CamelCase".
- `put` requests will return the TXT record being put; i.e.
`put.hello.world.kv.sslip.io` returns one TXT record of one string,
`hello`.
- `delete` requests will return the TXT record being deleted; i.e.
`delete.world.kv.sslip.io` returns one TXT record of one string,
`hello`. If the TXT record does not exist, no TXT records will be
returned.
- Values are limited to 63 bytes to mitigate using the sslip.io servers
in a [DNS amplification
attack](https://us-cert.cisa.gov/ncas/alerts/TA13-088A).
- Values are not persistent: if the server is restarted, all values
disappear. Poof.
- Values are not consistent. If a value is set in `ns-aws.sslip.io`, it
does not propagate to `ns-gce.sslip.io` nor `ns-azure.sslip.io`.
This pipeline's only purpose was an asset in a blog post that I wrote a
couple of years ago, and is no longer necessary.
Also, and this sounds petty, but I didn't like the RED on my CI--I'd
like to see as much green as possible. Now my CI is green (with the
exception of the many-colored "badges" pipeline).
They have been replaced by the sslip.io nameservers. I had been meaning
to do this a long time, and nothing like a Thanksgiving weekend to get
long-lingering tasks done.
The Docker images are now created automatically with our pipeline.
That's right: with 80 hours of work we saved 30 seconds of work! We are
nothing if not efficient.
We currently use three nameservers in the `nono.io` domain, but that's
confusing--why not have the nameservers in the `sslip.io` domain?
This commit starts the ball rolling to convert to the sslip.io. We'll
have a brief period where we have _both_ `nono.io` and `sslip.io`
nameservers.
At which point we'll add the `sslip.io` nameservers to our registrar,
Namecheap.com.
Once they've been added to our registrar, we'll wait a day or two to
propagate, and then we'll delete references to the `nono.io`
nameservers.
...especially since I recently switched from `master` to `main` on
sslip.io's repo.
Also I got rid of the Concourse groups, which I don't like at all. And I
added some pretty icons to the resources.
...especially since I recently switched from `master` to `main` on
sslip.io's repo.
Also I got rid of the Concourse groups, which I don't like at all. And I
added some pretty icons to the resources.
fixes:
```
error: error unmarshaling JSON: while decoding JSON: malformed task step: json: cannot unmarshal bool into Go struct field TaskRunConfig.config.run.path of type string
```
Also, change the order of `dig` arguments so that the server being
queried is first (e.g. `@#{whois_nameserver}`) and the arguments (e.g.
`+short`), is last.