Commit Graph

689 Commits

Author SHA1 Message Date
Brian Cunnie
c61b81c29b Server tests: update for new endpoints
fixes:
<https://ci.nono.io/teams/main/pipelines/sslip.io/jobs/dns-servers/builds/271>
2022-01-20 09:50:58 -08:00
Brian Cunnie
b496e68423 Website explains what each metric means
Drive-by: updated publishing docs.
2022-01-20 09:29:06 -08:00
Brian Cunnie
bbf1925be4 BOSH release: 2.4.1: fewer panics
customized records w/ non-existent TXTs don't panic().
2.4.1
2022-01-20 08:10:03 -08:00
Brian Cunnie
e215c4fda4 🐞 Don't panic() invoking a customized TXT
We made a mistake: we blindly invoked a function that was sometimes
`nil`. Specifically, if we had a customized domain (e.g. `ns.sslip.io`)
that didn't have a TXT record (a function), we'd try to invoke it
anyway. Bad move.

Now we ensure the function is there before we try to invoke it.

This is a curious affirmation of installing metrics: if we hadn't seen
that the server had been restarted because uptime was too low, we
wouldn't have caught this bug.

Drive-by: we made the lengths of TXT records of `version.status.sslip.io`
exactly match what we replace them with during the linking phase. We
hope that this fixes the wrong-line-numbers we see in the `panic()`
messages.

[fixes #14]
2022-01-20 07:47:48 -08:00
Brian Cunnie
b119442a37 BOSH release: 2.4.0: metrics.status.sslip.io returns metrics
Also, I moved the "versio" endpoint: `version.sslip.io` →
`version.status.sslip.io`. It seemed to make more sense to corral the
special endpoints under `status`.
2.4.0
2022-01-20 05:02:21 -08:00
Brian Cunnie
c0196ed617 🐞 Don't run etcd tests without etcd
Now we check first to see if etcd is running before diving in & testing
against it.

fixes:
```
Unexpected error:
    <*fmt.wrapError | 0xc0003bc8e0>: {
        msg: "couldn't GET \"my-key\": context deadline exceeded",
        err: <context.deadlineExceededError>{},
    }
```
2022-01-20 04:39:32 -08:00
Brian Cunnie
c48ca88c4f 🐞 SIGSEGV when mistakenly trying to access etcd
When running on a BOSH-deployed Bionic stemmcell, we panic when there's
no `etcd`

Curiously, the failure on macOS is entirely different
(`context.deadlineExceededError`)

```
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x93d10e]

goroutine 34 [running]:
go.etcd.io/etcd/client/v3.(*Client).Put(0x0, {0xb4cb68, 0xc0002730e0}, {0xc0000eceac, 0xf9cf40}, {0xc0000ecea4, 0x0}, {0x0, 0x0, 0x0})
        <autogenerated>:1 +0x4e
xip/xip.Xip.putKv({{0xc0000ece48, 0x4, 0x4}, {0xb4ccb8, 0x0}, 0xc0000890e0}, {0xc0000eceac, 0x6}, {0xc0000ecea4,
 0x7})
        /var/vcap/data/compile/sslip.io-dns-server/src/xip/xip.go:880 +0x136
```
2022-01-19 18:12:31 -08:00
Brian Cunnie
7d551b81db Metrics endpoint is fully fleshed-out
- integration tests (woohoo!)
2022-01-19 11:58:07 -08:00
Brian Cunnie
8155b0821e Drop TTL 300 → 180 seconds
I want the key-value changes to propagate faster, so I dropped the TTL.
Now, on average, it'll take on average 90 seconds for a stale value to
clear out, 3 minutes max.

I don't think this will affect traffic much; I suspect that there's not
much negative-caching going on (based on the queries I see), and the
lion's share of queries (I believe over 3/4 of my traffic are things I
don't have a record for). And I average ~5
queries-with-a-record-returned per second on ns-aws, which means that'll
bump to a little more than 8 queries a second.
2022-01-19 07:15:15 -08:00
Brian Cunnie
bf4f039001 Metrics are served via metrics.status.sslip.io
- The metrics aren't fleshed out. In fact, there's only two so far:
  1. uptime
  2. number of queries
- Even though the metrics aren't complete, I'm checking it in because
  this commit is already much too big.
- I moved the version information to `version.status.sslip.io`;
  previously it was at `version.sslip.io`. I didn't want one endpoint
  for both metrics & version (worry: DNS amplification), and I wanted a
  consistent subdomain to find that information (i.e.
  `status.sslip.io`).
- I'm not worried about atomic updates to the metrics; if a metric is
  off by one, if I skip a count because two lookups are happening at the
  exact same time, I don't care.
- The `Metrics` struct is a pointer within `Xip` because I might have
  several copies of `Xip` (if I'm binding to several interfaces
  individually), but I must only have one copy of `Metrics`
- I only include the metrics I'm interested in, usually because it took
  some work to implement that feature. I don't care about MX records,
  but I care about IPv6 lookups, DNS-01 challenges, public IP lookups.
- got rid of a section of unreachable code at the end of
  `ProcessQuestion()`; I was tired of Goland flagging it. I had it there
  mostly because I was paranoid of falling through a `switch` statement
2022-01-19 06:47:21 -08:00
Brian Cunnie
20e4238037 Use builtin key-value store when no etcd
We use a simple map as a key-value store when there's no `etcd`.

We test both: when there's etcd, and when there isn't.

I'm inordinately pleased at my intuitive understanding of
`DescribeTable()`, allowing me to define a function and using it in the
two places it's tested (with `etcd` and without), which is much better
than having duplicate copies of the same test (requiring twice the
maintenance when updating tests).

TODO: Don't run `etcd` unit tests when there's no `etcd`
2022-01-15 12:26:08 -08:00
Brian Cunnie
49efcf9868 Create counterfeit etcd for testing
We now use an interface for `Xip.Etcd` (`V3client` not
`*v3client.Client`). We've also used
[counterfeiter](https://github.com/maxbrunsfeld/counterfeiter) to create
a fake to use in testing.

Drive-by: we updated dependencies: `go get -u && go mod tidy`
2022-01-15 10:16:12 -08:00
Brian Cunnie
53bc60bc14 Introduce getKv(), putKv(), deleteKv()
The key-value portion is about to get a lot more complicated, and as a
prelude I've moved the three verbs into their own functions.

I plan to do the following:

- use an interface for Xip.Etcd (not `*v3client.Client)
- use counterfeiter to create fakes for above interface
- write vanilla Golang, non-Ginkgo test for getKv, putKv, deleteKv
- add local datastructure (non-etcd) to hold kv if no etcd
2022-01-15 08:36:56 -08:00
Brian Cunnie
b3fc9837ad -etcdHost flog overrides default "localhost:2379"
The primary purpose for this is integration tests; I don't think it has
much applicability other than that.
2022-01-15 07:45:09 -08:00
Brian Cunnie
4eaaac5f79 Accommodate lack of etcd
We need to determine whether we have `etcd` or use a local key-value
store instead.

This is for people who want to use this DNS server but don't want to
install `etcd`. Most people probably don't care for the key-value
portion anyway.

TODO: modify xip.go to accommodate lack of etcd. Test coverage, too.
2022-01-13 08:54:29 -08:00
Brian Cunnie
c4d415887e etcd: instructions to configure on ns-aws 2021-12-31 16:20:29 -08:00
Brian Cunnie
af6c0f8326 etcd cluster configuration for ns-aws.sslip.io
- patterned after the [k8s
  configuration](https://github.com/cunnie/docs/blob/main/kubernetes.md#bootstrapping-the-etcd-cluster)
- I'm ridiculously psyched that the certificates are elliptic-curve
- clients communicate no TLS loopback only
- peers require TLS over public IPs
2021-12-31 15:58:38 -08:00
Brian Cunnie
71ca8e1732 etcd: generate certs for cluster communication 2021-12-31 14:51:04 -08:00
Brian Cunnie
916b501bff Bump Ginkgo v1.16.5 → v2.0.0
Ginkgo v2.0.0 is hot off the press, released yesterday. Let's upgrade!

- `extensions/table` no longer needs to be separately imported
- `BeforeSuite()` must be outermost

fixes:
```
It looks like you are trying to add a [BeforeSuite] node within a container
```
```
imported and not used: "github.com/onsi/ginkgo/v2/extensions/table"
```
```
Entry redeclared during import "github.com/onsi/ginkgo/extensions/table"
```
2021-12-31 11:19:34 -08:00
Brian Cunnie
0f3e790b15 🐞 CI unit tests require etcd
Previously `etcd` wasn't running, causing the integration tests to fail
because they require `etcd`.

We now run `etcd`.

In the future I plan to add the ability to not require `etcd`, to use a
local table of key-value pairs, but I don't plan to test that option in
CI. It'll be for the very few users who use the sslip.io code but not
the  service.

fixes <https://ci.nono.io/teams/main/pipelines/sslip.io/jobs/unit/builds/23>:
```
{"level":"warn","ts":"2021-12-31T01:34:28.089Z","logger":"etcd-client","caller":"v3@v3.5.1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001ef340/localhost:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}

2021/12/31 01:34:28 couldn't GET "dmy-key": context deadline exceeded
```
2021-12-31 11:07:30 -08:00
Brian Cunnie
4a5032997a 🐞 Don't include invisible "d" in k-v.io error msg
We include an invisible "d" in our keys, but we don't want to leak them
to the user (it'll only serve to confuse), so we fix our error messages
to not display them. This code doesn't have coverage, but we don't feel
it's worth the contortions to cover it.

fixes
<https://ci.nono.io/teams/main/pipelines/sslip.io/jobs/unit/builds/23>
(should have been "my-key" not "dmy-key":
```
2021/12/31 17:36:27 couldn't GET "dmy-key": context deadline exceeded
```
2021-12-31 09:48:35 -08:00
Brian Cunnie
a6bf837a49 etcd: include vanilla configuration file
...that we can customize for each of our three DNS servers.

Drive-bys:

- Bumped SOA serial 2021080200 → 2021123100. There's something poetic
  about it being the last day of the year
- Deleted the old PowerDNS configuration. It's so stale there's no point
  in having it. Or mentioning it in the README.
2021-12-30 17:32:38 -08:00
Brian Cunnie
5065229a03 Key-value store domain: kv.sslip.io → k-v.io
I didn't want a really long domain for the key-value store; I wanted a
short, easy-to-remember domain. And it cost $400 for ten years.

Many good domains (e.g. keyvalue.store, kv.io)
were taken, and some weren't easily registered (e.g. the Albanian
domain, keyv.al).

Browsing these domains that were never put into use is like strolling
along the Boulevard of Broken Dreams: high hopes dashed against the hard
rocks of reality.
2021-12-29 19:56:52 -08:00
Brian Cunnie
3066a22f57 Get, Put, and Delete integrated with etcd
Previously we maintained a local table of key-value pairs
(`TxtKvCustomizations`), but this had two drawbacks:

- no persistence: when the server is restarted, all key-value pairs are
  wipe.
- no consistency: the key-value pairs on one server are completely
  orthogonal to the key-value pairs on another.

By using `etcd` to store our KV pairs, we fix both those problems.
2021-12-29 18:40:41 -08:00
Brian Cunnie
ca52c317f0 Bring in etcd support for key-value store
The addition of etcd was enough to inspire me to make a struct (`Xip`)
to hold the important information (source addr, etcd client). That way I
don't have to plumb that information through the hierarchy of function
calls.

Drive-by: fixed a bug in the random-IPv6-address-generator that would,
once in a great while, generate an IPv4 address.
2021-12-27 16:47:51 -08:00
Brian Cunnie
f76218660e TXT records have a 3-minute TTL
When we implement the key-value store, we want new values to propagate
in a reasonable amount of time. Based on no scientific evidence
whatsoever, based solely on "gut feel", I came up with three minutes
(180 seconds).

The previous value was one week. I can't imagine anyone in their right
mind waiting a full week for their key-value to propagate.
2021-12-27 16:47:51 -08:00
Brian Cunnie
33e90546d2 Bump Go dependencies
```shell
rm go.mod go.sum
go mod init xip
go mod tidy
```
2021-12-24 09:56:34 -08:00
Brian Cunnie
d43990ed50 Don't pass pointers
I was uneasy: functions were returning values and mutating arguments
(specifically `response &Response`)--I was mixing meat with dairy, and
the result wasn't kosher.

Now I only return values, and don't mutate.

According to canonical [Go Code Review
Comments](https://github.com/golang/go/wiki/CodeReviewComments#pass-values):

> Don't pass pointers as function arguments just to save a few bytes. If
a function refers to its argument x only as *x throughout, then the
argument shouldn't be a pointer. Common instances of this include
passing a pointer to a string (*string) or a pointer to an interface
value (*io.Reader). In both cases the value itself is a fixed size and
can be passed directly. This advice does not apply to large structs, or
even small structs that might grow.
2021-12-24 09:49:24 -08:00
Brian Cunnie
dd4eb3b426 pipeline: test the servers twice, not ten times
...because I don't want the test to run for an hour when a server is
down, like ns-azure.
2021-12-23 19:37:09 -08:00
Brian Cunnie
30141f1d90 CI: Test key-value store regularly 2021-12-04 10:31:12 -08:00
Brian Cunnie
25ec87feb5 🐞 sslip.io: fix key-value store for GCE
We set the number of replicas to 1 so that when you create a key-value
on `ns-gce.sslip.io`, you're sure of retrieving that value later from
`ns-gce.sslip.io`.

Previously it could hit the other replica, which would have a different
key-value store, which would make the value "disappear".
2021-12-04 10:26:40 -08:00
Brian Cunnie
bd63421c3f BOSH release: 2.3.0: kv.sslip.io key-value store 2.3.0 2021-12-04 08:16:53 -08:00
Brian Cunnie
78722b6887 kv.sslip.io: (key-value) read/write/delete TXTs
We enable special behavior under the `kv.sslip.io` subdomain: it can be
treated as a key-value store, the sub-subdomain being the key, and the
TXT record being the value.

For example, to write ("put") the value "12.0.1" to the key
"macos-version" on the `ns-gce.sslip.io.` nameserver, you'd use the
following `dig` command:

```shell
dig @ns-gce.sslip.io. txt put.12.0.1.macos-version.kv.sslip.io.
```

To read ("get") the value back, you'd write the following `dig` command:

```shell
dig @ns-gce.sslip.io. txt get.macos-version.kv.sslip.io.
```

Since "get" is the default behavior, you don't need to include it in the
domain name:

```shell
dig @ns-gce.sslip.io. txt macos-version.kv.sslip.io.
```

Finally, when you're done with the key-value, you can "delete" it:

```shell
dig @ns-gce.sslip.io. txt delete.macos-version.kv.sslip.io.
```

Notes:

- Keys are case-insensitive (to accommodate DNS convention). In other
  words, `KEY.kv.sslip.io` and `key.kv.sslip.io` return the same TXT
  record.
- Values are case-sensitive. `put.CamelCase.style.kv.sslip.io` sets the
  TXT record to "CamelCase".
- `put` requests will return the TXT record being put; i.e.
  `put.hello.world.kv.sslip.io` returns one TXT record of one string,
  `hello`.
- `delete` requests will return the TXT record being deleted; i.e.
  `delete.world.kv.sslip.io` returns one TXT record of one string,
  `hello`. If the TXT record does not exist, no TXT records will be
  returned.
- Values are limited to 63 bytes to mitigate using the sslip.io servers
  in a [DNS amplification
  attack](https://us-cert.cisa.gov/ncas/alerts/TA13-088A).
- Values are not persistent: if the server is restarted, all values
  disappear. Poof.
- Values are not consistent. If a value is set in `ns-aws.sslip.io`, it
  does not propagate to `ns-gce.sslip.io` nor `ns-azure.sslip.io`.
2021-12-04 07:59:57 -08:00
Brian Cunnie
4ba3516834 DNS server testing: randomize case of domain names
We randomize the case of domain names (previously they were always
lowercase). We hope to surface any case-related errors, but didn't find
any.
2021-11-29 08:51:18 -08:00
Brian Cunnie
b8b4786387 Update ns-aws.sslip.io's HTML assets
i.e.: <https://52-0-56-137.sslip.io/>

Previously I didn't update `index.html` properly because it wasn't
documented, and the content had become stale.
2021-11-28 20:08:52 -08:00
Brian Cunnie
e256241394 Delete pipeline-simple.yml; it's old
This pipeline's only purpose was an asset in a blog post that I wrote a
couple of years ago, and is no longer necessary.

Also, and this sounds petty, but I didn't like the RED on my CI--I'd
like to see as much green as possible. Now my CI is green (with the
exception of the many-colored "badges" pipeline).
2021-11-28 19:50:29 -08:00
Brian Cunnie
2599def6b6 Upgrading (Developer) notes: manually trigger job
Because it's a manual job currently because if it was automatic it'd
trigger & fail because the required executable isn't yet downloadable.
2021-11-28 19:45:14 -08:00
Brian Cunnie
90b94baa29 BOSH release: 2.2.4: Deprecate nono.io nameservers 2.2.4 2021-11-28 13:08:49 -08:00
Brian Cunnie
4c8e7741f1 Use @ns.sslip.io to determine your IP lookup
It makes for simpler instructions than listing the three nameservers &
which ones have IPv6.
2021-11-27 19:03:40 -08:00
Brian Cunnie
61f0ae2ae8 Remove *.nono.io nameservers
They have been replaced by the sslip.io nameservers. I had been meaning
to do this a long time, and nothing like a Thanksgiving weekend to get
long-lingering tasks done.
2021-11-27 18:52:03 -08:00
Brian Cunnie
7ed2107f36 Web page: use sslip.io servers, not nono.io 2021-11-27 18:23:02 -08:00
Brian Cunnie
690e0ad618 New Release Documentation: no more manual Docker images
The Docker images are now created automatically with our pipeline.
That's right: with 80 hours of work we saved 30 seconds of work! We are
nothing if not efficient.
2021-11-27 15:53:44 -08:00
Brian Cunnie
56191a2ef7 HTML: remove the "new software" warning
It's not new after a year. I also updated the version numbers returned
because, well, it makes the website more "fresh".
2021-11-27 12:29:58 -08:00
Brian Cunnie
4e22123114 BOSH release: 2.2.3: Include sslip.io nameservers 2.2.3 2021-11-27 11:35:01 -08:00
Brian Cunnie
fda3baeaaa Add NS servers in sslip.io domain
We currently use three nameservers in the `nono.io` domain, but that's
confusing--why not have the nameservers in the `sslip.io` domain?

This commit starts the ball rolling to convert to the sslip.io. We'll
have a brief period where we have _both_ `nono.io` and `sslip.io`
nameservers.

At which point we'll add the `sslip.io` nameservers to our registrar,
Namecheap.com.

Once they've been added to our registrar, we'll wait a day or two to
propagate, and then we'll delete references to the `nono.io`
nameservers.
2021-11-27 10:50:04 -08:00
Brian Cunnie
992458f67c simple pipeline: use default branch, not master
...especially since I recently switched from `master` to `main` on
sslip.io's repo.

Also I got rid of the Concourse groups, which I don't like at all. And I
added some pretty icons to the resources.
2021-11-26 20:46:37 -08:00
Brian Cunnie
2c4a60e315 sslip.io pipeline: use default branch, not master
...especially since I recently switched from `master` to `main` on
sslip.io's repo.

Also I got rid of the Concourse groups, which I don't like at all. And I
added some pretty icons to the resources.
2021-11-26 20:41:39 -08:00
Brian Cunnie
ab33ada856 🐞 Simple Pipeline: don't try to run YAML booleans
fixes:
```
error: error unmarshaling JSON: while decoding JSON: malformed task step: json: cannot unmarshal bool into Go struct field TaskRunConfig.config.run.path of type string
```
2021-11-24 09:13:20 -08:00
Brian Cunnie
854d8e8c1b Spec: test ip.sslip.io
Also, change the order of `dig` arguments so that the server being
queried is first (e.g. `@#{whois_nameserver}`) and the arguments (e.g.
`+short`), is last.
2021-11-05 08:10:39 -07:00
Brian Cunnie
1d4e1af656 Production test: all servers run same version 2021-11-02 05:02:46 -07:00