Commit Graph

659 Commits

Author SHA1 Message Date
Brian Cunnie
33d76eb818 ❤️ Blocked CIDRs are downloaded and parsed...
...but not yet blocked. Stay tuned.

Sadly, this is my Valentine's day, spent coding. Le sigh.
2022-02-14 19:38:20 -08:00
Brian Cunnie
23eb99ca12 ReadBlocklist() now parses CIDRs as well as strings
My initial implementation of blocking phishers was flawed. I thought I
only needed to block by matching strings in a hostname (e.g.
"raiffeisen"), but I was recently served with a second abuse notice
(<https://nf-43-134-66-67.sslip.io/sg>), one which didn't lend itself to
blocking via a substring match. And at that moment I understood why
Roopinder of nip.io blocked by IP address.

The work is not yet complete, but at least I can parse and create an
array of CIDRs to match against.

Drive-by: I didn't realize Golang had increment ("++") (see [Why are ++
and -- statements and not expressions? And why postfix, not
prefix?](https://go.dev/doc/faq#inc_dec)), so I used the longer "+= 1"
throughout the codebase. Now that I know Golang has them, I use them.
2022-02-12 21:21:30 -08:00
Brian Cunnie
86b02cd59b Updated dependencies: go get -u && go mod tidy 2022-02-08 11:32:51 -08:00
Brian Cunnie
2ddaeeed23 Bump SOA → 2022020800
I love powers-of-two.
2022-02-08 11:27:43 -08:00
Brian Cunnie
01d68dcd8b Use more precise terminology in metrics
"Successful" is a nebulous term. "Answered" is more precise (at least
one record returned in the answer section of the DNS response).
2022-02-08 11:17:28 -08:00
Brian Cunnie
51ed47317e BOSH release: 2.5.0: block phishers 2.5.0 2022-02-06 19:51:06 -08:00
Brian Cunnie
5afb911f50 metrics.status.sslip.io includes blocked queries
I've refactored the metrics: where I previously used the term
"successful", I now use the term "answered". "Answered" means there was
at least one record in the Answer section of the response to the DNS
query. This is a more precise description.

I re-arranged the metrics integration test. Now it's sorted by type of
record queried (A, AAAA, MX, etc.). It's easier for me to follow.
2022-02-06 18:40:31 -08:00
Brian Cunnie
5398c543e7 Bump SOA → 2022020200
Mostly because I love the idea of an SOA that's only twos and zeros.
This makes me happy.
2022-02-02 12:36:14 -08:00
Brian Cunnie
b0a3b17238 Block names used in phishing attempts
When a hostname is queried with a blocked name, we return the address of
one of our servers (currently ns-aws). For example,
`raiffeisen.94.228.116.140.sslip.io` returns the IP address
`52.0.56.137` (`ns-aws.sslip.io`'s IPv4 address).

Currently we only block one name: "raiffeisen",
<https://en.wikipedia.org/wiki/Raiffeisenbank>.

- We enable the integration tests for the blocklist.
- We don't block private IP addresses; they can't be used in phishing
  attacks.
- At the beginning of the integration tests (`ginkgo -r .`), we now
  print the DNS server start-up messages. They help me debug.
- We broke out some of the code into their own methods.
  `processQuestion()` remains too big, but at least it's now smaller.
2022-02-02 12:17:04 -08:00
Brian Cunnie
2a704ba008 Blocklists: read it from the web
- We use `blocklist` rather than `blacklist`. If this modest change
  betters the black experience in America, then it was worth it.

TODO:
- Wire up the blocklist so we block the phisher domains
- Migrate the downloading of the blocklist outsid the `main()` method
- Uncomment the integration tests
2022-01-26 19:17:15 -08:00
Brian Cunnie
9f345b1f8e A list of "forbidden" names
This is the first step in our attempts to foil the evil phishers.

[#13]
2022-01-26 09:38:25 -08:00
Brian Cunnie
7b081f1c24 🐞 Ensure rate-throttling channel is empty on CI
We weren't aggressive enough to make sure our rate limiting channel was
emptied: we added only ten extra reads on the channel on our CI, which
worked perfectly on our Xeon workstation, but not on our CI.

Our MetricsBufferSize is 100, our delay is 250ms, and each query on CI
took ~25ms to complete, which meant we needed > 110 reads to exhaust the
channel, and we were on the knife's edge. So we doubled the number of
reads to 200 to make sure we had really, truly exhausted the channel's
buffers.

Fixes <https://ci.nono.io/teams/main/pipelines/sslip.io/jobs/unit/builds/50>:
```
sslip.io-dns-server for more complex assertions a TXT record for an "metrics.status.sslip.io" domain is repeatedly queries [It] rate-limits the queries after some amount requests

/tmp/build/b4e0c68a/sslip.io/bosh-release/src/sslip.io-dns-server/integration_test.go:302
```
2022-01-24 07:19:14 -08:00
Brian Cunnie
d42ce54947 Notes to self: how to examine logs 2022-01-23 07:46:07 -08:00
Brian Cunnie
bcceeba858 Mitigate DNS amplification attack surface
`metrics.status.sslip.io` is a vector for a DNS amplification attack; we
mitigate it by latching a 1/4 second throttle on each query after a
certain amount of queries.

That endpoint is a 4x amplifier: 100byte request with a 400 byte reply.
2022-01-22 15:51:35 -08:00
Brian Cunnie
d35cc1faa6 Release procedure has slightly better instructions 2022-01-22 10:20:12 -08:00
Brian Cunnie
8f2890d90e BOSH release: 2.4.2: fix panic() 2.4.2 2022-01-22 09:41:49 -08:00
Brian Cunnie
3e502731d4 🐞 Fix panic: runtime error: index out of range
Previously I never checked if `net.ParseIP()` returned `nil` for an IPv4
address—I couldn't imagine my IPv4 regex was incomplete. I was wrong.

Moral of the story: always check for errors, always check for nil.

Oddly, I checked for IPv6 addresses—I guess I wasn't as confident about
the regex used.

Drive-bys:
- updated SOA with today's date
- updated dependencies `go get -u`

[fixes #15]
2022-01-22 09:12:13 -08:00
Brian Cunnie
ec649870a5 Website: consistently put record type after hostname 2022-01-20 13:32:55 -08:00
Brian Cunnie
6b2d65c778 🐞 Update links to use main branch, not master
...because the website wasn't updating
2022-01-20 12:15:00 -08:00
Brian Cunnie
d2914645e4 Website shows correct endpoint for metrics
...metrics.status.sslip.io not version.status.sslip.io
2022-01-20 12:01:12 -08:00
Brian Cunnie
c61b81c29b Server tests: update for new endpoints
fixes:
<https://ci.nono.io/teams/main/pipelines/sslip.io/jobs/dns-servers/builds/271>
2022-01-20 09:50:58 -08:00
Brian Cunnie
b496e68423 Website explains what each metric means
Drive-by: updated publishing docs.
2022-01-20 09:29:06 -08:00
Brian Cunnie
bbf1925be4 BOSH release: 2.4.1: fewer panics
customized records w/ non-existent TXTs don't panic().
2.4.1
2022-01-20 08:10:03 -08:00
Brian Cunnie
e215c4fda4 🐞 Don't panic() invoking a customized TXT
We made a mistake: we blindly invoked a function that was sometimes
`nil`. Specifically, if we had a customized domain (e.g. `ns.sslip.io`)
that didn't have a TXT record (a function), we'd try to invoke it
anyway. Bad move.

Now we ensure the function is there before we try to invoke it.

This is a curious affirmation of installing metrics: if we hadn't seen
that the server had been restarted because uptime was too low, we
wouldn't have caught this bug.

Drive-by: we made the lengths of TXT records of `version.status.sslip.io`
exactly match what we replace them with during the linking phase. We
hope that this fixes the wrong-line-numbers we see in the `panic()`
messages.

[fixes #14]
2022-01-20 07:47:48 -08:00
Brian Cunnie
b119442a37 BOSH release: 2.4.0: metrics.status.sslip.io returns metrics
Also, I moved the "versio" endpoint: `version.sslip.io` →
`version.status.sslip.io`. It seemed to make more sense to corral the
special endpoints under `status`.
2.4.0
2022-01-20 05:02:21 -08:00
Brian Cunnie
c0196ed617 🐞 Don't run etcd tests without etcd
Now we check first to see if etcd is running before diving in & testing
against it.

fixes:
```
Unexpected error:
    <*fmt.wrapError | 0xc0003bc8e0>: {
        msg: "couldn't GET \"my-key\": context deadline exceeded",
        err: <context.deadlineExceededError>{},
    }
```
2022-01-20 04:39:32 -08:00
Brian Cunnie
c48ca88c4f 🐞 SIGSEGV when mistakenly trying to access etcd
When running on a BOSH-deployed Bionic stemmcell, we panic when there's
no `etcd`

Curiously, the failure on macOS is entirely different
(`context.deadlineExceededError`)

```
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x93d10e]

goroutine 34 [running]:
go.etcd.io/etcd/client/v3.(*Client).Put(0x0, {0xb4cb68, 0xc0002730e0}, {0xc0000eceac, 0xf9cf40}, {0xc0000ecea4, 0x0}, {0x0, 0x0, 0x0})
        <autogenerated>:1 +0x4e
xip/xip.Xip.putKv({{0xc0000ece48, 0x4, 0x4}, {0xb4ccb8, 0x0}, 0xc0000890e0}, {0xc0000eceac, 0x6}, {0xc0000ecea4,
 0x7})
        /var/vcap/data/compile/sslip.io-dns-server/src/xip/xip.go:880 +0x136
```
2022-01-19 18:12:31 -08:00
Brian Cunnie
7d551b81db Metrics endpoint is fully fleshed-out
- integration tests (woohoo!)
2022-01-19 11:58:07 -08:00
Brian Cunnie
8155b0821e Drop TTL 300 → 180 seconds
I want the key-value changes to propagate faster, so I dropped the TTL.
Now, on average, it'll take on average 90 seconds for a stale value to
clear out, 3 minutes max.

I don't think this will affect traffic much; I suspect that there's not
much negative-caching going on (based on the queries I see), and the
lion's share of queries (I believe over 3/4 of my traffic are things I
don't have a record for). And I average ~5
queries-with-a-record-returned per second on ns-aws, which means that'll
bump to a little more than 8 queries a second.
2022-01-19 07:15:15 -08:00
Brian Cunnie
bf4f039001 Metrics are served via metrics.status.sslip.io
- The metrics aren't fleshed out. In fact, there's only two so far:
  1. uptime
  2. number of queries
- Even though the metrics aren't complete, I'm checking it in because
  this commit is already much too big.
- I moved the version information to `version.status.sslip.io`;
  previously it was at `version.sslip.io`. I didn't want one endpoint
  for both metrics & version (worry: DNS amplification), and I wanted a
  consistent subdomain to find that information (i.e.
  `status.sslip.io`).
- I'm not worried about atomic updates to the metrics; if a metric is
  off by one, if I skip a count because two lookups are happening at the
  exact same time, I don't care.
- The `Metrics` struct is a pointer within `Xip` because I might have
  several copies of `Xip` (if I'm binding to several interfaces
  individually), but I must only have one copy of `Metrics`
- I only include the metrics I'm interested in, usually because it took
  some work to implement that feature. I don't care about MX records,
  but I care about IPv6 lookups, DNS-01 challenges, public IP lookups.
- got rid of a section of unreachable code at the end of
  `ProcessQuestion()`; I was tired of Goland flagging it. I had it there
  mostly because I was paranoid of falling through a `switch` statement
2022-01-19 06:47:21 -08:00
Brian Cunnie
20e4238037 Use builtin key-value store when no etcd
We use a simple map as a key-value store when there's no `etcd`.

We test both: when there's etcd, and when there isn't.

I'm inordinately pleased at my intuitive understanding of
`DescribeTable()`, allowing me to define a function and using it in the
two places it's tested (with `etcd` and without), which is much better
than having duplicate copies of the same test (requiring twice the
maintenance when updating tests).

TODO: Don't run `etcd` unit tests when there's no `etcd`
2022-01-15 12:26:08 -08:00
Brian Cunnie
49efcf9868 Create counterfeit etcd for testing
We now use an interface for `Xip.Etcd` (`V3client` not
`*v3client.Client`). We've also used
[counterfeiter](https://github.com/maxbrunsfeld/counterfeiter) to create
a fake to use in testing.

Drive-by: we updated dependencies: `go get -u && go mod tidy`
2022-01-15 10:16:12 -08:00
Brian Cunnie
53bc60bc14 Introduce getKv(), putKv(), deleteKv()
The key-value portion is about to get a lot more complicated, and as a
prelude I've moved the three verbs into their own functions.

I plan to do the following:

- use an interface for Xip.Etcd (not `*v3client.Client)
- use counterfeiter to create fakes for above interface
- write vanilla Golang, non-Ginkgo test for getKv, putKv, deleteKv
- add local datastructure (non-etcd) to hold kv if no etcd
2022-01-15 08:36:56 -08:00
Brian Cunnie
b3fc9837ad -etcdHost flog overrides default "localhost:2379"
The primary purpose for this is integration tests; I don't think it has
much applicability other than that.
2022-01-15 07:45:09 -08:00
Brian Cunnie
4eaaac5f79 Accommodate lack of etcd
We need to determine whether we have `etcd` or use a local key-value
store instead.

This is for people who want to use this DNS server but don't want to
install `etcd`. Most people probably don't care for the key-value
portion anyway.

TODO: modify xip.go to accommodate lack of etcd. Test coverage, too.
2022-01-13 08:54:29 -08:00
Brian Cunnie
c4d415887e etcd: instructions to configure on ns-aws 2021-12-31 16:20:29 -08:00
Brian Cunnie
af6c0f8326 etcd cluster configuration for ns-aws.sslip.io
- patterned after the [k8s
  configuration](https://github.com/cunnie/docs/blob/main/kubernetes.md#bootstrapping-the-etcd-cluster)
- I'm ridiculously psyched that the certificates are elliptic-curve
- clients communicate no TLS loopback only
- peers require TLS over public IPs
2021-12-31 15:58:38 -08:00
Brian Cunnie
71ca8e1732 etcd: generate certs for cluster communication 2021-12-31 14:51:04 -08:00
Brian Cunnie
916b501bff Bump Ginkgo v1.16.5 → v2.0.0
Ginkgo v2.0.0 is hot off the press, released yesterday. Let's upgrade!

- `extensions/table` no longer needs to be separately imported
- `BeforeSuite()` must be outermost

fixes:
```
It looks like you are trying to add a [BeforeSuite] node within a container
```
```
imported and not used: "github.com/onsi/ginkgo/v2/extensions/table"
```
```
Entry redeclared during import "github.com/onsi/ginkgo/extensions/table"
```
2021-12-31 11:19:34 -08:00
Brian Cunnie
0f3e790b15 🐞 CI unit tests require etcd
Previously `etcd` wasn't running, causing the integration tests to fail
because they require `etcd`.

We now run `etcd`.

In the future I plan to add the ability to not require `etcd`, to use a
local table of key-value pairs, but I don't plan to test that option in
CI. It'll be for the very few users who use the sslip.io code but not
the  service.

fixes <https://ci.nono.io/teams/main/pipelines/sslip.io/jobs/unit/builds/23>:
```
{"level":"warn","ts":"2021-12-31T01:34:28.089Z","logger":"etcd-client","caller":"v3@v3.5.1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001ef340/localhost:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}

2021/12/31 01:34:28 couldn't GET "dmy-key": context deadline exceeded
```
2021-12-31 11:07:30 -08:00
Brian Cunnie
4a5032997a 🐞 Don't include invisible "d" in k-v.io error msg
We include an invisible "d" in our keys, but we don't want to leak them
to the user (it'll only serve to confuse), so we fix our error messages
to not display them. This code doesn't have coverage, but we don't feel
it's worth the contortions to cover it.

fixes
<https://ci.nono.io/teams/main/pipelines/sslip.io/jobs/unit/builds/23>
(should have been "my-key" not "dmy-key":
```
2021/12/31 17:36:27 couldn't GET "dmy-key": context deadline exceeded
```
2021-12-31 09:48:35 -08:00
Brian Cunnie
a6bf837a49 etcd: include vanilla configuration file
...that we can customize for each of our three DNS servers.

Drive-bys:

- Bumped SOA serial 2021080200 → 2021123100. There's something poetic
  about it being the last day of the year
- Deleted the old PowerDNS configuration. It's so stale there's no point
  in having it. Or mentioning it in the README.
2021-12-30 17:32:38 -08:00
Brian Cunnie
5065229a03 Key-value store domain: kv.sslip.io → k-v.io
I didn't want a really long domain for the key-value store; I wanted a
short, easy-to-remember domain. And it cost $400 for ten years.

Many good domains (e.g. keyvalue.store, kv.io)
were taken, and some weren't easily registered (e.g. the Albanian
domain, keyv.al).

Browsing these domains that were never put into use is like strolling
along the Boulevard of Broken Dreams: high hopes dashed against the hard
rocks of reality.
2021-12-29 19:56:52 -08:00
Brian Cunnie
3066a22f57 Get, Put, and Delete integrated with etcd
Previously we maintained a local table of key-value pairs
(`TxtKvCustomizations`), but this had two drawbacks:

- no persistence: when the server is restarted, all key-value pairs are
  wipe.
- no consistency: the key-value pairs on one server are completely
  orthogonal to the key-value pairs on another.

By using `etcd` to store our KV pairs, we fix both those problems.
2021-12-29 18:40:41 -08:00
Brian Cunnie
ca52c317f0 Bring in etcd support for key-value store
The addition of etcd was enough to inspire me to make a struct (`Xip`)
to hold the important information (source addr, etcd client). That way I
don't have to plumb that information through the hierarchy of function
calls.

Drive-by: fixed a bug in the random-IPv6-address-generator that would,
once in a great while, generate an IPv4 address.
2021-12-27 16:47:51 -08:00
Brian Cunnie
f76218660e TXT records have a 3-minute TTL
When we implement the key-value store, we want new values to propagate
in a reasonable amount of time. Based on no scientific evidence
whatsoever, based solely on "gut feel", I came up with three minutes
(180 seconds).

The previous value was one week. I can't imagine anyone in their right
mind waiting a full week for their key-value to propagate.
2021-12-27 16:47:51 -08:00
Brian Cunnie
33e90546d2 Bump Go dependencies
```shell
rm go.mod go.sum
go mod init xip
go mod tidy
```
2021-12-24 09:56:34 -08:00
Brian Cunnie
d43990ed50 Don't pass pointers
I was uneasy: functions were returning values and mutating arguments
(specifically `response &Response`)--I was mixing meat with dairy, and
the result wasn't kosher.

Now I only return values, and don't mutate.

According to canonical [Go Code Review
Comments](https://github.com/golang/go/wiki/CodeReviewComments#pass-values):

> Don't pass pointers as function arguments just to save a few bytes. If
a function refers to its argument x only as *x throughout, then the
argument shouldn't be a pointer. Common instances of this include
passing a pointer to a string (*string) or a pointer to an interface
value (*io.Reader). In both cases the value itself is a fixed size and
can be passed directly. This advice does not apply to large structs, or
even small structs that might grow.
2021-12-24 09:49:24 -08:00
Brian Cunnie
dd4eb3b426 pipeline: test the servers twice, not ten times
...because I don't want the test to run for an hour when a server is
down, like ns-azure.
2021-12-23 19:37:09 -08:00
Brian Cunnie
30141f1d90 CI: Test key-value store regularly 2021-12-04 10:31:12 -08:00