Commit Graph

689 Commits

Author SHA1 Message Date
Brian Cunnie
d76e592500 🐞 Build sslip.io's nginx Dockerfile correctly
Also includes a gratuitous change to the HTML in order to trigger a
build.

Fixes <https://ci.nono.io/teams/main/pipelines/dockerfiles/jobs/build-and-push-sslip.io-nginx/builds/33>:
```
error: failed to solve: rpc error: code = Unknown desc = executor failed running [/bin/sh -c dnf install -y     bind-utils     iproute     less     lsof     neovim     net-tools     nginx     nmap-ncat     procps-ng RUN mv /usr/share/nginx/html /usr/share/nginx/html-orig]: exit code: 1
```
2022-04-22 09:04:41 -07:00
Brian Cunnie
81b0d4d739 Dockerfile nginx name explicitly states service
In this case, we rename the nginx Dockerfile to
`Dockerfile-sslip.io-nginx` to make room for the upcoming
`Dockerfile-k-v.io-nginx`
2022-04-22 08:39:31 -07:00
Brian Cunnie
1fc970a87e Dockerfiles: Replace deprecated "maintainer" label
Also, do the `dnf install` in one step, not in three.
2022-04-22 08:11:21 -07:00
Brian Cunnie
8d55c534fc Make way for k-v.io HTML website
To make room for the k-v.io HTML website, we rename the `document_root`
of the sslip.io website to the more explicit `document_root_sslip.io`.
2022-04-22 07:59:10 -07:00
Brian Cunnie
19668fac7f CI: rigorously test k-v.io
We make sure that each of the three nameservers
(ns-{aws,azure,gce}.sslip.io) can set a key-value, that the value
propagates to the remaining nameservers, that a nameserver can delete a
key, and that the deletion propagates to the remaining nameservers.
2022-04-20 16:48:50 -07:00
Brian Cunnie
28ae9b4348 k8s: use the etcd cluster IP for queries
Note: this didn't work. So sad.
2022-04-17 20:17:24 -07:00
Brian Cunnie
3f3f0ee78a 🐞 TLS for etcd: add GKE Node IPs
ns-gce is unable to join the cluster because its source IP address is
the node on which its running, 34.72.45.206, and that's not included in
the SANs.

This commit updates the etcd certificate to one which includes the three
GKE nodes' IP addresses in its SANs.

This commit also includes instruction to update the certificates in the
event of an IP address change.

Fixes:
```
Apr 16 14:15:34 ns-aws etcd[500]: rejected connection from "34.72.45.206:43080" (error "tls: \"34.72.45.206\" does not match any of DNSNames [\"ns-aws.sslip.io\" \"ns-azure.sslip.io\" \"ns-gce.sslip.io\" \"ns-aws\" \"ns-azure\" \"ns-gce\"] (lookup ns-gce: Temporary failure in name resolution)", ServerName "ns-aws.sslip.io", IPAddresses ["127.0.0.1" "52.0.56.137" "52.187.42.158" "104.155.144.4" "::1" "2600:1f18:aaf:6900::a"], DNSNames ["ns-aws.sslip.io" "ns-azure.sslip.io" "ns-gce.sslip.io" "ns-aws" "ns-azure" "ns-gce"])
```
2022-04-17 17:08:00 -07:00
Brian Cunnie
b33e1f6b37 Docs: tweaks to releasing new versions
This is the end of a series of changes which allowed my
[dns-servers](https://ci.nono.io/teams/main/pipelines/sslip.io) job to
finally go green
2022-04-13 20:26:30 -07:00
Brian Cunnie
294f54a79a Version 2.5.2: DELETE on k-v.io returns no TXT records
The original behavior was to return the deleted record, which
inadvertently prolonged the lifetime (in DNS cache) of the record which
was meant to expire as soon as possible.

- Removed the instructions to create a BOSH release. We are no longer
  creating a BOSH release because we needed to colocate an etcd release
  alongside the BOSH release, and we couldn't find an etcd BOSH release.
- Updated the instructions to run a quick test against the sslip.io DNS
  server locally (sanity check) instead of deploying a VM with the BOSH
  release & testing against that.
- Updated the instructions for updating ns-azure's DNS server. ns-azure
  is no longer a BOSH-deployed VM.
2.5.2
2022-04-13 12:55:34 -07:00
Brian Cunnie
2a0e6b105d Health checks conform to new key-value delete behavior
When we check the production servers, we now expect, when we delete a
key, to NOT receive the key's old value as a response, lest we
inadvertently extend the lifetime of the key that we want to expire.
2022-04-13 08:35:21 -07:00
Brian Cunnie
033cf481d7 k-v.io: on DELETE, don't return the deleted value
We don't return the deleted value because doing that would have the
unintended consequence of postponing the deletion: downstream caching
servers would cache the deleted value for up to three more minutes. We'd
rather have the key deleted sooner rather than later.

Some APIs, e.g. etcd's, return a list of deleted values on return: those
APIs can afford to do so because they don't need to worry about DNS
propagation.

We also lengthen the timeout of an `etcd` API call from 500 msec to 1928
msecs; 500 msec was too close; some calls routinely took 480 msec to
complete, and we wanted more headroom.

We also no longer do two `etcd` operations when we delete a value.
Previously we would do a GET followed by a DELETE, but since we're not
returning the value deleted, there's no point to the GET. Furthermore,
the GET was never necessary, for the `etcd` DELETE API call returned the
values deleted.

Drive-by:
- README: install gingko the proper way, with `go install`

[fixes #17]
2022-04-12 09:17:38 -07:00
Brian Cunnie
4d6b4375a3 src/ is in the repo's root
Now that we're no longer create BOSH releases, we don't need to bury the
`src/` directory under `bosh-release`; we can now place it under the
repo root, and we no longer need to fiddle with symbolic links.

We're not creating BOSH releases because when we decided to implement a
key-value store, we'd have to create an `etcd` BOSH release, and we
didn't want to invest the time.
2022-04-10 07:48:51 -07:00
Brian Cunnie
8483e1eb1e etcd README has troubleshooting section 2022-04-10 07:40:34 -07:00
Brian Cunnie
f4863813bb ns-aws & ns-azure have consistent etcd configs
Now that both ns-aws & ns-azure are on Ubuntu Impish (previously ns-aws
was on Fedora), we can make the configuration files consistent.
2022-04-09 18:54:18 -07:00
Brian Cunnie
3de0ccc431 README: minor corrections 2022-04-09 08:59:30 -07:00
Brian Cunnie
b46f09fa1f README: how to clear out etcd data 2022-03-30 14:06:17 -07:00
Brian Cunnie
9b8e3e36b1 etcd on Azure: conform to Ubuntu's defaults
...because it's different than Fedora's defaults
2022-03-24 07:21:03 -07:00
Brian Cunnie
a1117ef370 Azure has its own etcd configuration
Other than two lines, it's identical to AWS's etcd configuration.

I've also updated the instructions for configuring it.
2022-03-23 09:00:01 -07:00
Brian Cunnie
02fea91671 README now reflects new behavior
- You can select the port to bind to
- The NS record returned for `_acme-challenge` domains is special

Also, I removed the periods at the ends of bullets to be consistent.
2022-03-19 17:44:41 -07:00
Brian Cunnie
134ab1fd3a Bump Go dependencies
```shell
go get -u -t
```
2022-03-10 12:33:43 -08:00
Brian Cunnie
a38be5e771 Accept -port flag to bind to ports other than 53
We want to allow users to bind to ports other than 53. A big reason is
that port 53 is a privileged port, and often requires root privileges.
We don't want to force our users to use root privileges in order to run
the tests.

This isn't a problem on macOS, but is on Linux.
2022-03-10 10:28:42 -08:00
Brian Cunnie
f46eeeae25 Download blocklist hourlyhour, not hourly per boundIP
Previously we would download the blocklist every hour for every address
we've bound to, which, on Linux machines, could easily amount to 8
addresses (loopback, IPv4, several IPv6). Linux, if you recall, has a
systemd nameserver bound to 127.0.0.53, forcing us to bind to each
address individually. Downloading multiple identical versions of the
blocklist was inefficient.

With this commit, it downloads the blocklist only once per hour,
regardless of the number of individual IP addresses listened on.

But what really excites me about this commit is that I've moved much of
the initialization of the `xip.Xip` struct out of `main()` and into
`xip.NewXip()`. This makes `main()` lean again, and `xip.Xip` has gotten
complex enough that it warrants its own constructor.
2022-03-03 08:09:15 -08:00
Brian Cunnie
85a476e147 Bump dependencies: go get -u && go mod tidy 2022-03-03 06:29:31 -08:00
Brian Cunnie
26646f59a4 README is more developer-friendly with Quick Start
This repo has been forked 36 times, and yet I've done a great disservice
to my would-be developers by not describing how to run/test my code.

This commit addresses that shortcoming by having a _Quick Start_ section
very near the top.

- includes new Ginkgo v2
- includes required `sudo` for Linux
- removed the now-wrong comment about TXT records (there's now a
  plethora of TXT records such as `ip.sslip.io`)
- minor formatting tweaks
2022-03-03 06:22:20 -08:00
Brian Cunnie
cd2b14b924 BOSH release: 2.5.1: block phishers with CIDRs 2.5.1 2022-02-26 16:41:05 -08:00
Brian Cunnie
4260e752b8 Blocklist also blocks by CIDR
- `metrics.status.sslip.io` now returns information on the blocklist
2022-02-26 16:10:06 -08:00
Brian Cunnie
ae6883dd6c Include IPv6 CIDR in blocklist for testing
- updated comments in `blocklist.txt` to include references to CIDRs &
  how they're handled
- updated webpage to include description of the upcoming metrics for the
  blocklist
2022-02-26 15:57:11 -08:00
Brian Cunnie
4f3dc22e60 Singleton for metrics, etcd client, blacklist
There is now a singleton which contains global state (metrics, etcd
client, blocklist, etc.). Singleton is quite the fancy name for a global
variable, which is global by virtue of being passed around by reference.
2022-02-22 08:44:27 -08:00
Brian Cunnie
57ff7e9cb3 The Xip struct is constant, not volatile
Prior to this commit, the Xip struct served two masters: global state
(e.g. metrics) and volatile state (querier's source IP address).

This was ugly, but workable.

But with the advent of the blocklist it became untenable. I needed the
Xip struct to be truly global, to download only one copy of the
blocklist, not one copy for each of the network interfaces that
sslip.io-dns-server was listening to. Hence this change.

One the downside I had to plumb the querier's source IP address through
several layers of function calls.
2022-02-16 09:56:49 -08:00
Brian Cunnie
e8458a9dc2 "[Bb]lockList" → "[Bb]locklist"
We conform to the modern usage of "blacklist". In Google search,
"blacklist" appears 45 million times, "black list", 7 million.

Yes, I'm aware that we're using "block", not "black", for the variable
name, but keep in mind that we're using "block" as a drop-in replacement
for "black". And the newer "blocklist" has a puny 1 million appearances
to "blacklist"'s 45.
2022-02-16 08:36:59 -08:00
Brian Cunnie
33d76eb818 ❤️ Blocked CIDRs are downloaded and parsed...
...but not yet blocked. Stay tuned.

Sadly, this is my Valentine's day, spent coding. Le sigh.
2022-02-14 19:38:20 -08:00
Brian Cunnie
23eb99ca12 ReadBlocklist() now parses CIDRs as well as strings
My initial implementation of blocking phishers was flawed. I thought I
only needed to block by matching strings in a hostname (e.g.
"raiffeisen"), but I was recently served with a second abuse notice
(<https://nf-43-134-66-67.sslip.io/sg>), one which didn't lend itself to
blocking via a substring match. And at that moment I understood why
Roopinder of nip.io blocked by IP address.

The work is not yet complete, but at least I can parse and create an
array of CIDRs to match against.

Drive-by: I didn't realize Golang had increment ("++") (see [Why are ++
and -- statements and not expressions? And why postfix, not
prefix?](https://go.dev/doc/faq#inc_dec)), so I used the longer "+= 1"
throughout the codebase. Now that I know Golang has them, I use them.
2022-02-12 21:21:30 -08:00
Brian Cunnie
86b02cd59b Updated dependencies: go get -u && go mod tidy 2022-02-08 11:32:51 -08:00
Brian Cunnie
2ddaeeed23 Bump SOA → 2022020800
I love powers-of-two.
2022-02-08 11:27:43 -08:00
Brian Cunnie
01d68dcd8b Use more precise terminology in metrics
"Successful" is a nebulous term. "Answered" is more precise (at least
one record returned in the answer section of the DNS response).
2022-02-08 11:17:28 -08:00
Brian Cunnie
51ed47317e BOSH release: 2.5.0: block phishers 2.5.0 2022-02-06 19:51:06 -08:00
Brian Cunnie
5afb911f50 metrics.status.sslip.io includes blocked queries
I've refactored the metrics: where I previously used the term
"successful", I now use the term "answered". "Answered" means there was
at least one record in the Answer section of the response to the DNS
query. This is a more precise description.

I re-arranged the metrics integration test. Now it's sorted by type of
record queried (A, AAAA, MX, etc.). It's easier for me to follow.
2022-02-06 18:40:31 -08:00
Brian Cunnie
5398c543e7 Bump SOA → 2022020200
Mostly because I love the idea of an SOA that's only twos and zeros.
This makes me happy.
2022-02-02 12:36:14 -08:00
Brian Cunnie
b0a3b17238 Block names used in phishing attempts
When a hostname is queried with a blocked name, we return the address of
one of our servers (currently ns-aws). For example,
`raiffeisen.94.228.116.140.sslip.io` returns the IP address
`52.0.56.137` (`ns-aws.sslip.io`'s IPv4 address).

Currently we only block one name: "raiffeisen",
<https://en.wikipedia.org/wiki/Raiffeisenbank>.

- We enable the integration tests for the blocklist.
- We don't block private IP addresses; they can't be used in phishing
  attacks.
- At the beginning of the integration tests (`ginkgo -r .`), we now
  print the DNS server start-up messages. They help me debug.
- We broke out some of the code into their own methods.
  `processQuestion()` remains too big, but at least it's now smaller.
2022-02-02 12:17:04 -08:00
Brian Cunnie
2a704ba008 Blocklists: read it from the web
- We use `blocklist` rather than `blacklist`. If this modest change
  betters the black experience in America, then it was worth it.

TODO:
- Wire up the blocklist so we block the phisher domains
- Migrate the downloading of the blocklist outsid the `main()` method
- Uncomment the integration tests
2022-01-26 19:17:15 -08:00
Brian Cunnie
9f345b1f8e A list of "forbidden" names
This is the first step in our attempts to foil the evil phishers.

[#13]
2022-01-26 09:38:25 -08:00
Brian Cunnie
7b081f1c24 🐞 Ensure rate-throttling channel is empty on CI
We weren't aggressive enough to make sure our rate limiting channel was
emptied: we added only ten extra reads on the channel on our CI, which
worked perfectly on our Xeon workstation, but not on our CI.

Our MetricsBufferSize is 100, our delay is 250ms, and each query on CI
took ~25ms to complete, which meant we needed > 110 reads to exhaust the
channel, and we were on the knife's edge. So we doubled the number of
reads to 200 to make sure we had really, truly exhausted the channel's
buffers.

Fixes <https://ci.nono.io/teams/main/pipelines/sslip.io/jobs/unit/builds/50>:
```
sslip.io-dns-server for more complex assertions a TXT record for an "metrics.status.sslip.io" domain is repeatedly queries [It] rate-limits the queries after some amount requests

/tmp/build/b4e0c68a/sslip.io/bosh-release/src/sslip.io-dns-server/integration_test.go:302
```
2022-01-24 07:19:14 -08:00
Brian Cunnie
d42ce54947 Notes to self: how to examine logs 2022-01-23 07:46:07 -08:00
Brian Cunnie
bcceeba858 Mitigate DNS amplification attack surface
`metrics.status.sslip.io` is a vector for a DNS amplification attack; we
mitigate it by latching a 1/4 second throttle on each query after a
certain amount of queries.

That endpoint is a 4x amplifier: 100byte request with a 400 byte reply.
2022-01-22 15:51:35 -08:00
Brian Cunnie
d35cc1faa6 Release procedure has slightly better instructions 2022-01-22 10:20:12 -08:00
Brian Cunnie
8f2890d90e BOSH release: 2.4.2: fix panic() 2.4.2 2022-01-22 09:41:49 -08:00
Brian Cunnie
3e502731d4 🐞 Fix panic: runtime error: index out of range
Previously I never checked if `net.ParseIP()` returned `nil` for an IPv4
address—I couldn't imagine my IPv4 regex was incomplete. I was wrong.

Moral of the story: always check for errors, always check for nil.

Oddly, I checked for IPv6 addresses—I guess I wasn't as confident about
the regex used.

Drive-bys:
- updated SOA with today's date
- updated dependencies `go get -u`

[fixes #15]
2022-01-22 09:12:13 -08:00
Brian Cunnie
ec649870a5 Website: consistently put record type after hostname 2022-01-20 13:32:55 -08:00
Brian Cunnie
6b2d65c778 🐞 Update links to use main branch, not master
...because the website wasn't updating
2022-01-20 12:15:00 -08:00
Brian Cunnie
d2914645e4 Website shows correct endpoint for metrics
...metrics.status.sslip.io not version.status.sslip.io
2022-01-20 12:01:12 -08:00