When I ran the following code, I would get at least one of the above
messages when run on the Linux CI container:
```bash
for i in $(seq 1 32); do
dig @localhost asdf +short &
done
```
The error hasn't recurred after this change, in spite of running the
above code a dozen times.
The behavior of `dig` version **9.11.25-RedHat-9.11.25-2.fc32** differs
from macOS's `dig` version **9.10.6**. In other words, this test passes
on my mac but not until now on (Linux-based) CI.
I also took the opportunity to refactor our `dig` arguments to conform with
the suggested usage:
> Usage: dig [@global-server] [domain] [q-type] [q-class] {q-opt}
fixes <https://ci.nono.io/teams/main/pipelines/sslip.io/jobs/unit/builds/145>:
```
Expected
<int>: 9
to match exit code:
<int>: 0
```
Note that for the `any` test I had to append an additional `+notcp`
argument to avoid an attempted TCP connection. I suspect a bug in `dig`:
```
dig any sslip.io @localhost
;; Connection to 127.0.0.1#53(127.0.0.1) for sslip.io failed: connection refused.
```
- it appears that Let's Encrypt requires setting at least two TXT
records; before I only allowed one to be set; now you can set as many as
you want.
- our records had a TTL of 0 seconds; I bumped it to 60: long enough to
get a cert, short enough to refesh for a second attempt if the first one
failed.
Previously we weren't returning a response when `acme.sh` updated our
TXT record, but the acme-dns endpoint specifies a
[response](https://github.com/joohoi/acme-dns#response), and acme.sh
expects [a
response](b7a3fe05a4/dnsapi/dns_acmedns.sh (L38)).
fixes:
```
[Mon Jan 18 19:09:26 UTC 2021] invalid response of acme-dns
[Mon Jan 18 19:09:26 UTC 2021] Error add txt for domain:_acme-challenge.34-83-219-164.sslip.io
```
We had moved the DNS server to a sub-directory to make room for a
sibling application, a small DNS server + small HTTP server.
fixes:
```
cannot find package "main.go" in any of:
/usr/local/Cellar/go/1.15.6/libexec/src/main.go (from $GOROOT)
/Users/cunnie/go/src/main.go (from $GOPATH)
```
When querying for a record with `_acme-challenge.` and an embedded IP
address, we mistakenly responded with an answer with the
**authoritative** flag set and the **SOA** record in the **Authorities**
section. But that was wrong: we should NOT have set the
**authoritative** flag, and we should have included the **NS** record,
not the **SOA** record, in the **Authorities** section.
Created much-needed integration-level tests. The existing unit tests
were difficult to set up, were too constricting when refactoring, and
were less meaningful than the integration tests.
Hygiene:
- Eliminated the `ErrNotFound` error; it was hack: it was a substitute
for returning an empty set, which I wasn't handling correctly. Now, when
a record isn't found (usually because it's not a customized domain), it
returns an empty set, not an error.
- Used the `dnsmessage` capitalization convention where applicable, e.g.
`SOAResource()` not `SOAResource()`
- Replaced painful low-level copying with the `dnsmessage`'s utility
functions, `NewName()` and `MustNewName()`
This change triggered a huge rewrite, for we had hard-coded the
**authoritative** flag. The newer code is more flexible, and lays the
groundwork for future changes such as including IP addresses in the
**Additionals** section.
This change is to enable wildcard certificates via DNS challenge.
My earlier attempt failed: when queried for TXT for
`_acme-challenge.127-0-0-1.sslip.io`, I returned an authoritative
response with the Authorities section containing the SOA record, which
signaled, "There is no TXT record, of that I am sure." Even though I had
configured an NS record `_acme-challenge.127-0-0-1.sslip.io` to return
`127-0-0-1.sslip.io`.
Now I return an authoritative response with an NS record, not an SOA
record, in the response.
But that's still not enough, and I'd like to do the following changes:
- When queried for a DNS-01 challenge, e.g.
`_acme-challenge.127-0-0-1.sslip.io`, I return a _non-authoritative_
response with the NS record. This is the behavior of, say, `dig ns
sslip.io @a0.nic.io.`
- When queried for any NS records, I return an Additionals section
with the IP addresses.
**This process still does not work**. We need to fix our sslip.io DNS
server code. That being said, once our DNS server code is fixed, this
process _should_ work.
As much as we'd have liked to use `joohoi/acme-dns`, it didn't work with
our setup, possibly due to our DNS server code brokenness, mentioned
above. At any rate, we have our own `acme-dns` replacement, which we
intend to use going forward.
Previously I handcoded the `dnsmessage.Name{}` structs, but this
function makes it much more easy.
The only downside is that I consistently ignore any errors returned.
I scoped the change to the code, but not the tests.
This DNS/HTTP server enables the procurement of wildcard certs for
sslip.io subdomains.
Drive-by:
- Removed the apostrophe from the initialized TXT string so that
cutting-and-pasting the string is less difficult (but the backslashes
and double quotes are still a pain).
- The DNS/HTTP server logs output when the TXT record is updated. We log
most actions, and this is perhaps the most important one, so it was an
oversight that we didn't log it.
This is an [acme-dns](https://github.com/joohoi/acme-dns)-compatible
webserver that allows you to update the TXT record to verify domain
ownership to the certificate authority in order to procure a wildcard
certificate.
This small DNS server only returns one type of record, a TXT record,
meant to be a token assigned by a certificate authority (e.g. Let's
Encrypt) to verify domain ownership.
The TXT record will be updateable by an API endpoint on the webserver
(same executable as the DNS server), but I haven't yet written that
portion.
Drive-by: in our _other_ (main) sslip.io DNS server, I changed `break` →
`continue` in the main loop. Had we gotten a malformed UDP packet, we
would have exited, but now we continue to the next packet. Exiting is
not that big a deal—`monit` would have restarted the server—but moving
on to the next packet is a more robust approach.
[#6]
Warning: these instructions do not work & are incomplete.
I had high hopes for [acme-dns](https://github.com/joohoi/acme-dns), but
it seems much too baroque for my purposes—authentication, subdomains,
CNAMEs. It seems quite clever for a use case that is much more
complicated than mine.
I've resolved to write an _acme-dns_-compatible HTTP server & DNS server
to meet my much simpler needs.
I'm going to create a simple HTTP/DNS server that has the same API as
[acme-dns](https://github.com/joohoi/acme-dns) but isn't so complicated,
and I want that code to be next to the regular DNS server code.
caveat: Although I've made changes to the packaging script, I have not
tested them, and I don't intend to, for I don't plan to ever create
another BOSH release, and that breaks my 💔.
`DEVELOPER.md` had the wrong tests (mostly missing newlines); that's
been fixed. Also, I added a new test for DNS records which contain
`_acme-challenge.`, which may enable users to generate wildcard certs
for their sslip.io domains.
Prior behavior was that the same trinity of NS records was returned for
every NS query:
- ns-aws.nono.io.
- ns-azure.nono.io.
- ns-gce.nono.io.
This commit introduces a change in that behavior: IF the NS query includes
the string `_acme-challenge.` AND the query has an embedded IP address
THEN the NS record returned is the query with the `_acme-challenge.`
stripped.
For example:
```
dig +short ns _acme-challenge.104.155.144.4.sslip.io
```
Would return:
```
104.155.144.4.sslip.io.
```
This is an attempt to enable
[DNS-01](https://letsencrypt.org/docs/challenge-types/#dns-01-challenge)
challenge for wildcard certs from Let's Encrypt or other CAs
(Certificate Authorities).
Note that the embedded IP address would need to be routable (NOT 10.x
172.16-31.x, or 192.168.x).
Note that you would also need to run a DNS server such as
[acme-dns](https://github.com/joohoi/acme-dns) at that address.
Thanks @normanr !
[#6]
I try to use random domain names (`randomDomain`) in my tests wherever
possible, rather than `"example.com."` or such.
When domains have custom records, I use the variable name
`customizedDomain`.
I've bumped up the number of IPv6 fuzz tests 1,000 → 10,000 because the
tests are so cheap (quick).
`NXResources()` now takes an argument. I don't use it yet, but I plan
to.
This underscores a flaw in my code; log messages are difficult to
test, so I test them minimally.
I inlined the constant `hostmaster` ("brian.cunnie@gmail.com"). It was
was only used in one place, and we shave nanoseconds by not converting
it do a `dnsmessage.Name` every time we return an SOA record.
fixes
</var/vcap/sys/log/sslip.io-dns-server/sslip.io-dns-server.stderr.log>
(IP changed for anonymity):
```
2020/12/20 03:15:09 54.186.222.15.60886 TypeMX A8aB69E3e4D8.55.74.79.60.sslip.io. ? 0 A8aB69E3e4D8.55.74.79.60.sslip.io.^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
```
The DNS server now returns CNAME records. We have updated
`Customizations` to include `sslip.io`'s DKIM signing records.
We have renamed `MXResource()` → `MXResources()` (plural). Although
there can only be one CNAME, there can be multiple MX records. We have
also moved `MXResources` upwards to maintain a semblance of alphabetical
ordering.
We no longer have a trail of zeros at the end of our `[255]byte` arrays.
We let Golang populate the zeros for us. We also updated our Golang
Playground link to new code that doesn't produce the unnecessary zeros.
Rather than hard-coding domain names in our test, we try to use
`random8ByteString()` wherever possible.
Previously the DNS server only returned the first A record of a
customized domain; now it will return all the A records.
Documented in unit tests what happens when there are multiple matches
(spoiler: it only returns one IP address, not several):
- If there's a mix of dashed and dotted IP addresses, it returns the
dashed. (`127.0.0.1.192-168-0-1.sslip.io` → `192.168.0.1`)
- If there are more than one matched IP address, it returns the leftmost
match. (`127-0-0-1.192-168-0-1.sslip.io` → `127.0.0.1`)
Rather than using Docker Hub's automated build feature (which doesn't
seem to work when setting up new repositories), I've opted to manually
build & push the images.
There are workarounds which might allow me to use GitHub's automated
build feature, like creating an organization, moving the repos to the
new organization, and creating a 'bot' user to publish the images, but
that seems like a lot of work for little gain.
fixes:
> Fetch source repositories failed.
> Connect a GitHub account to cunnie to enable automated builds. If it is already connected, please re-link the source provider.
We use the Alpine image; it's a lean 5.6 MB, and our 3 MB server keeps
it lean at below 9 MB.
Though we include instructions to build the Dockerfile, we plan to use
Docker Hub's automated builds feature.
When we released our new Golang-based DNS server, we had a banner that
said to let us know if anything breaks, but we neglected to tell them
_how_ to let us know. Now we include a link that opens a GitHub issue.
Previously we were returning one TXT record with multiple strings for
_sslip.io_. That did not work for ProtonMail's domain verification.
It seems a convention that each TXT record has one string. _google.com_,
for example, has a separate TXT record for each string.
It turns out I had misunderstood the
[StackExchange](https://serverfault.com/questions/815841/multiple-txt-fields-for-same-subdomain)
thread.
fixes (from ProtonMail domain verification):
> Verification did not succeed, please try again in an hour.
In order to restore email service for the sslip.io domain, we need to
return custom TXT records.
The custom records are in the `xip.Customizations` variable. This lays
the groundwork for ACMEv2 wildcard DNS, which, IIRC, works via TXT
records.
Drive-by: removed an unused constant, `MxHost`. That information is
either in the `Customization` struct or generated on-the-fly.
fixes:
> Dear valued customer, We have disabled your domain sslip.io and all of its addresses. No emails will be received or sent for it.
[#6]
- 🐞 fix IPv6 resolution:
2601-41d0-2-e01e--56dB-3598.sSLIP.io. → 2601:41d0:2:e01e::56db (wrong)
→ 2601:41d0:2:e01e::56db:3598 (right)
- 🐞 fix IPv4 resolution:
minio-01.192-168-1-100.sslip.io → 1.192.168.1 (wrong)
→ 192.168.1.100 (right)
- MX records are customized
- sslip.io's records point to protonmail
- everyone else's point to themselves (whatever FQDN they queried)
- License switched to Apache because GNU is too burdensome
(trust me, I've been on the receiving end)
- include notes for myself to create BOSH releases
(DEVELOPER.md)
To avoid being caught with our pants down & having certain IPv6
addresses not resolve correctly, we introduce fuzz testing to catch any
errors. Each run tests 1k IPv6 addresses.
We haven't found any errors yet.
IPv6 resolution was truncated if there was more than one section after
the double-dash (`--`):
2601-41d0-2-e01e--56dB-3598.sSLIP.io. → 2601:41d0:2:e01e::56db (wrong)
→ 2601:41d0:2:e01e::56db:3598 (right)
The fix was to use `regexp.Longest()`
`git diff` makes it appear that I modified the IPv6 RE. I didn't. This
is merely a whitespace change caused by having forgotten to run `gofmt`
before committing the previous commit.
fixes (from the logs):
```
TypeAAAA 2601-41d0-2-e01e--56dB-3598.sSLIP.io. ? 2601:41d0:2:e01e::56db
```
Long-ago behavior (PowerDNS):
minio-01.192-168-1-100.sslip.io → 192.168.1.100
More-recent behavior (Golang):
minio-01.192-168-1-100.sslip.io → 1.192.168.1
This behavior is counter-intuitive & wrong. We now restore the long-ago
behavior by being much more strict--no more mixing of dots and dashes!
Thanks @pandaxin!
[fixes#9]
...and not the deprecated PowerDNS pipe backend shell script, which we
no longer use.
README now has the badge for the unit tests, and the placeholder is
gone.
fixes:
```
resources.6h: '6h' is not a valid identifier: must start with a lowercase letter^
```