Metrics are served via metrics.status.sslip.io

- The metrics aren't fleshed out. In fact, there's only two so far:
  1. uptime
  2. number of queries
- Even though the metrics aren't complete, I'm checking it in because
  this commit is already much too big.
- I moved the version information to `version.status.sslip.io`;
  previously it was at `version.sslip.io`. I didn't want one endpoint
  for both metrics & version (worry: DNS amplification), and I wanted a
  consistent subdomain to find that information (i.e.
  `status.sslip.io`).
- I'm not worried about atomic updates to the metrics; if a metric is
  off by one, if I skip a count because two lookups are happening at the
  exact same time, I don't care.
- The `Metrics` struct is a pointer within `Xip` because I might have
  several copies of `Xip` (if I'm binding to several interfaces
  individually), but I must only have one copy of `Metrics`
- I only include the metrics I'm interested in, usually because it took
  some work to implement that feature. I don't care about MX records,
  but I care about IPv6 lookups, DNS-01 challenges, public IP lookups.
- got rid of a section of unreachable code at the end of
  `ProcessQuestion()`; I was tired of Goland flagging it. I had it there
  mostly because I was paranoid of falling through a `switch` statement
This commit is contained in:
Brian Cunnie
2022-01-19 06:47:21 -08:00
parent 20e4238037
commit bf4f039001
8 changed files with 156 additions and 27 deletions

View File

@@ -25,7 +25,7 @@ require (
golang.org/x/text v0.3.7 // indirect
golang.org/x/tools v0.1.8 // indirect
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
google.golang.org/genproto v0.0.0-20220114231437-d2e6a121cae0 // indirect
google.golang.org/genproto v0.0.0-20220118154757-00ab72f36ad5 // indirect
google.golang.org/grpc v1.43.0 // indirect
google.golang.org/protobuf v1.27.1 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect

View File

@@ -301,8 +301,8 @@ google.golang.org/genproto v0.0.0-20190819201941-24fa4b261c55/go.mod h1:DMBHOl98
google.golang.org/genproto v0.0.0-20200513103714-09dca8ec2884/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c=
google.golang.org/genproto v0.0.0-20200526211855-cb27e3aa2013/go.mod h1:NbSheEEYHJ7i3ixzK3sjbqSGDJWnxyFXZblF3eUsNvo=
google.golang.org/genproto v0.0.0-20210602131652-f16073e35f0c/go.mod h1:UODoCrxHCcBojKKwX1terBiRUaqAsFqJiF615XL43r0=
google.golang.org/genproto v0.0.0-20220114231437-d2e6a121cae0 h1:aCsSLXylHWFno0r4S3joLpiaWayvqd2Mn4iSvx4WZZc=
google.golang.org/genproto v0.0.0-20220114231437-d2e6a121cae0/go.mod h1:5CzLGKJ67TSI2B9POpiiyGha0AjJvZIUgRMt1dSmuhc=
google.golang.org/genproto v0.0.0-20220118154757-00ab72f36ad5 h1:zzNejm+EgrbLfDZ6lu9Uud2IVvHySPl8vQzf04laR5Q=
google.golang.org/genproto v0.0.0-20220118154757-00ab72f36ad5/go.mod h1:5CzLGKJ67TSI2B9POpiiyGha0AjJvZIUgRMt1dSmuhc=
google.golang.org/grpc v1.19.0/go.mod h1:mqu4LbDTu4XGKhr4mRzUsmM4RtVoemTSY81AxZiDr8c=
google.golang.org/grpc v1.23.0/go.mod h1:Y5yQAOtifL1yxbo5wqy6BxZv8vAUGQwXBOALyacEbxg=
google.golang.org/grpc v1.25.1/go.mod h1:c3i+UQWmh7LiEpx4sFZnkU36qjEYZ0imhYfXVyQciAY=

View File

@@ -0,0 +1,58 @@
package main_test
import (
"fmt"
"os/exec"
"strings"
"time"
"xip/xip"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
. "github.com/onsi/gomega/gexec"
)
var _ = Describe("IntegrationMetrics", func() {
var digCmd *exec.Cmd
var digSession *Session
var digArgs string
When("the server is queried", func() {
It("should update metrics", func() {
startMetrics := getMetrics()
digArgs = "@localhost non-existent.sslip.io +short"
digCmd = exec.Command("dig", strings.Split(digArgs, " ")...)
digSession, err = Start(digCmd, GinkgoWriter, GinkgoWriter)
Expect(err).ToNot(HaveOccurred())
// we want to make sure digSession has exited because we
// want to parse the _full_ contents of stdout
Eventually(digSession, 1).Should(Exit(0))
expectedMetrics := startMetrics
expectedMetrics.Queries += 2 // two queries: nonexistent.sslip.io, metrics.status.sslip.io
actualMetrics := getMetrics()
Expect(expectedMetrics.MostlyEquals(actualMetrics)).To(BeTrue())
})
})
})
func getMetrics() (m xip.Metrics) {
digArgs := "@localhost metrics.status.sslip.io txt +short"
digCmd := exec.Command("dig", strings.Split(digArgs, " ")...)
stdout, err := digCmd.Output()
Expect(err).ToNot(HaveOccurred())
var uptime int
var junk string
_, err = fmt.Sscanf(string(stdout),
"\"uptime (seconds): %d\"\n"+
"\"key-value store: %s\n"+ // %s "swallows" the double-quote at the end
"\"queries: %d\"\n",
&uptime,
&junk,
&m.Queries,
)
Expect(err).ToNot(HaveOccurred())
m.Start = time.Now().Add(-time.Duration(uptime) * time.Second)
//_, err = fmt.Fscanf(digSession.Out, "queries: %d", &m.Queries)
return m
}

View File

@@ -109,10 +109,10 @@ var _ = Describe("sslip.io-dns-server", func() {
"@localhost example.com srv +short",
`\A\z`,
`TypeSRV example.com. \? nil, SOA example.com. briancunnie.gmail.com. 2021123100 900 900 1800 300\n$`),
Entry(`TXT for version.sslip.io is the version number of the xip software (which gets overwritten during linking)`,
"@127.0.0.1 version.sslip.io txt +short",
Entry(`TXT for version.status.sslip.io is the version number of the xip software (which gets overwritten during linking)`,
"@127.0.0.1 version.status.sslip.io txt +short",
`\A"dev"\n"today"\n"xxx"\n\z`,
`TypeTXT version.sslip.io. \? \["dev"\], \["today"\], \["xxx"\]`),
`TypeTXT version.status.sslip.io. \? \["dev"\], \["today"\], \["xxx"\]`),
Entry(`TXT is the querier's IPv4 address and the domain "ip.sslip.io"`,
"@127.0.0.1 ip.sslip.io txt +short",
`127.0.0.1`,

View File

@@ -33,12 +33,14 @@ func main() {
}
// I don't need to `defer etcdCli.Close()` it's redundant in the main routine: when main() exits, everything is closed.
conn, err := net.ListenUDP("udp", &net.UDPAddr{Port: 53})
// set up our global metrics struct, setting our start time
xipMetrics := xip.Metrics{Start: time.Now()}
// common err hierarchy: net.OpError → os.SyscallError → syscall.Errno
switch {
case err == nil:
log.Println(`Successfully bound to all interfaces, port 53.`)
wg.Add(1)
readFrom(conn, etcdCli, &wg)
readFrom(conn, &wg, etcdCli, &xipMetrics)
case isErrorPermissionsError(err):
log.Println("Try invoking me with `sudo` because I don't have permission to bind to port 53.")
log.Fatal(err.Error())
@@ -62,7 +64,7 @@ func main() {
} else {
wg.Add(1)
boundIPsPorts = append(boundIPsPorts, conn.LocalAddr().String())
go readFrom(conn, etcdCli, &wg)
go readFrom(conn, &wg, etcdCli, &xipMetrics)
}
}
if len(boundIPsPorts) > 0 {
@@ -77,7 +79,7 @@ func main() {
wg.Wait()
}
func readFrom(conn *net.UDPConn, etcdCli *clientv3.Client, wg *sync.WaitGroup) {
func readFrom(conn *net.UDPConn, wg *sync.WaitGroup, etcdCli xip.V3client, xipMetrics *xip.Metrics) {
defer wg.Done()
for {
query := make([]byte, 512)
@@ -87,7 +89,8 @@ func readFrom(conn *net.UDPConn, etcdCli *clientv3.Client, wg *sync.WaitGroup) {
continue
}
go func() {
response, logMessage, err := xip.Xip{SrcAddr: addr.IP, Etcd: etcdCli}.QueryResponse(query)
xipServer := xip.Xip{SrcAddr: addr.IP, Etcd: etcdCli, Metrics: xipMetrics}
response, logMessage, err := xipServer.QueryResponse(query)
if err != nil {
log.Println(err.Error())
return

View File

@@ -8,6 +8,7 @@ import (
"errors"
"fmt"
"net"
"reflect"
"regexp"
"strconv"
"strings"
@@ -33,6 +34,18 @@ type V3client interface {
type Xip struct {
SrcAddr net.IP
Etcd V3client
Metrics *Metrics
}
type Metrics struct {
Start time.Time
Queries int
SuccessfulQueries int
SuccessfulAQueries int
SuccessfulAAAAQueries int
SuccessfulTXTSrcIPQueries int
SuccessfulTXTVersionQueries int
SuccessfulTXTDNS01ChallengeQUeries int
}
// DomainCustomization is a value that is returned for a specific query.
@@ -48,7 +61,7 @@ type DomainCustomization struct {
AAAA []dnsmessage.AAAAResource
CNAME dnsmessage.CNAMEResource
MX []dnsmessage.MXResource
TXT func(string) ([]dnsmessage.TXTResource, error)
TXT func(Xip) ([]dnsmessage.TXTResource, error)
// Unlike the other record types, TXT is a function in order to enable more complex behavior
// e.g. IP address of the query's source
}
@@ -115,7 +128,7 @@ var (
MX: mx2,
},
},
TXT: func(_ string) ([]dnsmessage.TXTResource, error) {
TXT: func(_ Xip) ([]dnsmessage.TXTResource, error) {
// Although multiple TXT records with multiple strings are allowed, we're sticking
// with a multiple TXT records with a single string apiece because that's what ProtonMail requires
// and that's what google.com does.
@@ -161,8 +174,8 @@ var (
"ip.sslip.io.": {
TXT: ipSslipIo,
},
"version.sslip.io.": {
TXT: func(_ string) ([]dnsmessage.TXTResource, error) {
"version.status.sslip.io.": {
TXT: func(_ Xip) ([]dnsmessage.TXTResource, error) {
return []dnsmessage.TXTResource{
{TXT: []string{VersionSemantic}}, // e.g. "2.2.1'
{TXT: []string{VersionDate}}, // e.g. "2021/10/03-15:08:54+0100"
@@ -170,6 +183,9 @@ var (
}, nil
},
},
"metrics.status.sslip.io.": {
TXT: metricsSslipIo,
},
}
)
@@ -220,6 +236,7 @@ func (x Xip) QueryResponse(queryBytes []byte) (responseBytes []byte, logMessage
response.Header.ID = queryHeader.ID
response.Header.RecursionDesired = queryHeader.RecursionDesired
x.Metrics.Queries += 1
b := dnsmessage.NewBuilder(nil, response.Header)
b.EnableCompression()
if err = b.StartQuestions(); err != nil {
@@ -538,8 +555,6 @@ func (x Xip) processQuestion(q dnsmessage.Question) (response Response, logMessa
return response, logMessage + "nil, SOA " + soaLogMessage(soaResource), nil
}
}
// The following is flagged as "Unreachable code" in Goland, and that's expected
return response, "", errors.New("unexpectedly fell through x.processQuestion()")
}
// NSResponse sets the Answers/Authorities depending whether we're delegating or authoritative
@@ -720,7 +735,7 @@ func (x Xip) TXTResources(fqdn string) ([]dnsmessage.TXTResource, error) {
if domain, ok := Customizations[strings.ToLower(fqdn)]; ok {
// Customizations[strings.ToLower(fqdn)] returns a _function_,
// we call that function, which has the same return signature as this method
return domain.TXT(x.SrcAddr.String())
return domain.TXT(x)
}
return nil, nil
}
@@ -750,8 +765,28 @@ func SOAResource(name dnsmessage.Name) dnsmessage.SOAResource {
}
// when TXT for "ip.sslip.io" is queried, return the IP address of the querier
func ipSslipIo(sourceIP string) ([]dnsmessage.TXTResource, error) {
return []dnsmessage.TXTResource{{TXT: []string{sourceIP}}}, nil
func ipSslipIo(x Xip) ([]dnsmessage.TXTResource, error) {
return []dnsmessage.TXTResource{{TXT: []string{x.SrcAddr.String()}}}, nil
}
// when TXT for "metrics.sslip.io" is queried, return the cumulative metrics
func metricsSslipIo(x Xip) (txtResources []dnsmessage.TXTResource, err error) {
var metrics []string
uptime := time.Since(x.Metrics.Start)
metrics = append(metrics, fmt.Sprintf("uptime (seconds): %.0f", uptime.Seconds()))
keyValueStore := "etcd"
// comparing interfaces to nil are tricky: interfaces contain both a type
// and a value, and although the value is nil the type isn't, so we need the following
if x.Etcd == nil || reflect.ValueOf(x.Etcd).IsNil() {
keyValueStore = "builtin"
}
metrics = append(metrics, "key-value store: "+keyValueStore)
metrics = append(metrics, fmt.Sprintf("queries: %d", x.Metrics.Queries))
metrics = append(metrics, fmt.Sprintf("queries/second: %.1f", float64(x.Metrics.Queries)/uptime.Seconds()))
for _, metric := range metrics {
txtResources = append(txtResources, dnsmessage.TXTResource{TXT: []string{metric}})
}
return txtResources, nil
}
// when TXT for "k-v.io" is queried, return the key-value pair
@@ -868,3 +903,17 @@ func soaLogMessage(soaResource dnsmessage.SOAResource) string {
strconv.Itoa(int(soaResource.Expire)) + " " +
strconv.Itoa(int(soaResource.MinTTL))
}
// MostlyEquals compares all fields except `Start` (timestamp)
func (a Metrics) MostlyEquals(b Metrics) bool {
if a.Queries == b.Queries &&
a.SuccessfulQueries == b.SuccessfulQueries &&
a.SuccessfulAQueries == b.SuccessfulAQueries &&
a.SuccessfulAAAAQueries == b.SuccessfulAAAAQueries &&
a.SuccessfulTXTSrcIPQueries == b.SuccessfulTXTSrcIPQueries &&
a.SuccessfulTXTVersionQueries == b.SuccessfulTXTVersionQueries &&
a.SuccessfulTXTDNS01ChallengeQUeries == b.SuccessfulTXTDNS01ChallengeQUeries {
return true
}
return false
}

View File

@@ -8,7 +8,7 @@ export OLD_VERSION=2.2.4
export VERSION=2.3.0
cd ~/workspace/sslip.io
git pull -r --autostash
# update the version number for the TXT record for version.sslip.io
# update the version number for the TXT record for version.status.sslip.io
sed -i '' "s/$OLD_VERSION/$VERSION/g" \
bin/make_all \
bosh-release/packages/sslip.io-dns-server/packaging \
@@ -17,7 +17,7 @@ sed -i '' "s/$OLD_VERSION/$VERSION/g" \
sed -i '' "s~/$OLD_VERSION/~/$VERSION/~g" \
k8s/document_root/index.html \
k8s/Dockerfile-sslip.io-dns-server
# update the git hash for the TXT record for version.sslip.io for BOSH release
# update the git hash for the TXT record for version.status.sslip.io for BOSH release
sed -i '' "s/VersionGitHash=[0-9a-fA-F]*/VersionGitHash=$(git rev-parse --short HEAD)/g" \
bosh-release/packages/sslip.io-dns-server/packaging
cd bosh-release/
@@ -49,7 +49,7 @@ dig +short sSlIp.Io
echo 78.46.204.247
dig @$IP txt ip.sslip.io +short | tr -d '"'
curl curlmyip.org; echo
dig @$IP txt version.sslip.io +short | grep $VERSION
dig @$IP txt version.status.sslip.io +short | grep $VERSION
echo "\"$VERSION\""
dig @$IP my-key.kv.sslip.io txt +short # returns nothing
echo " ===" # separator because the results are too similar
@@ -101,7 +101,7 @@ git pull -r
nvim sslip.io.yml
bosh -e vsphere -d sslip.io deploy sslip.io.yml -l <(lpass show --note deployments.yml) --no-redact
dig @ns-azure 127-0-0-1.sslip.io +short # output should be 127.0.0.1
dig @ns-azure.nono.io txt version.sslip.io +short
dig @ns-azure.nono.io txt version.status.sslip.io +short
git add -p
git ci -v -m"Bump sslip.io BOSH release: $OLD_VERSION$VERSION"
git push

View File

@@ -226,7 +226,7 @@ dig @ns.sslip.io txt ip.sslip.io +short -6 # forces IPv6 lookup; sample reply "2
<code>ns-aws.sslip.io</code> requires a mere 592 bytes spread over 2 packets; Querying <a href=
"https://icanhazip.com/">https://icanhazip.com/</a> requires 8692 bytes spread out over 34 packets—over 14 times
as much! Admittedly bandwidth usage is a bigger concern for the one hosting the service than the one using the
service.</p>
service.</p><!--
<h4 id="key-value-store"><code>k-v.io</code>: (key-value) read/write/delete TXTs</h4>
<p>We enable special behavior under the <code>k-v.io</code> domain: it can be treated as a key-value store, the
subdomain being the key, and the TXT record being the value.</p>
@@ -260,10 +260,11 @@ dig @ns.sslip.io txt ip.sslip.io +short -6 # forces IPv6 lookup; sample reply "2
<li>Values are not consistent. If a value is set in <code>ns-aws.sslip.io</code>, it does not propagate to
<code>ns-gce.sslip.io</code> nor <code>ns-azure.sslip.io</code>.</li>
</ul>
-->
<h4 id="version">Determining The Server Version of Software</h4>You can determine the server version of the
sslip.io software by querying the TXT record of <code>version.sslip.io</code>:
sslip.io software by querying the TXT record of <code>version.status.sslip.io</code>:
<pre>
dig @ns-aws.sslip.io txt version.sslip.io +short
dig @ns-aws.sslip.io txt version.status.sslip.io +short
"2.2.3"
"2021/11/27-11:35:50-0800"
"074f0a8"
@@ -271,10 +272,28 @@ dig @ns-aws.sslip.io txt version.sslip.io +short
<p>The first number, ("2.2.3"), is the version of the sslip.io DNS software, and is most relevant. The other two
numbers are the date compiled and the most recent git hash, but those values can differ across servers due to the
manner in which the software is deployed.</p>
<h4 id="metrics">Server Metrics</h4>You can retrieve metrics from a given server by querying the TXT records of
<code>metrics.status.sslip.io</code>
<pre>
dig @ns-azure.sslip.io txt version.status.sslip.io +short
"uptime (seconds): 1200"
"key-value store: builtin"
"queries: 46202"
"queries/second: 38.5"
"successful:"
"- queries/second: 14.5"
"- A: 2000"
"- AAAA: 20"
"- IP TXT: 2"
"- version TXT: 2"
"- DNS-01 challenge: 2"
</pre>
<h3 id="related">Related Services</h3>
<ul>
<li>
<a href="http://xip.io/">xip.io</a>: the inspiration for sslip.io
<a href="http://xip.io/">xip.io</a>: the inspiration for sslip.io. Sadly, this appears to be no longer
maintained after <a href="https://twitter.com/sstephenson/status/1388146129284603906">Sam Stephenson left
Basecamp</a>.
</li>
<li>
<a href="http://nip.io">nip.io</a>: similar to xip.io, but the PowerDNS backend is written in elegant Python