Replace ambiguous metric, "Answered Queries"

I've always been uncomfortable with the metric "Answered Queries" — it
implies that we don't answer all the queries. But we do answer all the
queries!

What the metric meant is "the number of DNS responses that we send that
have one or more records in the ANSWER section".

The new metric is "Answer ≥ 1". Not great, but better than before.
This commit is contained in:
Brian Cunnie
2024-11-19 09:25:10 -08:00
parent 3ed466bc74
commit 0fc3c81641
3 changed files with 25 additions and 26 deletions

View File

@@ -205,12 +205,13 @@ func getMetrics(port int) (m xip.Metrics) {
Expect(err).ToNot(HaveOccurred()) Expect(err).ToNot(HaveOccurred())
var uptime int var uptime int
var junk string var junk string
var greaterThanOrEqualsUnicode string
_, err = fmt.Sscanf(string(stdout), _, err = fmt.Sscanf(string(stdout),
"\"Uptime: %d\"\n"+ "\"Uptime: %d\"\n"+
"\"Blocklist: %s %s %s\n"+ "\"Blocklist: %s %s %s\n"+
"\"Queries: %d (%s\n"+ // %s "swallows" the `/s"` at the end "\"Queries: %d (%s\n"+ // %s "swallows" the `/s"` at the end
"\"TCP/UDP: %d/%d\"\n"+ "\"TCP/UDP: %d/%d\"\n"+
"\"Answered Queries: %d (%s\n"+ // %s "swallows" the `/s"` at the end "\"Answer %s 1: %d (%s\n"+ // %s "swallows" the `/s"` at the end
"\"A: %d\"\n"+ "\"A: %d\"\n"+
"\"AAAA: %d\"\n"+ "\"AAAA: %d\"\n"+
"\"TXT Source: %d\"\n"+ "\"TXT Source: %d\"\n"+
@@ -222,7 +223,7 @@ func getMetrics(port int) (m xip.Metrics) {
&junk, &junk, &junk, &junk, &junk, &junk,
&m.Queries, &junk, &m.Queries, &junk,
&m.TCPQueries, &m.UDPQueries, &m.TCPQueries, &m.UDPQueries,
&m.AnsweredQueries, &junk, &greaterThanOrEqualsUnicode, &m.AnsweredQueries, &junk,
&m.AnsweredAQueries, &m.AnsweredAQueries,
&m.AnsweredAAAAQueries, &m.AnsweredAAAAQueries,
&m.AnsweredTXTSrcIPQueries, &m.AnsweredTXTSrcIPQueries,

View File

@@ -125,8 +125,8 @@ src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> <![endif]-->
accomplish this, set the following three DNS servers as NS records for the subdomain “xip.example.com”</p> accomplish this, set the following three DNS servers as NS records for the subdomain “xip.example.com”</p>
<div class="alert alert-danger" role="alert"> <div class="alert alert-danger" role="alert">
<b>2024-11-16</b> <code>ns-aws.sslip.io</code> and <code>ns-azure.sslip.io</code> are deprecated. Please update <b>2024-11-16</b> <code>ns-aws.sslip.io</code> and <code>ns-azure.sslip.io</code> are deprecated. Please update
your nameservers to the nameservers your nameservers to the nameservers below. <code>ns-aws</code> and <code>ns-azure</code>will be shut down on
below. <code>ns-aws</code> and <code>ns-azure</code>will be shut down on <b>2024-12-25</b>.<br> <b>2024-12-25</b>.<br>
<br> <br>
In October 2024, AWS charged me $113.88 for bandwidth for 1,265.3 GB at $0.09 / GB, and I am loath to spend In October 2024, AWS charged me $113.88 for bandwidth for 1,265.3 GB at $0.09 / GB, and I am loath to spend
$1,366 on yearly bandwidth when other vendors, such as OVH and Hetzner, are much more reasonable. $1,366 on yearly bandwidth when other vendors, such as OVH and Hetzner, are much more reasonable.
@@ -147,13 +147,13 @@ src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> <![endif]-->
</tr> </tr>
<tr class="even"> <tr class="even">
<td><code>ns-hetzner.sslip.io.</code></td> <td><code>ns-hetzner.sslip.io.</code></td>
<td>5.78.115.44<br /> <td>5.78.115.44<br>
2a01:4ff:1f0:c920::</td> 2a01:4ff:1f0:c920::</td>
<td>USA</td> <td>USA</td>
</tr> </tr>
<tr class="odd"> <tr class="odd">
<td><code>ns-ovh.sslip.io.</code></td> <td><code>ns-ovh.sslip.io.</code></td>
<td>51.75.53.19<br /> <td>51.75.53.19<br>
2001:41d0:602:2313::1</td> 2001:41d0:602:2313::1</td>
<td>Poland</td> <td>Poland</td>
</tr> </tr>
@@ -267,7 +267,7 @@ dig @ns-azure.sslip.io metrics.status.sslip.io txt +short
"Blocklist: 2023-10-04 07:37:50-07 3,6" "Blocklist: 2023-10-04 07:37:50-07 3,6"
"Queries: 14295231 (86.3/s)" "Queries: 14295231 (86.3/s)"
"TCP/UDP: 5231/14290000" "TCP/UDP: 5231/14290000"
"Answered Queries: 4872793 (29.4/s)" "Answer ≥ 1: 4872793 (29.4/s)"
"A: 4025711" "A: 4025711"
"AAAA: 247215" "AAAA: 247215"
"TXT Source: 57" "TXT Source: 57"
@@ -294,24 +294,23 @@ dig @ns-azure.sslip.io metrics.status.sslip.io txt +short
<dt>TCP/UDP</dt> <dt>TCP/UDP</dt>
<dd>This is the number of queries received on the TCP protocol versus the UDP protocol. The sum should equal <dd>This is the number of queries received on the TCP protocol versus the UDP protocol. The sum should equal
the number of queries. DNS typically uses the UDP protocol</dd> the number of queries. DNS typically uses the UDP protocol</dd>
<dt>Answered Queries</dt> <dt>Answer ≥ 1</dt>
<dd>This consists of two numbers: the first is the number of queries we responded to with at least one record <dd>This consists of two numbers: the first is the number of queries we responded to with at least one record
in the answer section, and the second is the first number divided by the uptime (i.e. queries/second). Note in the answer section, and the second is the first number divided by the uptime (i.e. queries/second). Note
that the number of answered queries is typically a third or fourth the size of the overall queries. This is that the number of responses with an answer record is typically a fourth the size of the overall responses.
normal. One reason for this disparity is that often both the IPv4 (A) and IPv6 (AAAA) records will be checked, This is normal. One reason for this disparity is that often both the IPv4 (A) and IPv6 (AAAA) records will be
but only one reply will have a record in the answer section . For example, browsing to "127.0.0.1.sslip.io" checked, but only one reply will have a record in the answer section . For example, browsing to
generates two lookups, one with an answer (IPv4), and one without (IPv6). Another reason is that lookups "127.0.0.1.sslip.io" generates two lookups, one with an answer (IPv4), and one without (IPv6). Another reason
follow is that lookups follow a chain, e.g. looking up "127.0.0.1.sslip.io" may generate up to four queries for A
a chain, e.g. looking up "127.0.0.1.sslip.io" may generate up to four queries for A records ("1.sslip.io", records ("1.sslip.io", "0.1.sslip.io", "0.0.1.sslip.io" and "127.0.0.1.sslip.io"), only the last of which
"0.1.sslip.io", "0.0.1.sslip.io" and "127.0.0.1.sslip.io"), only the last of which returns a record in the returns a record in the answer section. Pro-tip: if you want to shave milliseconds off name resolution, use
answer section. Pro-tip: if you want to shave milliseconds off name resolution, use dashes not dots in your dashes not dots in your hostname (e.g. "10-9-9-30.sslip.io" instead of "10.9.9.30.sslip.io")</dd>
hostname (e.g. "10-9-9-30.sslip.io" instead of "10.9.9.30.sslip.io")</dd>
<dt>A</dt> <dt>A</dt>
<dd>The number of responses which included an A (IPv4) record since starting operation (e.g. "dig <dd>The number of responses which included an A (IPv4) record in the answer section since starting operation
127.0.0.1.sslip.io")</dd> (e.g. "dig 127.0.0.1.sslip.io")</dd>
<dt>AAAA</dt> <dt>AAAA</dt>
<dd>The number of responses which included an AAAA (IPv6) record since starting operation (e.g. "dig <dd>The number of responses which included an AAAA (IPv6) record in the answer section since starting operation
--1.sslip.io aaaa")</dd> (e.g. "dig --1.sslip.io aaaa")</dd>
<dt>TXT Source</dt> <dt>TXT Source</dt>
<dd>The number of responses which included a TXT record of the querier's IP address since starting operation <dd>The number of responses which included a TXT record of the querier's IP address since starting operation
(e.g. "dig @ns.sslip.io ip.sslip.io txt")</dd> (e.g. "dig @ns.sslip.io ip.sslip.io txt")</dd>
@@ -339,8 +338,8 @@ dig @ns-azure.sslip.io metrics.status.sslip.io txt +short
<a href="http://nip.io">nip.io</a>: similar to xip.io, but the PowerDNS backend is written in elegant Python <a href="http://nip.io">nip.io</a>: similar to xip.io, but the PowerDNS backend is written in elegant Python
</li> </li>
<li> <li>
<a href="https://letsencrypt.org/">Let's Encrypt</a>: A Certificate Authority providing TLS certificates; they <a href="https://letsencrypt.org/">Let's Encrypt</a>: A Certificate Authority providing TLS certificates;
have never failed to increase our rate limits when asked. If you can, <a they have never failed to increase our rate limits when asked. If you can, <a
href="https://www.abetterinternet.org/donate/">donate</a>. href="https://www.abetterinternet.org/donate/">donate</a>.
</li> </li>
</ul> </ul>
@@ -349,8 +348,7 @@ dig @ns-azure.sslip.io metrics.status.sslip.io txt +short
<p><a id="status"><sup>[Status]</sup></a> A status of “build failing” rarely means the system is failing. Its <p><a id="status"><sup>[Status]</sup></a> A status of “build failing” rarely means the system is failing. Its
more often an indication that when the servers were last checked (currently every six hours), the CI (continuous more often an indication that when the servers were last checked (currently every six hours), the CI (continuous
integration) <a href="https://ci.nono.io/teams/main/pipelines/sslip.io">server</a> had difficulty reaching one integration) <a href="https://ci.nono.io/teams/main/pipelines/sslip.io">server</a> had difficulty reaching one
of of the three sslip.io name servers. Thats normal. <sup><a href="#timeout" class="alert-link">[connection timed
the three sslip.io name servers. Thats normal. <sup><a href="#timeout" class="alert-link">[connection timed
out]</a></sup></p> out]</a></sup></p>
<p><a id="timeout"><sup>[connection timed out]</sup></a></p> <p><a id="timeout"><sup>[connection timed out]</sup></a></p>
<p>DNS runs over <a href="https://en.wikipedia.org/wiki/User_Datagram_Protocol">UDP</a> which has no guaranteed <p>DNS runs over <a href="https://en.wikipedia.org/wiki/User_Datagram_Protocol">UDP</a> which has no guaranteed

View File

@@ -985,7 +985,7 @@ func TXTMetrics(x *Xip, _ net.IP) (txtResources []dnsmessage.TXTResource, err er
len(x.BlocklistCIDRs))) len(x.BlocklistCIDRs)))
metrics = append(metrics, fmt.Sprintf("Queries: %d (%.1f/s)", x.Metrics.Queries, float64(x.Metrics.Queries)/uptime.Seconds())) metrics = append(metrics, fmt.Sprintf("Queries: %d (%.1f/s)", x.Metrics.Queries, float64(x.Metrics.Queries)/uptime.Seconds()))
metrics = append(metrics, fmt.Sprintf("TCP/UDP: %d/%d", x.Metrics.TCPQueries, x.Metrics.UDPQueries)) metrics = append(metrics, fmt.Sprintf("TCP/UDP: %d/%d", x.Metrics.TCPQueries, x.Metrics.UDPQueries))
metrics = append(metrics, fmt.Sprintf("Answered Queries: %d (%.1f/s)", x.Metrics.AnsweredQueries, float64(x.Metrics.AnsweredQueries)/uptime.Seconds())) metrics = append(metrics, fmt.Sprintf("Answer ≥ 1: %d (%.1f/s)", x.Metrics.AnsweredQueries, float64(x.Metrics.AnsweredQueries)/uptime.Seconds()))
metrics = append(metrics, fmt.Sprintf("A: %d", x.Metrics.AnsweredAQueries)) metrics = append(metrics, fmt.Sprintf("A: %d", x.Metrics.AnsweredAQueries))
metrics = append(metrics, fmt.Sprintf("AAAA: %d", x.Metrics.AnsweredAAAAQueries)) metrics = append(metrics, fmt.Sprintf("AAAA: %d", x.Metrics.AnsweredAAAAQueries))
metrics = append(metrics, fmt.Sprintf("TXT Source: %d", x.Metrics.AnsweredTXTSrcIPQueries)) metrics = append(metrics, fmt.Sprintf("TXT Source: %d", x.Metrics.AnsweredTXTSrcIPQueries))