This rate limits new connections to prevent DoS attacks.
For effectively rate limiting QUIC connections, we now gate QUIC connection attempts before the handshake, so that we don't spend compute on handshakes for connections that will eventually be cancelled.
We can only set a single ConnContext per quic-go Transport, as there's only 1 listener per quic-go Transport. So we cannot set a different ConnContext for listeners on the same address.
As we're now gating QUIC connections before the handshake, we use source address verification to ensure that spoofed IPs cannot DoS new connections from a particular IP. This is done by ensuring that some of the connection attempts always verify the source address. We get DoS protection at the expense of increased latency of source address verification.
This introduces addrsReachabilityTracker that tracks reachability on
a set of addresses. It probes reachability for addresses periodically
and has an exponential backoff in case there are too many errors
or we don't have any valid autonatv2 peer.
There's no smartness in the address selection logic currently. We just
test all provided addresses. It also doesn't use the addresses provided
by `AddrsFactory`, so currently there's no way to get a user provided
address tested for reachability, something that would be a problem for
dns addresses. I intend to introduce an alternative to
`AddrsFactory`, something like, `AnnounceAddrs(addrs []ma.Multiaddr)`
that's just appended to the set of addresses that we have, and check
reachability for those addresses.
There's only one method exposed in the BasicHost right now that's
`ReachableAddrs() []ma.Multiadd`r that returns the host's reachable
addrs. Users can also use the event `EvtHostReachableAddrsChanged`
to be notified when any addrs reachability changes.
This introduces a new GatedMaListener type which gates conns
accepted from a manet.Listener with a gater and creates the rcmgr
scope for it. Explicitly passing the scope allows for many guardrails
that the previous interface assertion didn't.
This breaks the previous responsibility of the upgradeListener method
into two, one gating the connection initially, and the other upgrading
the connection with a security and muxer selection.
This split makes it easy to gate the connection with the resource
manager as early as possible. This is especially true for websocket
because we want to gate the connection just after the TCP connection is
established, and not after the tls handshake + websocket upgrade is
completed.
Allows the same socket to be shared amongst TCP,WS,WSS transports.
---------
Co-authored-by: sukun <sukunrt@gmail.com>
Co-authored-by: Marco Munizaga <git@marcopolo.io>
* autonat: fix interaction with autorelay
* Fix race in test
* Use deadline from context if available for DialBack
* Return hasNewAddrs correctly
* nit: cleanup contains check
* Shuffle peers
* nits
* Change comment to indicate the bug
* holepuncher: pass address function in constructor (#2979)
* holepunch: pass address function in constructor
* nit
* Remove getPublicAddrs
---------
Co-authored-by: Marco Munizaga <git@marcopolo.io>
* Make a copy of the multiaddr slice in Addrs()
---------
Co-authored-by: Marco Munizaga <git@marcopolo.io>
* Remove unused resolver in basic host
* Refactor Swarm.resolveAddrs
Refactors how DNS Address resolution works.
* lint
* Move MultiaddrDNSResolver interface to core
* Reserve output space for addresses left to resolve
* feat: core/transport: Add SkipResolver interface (#2989)
* Rebase on top of resolveAddrs refactor
* Add comments
* Sanitize address inputs when returning a reservation message (#3006)
Using the `BasicHost` constructor transfers the ownership of the swarm.
This is similar to how using `libp2p.New` transfers the ownership of
user provided config options like `ResourceManager`, all of which are
closed on `host.Close`
* config: refactor AutoNAT construction into separate method
* config: use a lifecycle hook to start listening on swarm addresses
* use Fx to construct the host
* add a test for constructing a routed host
* use Fx hooks to start the host
* config: use Fx lifecycle hooks to start AutoRelay and for PeerRouting
* basichost: don't close the swarm
The swarm is not constructed by the basic host, thus is shouldn't be
closed by it.
* config: use Fx hook to close the quicreuse connection manager
* test for goroutine leaks when starting/stopping fx
To do this, I've had to move a few leaky tests into a separate package.
I've filed a bug for the AutoNAT issue (#2743) but the "error on
startup" issue is going to require some pretty invasive changes (we need
to construct _then_ start).
* go fmt
* Ignore one more top function
* Typo
* Ignore any not top
---------
Co-authored-by: Sukun <sukunrt@gmail.com>
Co-authored-by: Steven Allen <steven@stebalien.com>
Co-authored-by: Marco Munizaga <git@marcopolo.io>
* pass an event bus to the swarm constructor
* make the eventbus parameter a required swarm constructor parameter
* emit Connectedness notifications from the swarm
* remove peer connectedness watchers from hosts
* swarm: emit connectedness events when holding the mutex
* Refactor relay_finder and start autorelay after identify
* Clock fork
* Remove multiple timers and use a single rate limiting chan for findNodes
* Remove clock fork
* Rename
* Use scheduledWork.nextAllowedCallToPeerSource.Add(rf.conf.minInterval)
* Fix flaky test that relied on time
* add autonat metrics
* add benchmarks
* use increase instead of sum by with rate in dashboard
* add interface assertion
* add no alloc test
* update dashboard
* autonat: minor dashboard tweaks
---------
Co-authored-by: Marten Seemann <martenseemann@gmail.com>
* swarm: add very basic metrics for opening and closing connections
* swarm: use a sync.Pool to make metrics collection allocation-free
* swarm: introduce a MetricsTracer interface
* swarm: add the transport to the dial error metric
* swarm: add Grafana dashboard
* swarm: use the prometheus namespace option
* quic: add an integration test for QUIC version support
* quic: refactor the stateless reset test
* quic: simplify the interface of the noreuseConn
DecreaseCount now closes the underlying UDP conn, so that callers don't
need to pay attention if they're dealing with a reuseConn or a
noreuseConn.
* implement a quicreuse to manage QUIC connections
* quicreuse: introduce options
* config: construct the quicreuse.ConnManager using fx
* webtransport: use the quicreuse
* add integration test for QUIC and WebTranport sharing the same UDP addr
* Handle errors in accept loop goroutine
* Add comment
* Remove todo
* Rename mutexes
* Cleanup extra close
* Only log on err
* Use webtransport-go 0.4.0
* Fix expected error
Co-authored-by: Marco Munizaga <git@marcopolo.io>