contrib/libs/c-ares/FEATURES.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260

# Features

- [Dynamic Server Timeout Calculation](#dynamic-server-timeout-calculation)
- [Failed Server Isolation](#failed-server-isolation)
- [Query Cache](#query-cache)
- [DNS 0x20 Query Name Case Randomization](#dns-0x20-query-name-case-randomization)
- [DNS Cookies](#dns-cookies)
- [TCP FastOpen (0-RTT)](#tcp-fastopen-0-rtt)
- [Event Thread](#event-thread)
- [System Configuration Change Monitoring](#system-configuration-change-monitoring)


## Dynamic Server Timeout Calculation

Metrics are stored for every server in time series buckets for both the current
time span and prior time span in 1 minute, 15 minute, 1 hour, and 1 day
intervals, plus a single since-inception bucket (of the server in the c-ares
channel).

These metrics are then used to calculate the average latency for queries on
each server, which automatically adjusts to network conditions.  This average
is then multiplied by 5 to come up with a timeout to use for the query before
re-queuing it.  If there is not sufficient data yet to calculate a timeout
(need at least 3 prior queries), then the default of 2000ms is used (or an 
administrator-set `ARES_OPT_TIMEOUTMS`).

The timeout is then adjusted to a minimum bound of 250ms which is the
approximate RTT of network traffic half-way around the world, to account for the
upstream server needing to recurse to a DNS server far away.  It is also
bounded on the upper end to 5000ms (or an administrator-set 
`ARES_OPT_MAXTIMEOUTMS`).

If a server does not reply within the given calculated timeout, the next time
the query is re-queued to the same server, the timeout will approximately
double thus leading to adjustments in timeouts automatically when a successful
reply is recorded.

In order to calculate the optimal timeout, it is highly recommended to ensure
`ARES_OPT_QUERY_CACHE` is enabled with a non-zero `qcache_max_ttl` (which it
is enabled by default with a 3600s default max ttl).  The goal is to record
the recursion time as part of query latency as the upstream server will also
cache results.

This feature requires the c-ares channel to persist for the lifetime of the
application.


## Failed Server Isolation

Each server is tracked for failures relating to consecutive connectivity issues
or unrecoverable response codes.  Servers are sorted in priority order based
on this metric.  Downed servers will be brought back online either when the
current highest priority server has failed, or has been determined to be online
when a query is randomly selected to probe a downed server.

By default a downed server won't be retried for 5 seconds, and queries will
have a 10% chance of being chosen after this timeframe to test a downed server.
When a downed server is selected to be probed, the query will be duplicated
and sent to the downed server independent of the original query itself.  This
means that probing a downed server will always use an intended legitimate
query, but not have a negative impact of a delayed response in case that server
is still down.

Administrators may customize these settings via `ARES_OPT_SERVER_FAILOVER`.

Additionally, when using `ARES_OPT_ROTATE` or a system configuration option of
`rotate`, c-ares will randomly select a server from the list of highest priority
servers based on failures.  Any servers in any lower priority bracket will be
omitted from the random selection.

This feature requires the c-ares channel to persist for the lifetime of the
application.


## Query Cache

Every successful query response, as well as `NXDOMAIN` responses containing
an `SOA` record are cached using the `TTL` returned or the SOA Minimum as
appropriate.  This timeout is bounded by the `ARES_OPT_QUERY_CACHE`
`qcache_max_ttl`, which defaults to 1hr.

The query is cached at the lowest possible layer, meaning a call into
`ares_search_dnsrec()` or `ares_getaddrinfo()` may spawn multiple queries
in order to complete its lookup, each individual backend query result will
be cached.

Any server list change will automatically invalidate the cache in order to
purge any possible stale data.  For example, if `NXDOMAIN` is cached but system
configuration has changed due to a VPN connection, the same query might now
result in a valid response.

This feature is not expected to cause any issues that wouldn't already be
present due to the upstream DNS server having substantially similar caching
already.  However if desired it can be disabled by setting `qcache_max_ttl` to
`0`.

This feature requires the c-ares channel to persist for the lifetime of the
application.


## DNS 0x20 Query Name Case Randomization

DNS 0x20 is the name of the feature which automatically randomizes the case
of the characters in a UDP query as defined in
[draft-vixie-dnsext-dns0x20-00](https://datatracker.ietf.org/doc/html/draft-vixie-dnsext-dns0x20-00).

For example, if name resolution is performed for `www.example.com`, the actual
query sent to the upstream name server may be `Www.eXaMPlE.cOM`.

The reason to randomize case characters is to provide additional entropy in the
query to be able to detect off-path cache poisoning attacks for UDP.  This is
not used for TCP connections which are not known to be vulnerable to such
attacks due to their stateful nature.

Much research has been performed by
[Google](https://groups.google.com/g/public-dns-discuss/c/KxIDPOydA5M)
on case randomization and in general have found it to be effective and widely
supported.

This feature is disabled by default and can be enabled via `ARES_FLAG_DNS0x20`.
There are some instances where servers do not properly facilitate this feature
and unlike in a recursive resolver where it may be possible to determine an
authoritative server is incapable, its much harder to come to any reliable
conclusion as a stub resolver as to where in the path the issue resides.  Due to
the recent wide deployment of DNS 0x20 in large public DNS servers, it is
expected compatibility will improve rapidly where this feature, in time, may be
able to be enabled by default.

Another feature which can be used to prevent off-path cache poisoning attacks
is [DNS Cookies](#dns-cookies).


## DNS Cookies

DNS Cookies are are a method of learned mutual authentication between a server
and a client as defined in
[RFC7873](https://datatracker.ietf.org/doc/html/rfc7873)
and [RFC9018](https://datatracker.ietf.org/doc/html/rfc9018).

This mutual authentication ensures clients are protected from off-path cache
poisoning attacks, and protects servers from being used as DNS amplification
attack sources.  Many servers will disable query throttling limits when DNS
Cookies are in use. It only applies to UDP connections.

Since DNS Cookies are optional and learned dynamically, this is an always-on
feature and will automatically adjust based on the upstream server state.  The
only potential issue is if a server has once supported DNS Cookies then stops
supporting them, it must clear a regression timeout of 2 minutes before it can
accept responses without cookies.  Such a scenario would be exceedingly rare.

Interestingly, the large public recursive DNS servers such as provided by
[Google](https://developers.google.com/speed/public-dns/docs/using),
[CloudFlare](https://one.one.one.one/), and
[OpenDNS](https://opendns.com) do not have this feature enabled.  That said,
most DNS products like [BIND](https://www.isc.org/bind/) enable DNS Cookies
by default.

This feature requires the c-ares channel to persist for the lifetime of the
application.


## TCP FastOpen (0-RTT)

TCP Fast Open is defined in [RFC7413](https://datatracker.ietf.org/doc/html/rfc7413)
and enables data to be sent with the TCP SYN packet when establishing the
connection, thus rivaling the performance of UDP.  A previous connection must
have already have been established in order to obtain the client cookie to
allow the server to trust the data sent in the first packet and know it was not
an off-path attack.

TCP FastOpen can only be used with idempotent requests since in timeout
conditions the SYN packet with data may be re-sent which may cause the server
to process the packet more than once.  Luckily DNS requests are idempotent by
nature.

TCP FastOpen is supported on Linux, MacOS, and FreeBSD. Most other systems do
not support this feature, or like on Windows require use of completion
notifications to use it whereas c-ares relies on readiness notifications.

Supported systems also need to be configured appropriately on both the client
and server systems.

### Linux TFO
In linux a single sysctl value is used with flags to set the desired fastopen
behavior.

It is recommended to make any changes permanent by creating a file in
`/etc/sysctl.d/` with the appropriate key and value.  Legacy Linux systems
might need to update `/etc/sysctl.conf` directly.  After modifying the
configuration, it can be loaded via `sysctl -p`.

`net.ipv4.tcp_fastopen`:
   - `1` = client only (typically default)
   - `2` = server only
   - `3` = client and server

### MacOS TFO
In MacOS, TCP FastOpen is enabled by default for clients and servers.  You can
verify via the `net.inet.tcp.fastopen` sysctl.

If any change is needed, you should make it persistent as per this guidance:
[Persistent Sysctl Settings](https://discussions.apple.com/thread/253840320?)

`net.inet.tcp.fastopen`
   - `1` = client only
   - `2` = server only
   - `3` = client and server (typically default)

### FreeBSD TFO
In FreeBSD, server mode TCP FastOpen is typically enabled by default but
client mode is disabled.  It is recommended to edit `/etc/sysctl.conf` and
place in the values you wish to persist to enable or disable TCP Fast Open.
Once the file is modified, it can be loaded via `sysctl -f /etc/sysctl.conf`.

- `net.inet.tcp.fastopen.server_enable` (boolean) - enable/disable server
- `net.inet.tcp.fastopen.client_enable` (boolean) - enable/disable client


## Event Thread

Historic c-ares integrations required integrators to have their own event loop
which would be required to notify c-ares of read and write events for each
socket.  It was also required to notify c-ares at the appropriate timeout if
no events had occurred.  This could be difficult to do correctly and could
lead to stalls or other issues.

The Event Thread is currently supported on all systems except DOS which does
not natively support threading (however it could in theory be possible to
enable with something like [FSUpthreads](https://arcb.csc.ncsu.edu/~mueller/pthreads/)).

c-ares is built by default with threading support enabled, however it may
disabled at compile time.  The event thread must also be specifically enabled
via `ARES_OPT_EVENT_THREAD`.

Using the Event Thread feature also facilitates some other features like
[System Configuration Change Monitoring](#system-configuration-change-monitoring),
and automatically enables the `ares_set_pending_write_cb()` feature to optimize
multi-query writing.


## System Configuration Change Monitoring

The system configuration is automatically monitored for changes to the network
and DNS settings.  When a change is detected a thread is spawned to read the
new configuration then apply it to the current c-ares configuration.

This feature requires the [Event Thread](#event-thread) to be enabled via
`ARES_OPT_EVENT_THREAD`.  Otherwise it is up to the integrator to do their own
configuration monitoring and call `ares_reinit()` to reload the system
configuration.

It is supported on Windows, MacOS, iOS and any system configuration that uses
`/etc/resolv.conf` and similar files such as Linux and FreeBSD.  Specifically
excluded are DOS and Android due to missing mechanisms to support such a
feature.  On linux file monitoring will result in immediate change detection,
however on other unix-like systems a polling mechanism is used that checks every
30s for changes.

This feature requires the c-ares channel to persist for the lifetime of the
application.