The ISP Column A monthly column on things Internet December 2020 Geoff Huston, Joao Damas DNS 2XL The first part of this report on the handling of large DNS responses looked at the behaviour of the DNS, and the interaction between recursive resolvers and authoritative name servers in particular and examined what happens when the DNS response is around the Internet's de facto MTU size of 1,500 octets. For responses larger than 1,500 octets we saw failure in some 2.5% of all cases. What we observed was two forms of DNS failure. The first was that resolvers were signalling to the server via query attributes that it was acceptable for the server to send large responses over fragmented UDP, but then were unable to reassemble the fragmented response, either due to local host or local network constraints. This scenario occurred in around three quarters of all such failure cases in our measurement tests. The second failure form was where the resolver had received a truncated DNS response and there was a subsequent TCP failure. This included the failure to open a TCP session or a TCP path MTU mismatch where the TCP session hung when attempting to pass back the DNS response. This occurred in slightly less than one quarter of all failure cases in our measurement tests. The measurement setup and the results from this work are to be found in part 1 of this report, “DNS XL” (https://www.potaroo.net/ispcol/2020-11/xldns.html). However, we're not finished with these measurements. The results that are presented in the first part of this report are based on respecting the packet size constraints expressed in DNS queries. These constraints are that no UDP DNS response should exceed 512 octets unless there is an EDNS(0) extension with a UDP buffer size provided in the query, and the value of this buffer size field is greater than 512. When there is a UDP buffer size in the query, then the DNS response should be no larger than this size. In such cases where this is not possible, then the server will respond with a truncated DNS response over UDP. In this measurement the truncated response packet has an empty answer section, so the resolver making the query cannot use this truncated response to assemble an answer, and it should trigger the resolver to repeat the query over a TCP session with the server. In this, the second part of the report, we ask the question: What if we break with these conventions? In particular, we are interested in understanding the likely changes to DNS resolution behaviour of fragmented UDP responses, the behaviour of TCP responses and the behaviour of the DNS as a whole if all recursive resolvers were use the DNS Flag Day 2020 (https://dnsflagday.net/2020/) setting of 1,232 octets as a buffer size in their queries. Here, we will look at the behaviour of the DNS when we process incoming queries as if they all had an EDNS(0) extension and there was a buffer size in this extension that was set to a particular value. Yes, this server-based rewriting of queries is cheating, and it's not what resolvers may be expecting, but it allows us to gain some further insights into the capabilities of the resolver to authoritative part of the DNS. We are going to perform five variants of changing DNS queries. We will firstly test UDP buffer sizes where all incoming queries are altered to have buffer sizes of 512, 1,232, 1,440 and 4,096 octets. We will then modify the MSS of incoming TCP SYN packets and set this value to 1,220 octets. 1. All UDP When we force the buffer size to 4,096 octets for all incoming queries then at no stage will a recursive resolver receive a response with the truncation bit set. This means that the server will respond to all queries over UDP with a UDP response, and it will fragment all larger UDP responses. The fragmentation onset will reflect the server's local MTU setting of 1,500 octets. The results are shown in Table 1. Actually, that's not quite “all.” In 1% of cases, we observe a query over TCP, even though a truncated response has not been previously sent. It appears that some of the time a resolver that is not receiving fragmented UDP responses will probe the server with TCP in some kind of liveness test. Forced UDP DNS Size Tests Pass Fail IPv4 IPv6 Control 1150 1,140,192 99.6% 0.4% 0.6% 0.1% 1190 1,138,792 99.6% 0.4% 0.6% 0.1% 1230 1,273,730 99.6% 0.4% 0.6% 0.1% 0.5% 1270 1,272,765 98.1% 1.9% 2.4% 1.2% 0.5% 1310 1,275,436 98.2% 1.8% 2.4% 1.2% 0.5% 1350 1,272,634 98.2% 1.8% 2.4% 1.2% 0.5% 1390 1,273,332 98.1% 1.9% 2.4% 1.2% 0.5% 1430 1,274,189 97.8% 2.2% 2.6% 1.6% 0.5% 1470 1,274,581 96.9% 3.1% 3.7% 17.6% 1.0% 1510 1,273,496 85.0% 15.0% 14.2% 17.6% 2.4% 1550 1,274,776 85.0% 15.0% 14.4% 17.7% 2.6% 1590 1,276,441 85.1% 14.9% 14.4% 17.6% 2.6% 1630 1,275,233 85.1% 14.9% 14.5% 17.6% 2.6% Table 1 – Failure Rate on UDP Test The columns in Table 1 reflect a dual stack failure rate, an IPv4-only experiment, and an IPv6-only experiment, and the control, which is the experiment that does not alter the received buffer size in any way. There are some unexpected outcomes in this data. The first is that we observed a 2% failure rate for unfragmented UDP responses with DNS payload sizes of 1,270 octets and greater. Oddly enough the failure rate for DNS payloads between 1,270 octets and 1,430 octets in IPv4-only (2.4%) is double that of IPv6-only (1.2%). These DNS responses are packaged by the server as unfragmented UDP packets. As the smaller control unfragmented DNS response was successfully processed by the resolver, this presumably implies that there is some network infrastructure close to some resolvers that is discarding UDP packets where the payload size is between 1,270 and 1,430 octets, or the resolvers themselves are not accepting incoming DNS packets of size greater than 1232 octets in some circumstances. This particular result is likely to be due to the nature of the experimental setup and resolver behaviour, rather than being due to network behaviour. In this experiment we are deliberately abusing the DNS specification and the experiment's server is ignoring the resolver clients' offered UDP buffer size values. Most resolver implementations appear not to raise an exception if the DNS response in the UDP packet is larger than the UDP buffer size specified in the query, but some resolver implementations appear perform a correlation between query and response. These implementations appear to be discarding a UDP response if the DNS payload is larger than the UDP buffer size in the original query. Similarly, there are instances where the response is being discarded if no buffer size was originally given by the resolver client and the response is larger than 512 octets. When we look at the comparison between the resolver client's buffer size and the size of the UDP response, then for each individual test there are three possible types of response: all responses for the test are smaller than the query-specified buffer sizes, all responses for the test are larger than the query-specified buffer sizes, or it's a mixed scenario. We then divide up each case into success and fail. The results are as follows: DNS Size Success Fail Smaller Larger Mixed Smaller Larger Mixed 1150 58.20% 37.70% 4.10% 0.04% 99.95% 0.02% 1190 58.62% 37.44% 3.95% 0.04% 99.94% 0.02% 1230 58.74% 37.19% 4.06% 0.04% 99.94% 0.02% 1270 59.55% 36.80% 3.65% 0.04% 99.94% 0.02% 1310 59.69% 36.37% 3.93% 0.04% 99.94% 0.02% 1350 59.44% 36.94% 3.62% 0.04% 99.94% 0.02% 1390 59.67% 36.40% 3.93% 0.04% 99.94% 0.02% 1430 59.67% 36.52% 3.81% 0.04% 99.94% 0.02% 1470 57.42% 38.23% 4.35% 0.04% 99.94% 0.02% It's clear that in these unfragmented UDP cases the majority of failures occur when the DNS response is larger than the query-specified buffer size. The conclusion drawn from this data is that the observed loss rates for unfragmented UDP responses when we use a test that deliberately disregards the offered UDP buffer size are generally attributable to these resolver clients rejecting the server's responses in those cases where the response is larger than the size specified in the original query. There is no evidence of systematic network failure when using these packet sizes, either in IPv4 or in IPv6. When we quote figures about IPv6 we are talking about the pass and failure rates as they relate to the subset of users who are located behind IPv6-capable DNS resolvers. This is currently measured to be around some 55% of users (https://www.potaroo.net/ispcol/2020-07/dns6.html). For UDP packets that are fragmented by the server before they are sent, namely with payloads greater than 1,472 octets (and 1,452 in IPv6) the failure rate rises considerably for both protocols. IPv6 fragmentation is evidently not handled as well as IPv4, but both protocols show an extremely high loss rate. There are likely to be two factors going on in this scenario. Firstly, there is the ‘oversized' response being discarded by the resolver, which would account for a 2.4% failure rate based on the data from the smaller unfragmented packets. The additional failure component appears to be related to a fragmentation drop behaviour, which appear to account for the remaining 12% failure rate. In IPv6 the fragmentation-related drop rate appears to account for 15.2% of failure cases while in IPv4 the ‘oversize' drop rate is higher and the residual fragmentation drop rate is 12%. Why isn't the IPv6 fragmentation drop rate of 15.2% even higher? Other studies have reported IPv6 fragmentated packet drop rates between 20% to upward of 45%. The reason probably lies in the particular circumstances of this experiment. Here we are looking at the path between recursive resolvers and a small set of authoritative servers. The servers are located in a data centre hosted environment that admits fragmented IPv6 packets and the recursive resolvers would presumably be located in operationally managed facilities that would likely to be also managed to achieve operational robustness. In order words, here we are looking more at the ‘core' of the network rather than the connections to the edges. The higher IPv6 fragmented packet drop rates have generally been observed in studies using end-to-end measurements which would presumably include edge networks. This implies that this observed 15.2% IPv6 UDP fragmentation drop rate reflects aspects of the recursive-to-authoritative network path but is not a good starting point to make more universal claims about IPv6 fragmentation performance in the end-to-end Internet. It's also the case that the IPv4 fragmentation drop rate is 12% in this scenario. This is a critical observation, in that other studies of end-to-end fragmentation drop rates in IPv4 do not report such high levels of packet drop. This implies that the observed IPv6 fragmentation drop rate is more likely to be due to specific security-based filter rules relating to UDP packet fragmentation rather than network behaviours dropping IPv6 packets with extension headers in this particular measurement scenario. What is also somewhat unexpected is that the average query count is so high for failure cases when the response is fragmented (Table 2). The lack of a truncated response leads some resolution systems to re-query at a high rate over the 60 second measurement window. DNS Size Pass Fail UDP Control UDP Control 1150 5.2 12.1 1190 5.2 12.0 1230 5.3 4.8 11.6 12.3 1270 5.3 4.9 11.0 12.0 1310 5.3 4.9 11.2 12.0 1350 5.3 4.9 11.2 12.5 1390 5.3 4.9 11.1 12.7 1430 5.3 4.9 12.3 12.6 1470 5.4 5.2 11.8 27.5 1510 7.8 5.8 91.0 46.1 1550 7.9 5.8 90.9 43.7 1590 7.8 5.8 90.9 43.6 1630 7.9 5.9 91.4 44.2 Table 2 – Average Query Count for UDP-only Test A similar pattern is visible when looking at the average time taken to perform this resolution task (Table 3). While the average number of queries to successfully resolve a name rises by 2 queries for fragmented UDP packets, the average time taken to successfully complete the resolution process rises by a further 80ms on average when the UDP response is fragmented. DNS Size Pass Fail UDP Control UDP Control 1150 201 7,420 1190 200 7,395 1230 201 66 6,903 5,801 1270 204 68 2,870 5,829 1310 203 68 2,934 5,970 1350 203 71 2,971 6,107 1390 203 72 2,925 6,163 1430 202 73 3,579 6,190 1470 234 247 6,501 12,385 1510 287 1,221 24,010 20,842 1550 289 1,230 23,863 19,708 1590 289 1,232 23,828 19,605 1630 293 1,250 23,710 19,799 Table 3 – Average Query Time (ms) for UDP-only Test These results do not place fragmented UDP in a good light for the DNS, irrespective of the IP protocol version. There is a base rate of some 14% of experiments that fail when the only resolution mechanism is fragmented UDP, and this rises by a further 2.5% when IPv6-only is used. The elapsed time to resolve also stretches out, and 8 seconds on average for resolution of a name when fragmented UDP is the only resolution mechanism is simply too long a time to be useful. The implication of these results suggests that the original recommendation in RFC 6891to use a default buffer size parameter value of 4,096 octets was overly optimistic about the performance characteristics of fragmented UDP when negotiating firewalls and filters in front of DNS resolvers. Avoiding UDP fragmentation in the DNS appears to be a prudent measure, not because of network drop per se, but because of the common operational conventions in filtering fragmented DNS over UDP packets. Let's test this theory some more. What if we alter our measurement environment to truncate every response larger than 512 octets and only serve larger DNS responses over TCP? 2. All TCP When we force the buffer size to 512 for all received queries then the experiment server will use a truncated response for all queries received over UDP. The truncated response contains no answer section, so the resolver will need to perform the query over TCP to resolve the name. The results are shown in Table 4. Forced TCP DNS Size Tests Pass Fail IPv4 IPv6 Control 1150 1,104,539 98.5% 1.6% 1.9% 1.6% 1190 1,105,126 98.5% 1.6% 1.9% 1.6% 1230 1,105,601 98.5% 1.6% 1.9% 1.6% 0.5% 1270 1,104,571 98.5% 1.6% 1.9% 1.6% 0.5% 1310 1,104,521 98.5% 1.6% 1.9% 1.6% 0.5% 1350 1,104,068 98.5% 1.6% 2.0% 1.6% 0.5% 1390 1,105,080 98.5% 1.6% 1.9% 1.6% 0.5% 1430 1,104,527 98.5% 1.6% 1.9% 1.6% 0.5% 1470 1,103,423 98.3% 1.8% 2.1% 1.8% 1.0% 1510 1,104,960 98.3% 1.8% 2.1% 1.8% 2.4% 1550 1,105,566 98.3% 1.8% 2.1% 1.8% 2.6% 1590 1,103,609 98.3% 1.8% 2.1% 1.8% 2.6% 1630 1,106,284 98.3% 1.8% 2.1% 1.8% 2.6% Table 4 – Failure Rate on TCP Test It appears that some 1.6% of users sit behind a resolver that cannot perform DNS over TCP. If we look at the users behind IPv4-capable resolvers, then the proportion rises slightly to 1.9%. When we look at the subset of users behind IPv6-capable resolvers the number drops slightly to 1.6%. It is likely that more recent resolver deployments support both IPv6 and TCP, while there is a set of legacy resolver systems that do not support IPv6 and a higher proportion of these resolvers do not support TCP. The failure rate rises slightly, by 0.2%, when the TCP response requires two TCP segments. This also means that the first TCP segment is sent using a segment size equal to the receiver's offered MSS value. If there are any path MTU issues on the TCP path, then the first full-size packet may encounter a TCP black hole situation where the ICMP message is not passed back to the TCP sender (the DNS server), and the TCP connection hangs. DNS Size Count NO TCP ACK Fail TCP OK 1150 16,090 60.2% 2.5% 37.3% 1190 16,235 59.8% 2.7% 37.5% 1230 16,287 58.7% 2.4% 39.0% 1270 16,258 59.5% 2.4% 38.1% 1310 16,272 59.5% 2.6% 37.9% 1350 16,249 59.0% 2.3% 38.7% 1390 16,099 59.1% 2.7% 38.1% 1430 16,373 60.1% 2.2% 37.8% 1470 18,092 53.1% 11.8% 35.2% 1510 18,055 52.9% 12.7% 34.4% 1550 18,220 52.7% 12.5% 34.8% 1590 18,469 52.5% 12.4% 35.1% 1630 18,283 52.5% 12.1% 35.4% Table 5 – ACK Failure Rate on TCP Test This appears to be the reason behind the increased failure rate in TCP ACK failure when the DNS payload exceeds the MSS and the response is delivered using a full-sized packet (where the offered MSS equals the outbound MTU minus the packet header overheads. As we noted in the first part of this report (Figure 11 of DNS XL, Part 1), some 80% of TCP sessions over IPv4 and 57% of TCP sessions over IPv6 use an MSS setting in the TCP session that assumes a 1,500-octet path MTU. However, the more dominant factors when failure occurs are cases where there is no TCP at all and cases where there is what appears to be a successfully completed TCP transaction. More than half the time failure occurs when the resolver cannot open the TCP and pass the query to the server. Most likely this is an enthusiastic filter setting close to the resolver that does not allow the DNS to use TCP port 53. The other failure mode is not so readily explained. In a little over one third of cases the TCP session passes the response to the remote client and the client end of the TCP session acknowledges the data. This would normally lead us to conclude that the resolver now has the data. But the resolver does not then complete the overall DNS resolution process. It is unclear why this occurs. A possible explanation is that the DNS application is discarding TCP responses that exceed its UDP payload size, although why a resolver would apply a UDP maximum payload setting to responses received over TCP is not readily explained. The average query count for pass experiments is 1 – 2 queries greater than the control, and 1 query greater than the UDP-only count for smaller packets and much the same as UDP-only for larger DNS responses. The query count for failed experiments is 10 times higher than UDP-only for smaller packets, and similar for the larger DNS responses (Table 6). DNS Size Pass Fail TCP Control TCP Control 1150 7.1 104.1 1190 6.9 110.1 1230 6.9 4.8 87.8 12.3 1270 6.9 4.9 88.7 12.0 1310 6.9 4.9 100.6 12.0 1350 6.8 4.9 91.1 12.5 1390 7.1 4.9 86.3 12.7 1430 6.9 4.9 68.8 12.6 1470 7.0 5.2 78.7 27.5 1510 6.9 5.8 75.1 46.1 1550 6.9 5.8 94.8 43.7 1590 6.9 5.8 98.3 43.6 1630 6.9 5.9 73.6 44.2 Table 6 – Average Query Count for TCP-only Test TCP takes some additional time to start in the DNS. There is 1 round trip time to deliver the UDP truncated response and a further round trip time to complete the TCP handshake, so we can expect the delay with TCP to be longer than simple UDP. Compared to the results in Table 3 (where only UDP was used), the results for this TCP-only experiment show's an increased the elapsed time by a little under double the time (Table 7). However, larger responses are delivered reliably. Unlike fragmented UDP, the TCP failure rate is consistently low. DNS Size Pass Fail TCP Control TCP Control 1150 341 33,938 1190 299 30,372 1230 331 66 29,431 5,801 1270 369 68 28,893 5,829 1310 340 68 28,887 5,970 1350 382 71 29,169 6,107 1390 339 72 28,734 6,163 1430 366 73 31,033 6,190 1470 298 247 28,351 12,385 1510 331 1,221 30,201 20,842 1550 410 1,230 28,335 19,708 1590 315 1,232 28,387 19,605 1630 321 1,250 29,088 19,799 Table 7 – Average Query Time (ms) for TCP-only Test It appears that unfragmented UDP is both fast and reliable, while for larger responses where UDP fragmentation is unavoidable TCP is more reliable, albeit somewhat slower. What happens when we force this behaviour by setting the buffer size in all queries to a value where UDP fragmentation is avoided? 3. Buffer Size of 1,232 octets The next scenario to be explored here is that being used in DNS Flag Day 2020. Here we set our server to behave as if all incoming queries use a buffer size of 1,232 octets. The intent here is to use UDP when we can be reasonably confident that the UDP packet will not encounter UDP fragmentation scenarios, and then shift to TCP for larger responses. The shift to TCP is of course controlled by the server providing a truncated response in UDP. In our case we are once again pushing this beyond conventional behaviour, in that we are not loading an answer section into the truncated response. The only way that the resolver will receive the response is by using TCP once the DNS response size exceeds 1,232 octets. The results of this measurement experiment are shown in Table 8. Forced 1232 buffer Size DNS Size Tests Pass Fail IPv4 IPv6 Control 1150 1,113,090 99.5% 0.5% 0.5% 0.6% 1190 1,113,104 99.5% 0.5% 0.5% 0.6% 1230 1,111,703 99.4% 0.6% 0.6% 0.7% 0.5% 1270 1,114,563 98.4% 1.6% 1.5% 1.8% 0.5% 1310 1,113,632 98.4% 1.6% 1.5% 1.8% 0.5% 1350 1,113,669 98.4% 1.6% 1.5% 1.8% 0.5% 1390 1,115,152 98.4% 1.6% 1.5% 1.8% 0.5% 1430 1,114,069 98.4% 1.6% 1.5% 1.8% 0.5% 1470 1,111,607 98.2% 1.8% 1.7% 2.0% 1.0% 1510 1,112,349 98.2% 1.8% 1.7% 1.9% 2.4% 1550 1,112,795 98.2% 1.8% 1.7% 2.0% 2.6% 1590 1,112,351 98.2% 1.8% 1.7% 1.9% 2.6% 1630 1,112,523 98.2% 1.8% 1.7% 2.0% 2.6% Table 8 – Failure Rate on Buffer Size of 1,232 Test Predictably, we see the UDP-only failure rate (0.5%) for DNS responses of less than 1,232 octets and the TCP-only failure rate (1.6%) for larger packets. This is comparable to the control experiment for smaller responses, slightly worse than the control for responses up to 1,430 octets and slightly better for larger responses. The average query count in this case is 2 queries more than the control case for smaller DNS responses and 1 query more for larger responses. DNS Size Pass Fail 1232 Control 1232 Control 1150 7.1 104.1 1190 6.9 110.1 1230 6.9 4.8 87.8 12.3 1270 6.9 4.9 88.7 12.0 1310 6.9 4.9 100.6 12.0 1350 6.8 4.9 91.1 12.5 1390 7.1 4.9 86.3 12.7 1430 6.9 4.9 68.8 12.6 1470 7.0 5.2 78.7 27.5 1510 6.9 5.8 75.1 46.1 1550 6.9 5.8 94.8 43.7 1590 6.9 5.8 98.3 43.6 1630 6.9 5.9 73.6 44.2 Table 9 – Average Query Count for Buffer Size 1,232 Test The elapsed time to complete resolution rises once the DNS payload exceeds 1,232 octets, and there is on average a further 100ms to complete the resolution process for these larger packets. This is due to the overheads of the truncated DNS response and the TCP handshake time for these response sizes. DNS Size Pass Fail 1232 Control 1232 Control 1150 185 7,118 1190 185 7,375 1230 184 66 7,049 5,801 1270 290 68 18,805 5,829 1310 289 68 18,725 5,970 1350 290 71 18,986 6,107 1390 290 72 18,809 6,163 1430 290 73 18,594 6,190 1470 293 247 17,958 12,385 1510 292 1,221 18,193 20,842 1550 290 1,230 17,933 19,708 1590 292 1,232 18,162 19,605 1630 295 1,250 18,060 19,799 Table 10 – Average Query Time (ms) for Buffer Size 1,232 Test With an overall loss rate of 1.8% for DNS payloads larger than 1,232 octets the obvious question is whether we can improve on this scenario. What if we lift the buffer size to just below the onset of UDP packet fragmentation, namely at 1,440 octets? 4. Buffer Size of 1,440 octets Let's now look at the scenario of lifting of the threshold point to switch to TCP to just below a packet size of 1,500 octets. We will force all queries to use a buffer size setting of 1,440 octets. We know from the all UDP experiment (Table 1) that there is an elevated response loss rate when the DNS payload size in UDP exceeds the resolver-client specified buffer size in the query, and this is visible in Table 11. This appears to account for a minimum of some 2% of the 2.6% observed failure rate for these smaller-sized packets. The UDP loss rate for this size range exceeds the TCP loss rate that we observed in Table 8 where the lower buffer size setting of 1,232 octets was used. Forced 1440 buffer Size DNS Size Tests Pass Fail IPv4 IPv6 Control 1150 1,113,090 99.5% 0.5% 0.6% 0.6% 1190 1,113,104 99.5% 0.5% 0.6% 0.6% 1230 1,111,703 99.4% 0.6% 0.7% 0.6% 0.5% 1270 1,114,563 97.5% 2.5% 2.3% 2.9% 0.5% 1310 1,113,632 97.5% 2.5% 2.3% 2.9% 0.5% 1350 1,113,669 97.4% 2.6% 2.3% 3.0% 0.5% 1390 1,115,152 97.5% 2.5% 2.3% 2.9% 0.5% 1430 1,114,069 97.2% 2.8% 2.6% 3.2% 0.5% 1470 1,111,607 98.3% 1.7% 1.8% 1.7% 1.0% 1510 1,112,349 98.3% 1.7% 1.8% 1.8% 2.4% 1550 1,112,795 98.3% 1.7% 1.9% 1.7% 2.6% 1590 1,112,351 98.3% 1.7% 1.8% 1.8% 2.6% 1630 1,112,523 98.3% 1.7% 1.8% 1.7% 2.6% Table 11 – Failure Rate on Buffer Size 1,440 Test The UDP average query count is uniformly low up until the TCP point, and the truncation and switch to TCP lifts the average query count for successful resolution efforts by slightly over 2 queries. The unsuccessful query count is more than quadrupled when there is a shift to TCP (Table 12). DNS Size Pass Fail 1440 Control 1440 Control 1150 4.3 29.3 1190 4.3 26.9 1230 4.3 4.8 28.4 12.3 1270 4.3 4.9 28.9 12.0 1310 4.3 4.9 30.8 12.0 1350 4.3 4.9 30.8 12.5 1390 4.3 4.9 29.1 12.7 1430 4.3 4.9 29.5 12.6 1470 6.6 5.2 154.6 27.5 1510 6.6 5.8 142.7 46.1 1550 6.6 5.8 133.2 43.7 1590 6.8 5.8 187.2 43.6 1630 6.7 5.9 180.4 44.2 Table 12 – Average Query Count for Buffer Size 1,440 Test The UDP-based retrieval is also considerably faster than TCP, completing the resolution in an average of 130ms, compared to 260ms, which is consistent with the overheads of the TCP connection. (Table 13). DNS Size Pass Fail 1440 Control 1440 Control 1150 156 26,500 1190 133 23,522 1230 134 66 24,315 5,801 1270 131 68 24,761 5,829 1310 138 68 25,378 5,970 1350 157 71 25,027 6,107 1390 128 72 24,930 6,163 1430 142 73 24,375 6,190 1470 274 247 25,677 12,385 1510 228 1,221 26,329 20,842 1550 265 1,230 25,950 19,708 1590 267 1,232 25,610 19,605 1630 247 1,250 26,266 19,799 Table 13 – Average Query Time (ms) for Buffer Size 1,440 Test This data suggests that the lower buffer size of 1,232 is more robust for resolvers, but it will add delays in resolution time and impose a greater query load on the server, both in terms of the TCP control overhead and the additional query volume for responses whose size falls into the range of 1,232 to 1,440 octets. It is possible, even likely, that the loss rate would fall were resolvers to use a default buffer size of 1,440 octets rather than 1,232 octets. The issue here appears to be application-level settings disregarding received packets and not an intrinsic behavioural property of the network path between the servers and recursive resolvers 5. Buffer Size of 1,440 octets, TCP MSS of 1,200 There is another variant to examine here, and that is to try and reduce the incidence of TCP path MTU issues. One way to achieve this is to drop the MTU setting on the server, so that it will not push out 1,500 octet IP packets. Another way is to modify the incoming MSS of TCP connection packets and rewrite the MSS to a lower value. In this experiment we've used the approach of rewriting the MSS on incoming TCP SYN packets, changing the MSS value to a value of 1,200 octets. This should reduce the TCP failure rate where the server sends the DNS data and does not receive an ACK for the data. 1440 tests DNS Size Tests Pass Fail IPv4 IPv6 Control 1150 1,202,770 99.5% 0.5% 0.5% 0.1% 1190 1,207,607 99.5% 0.5% 0.5% 0.1% 1230 1,205,935 99.4% 0.6% 0.5% 0.1% 0.5% 1270 1,206,166 97.5% 2.5% 1.1% 0.2% 0.5% 1310 1,204,420 97.5% 2.5% 1.1% 0.2% 0.5% 1350 1,205,097 97.4% 2.6% 1.1% 0.2% 0.5% 1390 1,204,737 97.5% 2.5% 1.1% 0.2% 0.5% 1430 1,204,415 97.2% 2.8% 1.5% 0.9% 0.5% 1470 1,205,472 98.3% 1.7% 1.7% 1.6% 1.0% 1510 1,208,416 98.3% 1.7% 1.8% 1.6% 2.4% 1550 1,207,806 98.3% 1.7% 1.7% 1.6% 2.6% 1590 1,205,885 98.3% 1.7% 1.7% 1.6% 2.6% 1630 1,206,097 98.3% 1.7% 1.7% 1.6% 2.6% Table 14 – Failure Rate on Buffer Size 1,440, MSS 1,200 Test There is a very small change in the failure rate for DNS responses larger than 1,500 octets, and the change is around 0.1%. (Table 14) The change improves the IPv6 performance, dropping the failure rate for larger packets from 1.7% to 1.6%. The query count profile is largely unaltered, as one would expect, although the level of query thrashing for large responses that fail in TCP is higher. One the issue of TCP “black hole” failure is removed then the other failure cases relating to TCP become the dominant factor, and the number of TCP queries that are made in 60 seconds increases once the stalled TCP sessions are eliminated (Figure 15). DNS Size Pass Fail 1440 Control 1440 Control 1150 4.3 27.0 1190 4.3 27.2 1230 4.3 4.8 26.0 12.3 1270 4.3 4.9 27.8 12.0 1310 4.3 4.9 29.1 12.0 1350 4.3 4.9 28.9 12.5 1390 4.3 4.9 26.8 12.7 1430 4.3 4.9 26.8 12.6 1470 7.2 5.2 171.6 27.5 1510 7.0 5.8 205.5 46.1 1550 7.2 5.8 172.2 43.7 1590 7.2 5.8 187.5 43.6 1630 7.0 5.9 228.9 44.2 Table 15 – Average Query Count for Buffer Size 1,440, MSS 1,200 Test DNS Size Pass Fail 1440 Control 1440 Control 1150 178 24,725 1190 189 23,700 1230 169 66 23,283 5,801 1270 201 68 23,550 5,829 1310 195 68 24,592 5,970 1350 177 71 23,693 6,107 1390 167 72 22,922 6,163 1430 172 73 21,919 6,190 1470 408 247 29,350 12,385 1510 494 1,221 29,500 20,842 1550 586 1,230 29,800 19,708 1590 506 1,232 29,676 19,605 1630 663 1,250 30,189 19,799 Table 16 – Average Query Time (ms) for Buffer Size 1,440, MSS 1200 Test The profile of time to resolve is also similar, although the elapsed time for larger responses is somewhat larger (Figure 16). 6. Max Buffer size of 1,232 octets, TCP MSS of 1,200 So far, we have assumed a model where the resolver client is in control of the onset of UDP fragmentation by using the buffer size parameter in the EDNS(0) extension attached to a DNS query. Some DNS implementations also allow the server to also influence the onset of UDP fragmentation in DNS responses over UDP. In the Bind resolver the configuration option is the max-udp-size value: max-udp-size Sets the maximum EDNS UDP message size named will send in bytes. Valid values are 512 to 4096 (values outside this range will be silently adjusted). The default value is 4096. The usual reason for setting max-udp-size to a non-default value is to get UDP answers to pass through broken firewalls that block fragmented packets and/or block UDP packets that are greater than 512 bytes. This is independent of the advertised receive buffer (edns-udp-size). The intent of this setting is to allow the server to set its own maximum UDP response size. If the query provides a lower value for the buffer size then the server will use it, but if the query has a higher buffer size value, then this local setting will be used. What happens when we combine this approach with the server-size imposed TCP MSS value of 1,200? The results of this experiment are shown in Table 17. DNS Size Tests Pass Fail IPv4 IPv6 Control 1150 1,198,284 99.2% 0.8% 1.0% 0.7% 1190 1,196,442 99.2% 0.8% 1.1% 0.7% 1230 1,196,874 99.2% 0.8% 1.1% 0.7% 0.5% 1270 1,196,063 98.3% 1.7% 2.4% 1.3% 0.5% 1310 1,198,020 98.4% 1.6% 2.5% 1.3% 0.5% 1350 1,197,269 98.3% 1.7% 2.5% 1.4% 0.5% 1390 1,196,841 98.4% 1.6% 2.4% 1.3% 0.5% 1430 1,198,235 98.3% 1.7% 2.4% 1.3% 0.5% 1470 1,196,930 98.3% 1.7% 2.5% 1.3% 1.0% 1510 1,196,544 98.3% 1.7% 2.5% 1.3% 2.4% 1550 1,197,824 98.3% 1.7% 2.4% 1.3% 2.6% 1590 1,197,728 98.3% 1.7% 2.5% 1.4% 2.6% 1630 1,196,426 98.3% 1.7% 2.5% 1.3% 2.6% Table 17 – Failure Rate on Buffer Size max 1,232 with 1,200 TCP MSS Test The change here is that we are avoiding the case where the client drops the response because it is larger than the clients' originally specified maximum UDP response sizes. Because no UDP response is larger than 1,232 octets of payload then all intermediate sized responses (1,270 octets) and large responses (larger than 1430 octets) switch to TCP, and the larger TCP failure rate (of some 1.7%) kicks in. As observed already, the TCP failure rate for IPv4 resolvers is almost double the IPv6 failure rate. The profile of number of queries (Table 18) and time to resolve (Table 19) the name is largely similar to the previous case, DNS Size Pass Fail <=1232 Control <=1232 Control 1150 6.0 251.2 1190 6.0 261.6 1230 6.1 4.8 268.8 12.3 1270 7.6 4.9 160.7 12.0 1310 7.7 4.9 185.5 12.0 1350 7.6 4.9 170.1 12.5 1390 7.6 4.9 162.9 12.7 1430 7.7 4.9 154.7 12.6 1470 7.8 5.2 152.4 27.5 1510 7.5 5.8 140.8 46.1 1550 7.6 5.8 179.9 43.7 1590 7.6 5.8 143.1 43.6 1630 7.6 5.9 172.3 44.2 Table 18 – Average Query Count for Buffer Size max 1,232, MSS 1,200 Test DNS Size Pass Fail <=1232 Control <=1232 Control 1150 412 12,556 1190 526 11,257 1230 426 66 11,803 5,801 1270 609 68 28,076 5,829 1310 545 68 29,785 5,970 1350 510 71 29,608 6,107 1390 489 72 29,621 6,163 1430 521 73 28,852 6,190 1470 561 247 29,070 12,385 1510 687 1,221 29,540 20,842 1550 589 1,230 27,622 19,708 1590 590 1,232 29,055 19,605 1630 491 1,250 28,403 19,799 Table 19 – Average Query Time (ms) for Buffer Size max 1,232, MSS 1200 Test 7. Max Buffer size of 1,440 octets, TCP MSS of 1,200 This case is similar to case 6, but with the UDP-to-TCP threshold lifted to 1,440 octets. Max 1440 tests DNS Size Tests Pass Fail IPv4 IPv6 Control 1150 1,199,907 99.2% 0.8% 0.9% 0.6% 1190 1,201,395 99.2% 0.8% 1.0% 0.7% 1230 1,201,890 99.2% 0.8% 0.9% 0.7% 0.5% 1270 1,200,722 99.2% 0.8% 1.0% 0.6% 0.5% 1310 1,201,045 99.2% 0.8% 1.0% 0.7% 0.5% 1350 1,200,161 99.2% 0.8% 0.9% 0.7% 0.5% 1390 1,200,845 99.2% 0.8% 0.9% 0.7% 0.5% 1430 1,201,287 99.2% 0.8% 1.0% 0.6% 0.5% 1470 1,200,239 98.3% 1.7% 2.1% 1.4% 1.0% 1510 1,202,629 98.3% 1.7% 2.1% 1.4% 2.4% 1550 1,200,767 98.3% 1.7% 2.1% 1.4% 2.6% 1590 1,202,288 98.3% 1.7% 2.1% 1.4% 2.6% 1630 1,203,165 98.3% 1.7% 2.1% 1.4% 2.6% Table 20 – Failure Rate on Buffer Size max 1,440 with 1,200 TCP MSS Test The outcomes for IPv4 and IPv6 non-fragmented packets in Table 20 are slightly better than the results in Table 14, particularly as it relates to DNS response sizes in the range 1,270 to 1,470 octets. It appears that some 2% of users sit behind recursive resolvers that will check the UDP DNS response size against the buffer size in the original query and reject the query if the response is larger than the query-specified size. DNS Size Pass Fail <=1440 Control <=1440 Control 1150 6.2 303.6 1190 6.1 183.5 1230 6.1 4.8 273.8 12.3 1270 6.1 4.9 229.8 12.0 1310 6.0 4.9 253.0 12.0 1350 6.0 4.9 245.2 12.5 1390 6.3 4.9 215.2 12.7 1430 6.0 4.9 215.5 12.6 1470 7.5 5.2 170.8 27.5 1510 7.7 5.8 187.5 46.1 1550 7.4 5.8 129.2 43.7 1590 7.7 5.8 151.2 43.6 1630 7.7 5.9 148.7 44.2 Table 21 – Average Query Count for Buffer Size max 1,440, MSS 1,200 Test DNS Size Pass Fail <=1440 Control <=1440 Control 1150 340 11,138 1190 344 10,573 1230 331 66 10,506 5,801 1270 317 68 10,696 5,829 1310 384 68 11,113 5,970 1350 359 71 11,701 6,107 1390 318 72 10,319 6,163 1430 314 73 11,497 6,190 1470 405 247 27,144 12,385 1510 497 1,221 27,214 20,842 1550 411 1,230 26,893 19,708 1590 376 1,232 27,901 19,605 1630 388 1,250 27,268 19,799 Table 22 – Average Query Time (ms) for Buffer Size max 1,440, MSS 1200 Test The number of queries (Table 21) and query time (Table 22) show a marked performance improvement for intermediate-sized responses as would be expected . Conclusions Let's collect the results of these individual experiments into single table that look at the failure rates fro the various packet size management scenarios (Table 23). There are a set of design trade-offs in the choices for transport for the DNS protocol. For short responses UDP is an efficient and reliable transport vehicle. However, when the size of the UDP response is larger than the network path MTU and UDP fragmentation is required, then fragmentation packet losses create serious problems for the protocol, and it becomes unreliable. For that reason, TCP will be more far more reliable than fragmented UDP for larger responses on average. However, TCP is slower and far less efficient than UDP and its basic reliability rate is worse than unfragmented UDP. If carriage efficiency and reliability is a consideration for the DNS, then unfragmented UDP is clearly superior to TCP, while TCP is clearly superior to fragmented UDP. DNS Size Failure Rates Control 512 4096 1232 <=1232 1440 <=1440 (UDP) (TCP) 1150 1.6% 0.4% 0.5% 0.5% 0.5% 0.5% 1190 1.6% 0.4% 0.5% 0.5% 0.5% 0.5% 1230 0.5% 1.6% 0.4% 0.6% 0.6% 0.6% 0.6% 1270 0.5% 1.6% 1.9% 1.6% 1.7% 2.5% 0.6% 1310 0.5% 1.6% 1.8% 1.6% 1.6% 2.5% 0.6% 1350 0.5% 1.6% 1.8% 1.6% 1.7% 2.6% 0.6% 1390 0.5% 1.6% 1.9% 1.6% 1.6% 2.5% 0.6% 1430 0.5% 1.6% 2.2% 1.6% 1.7% 2.8% 0.6% 1470 1.0% 1.8% 3.1% 1.8% 1.7% 1.7% 1.7% 1510 2.4% 1.8% 15.0% 1.8% 1.7% 1.7% 1.7% 1550 2.6% 1.8% 15.0% 1.8% 1.7% 1.7% 1.7% 1590 2.6% 1.8% 14.9% 1.8% 1.7% 1.7% 1.7% 1630 2.6% 1.8% 14.9% 1.8% 1.7% 1.7% 1.7% Table 23 – Summary of Failure Rates What this means is that UDP should be used for as long as it will not encounter fragmentation, and then the DNS should shift to TCP. How can this be achieved? It is unreasonable to expect that a lightweight UDP-based packet exchange should perform a path MTU discovery operation for each and every transaction. This implies that both the client and the server should use conservative settings for transport parameters that avoid path MTU issues. What should a DNS client do? The DNS Flag Day 2020 settings are a good start, but I think that they don't quite catch the entirety of the space. Not only should a client use a EDNS(0) payload size setting equal to or less than 1452 in IPv6 (accounting for a 40 octet IPv6 header and an 8 octet UDP header), and 1472 in IPv4 (accounting for a 20 octet IPv4 header and an 8 octet UDP header). For TCP, a client should also use a TCP MSS setting less than 1440 octets in IPv6 (accounting for a 40 octet IPv6 header and an 20 octet TCP header) and 1460 octets in IPv4 (accounting for a 20 octet IPv4 header and an 20 octet TCP header). What should a DNS server do? The server should also avoid fragmentation, and it can do this by setting a maximum payload size value no larger than 1,452 in IPv6 and 1,472 in IPv4. It should also impose a ceiling on the size of outgoing TCP packets of 1,440 packets in IPv6 and 1,460 in IPv4. Specific circumstances vary, and there is a difference between measurements at the edge of the Internet and within the infrastructure of the network. Our extensive measurements of the behaviour of the inner infrastructure of the Internet between recursive resolvers and authoritative servers indicate that the network behaviour is relatively uniform with IP packet sizes up to 1,500 octets. If we restrict ourselves to settings that relate only to the transactions between recursive resolvers and authoritative servers then the DNS Flag Day 2020 setting of 1,232 octets are too low. The result is that the transaction will invoke TCP too early. A more efficient outcome can be achieved by pushing the UDP packet size to 1,500 octets including the IP header. At the same time, it is prudent to pull the TCP segment size down. The incremental performance cost of using a 1,200 octet MSS value is extremely small when looking at DNS transactions. This leads to some recommendations for transport parameter values for DNS clients and servers, shown in Table 24. The intent of these settings is to use UDP all the way to 1,500 octets of IP packet size, then use TCP with a more conservative MSS setting that increases the reliability of TCP sessions. IPv4 IPv6 Client EDNS(0) Buffer Size 1,472 1,452 Client TCP MSS 1,200 1,200 Server Max Buffer Size 1,471 1,452 Server max TCP MSS 1,200 1,200 Table 24 – Summary of DNS transport settings, recursive to authoritative It must be noted that these settings apply only to “inside” of the Internet in the path between recursive resolvers and authoritative servers. The edge of the Internet is shows greater levels of variability and it is probably prudent to use a lower UDP upper bound, although this is as aspect of the DNS where our measurement technique cannot gain a direct insight, so we've refrained from making any particular recommendations for the edge stub-to-recursive resolver scenario. It's likely that the TCP MSS setting of 1,200 octets would still make sense, but less clear if the higher buffer size parameter is equally applicable at the edge.   Disclaimer The above views do not necessarily represent the views or positions of the Asia Pacific Network Information Centre. Author Geoff Huston is the Chief Scientist at APNIC, the Regional Internet Registry serving the Asia Pacific region. www.potaroo.net