ISP Column - June 2006 An occasional column on things Internet The BGP Report for 2005 June 2006 Geoff Huston So how's the Internet's inter-domain routing system getting along these days? Some time back in this column I looked at the state of inter-domain routing, and speculated as to how it could evolve (see "The State of Inter-Domain Routing", March 2004, and the earlier RFC (a "Commentary on Inter-Domain Routing in the Internet", RFC 3221). At the time it looked as if we'd be seeing some very real scaling problems with inter-domain routing, where the routing system was growing at a rate that appeared to outstrip router hardware capabilities, and the two forward trend lines of routing requirements and router capabilities would meet sometime around 2003 to 2005 (IETF Plenary Presentation, March 2001). That was some years ago, and now its 2006. So what's changed since then, and where are we with inter-domain routing? The first piece of news, and maybe its not so surprising, is that it still a BGP version 4 inter-domain routing world as far as the Internet is concerned, and nothing substantive has changed in the protocol we use today over what was in use over 12 years ago. The larger these system become the more inertial mass they accumulate, and fundamental change becomes harder to deploy. So I'll hazard the guess that nothing much in inter-domain routing technology is going to change in the near future. While the Internet used to take some comfort in its ability to perform feats of rapid deployment of innovative technologies up and down the protocol stack to address various forms of growing pains, these days the lower layers of the protocol stack are accreting significant levels of inertia, and it's the upper levels of the stack are left to carry the innovation burden. Routing is, perhaps unfortunately, an inhabitant of one of these lower levels of the protocol stack, while much of the innovative agenda is taking place at the application level. The Border Gateway Protocol really has not changed at all in its almost two decades of deployment. BGP remains a classic distance vector protocol, using an explicitly enumerated path vector as a combined path metric and loop detector. Indeed the introduction of 32-bit AS numbers to BGP could be argued as one of the larger forthcoming changes to the BGP protocol since the introduction of explicit address prefix mask (Classless Inter-Domain Routing, or "CIDR") back in 1994, and even this change is a relatively minor change to the protocol. Given that its just plain old BGP, and given that we're likely to be stuck with it for some years to come, whether its an IPv4, IPv6 or mixed protocol world, than now is as good a time as any to ask how BGP is going, and to see if we can make some guesses as to what kind of routing load BGP will be required to cope with in the coming years. There are a large number of measurements of the BGP routing table that can describe the dimensions and dynamic characteristics of the inter- domain internet. Here I'd like to concentrate on the use and behaviour of the protocol itself, so in this article I will take a look at BGP acro the year of 2005, and see how well BGP fared. BGP The Protocol To recap from last month's article, BGP is a distance vector routing protocol, as distinct from a link-state routing protocol or a map-based routing protocol. BGP is a distributed computation that uses addre prefixes as its basic unit of routing. Each BGP speaker maintains a set of tables (Routing Information Bases, or RIBs) - one for each BGP neighbour and one for its own internal use for forwarding. BGP keeps a copy of all prefixes and associated routes that have been advertised by its peers (Adjacency-RIB-IN). It selects the "best" of these routes to use for its local forwarding decisions (Local-RIB), and sends a copy of this "best" route to all its peer (Adjacency-RIB-OUT). Like any distance-vector routing protocol, BGP operates as a loosely synchronized distributed computation based on partial information forwarding. A BGP peer session uses TCP a reliable transport protocol, so that periodic re-flooding of the route tables, so beloved by the interior routing protocol RIP, is not required in BGP. BGP is a far more parsimonious protocol where once a BGP session has been set up and the initial route set is exchanged, then the subsequent protocol traffic is limited to notification of a prefix that is no longer reachable, or when the characteristics of the local "best" route have changed and the local BGP instance wants to inform it neighbouring peers. This information is passed in a BGP update message. This protocol message contains a collection of route attributes, and a list of prefixes that share this attribute set (announcements) and a set of prefixes that are no longer reachable (withdrawals). If the entire network is perfectly stable, with no changes of any form, then BGP would be a very quiet protocol, with only the intermittent (30 second by default) exchange of keepalive message to indicate any activity at all. On the other hand, a large dynamic network where prefixes are appearing and disappearing, and where paths are created and lost, such as in the Internet, is capable of generating a relatively impressive set of updates in very small time intervals. Each received update represents work to be undertaken. The incoming update message causes a change in the Adjacency-RIB-IN. If the information is a prefix withdrawal, then a comparison needs to be made with the local RIB. If there is a match, then all other Adjacency-RIB-Ins need to be scanned and a new "best" route installed into the local RIB, as well as loading new announcement messages in the Adjacency-RIB-OUTs to reflect this local change of best path. If there are no other candidate routes in the other RIB-IN's then the route is withdrawn from the local RIB and a withdrawal message is passed to the BGP Speaker's peers. If the incoming update message is an announcement, then the BGP engine ha to update the Adjacency-RIB-IN and then compare this route to the current best path in the Local-RIB. If this new route represents a "better" path, then the Local-RIB is updated and announcement messages are queued in all the Adjacency-RIB-OUTs. In terms of protocol workload and routing stability its not the size of the BGP routing table that is the critical issue - it's the dynamic characteristics of BGP update messages. The longer the delay in processing update messages the longer the time for the entire system to converge upon a stable routing state that reflect optimised paths across the inter-domain space, and the larger the number of intermediate messages that are generated during thi process of convergence, which in turn compounds the problem. At the extreme case the local BGP engine will exhaust its incoming BGP message buffer and fail to process updates. At this stage there i the potential for inconsistent information to be embedded in the routing system, leading to loops and black holes in the routing system. This is the point at while the routing could be said to have "collapsed". Looking at the BGP update rate, and in particular the relative rate of growth of the BGP routing table as compared to the rates of growth of update messages, and updated prefixes can give us a helpful indicator of the pressures for growth in the routing system, and also an indicator of what size router we'll need to use to cover the Internet's routing system in the coming years. So what can we say about the Internet and inter-domain routing in 2005? Lets have a look at a number of vital statistics for the year. The following graphs were generated from a stream of one-hourly 'snapshots' of the routing table across 2005, taken from the boundary of AS1221. [Fig 1. The number of IPv4 BGP Prefixes] [Fig 2. The total span of IPv4 address space in the routing table] [Fig 3. The number of AS numbers in the routing table] The IPv4 data can be summarised as follows: Prefixes 148,000 - 175,400 +18% +26,900 entries Prefix Roots 72,600 - 85,500 +18% +12,900 entries More Specifics 77,200 - 88,900 +18% +14,000 entries Addresses 80.6 - 88.9 (/8s) +10% +8.3 /8s ASNs 18,600 - 21,300 +14% 2,600 ASNs What this table indicates is that for the IPv4 Internet the use of aggregates in the routing system has not improved. The average size of advertisements is getting smaller in terms of address span per routing table entry, the span of originating addresses per AS i getting smaller, the average AS path length is constant at around 3.5 AS hops and the number of AS's is increasing, and the interconnection degree of AS's is getting higher. The implication i that the granularity of the inter- domain routing system continues to get finer and the density of interconnection is getting greater. For a distance vector protocol such as BGP is not heartening news. A similar exercise has been done for IPv6 for 2005: [Fig 4. The number of IPv6 BGP Prefixes] [Fig 5. The total span of IPv6 address space in the routing table] [Fig 6. The number of AS numbers in the routing table] The IPv6 data can be summarised as follows: Prefixes 700 - 850 +21% +150 entries Prefix Roots 555 - 640 +15% +185 entries More Specifics 145 - 210 +51% +65 entries Addresses 9.0 - 13.5 +50% +4.5 ASNs 500 - 600 +20% +100 ASNs Its far harder to make generalizations about the trends in the IPv6 network over 2005, as the IPv6 network is simply not large enough to show any overall trend behaviour as yet. However the IPv4 trends for 2005 are a source of some concern. How big can the Internet grow in the coming years? Will we continue to be able to deploy routers in the default-free routing zone of the Internet that can comfortably route the Internet. Can we add additional functionality into the routing system and still stay within comfortable limits of the capability of the routing system and the routers? If you are an ISP and are considering purchasing new 'core' routers what capabilities should you specify for an operational lifetime of 2 years? How about for the next 5 years? And if you are a router vendor designing routing products for the market 3 or 5 years in the future what capacity should you build into the router? How much processing capacity should you plan for to support default-free BGP? How much memory is necessary? These are all relevant questions, of course, so the next question i what data can we gather to attempt to provide some likely answers? These snapshots give us some rough information about likely trends, but to provide a more reasoned response its useful to take a more detailed examination of BGP over the year. Perhaps the best question to pose here is: how have these overall trends manifested themselves in the operation of the BGP protocol? For this exercise a BGP measurement point was set up inside AS1221, and all BGP protocol messages (or "updates") that were passed within that network were recorded with a timestamp on a logging host. The update data was processed to eliminate the internal routing change and the set of exterior BGP updates was analysed. Only the IPv4 BGP traffic is reported here. The aim here is to see if there are some trend data that we can extract from the assembled update logs for the year and make some predictions about overall BGP capacity requirements in the coming years. Update messages per Day The data set is admitted large - some 146 million BGP update messages were recorded for the entire year. One way of breaking down this data is looking at the number of BGP Update messages per day. On a daily basis the number of update messages appears to have almost doubled for 2005, starting from some 260,000 update message per day at the start of 2005 to some 550,000 update messages per day by the end of the year. Considering that even by the end of the year there were only 170,000 prefixes in the global routing table, to have this routing population generate 550,000 updates messages per is an impressive achievement. This is a growth rate that is much higher than the growth in the table size. Either the network is far less stable than we'd like to believe, or some other factor i driving up the BGP update rate. The increasing density of interconnection in the inter-domain space may be relevant to thi very high growth rate. [Fig 7. BGP Update messages per Day] The other interesting observation is that BGP has 'good' days and "bad" days - one day in November recorded 1 million update message in a single day. This is a very high level of variation, and it indicates a level of instability in the Internet that is not clearly evident at the user level, where most users tend to see a relatively stable and reliable Internet service. Prefixes per Update Message Why has the number of Update messages increased so significantly? The daily update rate has doubled over the year, while the size of the routing table itself increased by a much smaller growth factor of 18%. Each BGP update messages contains a number of prefixes. One question to ask is whether the number of prefixes in each update message is increasing or decreasing on average. The daily average number of prefixes per update message is The next area of interest is the average number of prefixes contained in each update message. On average there were between 8.1 and 8.3 prefixes per originating AS across 2005, and if it is really the case that prefixes are managed in a manner such that each AS has a single coherent routing policy then we would expect to see a relatively consistent number of prefixes in each BGP update message. This i not the case, and the number of prefixes per update message declined over the year. [Fig 8. Daily Average number of Prefixes per Update Message] The inevitable conclusion here is that the "unit" of inter-domain routing appears to be converging closer to the level of an individual prefix than to an individual AS. The implication here i that if we wish to contemplate a new routing system based on inter-AS connectivity then we need to understand the extent of the number of unique routing policies that must be encompassed in such an environment, and their dynamic behaviour. Again the level of daily variation in this average is very high, and while a least squares best fit indicates an overall downward trend for 2005 from 2.4 prefixes per update message at the start of the year to 2.3 prefixes per update message at the end of the year. The high 'spikes' of this measure on some individual days indicates some form of BGP session resets, where a number of peering sessions may have been reset on a day and the resultant reconstruction of the BGP peering session would normally use dense packing of a large number of prefixes in each update message. But there are on average some 8 prefixes per AS, and the average of a little over 2 prefixes per update message appears to indicate a use of fine-grained routing policies at a level finer than an AS. It would appear that the 'unit' of a BGP routing policy is more fine-grained than an AS, and is now heading towards the level of each advertised prefix having individual routing policies and individual attributes. This implie that the efforts of BGP to compress the update load by grouping prefixes into bundles is no longer as effective as it may have been in the past as a measure of assisting in making BGP an efficient routing protocol. So if we want to look at the trends in BGP, perhaps we should be looking at the update and withdrawal rates of individual prefixes, rather than looking at the level of BGP protocol update messages. So what data is available for the number of prefix updates across 2005? Prefix Update and Withdrawal Rate A similar approach has been made to look at the average number of prefixes that are updates each day in BGP. As Prefixes may be withdrawn or updated, the following graph shows the update and withdrawals per day, counting the number of prefixes in each category [Fig 9. Daily average prefix count of updates and withdrawal] Again the high level of daily variation is visible, and there is now a clearer indication of when there were full BGP session reset without backup paths (high withdrawal and update counts) and BGP re-routing (high update count without a corresponding high withdrawal count). [Fig 10. Prefix Update Count] Here the trend across 2005 is visible for updates. The trend line here is an exponential curve best fit, with an overall growth trend from 570,000 prefixes updated per day at the start of the year to some 850,000 prefixes being updated each day by the end of the year. Again that is a very high growth rate, and it should also be remembered that there are, on average some 165,000 unique prefixe in the Internet's routing table. Clearly some prefixes are evidently generating a very high number of updates on a daily basis. A similar trend is visible in the prefix withdrawal counts for 2005 [Fig 11. Prefix Withdrawal Count] Again an exponential curve best fit trend has been plotted against the withdrawal counts, and the withdrawal count has grown from some 160,000 prefixes being withdrawn on a daily basis at the start of the year to some 340,000 withdrawn prefixes per day by the end of the year. Trend Behaviour in BGP The next question is to relate these prefix update and withdrawal rates against the BGP table size, and look at the likely trends of the load of the BGP protocol in terms of prefix update and withdrawal rates against the trend of the projections of growth of the BGP table itself. The BGP table size over the period from 2002 until the start of 2006 is shown in the following figure. [Fig 12. BGP Prefix Table Size] In this figure the raw data of hourly snapshots (the blue line) ha been smoothed as part of the first step in generating a trend projection. The next step is to take the first order differential of the smoothed data series. [Fig 13. First order differential of BGP Table Size] The linear approximation of the first order differential can be fitted to a trend of an O(2) polynomial trend in the BGP table size. This allows a trend projection in the BGP table over the next 3 - 5 years using this O(2) polynomial, as shown in the figure below. [Fig 14. BGP Table Size Projection] If current trends in BGP continue for the next 3 - 5 years then thi model predicts that the BGP routing table will grown from the level of some 176,000 entries at the end of 2005 to 275,000 entries at the end of 2008 and some 370,000 prefixes by the end of 2010. It is possible to use this predictive model to also forecast the amount of BGP update activity. In this model the starting point is the trend of the number of prefix updates and withdrawals per BGP routing table entry across 2005. [Fig 15. Relative Prefix Update and Withdrawal Rates per BGP Table Entry These trend lines can then be applied to the BGP projection model, as shown in the next figure. [Fig 16. Prefix Update Rate Projection] The projections of BGP activity from this model indicate a growth rate of some 1.7 million prefix updates per day by the end of 2008 and 2.8 million prefix updates per day by the end of 20010. That' four times the update rate as of the end of 2005. A similar growth trend is forecast for prefix withdrawal rates, to 0.9 million withdrawals per day by the end of 2008 and 1.6 million withdrawal by the end of 2010. This implies a CPU processing load that will increase by a similar factor over this 3 to 5 year period. These projections are summarized in the following table: DateBGP Table Size Daily Prefix Updates Daily Prefix Withdrawal End 2005 176,000 700,000 400,000 End 2008 275,000 1,700,000 900,000 End 2010 370,000 2,800,000 1,600,000 Some Observation Any projection of this nature is ultimately a guess about a potential future here, but irrespective of the precise values in these projections it is evident that there are some accelerating factors within BGP that tend to suggest that the 'load' of BGP, in terms of processing update messages and in terms of processor cycle (update-related processing) is growing faster than the memory requirements and the forwarding decision structure (table size-related aspects). It appears that the combination of finer levels of granularity of routing information in the routing system, denser levels of interconnectivity in the network, greater levels of policy discrimination in the routing system are all combining to create the picture of a system that is increasingly sensitive to perturbation and increasingly difficult to discover and stabilise on a new converged state following each dynamic change. It would appear that these factors of BGP 'load' are growing far faster than the relatively simple metric of number of advertised prefixes in the BGP Routing Table. There is a further multiplicative factor in the load projection that appears to indicate that as the routing system grows, the level of routing overhead grows at a far higher rate. The other significant factor here is one of peak capacity a compared to average capacity in the routing system. BGP appears to be a very chaotic system in terms of burstiness of traffic, and the peak per-second rate of updates within BGP can be some 1,000 time greater than the daily average. The implication here is that the components of the system should be able to handle very short term peak loads rather than extended average loads in order to preserve any reasonable form of convergence in the routing system. In addition, how the routing system could cope with adding additional functionality, such as with additional processing overheads relating to improving the overall security in BGP, or with adding further policy-based functions to direct route propagation remains to be seen. It would appear that if the original question was about the capacity of a routing engine to cope with the anticipated routing load over the coming 3 to 5 years, the basic answer is that very much bigger than what we are using today is very definitely better! ________________________________________ Disclaimer The views expressed are the author’s and not those of APNIC, unless APNIC is specifically identified as the author of the communication. APNIC will not be legally responsible in contract, tort or otherwise for any statement made in this publication. ________________________________________ About the Author GEOFF HUSTON B.Sc., M.Sc., has been closely involved with the development of the Internet for many years, particularly within Australia, where he was responsible for the initial build of the Internet within the Australian academic and research sector. He is author of a number of Internet-related books, and has been active in the Internet Engineering Task Force for many years. www.potaroo.net