Internet Engineering Task Force Christian E. Hopps
INTERNET-DRAFT Merit Network
Expires June 1999 5 January 1999
Analysis of an ECMP Algorithm
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as "work in progress."
To view the entire list of current Internet-Drafts, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern
Europe), ftp.nic.it (Southern Europe), munnari.oz.au (Pacific Rim),
ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast).
Abstract
Equal cost multipath (ECMP) is a routing technique for routing
packets along multiple paths of equal cost. The forwarding engine
identifies paths by next-hop. When forwarding a packet the router
must decide which next hop (path) to use. This document gives an
analysis of one popular method for making that decision. The
analysis includes the performance of the algorithm and the disruption
caused by changes to the set of next-hops.
Expires June 1999 [Page 1]
Draft Analysis of an ECMP Algorithm January 1999
1. Hash-Threshold
One popular method for determining which next-hop to use when routing
with ECMP can be called hash-threshold. The router first selects a
key by performing a hash (e.g., modulo-K where K is large, or CRC16)
over the packet header fields that identify a flow. The N next-hops
have been assigned unique regions in the key space. The router uses
the key to determine which region and thus which next-hop to use.
This method is used in [1].
As an example of hash-threshold, upon receiving a packet the router
performs a CRC16 on the packet's header fields that define the flow
(e.g., the source and destination fields of the packet), this is the
key. Say for this destination there are 4 next-hops to choose from.
Each next-hop is assigned a region in 16 bit space (the key space).
For equal usage the router may have chosen to divide it up evenly so
each region is 65536/4 or 16k large. The next-hop is chosen by
determining which region contains the key (i.e., the CRC result).
2. Analysis
There are a few concerns when choosing an algorithm for deciding
which next hop to use. One is performance, the computational
requirements to run the algorithm. Another is disruption (i.e., the
changing of which path a flow uses). Balancing is a third concern;
however since the algorithm's balancing characteristics are directly
related to the chosen hash function this analysis does not treat this
concern in depth.
For this analysis we will assume regions of equal size. If the hash
function is uniformly distributed the distribution of flows amongst
paths will also be uniform.
2.1. Performance
The performance of the hash-threshold algorithm can be broken down
into three parts: selection of regions for the next-hops, obtaining
the key and comparing the key to the regions to decide which next-hop
to use.
Since regions are restricted to be of equal size the calculation of
region boundaries is trivial. The boundaries can be calculated as
follows:
Expires June 1999 [Page 2]
Draft Analysis of an ECMP Algorithm January 1999
i = 0;
regionssize = keyspace.size / #{ next-hops }
for n in { next-hops }
n.start = i * regionsize;
n.end = n.start + regionsize;
i = i + 1
done
This calculation is O(N). Furthermore the calculation can be done
when the next-hops are added to or removed from the destination
route.
The algorithm doesn't specify the hash function used to obtain the
key. Its performance in this area will be exactly the performance of
the hash function. It is presumed that if this calculation proves to
be a concern it can be done in hardware parallel to other operations
that need to complete before deciding which next-hop to use.
Finally the next-hop must be chosen. This is done by determining
which region contains the key. The time required to do this is
dependent on the way the next-hops are organized in memory. The
problem reduces to a search. For example if the next-hops are stored
as a linked list the time is O(N) as the router must traverse the
list comparing each next-hop's region boundaries against the key. If
the next-hops are stored as an ordered array a binary search can be
used yielding O(logN).
As [1] notes if the number of next-hops is limited to a fixed maximum
the comparison can be done in parallel in hardware, thus O(1).
2.2. Disruption
Protocols such as TCP perform better if the path they flow along does
not change while the stream is connected. Disruption is the
measurement of how many flows have their paths changed due to some
change in the router. We measure disruption as the fraction of total
flows whose path changes in response to some change in the router.
Some algorithms such as round-robin (i.e., upon receiving a packet
the least recently used next-hop is chosen) are disruptive regardless
of any change in the router. Clearly this is not the case with hash-
threshold. As long as the region boundaries remain unchanged the
same next-hop will be chosen for a given flow.
Expires June 1999 [Page 3]
Draft Analysis of an ECMP Algorithm January 1999
Because we have required regions to be equal in size the only reason
for a change in region boundaries is the addition or removal of a
next-hop. In this case the regions must all grow or shrink to fill
the key space. The analysis begins with some examples of this.
0123456701234567012345670123456701234567
+-------+-------+-------+-------+-------+
| 1 | 2 | 3 | 4 | 5 |
+-------+-+-----+---+---+-----+-+-------+
| 1 | 2 | 4 | 5 |
+---------+---------+---------+---------+
0123456789012345678901234567890123456789
Figure 1. Before and after deletion of region 3
In figure 1. region 3 has been deleted. The remaining regions grow
equally and shift to compensate. In this case 1/4 of region 2 is now
in region 1, 1/2 (2/4) of region 3 is in region 2, 1/2 of region 3 is
in region 4 and 1/4 of region 4 is in region 5. Since each of the
original regions represent 1/5 of the flows, the total disruption is
1/5*(1/4 + 1/2 + 1/2 + 1/4) or 3/10.
Note that the disruption to flows when adding a region is equivalent
to that of removing a region. That is, we are considering the
fraction of total flows that changes regions when moving from N to
N-1 regions, and that same fraction of flows will change when moving
from N-1 to N regions.
0123456701234567012345670123456701234567
+-------+-------+-------+-------+-------+
| 1 | 2 | 3 | 4 | 5 |
+-------+-+-----+---+---+-----+-+-------+
| 1 | 2 | 3 | 5 |
+---------+---------+---------+---------+
0123456789012345678901234567890123456789
Figure 2. Before and after deletion of region 4
In figure 2. region 4 has been deleted. Again the remaining regions
grow equally and shift to compensate. 1/4 of region 2 is now in
region 1, 1/2 of region 3 is in region 2, 3/4 of region 4 is in
Expires June 1999 [Page 4]
Draft Analysis of an ECMP Algorithm January 1999
region 3 and 1/4 of region 4 is in region 5. Since each of the
original regions represent 1/5 of the flows the, total disruption is
7/20.
To generalize, upon removing a region K the remaining N-1 regions
grow to fill the 1/N space. This growth is evenly divided between
the N-1 regions and so the change in size for each region is
1/N/(N-1) or 1/(N(N-1)). This change in size causes non-end regions
to move. The first region grows and so the second region is shifted
towards K by the change in size of the first region. 1/(N(N-1)) of
the flows from region 2 are subsumed by the change in region 1's
size. 2/(N(N-1)) of the flows in region 3 are subsumed by region 2.
This is because region 2 has shifted by 1/(N(N-1)) and grown by
1/(N(N-1)). This continues from both ends until you reach the
regions that bordered K. The calculation for the number of flows
subsumed from the Kth region into the bordering regions accounts for
the removal of the Kth region. Thus we have the following equation.
K-1 N
--- i --- (i-K)
disruption = \ --- + \ ---
/ (N)(N-1) / (N)(N-1)
--- ---
i=1 i=K+1
We can factor 1/((N)(N-1)) out as it is constant.
/ K-1 N \
1 | --- --- |
= --- | \ i + \ (i-K) |
(N)(N-1) | / / |
\ --- --- /
1 i=K+1
We now use the the concrete formulas for the sum of integers. The
first summation is (K)(K-1)/2. For the second summation notice that
we are summing the integers from 1 to N-K, thus it is (N-K)(N-K+1)/2.
Expires June 1999 [Page 5]
Draft Analysis of an ECMP Algorithm January 1999
(K-1)(K) + (N-K)(N-K+1)
= -----------------------
2(N)(N-1)
Considering the summations, one can see that the least disruption is
when K is as close to half way between 1 and N as possible. This can
be proven by finding the minimum of the concrete formula for K
holding N constant. First break apart the quantities and collect.
2K*K - 2K - 2NK + N*N + N
= -------------------------
2(N)(N-1)
K*K - K - NK N + 1
= -------------- + -------
(N)(N-1) 2(N-1)
Since we are minimizing for K the right side (N+1)/2(N-1) is constant
as is the denominator (N)(N-1) so we can drop them. To minimize we
take the derivative.
d
-- (K*K - (N+1)K)
dk
= 2K - (N+1)
Which is zero when K is (N+1)/2.
The last thing to consider is that K must be an integer. When N is
odd (N+1)/2 will yield an integer, however when N is even (N+1)/2
yields an integer + 1/2. In the case, because of symmetry, we get
the least disruption when K is N/2 or N/2 + 1.
Expires June 1999 [Page 6]
Draft Analysis of an ECMP Algorithm January 1999
Now since the formula is quadratic with a global minimum half way
between 1 and N the maximum possible disruption must occur when edge
regions (1 and N) are removed. If K is 1 or N the formula reduces to
1/2.
Thus to minimize disruption we recommend adding new regions to the
center rather than the ends.
3. Comparison to other algorithms
Other algorithms exist to decide which next-hop to use. These
algorithms all have different performance and disruptive
characteristics. Of these algorithms we will only consider ones that
are not disruptive by design (i.e., if no change to the set of next-
hops occurs the path a flow takes remains the same). This will
exclude round-robin and random choice. We will look at modulo-N and
highest random weight.
Modulo-N is a simpler form of hash-threshold. Given N next-hops the
hash function includes a final modulo-N which directly maps the
result to one of the next-hops. This operation is the fastest of the
three we consider, however if a next-hop is added or removed the
disruption is (N-1)/N.
Highest random weight (HRW) is another comparative method similar to
hash-threshold. The router seeds a pseudo-random number generator
with the packet header fields which describe the flow and the next-
hop to obtain a weight. The next-hop which receives the highest
weight is selected. The advantage with using HRW is that it has
minimal disruption (i.e., disruption due to adding or removing a
next-hop is always 1/N.) The disadvantage with HRW is an only
slightly more complex function to choose the next-hop. A description
of HRW along with comparisons to other methods can be found in [2].
Although not used for next-hop calculation an example usage of HRW
can be found in [3].
If the complexity of HRW's next-hop selection processes is acceptable
we think it should be considered as an alternative to hash-threshold.
4. Security Considerations
This document is an analysis of an algorithm used to implement an
ECMP routing decision. This analysis does not directly effect the
Expires June 1999 [Page 7]
Draft Analysis of an ECMP Algorithm January 1999
security of the Internet Infrastructure.
5. References
[1] Villamizar, C., "OSPF Optimized Multipath (OSPF-OMP)", draft-ietf-
ospf-omp-01.txt, October, 1998.
[2] Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to
Increase Hit Rates", IEEE/ACM Transactions on Networking, February
1998.
[3] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S.,
Handley, M., Jacobson, V., Liu, C., Sharma, P., and L. Wei,
"Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol
Specification", RFC 2362, June 1998.
6. Author's Address
Christian E. Hopps
Merit Network
4251 Plymouth Road, Suite C.
Ann Arbor, MI 48105
Phone: +1 734 936 0291
EMail: chopps@merit.edu
Expires June 1999 [Page 8]