TOC 
Network Working GroupE. Levy-Abegnoli
Internet-DraftCisco Systems
Intended status: Standards TrackJune 02, 2009
Expires: December 4, 2009 


Preference Level based Binding Table

Status of this Memo

This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on December 4, 2009.

Copyright Notice

Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document.

Abstract

A trusted database located on the first switch, storing the binding between end-nodes Link-Layer-Addresses (LLA) and their IPv6 addresses would be an essential part of source address validation. To build such a database, one must:

  1. Describe the source of information
  2. How the information is maintained
  3. How the collisions are resolved.

and solutions would differ by one or more of these elements.

While also getting its binding data from NDP, this draft proposes an alternate to "first-come-first-serve" basis [fcfs] (Nordmark, E. and M. Bagnulo, “First-Come First-Serve Source-Address Validation Implementation,” March 2009.)), by specifying a preference algorithm to deal with collisions. Instead of the simplistic first-come first serve collision handling, the proposed algorithm relies on the following criterias to choose between two coliding entries:

Since the state of the entry is one of the element of the algorithm, this draft also describes a tracking mechanism to maintain entries in states where the preference algorithm can enable end-nodes movement.



Table of Contents

1.  Introduction
2.  Goals and assumptions
    2.1.  Definitions and Terminology
    2.2.  Scenarios considered
3.  Source of information
4.  Binding table
    4.1.  Data model
    4.2.  Entry preference algorithm
        4.2.1.  Preference Level
        4.2.2.  Entry update algorithm
        4.2.3.  Enabling slow movement
    4.3.  Binding entry tracking
    4.4.  Binding table state machine
5.  Configuration
    5.1.  Switch port configuration
    5.2.  Binding table configuration
6.  Bridging NDP traffic
    6.1.  Bridging DAD NS
    6.2.  Bridging other NDP messages
7.  Normative References
Appendix A.  Contributors and Acknowledgments
§  Author's Address




 TOC 

1.  Introduction

To populate the first-switch binding table, this document propose a scheme based on NDP snooping, and introduces a preference level algorithm to deal with collisions. It is organized as follows:



 TOC 

2.  Goals and assumptions

The primary goal of the proposed approach is for the layer2 switch to maintain an accurate view of the nodes attached to it, directly or via another layer2 switch. This view is referred to as the switch "binding table". The following goals are also looked at:

The binding table includes the nodes IPv6 address, link-layer address, switch port they were leanrt from, whether an access port or a trunk port (port to another switch).

This binding table is the keystone to detect and arbitrage in case of collisions. It also brings a couple of interesting by-products: it can provide some address spoofing mitigation, and it can be used to limit multicast traffic forwarding.



 TOC 

2.1.  Definitions and Terminology

The following teminology is being used:

plb-switch:
A switch that implement the algorithms described in this draft



 TOC 

2.2.  Scenarios considered

Three main scenarios are considered in this document:

  1. Scenario A: a plb-switch connected to a set of L3-nodes, whether hosts of routers
    
      +------+
      |HostA +-----------------+
      +------+                 |
                               |
                         +-----+------+
      +------+           |            |
      |HostB +-----------+ SWITCHA    +
      +------+           |            |
                         +-----+------+
                               |
      +------+                 |
      |HostC +-----------------+
      +------+
    
    
  2. Scenario B: a plb-switch SWITCH_A connected to L3-nodes and to another plb-switch switch_B
      +------+                                                   +------+
      |HostA +-----------------+                  +--------------+HostD |
      +------+                 |                  |              +------+
                               |                  |
                         +-----+------+     +-----+------+
      +------+           |            |     |            |       +------+
      |HostB +-----------+ SWITCHA    +-----+ SWITCHB    +-------|HostE |
      +------+           |            |     |            |       +------+
                         +-----+------+     +-----+------+
                               |                  |
      +------+                 |                  |              +------+
      |HostC +-----------------+                  +--------------+HostF |
      +------+                                                   +------+
    
  3. Scenario C: a plb-switch switch_A connected to L3-nodes and to a non plb-switch switch_B



 TOC 

3.  Source of information

Basically, there should be the following source of data for filling the table:

Note that the binding information can also be learnt from other protocol sources such as DHCP or even be configured statically on the switch. This is outside the scope of this document to detail how this would be performed. However, binding table entries learnt by non-NDP methods might collide with entries leant via NDP snooping, and section [address collision resolution] describes how to prefer one entry over another one.



 TOC 

4.  Binding table

A table is maintained on the switch(es) that binds layer3 (IPv6) address and link-layer address (MAC).



 TOC 

4.1.  Data model

A record of the binding table should contain the following information:

A global scope address should be unique across ports and vlans. A link-local scope address is unique within a vlan. Therefore, the database is a collection of l3-entries, keyed by ipv6-address and zoneid. A zoneid is a function of the scope of the address (LINK-LOCAL, GLOBAL) and the vlanid:

A collision between an existing entry and a candidate entry would occur if the two entries have the same v6addr and zoneid.These field are referred to as the "key"

The fields of an entry other than the key (port, vlanid, lla, etc) will the referred to as attributes. Changing attributes of an entry require complying with the Entry update algorithm described in Section 4.2 (Entry preference algorithm).



 TOC 

4.2.  Entry preference algorithm



 TOC 

4.2.1.  Preference Level

The preference level (preflevel) is an attribute of an entry in the binding table. It is setup when the entry is learnt, based on where it is learnt from, the credentials associated with it and other criterias to-be-defined. The preflevel is used to arbitrage between two candidate entries (with identical key) in the binding table. The higher the preference level is, the more preferred the entry.

One of the key elements of the preflevel associated to an entry is the port is was learnt from. For example, an entry would have different preflevels if it is learnt from:

Another important element is the credentials associated with this learning. An entry could be associated with cryptographic proof (CGA), and/or the LLA learnt could match the source MAC of the frame from which it was learnt.

The following preflevel values have been identified (from lowest to highest):

An entry can sum up preference values, for instance it could be TRUNK_PORT + LLA_MAC_MATCH. However, the preference level value should be encoded in such a way that the sum of preferences 1 to N-1 is smaller that preference N. For example:



 TOC 

4.2.2.  Entry update algorithm

Once an entry is installed in the binding table, its attributes cannot be changed without complying with this “entry update algorithm”.

The algorithm is as follows, starting with rule_1, up to rule_5, in that order until one rule is satisfied:

  1. Updating of an entry is allowed when the preflevel carried by the change is bigger than the preflevel stored in the entry.
  2. Updating of an entry is denied if the preflevel carried by the change is smaller than the preflevel stored in the entry
  3. Updating of an entry in state INCOMPLETE is denied if the change is not associated with the port where this entry was first learnt from.
  4. Updating of an entry is denied if the preflevel carried by the change is equal to the preflevel stored in the entry, and the entry is in state REACHABLE or VERIFY (see Section 4.4 (Binding table state machine))
  5. Updating an entry is allowed otherwise.



 TOC 

4.2.3.  Enabling slow movement

It is quite a common scenario that an end-node is moving from one port of the switch to another one, or to a different switch. It is also possible that the end-node is updating its hardware and start using a different MAC address. There are two paradoxical goals with the trusted binding table: insure entry ownership and enable movement. The former drives the locking of the address, mac, and port alltogether, and prevent updates other than on the base of preference. It also works a lot better when entry lifetime is very long or infinite. The latter requires that a node can easily move from one port to another one, from one mac to another one. Enforcing address ownership will tend to lead to rejection of any movement and classify it as an attack.

The algorithm described in Section 4.2.2 (Entry update algorithm), conbined with the capability to manage entry states reviewed in Section 4.4 (Binding table state machine) enables end-nodes to move from on switch port to another port (or one mac to another) under three scenarios:

  1. The node disconnect from its original port at least for T1 (T1 is configurable as described in Section 5 (Configuration). and the move does not lead to a less preferred entry
  2. The node disconnect at least for T3 (T3 is also configurable).
  3. The entry seen after the node moves is “prefered”, for instance because the node moved from an ACCESS_PORT to a TRUSTED_PORT.

Note that movement driven bu T1 is tied up to the accuracy of the REACHABILIY state. Maintaining this state with some entry tracking mechanism as described in Section 4.3 (Binding entry tracking) is going to it a lot more efficient.



 TOC 

4.3.  Binding entry tracking

In order to maintain an accurate view of the devices location and device state, which is a key element of the binding table entry preference algorithm, an entry tracking mechanism can be enabled. The tracking of entries is performed on a per-port basis, per IPv6 address basis, by “layer-2 unicasting” DAD NS on the port the address was first learnt from, to the Destination MAC (DMAC) known to be bound to that address.

The DMAC can be learnt from the LLA option carried in some NDP messages, configured statically, or, in last resort, from the source mac (SMAC) address of NDP messages referring to that address. In the case of NDP messages not sourced with UNSPECIFIED address, that would be the source address of the messages. In the case of DAD NS, that would be the target address



 TOC 

4.4.  Binding table state machine

The entry lifecycle is driven by the switch, not by NDP: this is especially important to insure that entries are kept as long as needed in the table rather than following the rule of the ND cache, dictated by other requirements.

Typically, an entry will be created INCOMPLETE, move to REACHABLE when binding is known, move back and forth from REACHABLE to VERIFY if tracking is enabled, at some point move to STALE when the device (the address owner) stop talking on the link. The entry could stay in that state for very long, sometime forever depending on the configuration (see “configuration” section.

Four states are defined:

  1. INCOMPLETE: an entry is set in this state when it does not have the L3/L2 binding yet. This happens when an entry is created without the LLA. Typically, such entry is created when an end-node coming up sends a DAD NS to verify address uniqueness (DAD NS don’t carry SLLA option). Creating an entry in that state still requires an L3 address, found in the target field of the NS DAD, or in the source field for any other message. While the entry is created INCOMPLETE, the switch waits T0 to avoid collision. Then it unicast a DAD NS on the port were the first message was seen, to the SMAC address found in the received frame. In the absence of a response, the DAD NS is retried every T0 up to R0 times. There are two ways to get out of that state
  2. REACHABLE: As soon as the LLA is learnt, the entry moves to REACHABLE and, if tracking is enabled, a timer T1 is started (“configuration” Section 5 (Configuration)). Upon T1 expiration, the entry moves into VERIFY state. If tracking is not enabled, the entry remains T6 at most in that state without any reachability hint (obtain with NDP inspection or other features, before moving to STALE.
  3. VERIFY: In this state, a binding is known (L3/L2) but must be verified. A DAD NS is unicast to the L3/L2 destinations and a timer T2 is started. There are two ways to get out of that state:
  4. STALE: when getting into that state; a timer T3 is started based on the configuration (see “configuration” section). Upon expiry, the entry is deleted.

The binding table state machine is as follows:

      T0                                                  E1
    +------+ send DAD-NS                             +----------+
    |      | increment r0                            |          |
    |      V                                         |          |
+---+--------------+                  +--------------+---+      |
|                  |      E1          |                  |<-----+
|  INCOMPLETE      +----------------->|  REACHABLE       |
|                  |           T1     |                  |
|                  |   /--------------+                  |
+-----+------------+  /               +------+-----------+
      |R0            /                  A    |     A
      |             /                  /     |     |
      V            /                  /      |T1   |E1
    delete        /                  /       |     |
                 V                  /        V     |
+------------------+         E1    /  +------------+-----+
|                  +---------------   |                  |T3
|   VERIFY         |         R2       |   STALE          +---> delete
|                  +----------------->|                  |
|                  |                  |                  |
+---+--------------+                  +------------------+
    |      A
    |      | send DAD-NS
    +------+ increment r2
       T2

The following events are driving the state transitions:

Default values are as follows:

All the default values should be overridden-able by configuration.



 TOC 

5.  Configuration



 TOC 

5.1.  Switch port configuration

Qualifying a port of the switch is of primary importance to influence the “entry update algorithm” (see Section 4.2 (Entry preference algorithm)). The switch configuration should allow the following values to be configured on a per-port basis:



 TOC 

5.2.  Binding table configuration

The following elements, acting on the binding table behavior, should be configurable, globally or on a per-port basis:

  1. T0: (global) frequency at which the switch is unicast DAD NS to obtain an INCOMPLETE entry link-layer address. Default is three seconds. Associated configuration elements are:
  2. T1: (per-port) maximum reachable lifetime is the time an entry is kept in REACHABLE without sign of activity, before transitioning to VERIFY (if “tracking on”) or STALE otherwise. T1 may be set to “infinite”. Default value is 300 seconds.
  3. Tracking on/off: (per-port) when turned on, it enables the tracking of entries in the binding table. Reachability of entries is then tested every T1 by unicasting (at layer2) DAD NS (unless reachability is established indirectly by NDP inspection). Associated configuration elements are:
  4. T3: (per-port= maximum stale lifetime is the time an entry is kept in STALE without sign of activity, before being deleted from the binding table. T3 may be set to “infinite”. Default value is 24 hours.



 TOC 

6.  Bridging NDP traffic

One important aspect of an “NDP-aware” switch is to efficiently bridge the NDP traffic to destinations. In some areas, the switch might have a behavior different from a regular non plb-switch:

  1. When intercepting an NDP message carrying binding information, the switch can lookup its binding table, decide the message is not worth bridging and drop it. This may be the case when a binding entry already exist and is not consistent with the one being received.
  2. When the received message is a DAD NS for a target the switch has a pending (incomplete) entry, received from a different port, the switch may decide to drop it. If it came “second”, in the (small) window during which the switch is attempting to track the entry, it suggest this might be an attack.
  3. When intercepting a multicast NDP message, such as a DAD NS, for which it already has an entry in its binding table, the switch may decide to forward it only to the target owner.
  4. When receiving a DAD NS or other multicast NDP messages, a switch enable for MLD snooping might decide to prevent the bridging of the message on trunk ports to other switches (based on MLD report received on these port). The plb-switch however may decide to force a copy of these messages on these trunks, to insure the other switch is able to populate its own binding table. This behavior should be configurable on a per-port basis.

The general bridging algorithm is as follows. When an NDP message is received by the layer2 switch, the switch extracts the link-layer information, if any. If no LLA is provided, the switch should bridge normally the message to destination. If LLA is provided, the switch can lookup its binding table for this entry. If no entry is found, it creates one, and bridges the message normally. If an entry is found with attributes consistent with the ones received (port, zoneid, etc), it should bridge the message normally. If the attribute are not consistent, and a change is allowed (see Section 4.2 (Entry preference algorithm)), it should update the attributes and bridge the message. If the change is disallowed, it should drop the message.



 TOC 

6.1.  Bridging DAD NS

Bridging DAD NS is critical to both security and binding table distribution. Flows below study some relevant cases.

In scenario A, the switch SWITCH_A has only end-nodes connected to it.

Scenario A:

+--------+          +--------+          +--------+         +--------+
| host 1 +          |SWITCH_A|          |host 2  |         | host 3 |
+--------+          +--------+          +--------+         +--------+
    |                   |                   |                  |
    |                switch up              |                  |
    |                   |    DAD NS tgt=X   |                  |
    |                   |<------------------+                  |
    |                no hit                 |                  |
    |                X stored, pref=ACCESS  |                  |
    |                   |                   |                  |
    |  DAD NS tgt=X conditional forward (1) |                  |
    |<------------------O------------------------------------->|
    |  NA               |                   |                  |
    |------------------>|                   |                  |
    |                 hit, newpref=ACCESS   |                  |
    |                 do not replace        |                  |
    |                 drop                  |                  |
    |                   |                   |                  |
    |                   |   ...             |                  |
    |                   |                   |    DAD NS tgt=X  |
    |                   |<-------------------------------------|
    |                 hit, newpref=ACCESS   |                  |
    |                 forward to owner      |                  |
    |                   |------------------>|                  |
    |                   |                   |                  |
    |   DAD NS tgt=X conditional forward (1)|                  |
    |<------------------|                   |                  |
    |                replace                |                  |
    |  NA               |                   |                  |
    |<------------------|                   |                  |
    |                   |                   |                  |
    |                   |                   |                  |

When nodes come up, the switch is assumed to be already up. As the result of it, since the switch stores entries for all addresses it snoops, it is going to have a fairly accurate view of the nodes (addresses). Host 2 comes up, and sends a DAD NS for target X, intercepted by the switch. Switch_A does not have X in its binding table, stores it (INCOMPLETE), and bridges it to other nodes host1 and host3. If MLD snooping is in effect, the switch might decide not to forward it at all (no other known group listener for the solicited-node multicast group), or only to a few hosts. Regardless of MLD snooping, flow (1) is not absolutely "useful" and could even be harmful. If we assume the switch knows all addresses of the link/vlan, then it knows nobody owns yet this address. In that case, sending it to other hosts would be an invite for an attack. There is a tradeoff between two issues which are not equally probable: a risk to break DAD and a risk to be vulnerable to a DoS on address resolution.

The latter is well understood: should the switch broadcast DAD NS, an attacker can immediately claim ownership with an NA. As far as the former, it would happen if following conditions are met:

  1. The initial DAD NS for X, and any subsequent NDP packets (NA to all-nodes, etc) were missed by the switch.
  2. In addition:

In scenario B, SWICTH_A is also connected to a second switch SWITCH_B, which runs the same logic to populate its own binding table.

Scenario B:

+--------+          +--------+          +--------+         +--------+
| host 1 +          |SWITCH_A|          |SWITCH_B|         | host 2 |
+--------+          +--------+          +--------+         +--------+
    |                   |                switch up             |
    |                   |                   |    DAD NS tgt=X  |
    |                   |                   |<-----------------|
    |                   |             No hit, no trunk up      |
    |               switch up         X stored in Bt, pref= ACCESS
    |                   |                   |                  |
    |  DAD NS tgt=X     |                   |                  |
    |------------------>|                   |                  |
    |               no hit                  |                  |
    |               X stored, pref=ACCESS   |                  |
    |               forward on trunk (2)    |                  |
    |                   |------------------>|                  |
    |                   |                 hit (host2)          |
    |                   |                   | forward to owner |
    |                   |                   |----------------->|
    |                   |                   |    NA            |
    |                   |                   |<-----------------|
    |                   |                  hit, owner          |
    |                   |     NA           forward on trunk    |
    |                   |<------------------|                  |
    |                hit, newpref=TRUSTED_TRUNK                |
    |                replace                |                  |
    |  NA               |                   |                  |
    |<------------------|                   |                  |
    |                   |                   |                  |
    |                   |                   |                  |

When SWITCH_A comes up, it may come after SWITCH_B. In this case, it is unaware about end-nodes attached to SWITCH_B. SWITCH_B however knows all of them, with the same assumptions as in scenario A. Upon receiving a DAD NS for target X, and in the absence of a hit, SWITCH_A creates an INCOMPLETE entry and forwards it to SWITCH_B.

  1. If SWITCH_B has it in its table, then it can forward it only on the interface of X's owner (host2). Host2 responds, and response reaches SWITCH_A. SWITCH_A has already an entry for X associate with interface to host1, while this one is received from the trunk. The trunk is a TRUSTED_TRUNK, hence entries received over it are preferred. SWITCH_A updates its binding table and propagate to host1. This is the case of a valid address duplication.
  2. If SWITCH_B receiving the DAD NS over the trunk, does not have X in its table, it can drop the NS, while creating an INCOMPLETE entry for X. Or it can broadcast locally (with the same reasoning as for the previous scenario).

Scenario C connects SWITCH_A to a SWITCH_B that does not run the same binding table alrorigthm (referred to as a non plb-switch). In this scenario, SWITCH_A forwarding on the trunk a DAD NS for target X. Configuration should tell whether any response coming from SWITCH_B is to be trusted (in the lack of better credential such as CGA/RSA proof). If SWITCH_B is fully trusted, then the trunk is configured as "TRUSTED_TRUNK" and scenario B applies. Otherwise, the trunk is configured as "TRUNK" and response is ignored.

Scenario C:

+--------+          +--------+          +--------+         +--------+
| host 1 +          |SWITCH_A|          |SWITCH_B|         | host 2 |
+--------+          +--------+          +--------+         +--------+
    |                   |                switch up             |
    |                   |                   |   DAD NS tgt=X   |
    |                   |                   |<-----------------|
    |                   |                   |                  |
    |               switch up               |                  |
    |                   |                   |                  |
    |  DAD NS tgt=X     |                   |                  |
    |------------------>|                   |                  |
    |               no hit                  |                  |
    |               X stored, pref=ACCESS   |                  |
    |                   |------------------>|                  |
    |                   |                   |   to group       |
    |                   |                   |----------------->|
    |                   |                   |   NA             |
    |                   |                   |<-----------------|
    |                   |     NA            |                  |
    |                   |<------------------|                  |
    |                hit, newpref=TRUNK     |                  |
    |                do not replace         |                  |
    |                drop NA                |                  |
    |                   |                   |                  |
    |                   |                   |                  |
    |                   |                   |                  |



 TOC 

6.2.  Bridging other NDP messages

When running the proposed binding table populate algorithm, switches are expected to have an accurate view of end-nodes attached to them. While scenario C is problematic, scenario A and B are clearer. If a switch has an entry in its table that conflicts with binding observed in an NDP message just received, it should drop the message (if new data has a smaller preflevel) or update its entry and bridge the message.

If the switch does not have such entry, it should create the entry and bridge the message, including to trunks.

In the case of multicast messages, it should bridge it on trunks regardless of group registration, to give a chance to other switch to buildup a more accurate binding table.



 TOC 

7. Normative References

[RFC3971] Arkko, J., Kempf, J., Zill, B., and P. Nikander, “SEcure Neighbor Discovery (SEND),” RFC 3971, March 2005 (TXT).
[RFC3972] Aura, T., “Cryptographically Generated Addresses (CGA),” RFC 3972, March 2005 (TXT).
[RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, “Neighbor Discovery for IP version 6 (IPv6),” RFC 4861, September 2007 (TXT).
[RFC4862] Thomson, S., Narten, T., and T. Jinmei, “IPv6 Stateless Address Autoconfiguration,” RFC 4862, September 2007 (TXT).
[fcfs] Nordmark, E. and M. Bagnulo, “First-Come First-Serve Source-Address Validation Implementation,” draft-ietf-savi-fcfs-01 I-D, March 2009.


 TOC 

Appendix A.  Contributors and Acknowledgments

This draft benefited from the input from: Pascal Thubert.



 TOC 

Author's Address

  Eric Levy-Abegnoli
  Cisco Systems
  Village d'Entreprises Green Side - 400, Avenue Roumanille
  Biot-Sophia Antipolis - 06410
  France
Email:  elevyabe@cisco.com