KARP W. Atwood Internet-Draft R. Bangalore Somanatha Intended status: Standards Track Concordia University/CSE Expires: January 31, 2013 July 30, 2012 Automatic Key and Adjacency Management for Routing Protocols draft-atwood-karp-akam-rp-02 Abstract When tightening the security of the core routing infrastructure, two steps are necessary. The first is to secure the routing protocols' packets on the wire. The second is to ensure that the keying material for the routing protocol exchanges is distributed only to the appropriate routers. This document specifies requirements on that distribution and proposes the use of a set of protocols to achieve those requirements. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 31, 2013. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 1] Internet-Draft KARP AKAM-RP July 2012 the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 2. Keying Groups (Key Scopes) . . . . . . . . . . . . . . . . . . 4 2.1. Keying Groups . . . . . . . . . . . . . . . . . . . . . . 4 2.2. Key Scopes . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 5 3.1. Security Goals . . . . . . . . . . . . . . . . . . . . . . 6 3.2. Non-security Goals . . . . . . . . . . . . . . . . . . . . 6 4. High Level Design . . . . . . . . . . . . . . . . . . . . . . 6 4.1. Global View . . . . . . . . . . . . . . . . . . . . . . . 7 4.2. Entities in the system . . . . . . . . . . . . . . . . . . 7 4.3. Protocol Operations . . . . . . . . . . . . . . . . . . . 9 5. Detailed Design . . . . . . . . . . . . . . . . . . . . . . . 10 5.1. System Design . . . . . . . . . . . . . . . . . . . . . . 10 5.1.1. Communication among the Entities . . . . . . . . . . . 10 5.1.2. Inner View of a GM . . . . . . . . . . . . . . . . . . 12 5.1.3. Hierarchical Design . . . . . . . . . . . . . . . . . 13 5.2. Protocol Design . . . . . . . . . . . . . . . . . . . . . 13 5.2.1. Step 1 - Initial Exchanges: GCKS, GM mutual authentication . . . . . . . . . . . . . . . . . . . . 14 5.2.2. Step 2 - Key Management Message Exchanges between GCKS, GM . . . . . . . . . . . . . . . . . . . . . . . 15 5.2.3. Step 3 - GM-GM mutual authentication . . . . . . . . . 18 5.2.4. Step 4 - Key Management Message Exchanges between GMs . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.2.5. Variations for handling other Keying Groups . . . . . 21 6. Other Aspects of the Key Management Problem . . . . . . . . . 23 6.1. Key Updates . . . . . . . . . . . . . . . . . . . . . . . 23 6.2. Regular Key Updates . . . . . . . . . . . . . . . . . . . 25 6.2.1. Same key for the entire AD . . . . . . . . . . . . . . 25 6.2.2. Key per link . . . . . . . . . . . . . . . . . . . . . 25 6.2.3. Key per sending router . . . . . . . . . . . . . . . . 26 6.2.4. Key per sending router per interface . . . . . . . . . 26 6.2.5. Key per peer . . . . . . . . . . . . . . . . . . . . . 26 6.3. Router Installation/ Uninstallation . . . . . . . . . . . 26 6.3.1. Same key for the entire AD . . . . . . . . . . . . . . 27 6.3.2. Key per link . . . . . . . . . . . . . . . . . . . . . 27 6.3.3. Key per sending router . . . . . . . . . . . . . . . . 28 6.3.4. Key per sending router per interface . . . . . . . . . 28 6.3.5. Key per peer . . . . . . . . . . . . . . . . . . . . . 28 6.4. Router Reboots . . . . . . . . . . . . . . . . . . . . . . 28 6.5. Scalability . . . . . . . . . . . . . . . . . . . . . . . 31 Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 2] Internet-Draft KARP AKAM-RP July 2012 6.6. Option to Turn Off Adjacency Management . . . . . . . . . 32 6.7. Incremental Deployment . . . . . . . . . . . . . . . . . . 33 6.8. Smooth Key Rollover . . . . . . . . . . . . . . . . . . . 33 6.9. Eliminating Single Point of Failure . . . . . . . . . . . 34 7. Detailed Packet Formats . . . . . . . . . . . . . . . . . . . 34 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 34 10. Change History (RFC Editor: Delete Before Publishing) . . . . 34 11. Needs Work in Next Draft (RFC Editor: Delete Before Publishing) . . . . . . . . . . . . . . . . . . . . . . . . . 35 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 35 12.1. Normative References . . . . . . . . . . . . . . . . . . . 35 12.2. Informative References . . . . . . . . . . . . . . . . . . 35 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 37 Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 3] Internet-Draft KARP AKAM-RP July 2012 1. Introduction Within the Keying and Authentication for Routing Protocols working group, there are several goals: o Determining how to update the security of existing routing protocols, and guiding this work; o Development of automated mechanisms for management of the keying material. Within the second goal, there is at this time considerable activity on protocols and procedures for creating shared keys, under the assumption that the end points of the exchanges (the routers) are entitled to enter into the conversation, i.e., that they can prove that they are who they say they are. However, there appears to be no work on ensuring that the end points are entitled to be neighbors. This document addresses this issue. In particular, it addresses the need to ensure that keying material is distributed only to routers that legitimately form part of the "neighbor set" of a particular speaking router. 1.1. Terminology Autonomous System ... Administrative Domain ... Traffic Encryption Key (TEK) ... 2. Keying Groups (Key Scopes) 2.1. Keying Groups In an AD, all routers having the same TEK can be referred to as forming a 'keying group'. We can have routers forming a 'keying group' as follows: A group per AD - This is the most coarsely grained category of keying group where all routers in an AD share the same traffic key. Hence the incoming and outgoing keys for protecting control traffic on all routers are the same. This is the case typically in usage today with manual keying. Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 4] Internet-Draft KARP AKAM-RP July 2012 A group per link - Here, all routers sharing a link share the key for that link. The routers could have different keys on their different interfaces, and share them with the other routers connected to those respective links. A group per sending router - This category is more finely grained compared to the previous two cases; each router uses a different key to secure its outgoing control traffic. A group per sending router per interface - This is the most finely grained category wherein each router has a different key for each of its interfaces, which in turn is different from the keys used by other routers to secure their outgoing traffic. A group per peer router - This category is strictly for unicast communication wherein peer routers share keys for their interaction. There is one outgoing key corresponding to each router in every pair of routers. These keys can be established through a unicast key management protocol such as IKE [RFC2409] or IKEv2 [RFC5996]. 2.2. Key Scopes Alternatively, keying groups can be viewed from another perspective. Instead of looking at the granularity of keying from the point of view of the members, we can look at it from the point of view of the keys. This can be referred to as 'key scope'. The key scopes corresponding to the above categories of keying groups in the same order could be defined as follows: Same key for the entire AD - all routers in the domain share the same key. Key per link - all routers on a link share the same key. Key per sending router - each router has a different key to secure its outgoing control traffic. Key per sending router per interface - each router uses different keys for each of its interfaces, which in turn are different from the keys used by the other routers for securing their outgoing traffic. Key per peer router - there exist two keys corresponding to every pair of routers. 3. Problem Statement The overall aim of this document is to specify an overall system for automated key management, which will eliminate the disadvantages of the manual kethod of key updating. The basic function of this automated system is secure generation of keys and their distribution. The system should also enable key updates at regular intervals so as Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 5] Internet-Draft KARP AKAM-RP July 2012 to protect against both active intruders and passive intruders who could be eavesdropping the traffic after having gained access to the keys secretly. Along with these basic goals, a key management system should satisfy an additional set of requirements. These requirements ensure among other things, security, easy deployment, robustness and scalability. We have compiled this set after referring to the KARP Design Guide [RFC6518], the KARP Threats and Requirements Guide [I-D.ietf-karp-threats-reqs] and the PIM-SM "security on the wire" specification [RFC5796]. 3.1. Security Goals 1. Peer authentication for unicast and authentication of all members of the group for multicast protocols. 2. Message authentication, which includes data origin authentication and message integrity. 3. Protection of the system from replay attacks. 4. Peer liveness. 5. Secrecy of key management messages. 6. Authorization to ensure that only authorized routers get the keys. 7. Adjacency management, which implies ensuring the legitimacy of neighbor relationships of each router. Also providing an option to turn off adjacency management if required. 8. Ensuring Perfect Forward Security (PFS) and Perfect Backward Security (PBS). 9. Resistance to man-in-the-middle attacks. 10. Resistance to DoS attacks. 11. Usage of strong keys; those that are unpredictable and are of sufficient length. 3.2. Non-security Goals 1. Ability to handle various categories of keying groups depending on the security level required. 2. Possibility for easy and incremental deployment. 3. Smooth key rollover. 4. Robustness across router reboots. 5. Scalable design. 6. Single key management architecture accommodating both unicast and multicast systems. 4. High Level Design In this section, we propose an architecture for an automated key Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 6] Internet-Draft KARP AKAM-RP July 2012 management and adjacency management system. In order to build this framework, we have reused parts of some existing proposals and fitted them into their correct places in the overall architecture. We have then extended/ modified them so as to handle the key management issues that they appear to have overlooked. Our design deals with securing the control traffic of routers within an AD. 4.1. Global View The main entities in our system are the following: 1. Administrator 2. Policy Server 3. GCKS 4. Standby GCKS 5. GMs These entities and their functions are explained in the next section. 4.2. Entities in the system The entities are based on those in GSAKMP. The difference is that the Group Owner in GSAKMP has been replaced by a Policy Server, and the Subordinate GC/KS has been replaced by a Standby GCKS in our design. We have chosen the term 'Policy Server' in order to be consistent with RFC 3740 [RFC3740], and the term 'Standby GCKS' since it is not a subordinate in our design and is a standby that is capable of performing all operations performed by the active GCKS. Our design conforms to the Multicast Group Security Architecture [RFC3740]. The network administrator makes configurations for the Policy Server and the GCKS. Security policies go to the policy server, and configurations related to the AD go to the GCKS. Policy Server is the entity that manages security policies for the AD. The behavior of the policy server we describe here draws contents from and is very similar to the 'Group Owner' in GSAKMP. The security policies include general policies such as authorization details for the GCKS, access control for the GMs, rekey intervals, as well as other specific policies that may be necessary for the group. These policies are put together into a 'Policy Token' [RFC4535] and sent to the GCKS. The GCKS is either a router or a server chosen by the administrator as the group controller. It is the entity whose major function is Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 7] Internet-Draft KARP AKAM-RP July 2012 key management and adjacency management. The GCKS should also ensure that the security policies in the policy token are enforced. This implies that whenever a GM requests keys from the GCKS, the GCKS should enforce access control for the GM according to the terms specified in the policy token. The administrator configures the GCKS with information such as the type of keying group to be enforced for the AD and the adjacencies for each router in the AD corresponding to a particular routing protocol (or a set of similar routing protocols). This last point is due to our proposal that there could be one instance of a GCKS per routing protocol or a set of similar routing protocols. This is in fact necessary because GCKS is the entity that should ensure adjacency management, and adjacencies may be defined differently for different routing protocols. Also, according to [I-D.ietf-karp-ops-model] , "KARP must not permit configuration of an inappropriate key scope". This means that each routing protocol could have a different requirement of key scope and that needs to be satisfied. The GCKS may also generate, distribute and update keys, depending on the type of keying group to be enforced in the AD. The standby GCKS is an entity that is always kept in sync with the active GCKS, ready to take over at any time should the active fail. This design eliminates the possibility of a single point of failure in a centralized system. GMs are the group member routers that communicate with each other as well as with the GCKS. When they request keys from the GCKS, they are given the keys along with the policy token. GMs are required to check the rules specified in the policy token to determine if the GCKS is authorized to act in that role. Each GM has a Local Key Server (LKS) [atwo2009:AKM]. It is a key generation and storage entity within the GM. A GM may sometimes be required to generate keys itself depending on the category of keying group being enforced. This kind of design ensures that the architecture is distributed in the sense that key management responsibility is divided between the GCKS and the LKSes. From the description above, it can be seen that the architecture we propose is a balance between a completely centralized model and a completely distributed one, developed by picking the plus points of both types. It defines the concept of a GCKS, which is a centralized entity, as well as the concept of a LKS, which is distributed as being one entity per router. The design tries to bring in the advantages of both models. A centralized entity is considered necessary mainly to make adjacency management possible. In the absence of a central controller that has information about the adjacencies of each router in the AD, individual routers will not be able to establish the legitimacy of their neighbors. Adjacency Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 8] Internet-Draft KARP AKAM-RP July 2012 management is especially important since we are dealing with control packets, which are usually exchanged with immediate neighbors. At the same time, loading the centralized entity with multiple responsibilities may lead to its failure. Hence we have a localized entity that can take up some of the functions of the central controller as and when the need arises. This enhances scalability, which is so important in a key management system. Another factor leading to scalability is the presence of the standby GCKS. A centralized system could have the disadvantage of having a single point of failure. Our design tries to eliminate this by defining a standby for the central controller that is always kept in sync with it, ready to take over at any time. 4.3. Protocol Operations The operations of key management and adjacency management occur at two different levels. To ensure scalability of the system, as many operations as possible need to take place among adjacent routers. However, to ensure overall control, policies nees to be set centrally for the entire AD. We recognize two types of groups, which represent the two levels of operation: o a group consisting of the GCKS and all the routers (called group members or GMs); o many small groups, each consisting of a set of adjacent routers. The overall operation proceeds in four steps: 1. Establishment of a secure path between each GM and the GCKS. 2. Exchange of policy information between each GM and the GCKS. This policy information defines the key management approach and parameters and the adjacency management approach and parameters. 3. Establishment of a secure path between pairs of adjacent GMs, where the legitimacy of the adjacency was established in step 2; 4. (if required) Exchange or generation of the shared key (and other security parameters) that will be used to protect the routing protocol packets. If the key scope corresponds to "same key for the entire AD", then the key management policy in step 2 could be "use this key", where "this key" is the same for all GMs, and is sent as a parameter along with the policy. In this case, the key generation in step 4 is not necessary. If the key scope corresponds to "key per link", the the key may be mutually determined by the routers on that link, or a "local" GCKS Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 9] Internet-Draft KARP AKAM-RP July 2012 may be elected and assume the task of generating the key, which will then be distributed on the secure paths established in step 3. If the key scope corresponds to "key per sending router" or "key per sending router per interface", then the sending router assumes the responsibility for generating and distributing the key(s) that it will use to send its routing protocol traffic. In the first case, each router maintains (n+1) keys, one for each neighbor, for incoming traffic from that neighbor, and one key for outgoing traffic. In the second case, each router maintains (n+k) keys, where "k" is the number of interfaces. Similarly, if the key scope coresponds to "same key for the entire AD", then the adjacency management policy is probably "accept any router that claims to be your neighbor" or "accept any router that presents a valid router identification string". For other key scopes, the authentication part of step 3 will have to confirm that a match exists between what is presented by the neighbor router and what is specified in the adjacency management policy information. If IPsec is to be used to protect the routing protocol packets, negotiation of the Security Parameter Index (SPI) to be used will be done as part of step 4. This has to be mutually negotiated among the users of a particular key, because it cannot be arbitrarily set by any particular member of the group of adjacent routers. (This is in contrast with a two-party Security Association, where the SPI can be safely set by the (single) receiver of the incoming packets.) However, in the case where a single key is being used for the entire AD, the SPI may be dictated by the GCKS 5. Detailed Design This section provides a detailed description of the automated key and adjacency management system. This is followed by the details of the communication among the various entities of the system. 5.1. System Design This section provides a detailed description of the architecture, showing also the communication among the different entities. 5.1.1. Communication among the Entities Figure 1 gives a closer view of the entities in our design as described previously and shows the interactions among them. Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 10] Internet-Draft KARP AKAM-RP July 2012 ----------------- | Policy Server | ----------------- ^ | Security / | Policies / | / | / | ----------------- | | Administrator | | Security ----------------- | Policies \ | Config- \ | urations \ | \ | v v ----------------- ---------------- | GCKS (Group | Synchronization | Standby GCKS | | Controller |<--------------->| | | Key Server | | | ----------------- ---------------- | | Step 1 | | Step 1 followed by | | followed by Step 2 | | Step 2 ----------- ---------------- | | | | --------------- --------------- | GM 1 (Group | | GM 2 | | Member) | Step 3 | | | | followed by | | | also hosts | Step 4 | also hosts | | an LKS |<---------------->| an LKS | | (Local Key | | | | Server) | | | --------------- --------------- Figure 1: Communication between the entities Basically there is a centralized GCKS in the system and localized LKS, local to each GM router. The GCKS and the LKS have the ability to generate SA parameters through a KMP, and to store them in a key store. The different scenarios to be considered and the steps of communication are described in this section and the next. Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 11] Internet-Draft KARP AKAM-RP July 2012 5.1.2. Inner View of a GM Figure 2 shows an inner view of a GM with interactions among the KMP, a routing protocol and the LKS. ----------------------- | KMP (Key Management | | Protocol) | ----------------------- ^ | \ - SA parameters related to TEK - request for an | | -------- (Traffic Encryption Key) initial key | | \ - request to change | | v the key (if | | -------------------------- required) | | | LKS (Local Key Server) | | | | | | | | -------------- | | | | | Key Store | | | | - notification | -------------- | | | of new keys -------------------------- | | / | | / - SA parameters related | | ----------- to TEK | | / | v v ------------------- | RP (Routing | | Protocol) | ------------------- Figure 2: Inside view of a GM Initially the routing protocol requests keys from the KMP to secure its control traffic. This starts the communication between the GM and the GCKS through the KMP, as shown by the numbered steps in Figure 1. The key generation policy specified by the GCKS is transferred to the GM. Then the keys are generated by the LKS of the GM, and stored into a key store hosted by the LKS. The KMP notifies the routing protocol that new keys are available for its use as shown in Figure 2. The routing protocol then retrieves the keys from the key store. For some categories of keying groups, the LKS is given the keys directly by the GCKS. For others, it may negotiate the keys with its neighbors. These cases are explored in detail in the sections that follow. The proposed KMP runs between the GCKS and the GMs, and among the GMs themselves. The KMP messages need to be protected, and this can be achieved by running a protocol prior to it to derive keys to protect Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 12] Internet-Draft KARP AKAM-RP July 2012 it. This is similar to the manner in which GDOI messages are protected by keys generated by a phase 1 protocol such as IKE. 5.1.3. Hierarchical Design The design we propose is a hierarchical one. There are two kinds of groups that can be formed here (not to be confused with keying groups). The first kind is the one formed by the GCKS with each GM in the AD. The second kind is the one formed among the GMs. The design can be seen as comprised of 5 main steps. The steps together help ensure key and adjacency management in a secure manner. Step 1 - Mutual authentication between the GCKS and each GM in the AD. Step 2 - Communication between the GCKS and each GM in the AD for secure distribution of policies and keys. Step 3 - Inter-GM authentication. Step 4 - Communication among the GMs themselves for key distribution. Step 5 - The actual transfer of routing protocol control packets using the keys derived through the previous four steps. Each step is dependant on the previous ones leading to a hierarchy and ensuring modularity of design. Our design concentrates on steps 1 through 4 in order to enable a secure step 5. The details of each of these steps are explained in the next section. 5.2. Protocol Design In this section, we give a detailed description of our proposal for a protocol that serves as a solution to the key management problem outlined in Section 3. To summarise, the intention is to develop a protocol for an automated key management system such that all the requirements listed in Section 3 are satisfied. We have seen the set of entities in the proposed design in Section 4. Now we shall see the exact messages exchanged among them so that the keys required for securing routing protocol control traffic can be generated and distributed to the appropriate routers. Initially the administrator configures security rules on the Policy Server, and configuration parameters on the GCKS. The security rules have among other things, access control rules related to GMs, and authorization rules related to the GCKS. The configuration parameters include among other things, the key scope information pertaining to the AD and adjacency information corresponding to each router in the AD. If required, the Policy Server generates other Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 13] Internet-Draft KARP AKAM-RP July 2012 security policies relevant to the group and puts them together into a policy token. This policy token is sent to the GCKS. Once this is done, steps 1, 2, 3 and 4 as outlined in Section 5.1.3 follow. Step 1 is for GCKS-GM authentication, step 2 is for key and/ or policy transfer from the GCKS to each GM, step 3 is for GM-GM authentication, and step 4 is for key exchange between GMs that need to communicate with each other. Steps 2 and 4 have small variations depending on the key scope being enforced for the AD. Steps 1 and 2 are based on the GDOI GROUPKEY-PULL protocol [RFC6407]. However, step 2 in our case is an extension of GROUPKEY-PULL in the sense that it accommodates various cases of keying groups and adjacency management as well. Steps 3 and 4 have been designed such that GROUPKEY-PULL has been extended to inter-GM communication. Now we shall look at each of these steps in detail. 5.2.1. Step 1 - Initial Exchanges: GCKS, GM mutual authentication Initially, when a routing protocol instance wishes to start communication, be it unicast or multicast communication, it informs the same to the KMP instance on the router. This information is communicated by the KMP instance from that router to the KMP instance on the router or server it believes to be the GCKS. At this point, the GCKS needs the identity of the requesting router in order to authenticate it. The requesting router also has to authenticate the GCKS. Any of the ISAKMP group of unicast protocols could be used for step 1 communication between the GCKS and each router that requests keys from it. IKE/ IKEv2 is an example of such a protocol. This protocol provides peer authentication, and parameters for an SA including a key to help provide confidentiality and message integrity for the next step where the actual traffic keys would be generated. We call the key derived in this phase as SKEYID_a (term taken from GDOI). It is assumed that the routers have agreed upon a way to establish their identity during authentication, either through pre- shared keys, asymmetric keys or certificates. If peer authentication is successful, the router becomes a GM. As already mentioned, GM stands for 'Group Member'. When talking about the GCKS-GM interactions, 'group' typically means the entire set of GMs in the AD. When talking about the GM-GM interactions, 'group' typically means the sending router and some set of its neighbors. This set may include all of its neighbors or only a subset, depending on the key scope in use. For example, when the key scope is per link, a 'group' may refer to all routers sharing a link. This will become evident as we see the GM-GM interactions shortly. Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 14] Internet-Draft KARP AKAM-RP July 2012 5.2.1.1. Message Exchanges for Step 1 The protocol message exchanges for this step are the standard IKE exchanges since we propose using IKE for this step. We would like to mention at this point that whenever we say IKE, we intend to refer to IKE or IKEv2, unless explicitly stated otherwise. 5.2.2. Step 2 - Key Management Message Exchanges between GCKS, GM This is the step where the KMP takes over. The goal of the KMP is to provide parameters for an SA to be eventually used by a routing protocol to secure its control traffic. Messages in this step are secured by the key generated by the step 1 protocol, that is, SKEYID_a. This key helps achieve authentication and confidentiality for step 2. For step 2, we have taken most of the messages from GROUPKEY-PULL protocol of GDOI. However, there are some modifications and important addition of functionality in our case, with the GCKS passing additional information to the GMs. We shall see this in this section. We shall initially look at the KMP details for one of the finely grained cases of keying groups, namely, the group per sending router. This is a flavor of multicast communication. Soon after this we will see the small variations necessary in order to handle the other categories of keying groups. In step 2, the (each) GM makes requests from the GCKS through the KMP for SA parameters required to secure its control traffic. In the request to the GCKS, the GM specifies the identity of the routing protocol for which it needs the keys. Although the GCKS corresponding to the routing protocol would have already been selected in step 1, specifying the routing protocol id again here helps to handle the case where the same GCKS may be used for a category of similar routing protocols. When the GCKS receives this request from the GM, it checks to verify if the GM can be given access to key related information according to the rules in the policy token. If the checks fail, the communication with the GM should not be continued. The exact behavior can be determined from the rules in the policy token. If the checks succeed, the GCKS delivers to the GM the following information: o SA policy corresponding to the TEK. This could include the actual SA parameters as well depending on the category of keying group being enforced. The TEK is the traffic key whose scope could be anything among those described under key scopes in Section 2. The SA policy includes policy information about SA parameters. This Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 15] Internet-Draft KARP AKAM-RP July 2012 could include information pertaining to the algorithms, the TEK, the SPI and other parameters. For the category of keying group being discussed now, that is, the key per sending router, the exact TEK and SA parameters are not delivered by the GCKS to the GM. Only rules pertaining to their generation are handed down. The actual SA parameters are generated by the GM itself soon after step 2 so that the GCKS is not overloaded. o A certificate signed with the private key of the GCKS. This is to be used by the GM for authentication purposes when it communicates with neighboring GMs and with the GCKS for any SA updates in future. o The policy token information received by the GCKS from the Policy Server. As already mentioned, this includes authorization and access control related information. This is read by the GM in order to authorize the GCKS and verify if it is entitled to perform the role of GCKS. o The key scope being enforced in the AD. This configuration is made by the administrator on the GCKS and is pushed to the GM. This is necessary so that the GM knows whether to expect the traffic keys from the GCKS, or whether it needs to generate them itself. o The adjacency information, which includes details of all legitimate neighbors on all interfaces of the GM and not only the neighbors online at that point of time. This is in order to avoid a DoS attack on the GCKS that could result if the GMs started querying the GCKS for every router coming up, especially during the boot up sequence, to know if it is a legitimate neighbor. Also, this ensures completeness of information. It even helps eliminate spoofing attacks where a legitimate neighbor may appear on an interface other than the one it was supposed to appear on. The adjacency information is used by the GM to know the set of authorized neighbors with which it should communicate during steps 3 and 4. 5.2.2.1. Message Exchanges for Step 2 The protocol message exchanges for step 2 are shown in Figure 3. GM->GCKS: HDR*, HASH(1), Ni, RP_ID (1) GCKS->GM: HDR*, HASH(2), Nr, SA, CERT, K_SCOPE, PT, ADJ (2) GM->GCKS: HDR*, HASH(3) (3) Figure 3: Message exchanges for Step 2 In the message exchanges, HDR is an ISAKMP header payload. It has a message id M-ID. The '*' indicates that the message contents following the header are encrypted. The encryption is done with SKEYID_a. This ensures authentication (since the key is a secret Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 16] Internet-Draft KARP AKAM-RP July 2012 generated in step 1 and can be possessed only by the GCKS and the GM with which the step 1 has been carried out) as well as secrecy (due to the encryption). Hashes are used for ensuring message integrity and data origin authentication; this will be explained shortly. In exchange (1), the GM requests SA information from the GCKS to protect its control traffic corresponding to the routing protocol whose id is given by RP_ID. Ni is a nonce used to protect against replay attacks as well as to ensure liveness of the GM. In exchange (2), the GCKS initially confirms from the rules in the policy token that the GM can be given SA information. It also verifies the freshness of the nonce Ni. If this is successful, the GCKS proceeds to deliver to the GM the following information: o SA policy corresponding to the TEK - through the parameter SA o A signed certificate - CERT o Key Scope - K_SCOPE o Policy token - PT o Adjacency information - ADJ The details of these pieces of information have already been explained. Nr is a nonce used for replay protection and to ensure liveness of the GCKS. In exchange (3), the GM initially verifies freshness of the nonce Nr so as to detect a replay attack. It then proceeds to confirm the authorization of the GCKS by referring to the policy token. If the GCKS is an authorized entity, the GM uses the key scope information to know how to proceed with respect to key generation. The adjacency list is used to note the list of legitimate neighbors and the allowed interfaces on which they can appear online. Once this is done, the GM sends an acknowledgement. This acknowledgement includes a hash for integrity purposes. If the GCKS is not authorized, the GM needs to end the communication with the GCKS. The behavior in such cases can be determined by the policies specified in the policy token. The hashes are pseudorandom functions (prf) computed as shown in Figure 4. HASH(1) = prf(SKEYID_a, M-ID | Ni | RP_ID) HASH(2) = prf(SKEYID_a, M-ID | Ni_b | Nr | SA | CERT | K_SCOPE | PT | ADJ) HASH(3) = prf(SKEYID_a, M-ID | Ni_b | Nr_b) Figure 4: Hashes used in Step 2 According to [RFC6407], "Each HASH calculation is a pseudo-random Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 17] Internet-Draft KARP AKAM-RP July 2012 function ("prf") over the message ID (M-ID) from the ISAKMP header concatenated with the entire message that follows the hash including all payload headers, but excluding any padding added for encryption." SKEYID_a is included in the hashes to ensure that both parties have the step 1 key. The hashes include the nonces from previous messages to ensure that both the parties have the exchanged nonces. This is used for data origin authentication purposes. Hence Ni_b and Nr_b refer to Ni and Nr from exchanges (1) and (2) respectively. An important function of hashes is to provide message integrity. The receiver computes the hash of the received message and compares it with the hash value received to determine whether the message has been tampered with or not. Once the GM has received this information, it generates the TEK and determines the parameters to be used for its outgoing SA. Here the functionality of the LKS of the GM as a generator of keys comes into play. Since the key scope being discussed now is one key per sending router, the LKS of each GM generates one TEK. The key generation is to be followed by key information exchange with legitimate neighbors so that the incoming SAs can be determined. It is to be noted that this key generation can even be done at the beginning of step 4 once the inter-GM mutual authentication has happened in step 3. 5.2.3. Step 3 - GM-GM mutual authentication After the GM generates TEK based information, before exchanging it with its neighbors, it needs to ensure that a secure TEK exchange can take place. This is done in step 3 by each GM engaging in a unicast communication with each of its legitimate neighbors through any of the ISAKMP group of unicast key management protocols, such as IKE. This protocol provides peer authentication as well as a secret key to provide confidentiality, authentication and message integrity for step 4, which is the actual TEK exchange step. We call this secret key as SKEYID_b. The legitimate neighbors are determined by referring to the adjacency information given by the GCKS to the GM in step 2. During peer authentication in step 3, the certificate given to the GM by the GCKS could be used. 5.2.3.1. Message Exchanges for Step 3 The protocol message exchanges for this step are the standard IKE exchanges since we propose using IKE for this step. 5.2.4. Step 4 - Key Management Message Exchanges between GMs This is the step where the TEK information is exchanged between GMs that need to communicate with each other. Unicast communication is Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 18] Internet-Draft KARP AKAM-RP July 2012 anyway between two peers. For multicast communication, since we are dealing with control traffic only, and control traffic is typically link-local, each router on a link needs to be aware of the TEK of all other routers on the same link. These legitimate neighbors are determined from the adjacency information received from the GCKS. The LKS of the corresponding GMs communicate to exchange their TEK information in order to help them populate their incoming and outgoing SAs. Messages in this step are secured by the key generated by the step 3 protocol, that is, SKEYID_b. This key helps provide authentication as well as confidentiality. In step 4, the LKS of the GM pushes the SA information corresponding to its TEK to each of its neighbors. The LKS also requests TEK information from its neighbors. Each of the neighbors then sends its outgoing TEK information and this is maintained as an incoming key on the querying LKS. As a result of step 4, all GMs have the TEK information corresponding to all their neighbors so that a secure control traffic exchange can start. 5.2.4.1. Message Exchanges for Step 4 The message exchanges for Step 4 are shown in Figure 5. GMi->GMr: HDR*, HASH(4), N1, CERT1 (4) GMr->GMi: HDR*, HASH(5), N2, CERT2 (5) GMi->GMr: HDR*, HASH(6), SA1, KD1, KREQ (6) GMr->GMi: HDR*, HASH(7), SA2, KD2 (7) Figure 5: Message exchanges for Step 4 GMi and GMr depict the initiator and the responder GMs respectively. The message exchanges in this step are similar to those in step 2 in that the HDR is an ISAKMP header payload with a message id M-ID. The '*' indicates that the message contents following the header are encrypted. The encryption is now done with the key SKEYID_b derived in step 3. This ensures both authentication and secrecy. Hashes are used for ensuring message integrity and data origin authentication. Nonces are used to resist replay attacks and to ensure peer liveness. In exchanges (4) and (5), we show mutual authentication between GMs through the certificates received from the GCKS in step 2. CERT1 is the certificate received by GMi and CERT2 is the one received by GMr from the GCKS. Authentication would have happened in step 3 so exchanges (4) and (5) can be eliminated. They have been shown here for the sake of completeness. Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 19] Internet-Draft KARP AKAM-RP July 2012 In exchange (6), the initiator GM communicates to its neighbor its outgoing SA parameters in SA1 as well as the outgoing TEK information explicitly in KD1. This is the TEK that it will be using henceforth to secure its control packets. It also requests the outgoing SA information from the neighboring GM so that it can be installed as incoming SA information on the querying GM. This request is represented by KREQ, which stands for Key Request. In exchange (7), the neighboring GM responds with its outgoing SA information in SA2 as well as the TEK in KD2. This will be the TEK the neighboring GM will use henceforth to secure its control packets. As already mentioned, the nonces N1 and N2 help provide replay protection and a confirmation that the peer is alive. The hashes are pseudorandom functions computed as shown in Figure 6. HASH(4) = prf(SKEYID_b, M-ID | N1 | CERT1) HASH(5) = prf(SKEYID_b, M-ID | N1_b | N2 | CERT2) HASH(6) = prf(SKEYID_b, M-ID | N1_b | N2_b | SA1 | KD1 | KREQ) HASH(7) = prf(SKEYID_b, M-ID | N1_b | N2_b | SA2 | KD2) Figure 6: Hashes used in Step 4 Hash computation is similar to that explained in step 2. In step 4 hashes are computed by applying a pseudorandom function to the key SKEYID_b, along with the message id concatenated with the message contents following the hash. Also, nonces from a message exchange are included in the hash computation of the subsequent exchanges in order to ensure that both parties have the nonces just exchanged. This helps in data origin authentication. Hence N1_b and N2_b refer to N1 and N2 in exchanges (4) and (5) respectively. Hashes are very essential to ensure message integrity and to confirm that the messages have not been modified (possibly by an intruder) during transit. All information received by the LKS of a GM from the GCKS as well as from neighboring LKSes is written to stable storage persistent across reboots. This can be effectively used to avoid flooding the GCKS with requests on a router reboot. This is one of the advantages of the proposed design over GDOI [RFC6407], where, when routers reboot they come back up with no information and the GCKS is flooded with requests. The routing protocol is notified by the KMP about the new SA being available in the key table for it to protect its control traffic. The routing protocol security mechanism would store the incoming and outgoing SA information, and the adjacency information into the Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 20] Internet-Draft KARP AKAM-RP July 2012 relevant databases. As we can see, confidentiality and authentication has been ensured for all steps by means of secret keys and certificates. In the following section, we shall see the small variations required in the basic protocol design proposed above, in order to handle the various categories of keying groups. 5.2.5. Variations for handling other Keying Groups We have seen the different granularities possible for a keying group, that is, the different key scopes, in Section 2. We have also seen that the design proposed in Section 5.2 is able to handle the keying group where there is a separate key per sending router. This has been achieved by each router generating its own key, which would be the same for all its interfaces. Hence each router has a different SA for outgoing traffic and multiple SAs for incoming traffic, one corresponding to each neighbor. It is to be noted here that the key generation being done locally could have a small possibility of two routers ending up with the same key when they generate it randomly. However, if a good random number generator is used for key generation, the probability of ending up with the same key is drastically reduced. This extremely small possibility can be ignored since the method more importantly has the advantages that it reduces the load on the GCKS. Also the GCKS does not have the need to be aware of the individual keys of each router. This could be considered as a case of tradeoff. In this section, we shall see how the remaining cases of keying groups can be handled. They can actually be handled by minor variations to the basic design. In essence, these variations can be implemented by the GM interpreting the key scope information given to it by the GCKS in step 2, and thereby knowing whether to expect keys from the GCKS or to derive them itself. This also makes the GM aware of the path to be followed. As we shall see, in a majority of cases it is step 4 that gets slightly altered. Same key for the entire AD - Let us take the most coarsely grained case, namely, a keying group per AD. Since all routers have to share the same key (TEK), the centralized GCKS is the one that should generate it. Every GM gets the TEK and other SA parameters directly from the GCKS in step 2. The TEK information received from the GCKS can be stored as both the outgoing as well as the incoming key since all GMs share the same key. Therefore, step 4 can be eliminated. However, step 3, which involves GMs authenticating neighboring GMs is necessary before the GMs can start exchanging control packets. Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 21] Internet-Draft KARP AKAM-RP July 2012 In essence, this variation of key scope can be implemented by the GM interpreting the key scope information given to it from the GCKS in step 2, and thereby knowing that it should expect the TEK from the GCKS (TEK is also received in the same step). Key per link - This is another flavor of keying groups wherein there exists a TEK per link, that is, a key is shared by all routers sharing a link. This can be handled in a manner similar to the single key per router case described as far as steps 1, 2 and 3 are concerned. However, there is a slight variation required in step 4. Previously, the LKS of each GM generated a single key to be used on all interfaces of the GM. However in this case, an LKS needs to generate as many TEKs as the number of its interfaces by interacting with the neighbors on the respective links. This is done by GMs on a link interacting to derive a TEK and other SA parameters through any of the mutual key agreement protocols. Some examples of protocols that could be used for this purpose are MRKMP [I-D.hartman-karp-mrkmp], group Diffie-Hellman, and the STS protocol. Since MRKMP specifies how keys can be generated and distributed on a LAN by electing a GCKS, it can be used for TEK generation for the case where the key scope is per link. The TEK and the other SA parameters generated are stored by all LKSes sharing the link as the outgoing and incoming parameters on that particular link. This procedure is repeated by all GMs for all their links in turn. Key per sending router per interface - The only difference here when compared to the separate key per router case is that in that case, each GM generates a single TEK to be used on all of its interfaces, whereas, here each GM generates a different TEK for each of its interfaces. In step 4, it gives each neighbor the TEK that it plans to use on the connecting link between them. Key per peer - This is the last category of keying groups. This refers to unicast communication where peer routers exchange control packets. Here the SA parameters corresponding to the traffic key TEK and the TEK itself can be generated using a unicast key management protocol such as IKE or even KMPRP. However, an important point to note here is that adjacency management is necessary even for this case since routers should exchange keys only with legitimate neighbors. This can be achieved only by having a central authority that is aware of all valid adjacencies. Our design handles this. Steps 1, 2 and 3 of the design are sufficient. The key derived in step 3, namely, SKEYID_b serves as the TEK. We have mentioned that the SA parameters along with the TEK are either delivered to the GMs by the GCKS (for the single key per AD case) or generated by the GMs themselves, possibly through Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 22] Internet-Draft KARP AKAM-RP July 2012 interactions with other GMs (for the other keying groups, depending on the particular category). A parameter that could have a slightly different behavior is the SPI. This is also one of the parameters of an SA. However the range of SPIs to be used in an AD could be decided by the administrator. Whatever be the category of keying group, it could so happen that the administrator chooses to have the same SPI for all GMs. In this case, the GCKS could deliver the SPI to the GMs along with the policy for the remaining parameters of the SA. It could also be that the administrator wants each GM to use a different SPI for its outgoing traffic. In this case, the GCKS should not be overloaded with the task of generating a different SPI for each GM. GMs should generate the SPI themselves, possibly with communication with other GMs. If that happens, even for the single key per AD category of keying groups, the SPI is generated by the GMs, although the TEK may be obtained from the GCKS (since the TEK is to be the same for all GMs for this category of key scope). In other words, the key scope may be different from the scope of the SPI used in the AD. Our design is flexible enough to handle this since the SA policy handed down by the GCKS to the GMs would indicate to the GM the exact steps to be followed. In all cases of keying groups, the LKS stores SA information to persistent storage to be used across reboots. Keys are stored into the key table [I-D.ietf-karp-crypto-key-table] and the KMP informs the same to the routing protocol, which would start using the keys to secure its control traffic. This is the step 5 mentioned in the explanation of the concept of hierarchical design in Section 5.1.3. 6. Other Aspects of the Key Management Problem In this section, we address some of the other important aspects of the key management problem. Firstly we show how this automated system allows key updates to be done as frequently as desired. Soon after that, we show how various good-to-have features have been incorporated in the proposed design. Some of these features are scalability, incremental deployment ability, effective handling of router reboots and smooth key rollover. Addition of these features would help in achieving the requirements stated in Section 3. 6.1. Key Updates Keys used by the routing protocols to secure their traffic need to be updated at regular intervals. They may have to be updated at other non-specific times as well depending on the requirement. There are a couple of reasons why key updates are required: Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 23] Internet-Draft KARP AKAM-RP July 2012 o As a good practice in order to protect against passive intruders who could have obtained access to the keys and could be eavesdropping the traffic. o Whenever a new member comes up on a link, in order to ensure PBS. This means that the new member should not be able to get access to keys currently being used on the link since that could mean that the member can comprehend old messages exchanged on the link when it was not part of it. o Whenever a member leaves, in order to ensure PFS. This means that going forward, even if the old member manages to get hold of messages exchanged among the remaining members on the same link, it should not be able to comprehend them. One of the important points to be noted here is that PFS and PBS can be achieved very easily and in a straight forward way for unicast communication. Unicast communication involves a pair of routers that share keys for securing their traffic. Every pair of routers derives its own set of keys and those keys are known only to that particular pair of routers. Hence a change in any one of the members of the pair of routers would mean that the old keys are no longer valid and new keys are derived for communication. This automatically takes care of PFS and PBS. When a router, say R1, is uninstalled, the keys used by the other routers for pairwise (unicast) communication with R1 are no longer used. This ensures PFS. When a new router, say R2, is installed, all routers engaging in a unicast communication with it derive new pairwise keys with it. This ensures PBS. For multicast communication, key updates are essential on a router uninstallation or an installation to ensure PFS and PBS respectively. This is because in multicast communication, multiple routers share the same key and a key remains valid even if one of the routers involved in the communication is changed. To achieve PFS and PBS, keys have to be updated so that the leaving or entering routers do not have access to information they are not entitled to. We now have to determine what are the keys that need to be updated. For regular updates, it is quite obvious that the traffic keys of all the routers would have to be changed. The other case to consider is when the routers in an AD change, either due to an installation or an uninstallation. It is interesting to note that when the same traffic key is used for the entire AD, that key should be changed, leading to the effect of changing the keys for all the routers. However, for all other key scopes, only the keys corresponding to the neighbors of the leaving/ entering router need to be changed. This is because as far as control traffic is concerned, routers have knowledge of the keys of their neighbors only. Of course the adjacencies and hence the neighbors, may be defined differently for the various routing protocols. Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 24] Internet-Draft KARP AKAM-RP July 2012 One of the major problems with the manual method of key management is that keys cannot be updated as frequently as desired. This is due to the lack of authorized people to carry out the task. This issue can be easily overcome by an automated key management system. Let us see how these two cases of regular rekey and a rekey on a router installation/ uninstallation can be handled by the automated key management system we propose. 6.2. Regular Key Updates In this section, we discuss how our design for automated key management aids key updates at regular intervals. The interval at which key updates are to be done is determined from the policies handed down by the Policy Server entity described in Section 4.2. These policies are handed down by the Policy Server to the GCKS in the form of a policy token, which in turn is handed down by the GCKS to the GMs in Step 2 of the protocol as explained in Section 5.2. We now need to see how key updates for all variations of keying groups can be addressed. As we shall see, when all routers in the AD share the same traffic key, the centralized GCKS is the generator of the new key, whereas in all other cases, the GMs generate the new keys appropriately. This is in fact similar to the process of initial key generation described in Section 5.2. 6.2.1. Same key for the entire AD First, let us take the case of having a single key for the entire AD. Here, when a rekey is required, the GCKS generates the new traffic key and unicasts it to each individual GM. This ensures that all GMs share the same new TEK after the rekey. As an alternative to transferring the new TEK through unicast communication, the GCKS and all GMs in the AD could share a key called a 'TEK Encryption Key'. This key could be used by the GCKS for encrypting the new TEK derived, and multicasting to all GMs. The advantage of this approach over the unicast method is that it eliminates the need to have multiple key update messages sent out by the GCKS, one corresponding to each GM. This in turn reduces the network traffic. However, the downside to the multicast approach is the overhead of maintaining a group key (and appropriately updating it) just for the rekey purposes. This is a case of tradeoff. 6.2.2. Key per link In this category of keying group, routers sharing a link also share the traffic key for that link. Here when a TEK update is required, GMs on a link execute one of the key agreement protocols such as MRKMP, group Diffie-Hellman or the STS protocol to derive a new TEK. This is similar to the manner in which they interact to derive the Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 25] Internet-Draft KARP AKAM-RP July 2012 initial TEK for the link. The interval after which the TEK should be changed is of course determined from the policy token. 6.2.3. Key per sending router In this case, every router has a different TEK that it uses for securing its control traffic. When a rekey is required, each GM generates a new TEK individually and then communicates the same to all its neighbors. The neighbors update the incoming TEK information corresponding to that router in their databases. 6.2.4. Key per sending router per interface This case is very similar to the previous one. The only difference is that here, each GM generates as many new TEKs as the number of its interfaces, one per interface. The GM then communicates to each of its neighbors the TEK it plans to use on the interface corresponding to that particular neighbor. 6.2.5. Key per peer This is the unicast case. Keys can be updated just by every pair of routers executing a unicast key management protocol such as IKE. In all the above cases, the LKS updates the key store as well as its persistent storage with the updated key information. The KMP notifies the routing protocol of a change in the keys used to secure the control traffic. 6.3. Router Installation/ Uninstallation Along with the regular key updates, keys need to be updated even when an existing router is uninstalled or a new router is installed. These are for PFS and PBS purposes respectively as already explained in Section 6.1. There are a couple of differences between key updates in these cases when compared with the regular key updates. o Regular traffic key updates require that the traffic keys corresponding to all routers in the AD be updated. However, key updates on a router removal or addition require only the keys corresponding to the neighbors of the leaving or entering router to be changed. This is because routers have knowledge of the keys corresponding to their neighbors only as far as control traffic is concerned. But if it so happens that the same traffic key is being used for all routers in the AD, then a change in the key automatically implies that the key gets changed for all the routers. Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 26] Internet-Draft KARP AKAM-RP July 2012 o Regular key updates are done at intervals determined from the policy token given by the Policy Server. However, key updates on a router removal or addition are done based on instructions given by the GCKS in such a situation. This is because routers in the AD (other than the GCKS) would not be aware of the fact that a particular router is either installed or uninstalled. Apart from these differences, the process of key updates during a router change is very similar to the regular key updates. We shall now discuss briefly how key updates on a router change can be handled for each of the categories of keying groups. 6.3.1. Same key for the entire AD For this category of key scope, the same traffic key is shared by all routers in the AD. When a router is removed or a new router is installed, the GCKS derives a new TEK and unicasts it to each of the routers in the AD. As an alternative to transferring the new key through unicast method, the GCKS and all GMs could share a key called the 'TEK Encryption Key'. If this option is followed, first of all, the TEK Encryption Key would have to be changed on a router change. Then for the case of router installation, the GCKS multicasts the new TEK Encryption Key, encrypted in the old key to all existing routers. It then unicasts the new TEK Encryption Key to the newly installed router. After this, the GCKS derives a new TEK and multicasts it to all the routers after encrypting it in the new TEK Encryption Key. This can be decoded by the new router as well since it now possesses the latest TEK Encryption Key. For the case of router uninstallation, the GCKS changes the TEK Encryption Key and unicasts it to all the remaining routers. The new TEK Encryption Key cannot be multicast in this case since the old router would also be able to decrypt it. Changing of the TEK would be the same as for router installation. The new TEK is sent in a multicast message to all routers encrypted in the new TEK Encryption Key. When compared with the unicast method of key updates, this multicast method has the advantage of low bandwidth consumption. However the disadvantage of the multicast method is that an extra key, the TEK Encryption Key, now needs to be maintained and updated accurately. So the exact method chosen depends on the administrator. 6.3.2. Key per link For this case, on a router installation or an uninstallation, the GCKS informs the neighbors of that router. These routers interact with each other (and with the new router if it is a case of router Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 27] Internet-Draft KARP AKAM-RP July 2012 installation) and derive a new traffic key for that particular link where the neighbor change has occurred. Any of the mutual key agreement protocols such as MRKMP, group Diffie-Hellman or the STS protocol can be used. 6.3.3. Key per sending router Here again the GCKS appropriately informs the neighbors of the affected router. Each such neighbor runs a randomized key generation algorithm to derive a new traffic key and communicates the key to its neighbors. This is very similar to the case of regular key updates. 6.3.4. Key per sending router per interface This category of keying group can also be handled in an easy manner. The GCKS informs the neighbors of the affected router. Each such router derives a new traffic key for that interface on which the neighbor change has occurred. The router then communicates the new key to its new set of neighbors on that particular interface. 6.3.5. Key per peer As already explained, key updates on a router change are not valid for unicast communication. This is because in unicast communication, a key is shared by only two routers. A router addition or a removal results in a change in a particular pair (or pairs) of routers. Hence new keys are anyway derived to be shared by the new pair. Thus this can be considered as an automatic update of keys without any explicit processing. 6.4. Router Reboots Router reboots form a very important case to be considered in any design pertaining to networks. Especially in a centralized architecture, care should be taken to prevent the central entity from being stormed with requests when multiple routers happen to reboot almost simultaneously. In our architecture, it is the persistent storage of the distributed LKS that plays a major role on a router reboot. As already seen the LKS of each GM writes to persistent storage some configuration and policy information such as the key scope, adjacencies, SAs, the traffic keys corresponding to itself and its neighbors, certificate received from the GCKS, and the policy token. Hence on a GM reboot, the LKS retrieves information from the persistent storage. This is an extremely important feature since it avoids the GCKS being flooded with requests for information when multiple routers in the AD happen to reboot. However, information retrieval from the persistent storage may not Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 28] Internet-Draft KARP AKAM-RP July 2012 always be sufficient. Occasionally a rekey could have happened when a router was down. This could have been either a regular rekey or a rekey due to a router installation or removal. These cases should be dealt with in an appropriate manner so as to ensure that the rebooted router gets the latest SA and adjacency information. In order to handle these cases, a router needs to query its neighbors on a reboot. This is done as soon as the router has rebooted and read the relevant information from its persistent store. The neighbors communicate their traffic key and SA information to the rebooted router. Depending on this information as well as the key scope information retrieved from the persistent storage, the rebooted router can handle a rekey appropriately. This interaction with the neighbors for the different cases of key scopes is explained below: Same key for the entire AD - To handle this case, a router gets the TEK related information initially from one of its neighbors. It compares this key with the key corresponding to that neighbor (which is the same as its own key since the same key is shared by all routers in the AD) as retrieved from the persistent storage. If the two keys match, then it is evident that no rekey has happened on the neighbor. Since the key scope is such that the same key is used for the entire AD, it can be concluded that there has been no rekey in the AD. Hence the rebooted router need not do anything else. If the keys are in mismatch, the rebooted router concludes that a rekey has happened in the AD, either due to a regular key update or due to a key update based on a router change. In either case, the router changes its outgoing traffic key to be the same as the new one got from its neighbor. This helps maintain consistency of all traffic keys across the AD. Key per link - For this case, the rebooted router queries its neighbors in turn, one neighbor on each of its links. Again it compares the traffic key received from its neighbor with the corresponding information retrieved from its persistent store. If the two keys match, it means that there has been no rekey on that link. If the keys are in mismatch, it means that a rekey has happened on the link. The rebooted router then changes its own outgoing traffic key on that link to be the same as the new key got from the neighbor. In either case, the router proceeds with querying its neighbors on its remaining links. This is different from the previous case where a single key was used by all routers in the AD. This is because in the key per link case, determining whether a rekey has happened on a particular link does not help determine the status on other links. Hence at least one neighbor on each link has to be queried. Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 29] Internet-Draft KARP AKAM-RP July 2012 Key per sending router - For this case, the rebooted router starts by querying one neighbor on each of its interfaces. If the traffic keys of all the queried neighbors are the same as the corresponding keys retrieved from the persistent storage of the rebooted router, there is nothing to be done. If there is at least one neighbor whose key has changed, the rebooted router changes its own key and communicates it to its neighbors. The rebooted router can stop querying its neighbors at this point. An interesting observation here is that a neighbor's key could have changed either due to a regular rekey or due to an installation/ uninstallation of its neighboring router. This neighboring router may or may not be a common neighbor to the rebooted router. Since the exact situation cannot be determined, the rebooted router just goes ahead with its key change once it sees that the key of its neighbor has changed. This should be fine since an extra key update is not harmful. Key per sending router per interface - This case is similar to the key per link case. The rebooted router queries one neighbor per interface and compares the traffic key information received with the corresponding information from the persistent key store. If the keys match, there has been neither a regular update nor a router change on that interface. If the keys do not match, it means that there has been a key update either as part of a regular rekey or due to a neighbor change on that interface. Hence the rebooted router derives a new traffic key for that interface and communicates the same to its neighbors on that interface. The router then proceeds with querying its neighbors on the remaining interfaces to determine whether the keys used on its remaining interfaces are required to be changed or not. Key per peer - This category of keying group represents unicast communication. Here when a router comes back up after a reboot, it queries its counterpart for the traffic keys corresponding to this pair of routers. Since for unicast communication, a pair of routers together derives traffic keys, new keys for this pair would not be available as yet even though a regular rekey interval may have passed when the router was down. Therefore the two routers could engage in a unicast key management protocol such as IKE to derive new traffic keys or could decide to proceed with using the old keys itself till the next rekey interval has passed. The method described above helps ensure that in a majority of cases, rekeys that could have happened when a router was down are handled. There are a couple of cases to be considered as yet. Firstly, the rebooted router should verify whether the adjacencies as Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 30] Internet-Draft KARP AKAM-RP July 2012 retrieved from its persistent storage are accurate still. They could now be stale due to the fact that a router could have been installed/ uninstalled when it was rebooting. Secondly, in the discussion above regarding the ways in which reboots can be handled for the different categories of keying groups, we have mentioned that a router queries only one neighbor in some cases and one neighbor per link or interface in other cases. A situation could arise wherein the queried neighbor itself had gone through a reboot resulting in its own key being stale. This in turn would mean that the querying router cannot rely on the information got from this single neighbor. One way in which both of these issues could be addressed is for the rebooted router to query the GCKS to get the updated information. However we do not want the GCKS to be flooded with requests from the various routers in the AD. Hence there are two layers of protection designed as follows: o As already explained, the rebooted router retrieves information from its persistent store. It then queries its neighbors and appropriately changes its keys or realises that a key update is not required. o Once this is done, in order to query the GCKS, the rebooted router chooses a random time interval so as to avoid clashes with other routers querying the GCKS. Due to the randomness introduced, chances of the GCKS being flooded with requests are reduced. The GCKS when queried, could give the router information corresponding to its new adjacencies, probably the time of change of its adjacencies and any other relevant rekey information. This enables the rebooted router to know whether its traffic keys are stale or not. Another fine point here is that very rarely the rekey process could be in progress when the router comes up. This is a corner case and is being left for future work. 6.5. Scalability Any system that has widespread deployment should be designed keeping the scalability feature in mind. If scalability is overlooked during the design phase, the system would fail on high loads when actually deployed. We have designed the automated key management system so as to make it scalable. We have already mentioned that we are limiting the scope of our problem to key and adjacency management within an AD. Even Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 31] Internet-Draft KARP AKAM-RP July 2012 within an AD since the number of routers is not fixed, the system should be able to handle a variable/ large number of routers. The proposed protocol involves a set of GCKS-GM interactions and a set of GM-GM interactions. The GM-GM communication is only among neighboring GMs and hence scalability is not an issue for that. Even for the GCKS-GM communication in the normal case, there should not be any issue since all GMs are not installed or turned on at the same time. However, a situation to be considered is when the GMs reboot. It could so happen that due to a power outage, all GMs in the AD go down and come back up at approximately the same time. It is extremely important to ensure that the GCKS is not stormed with requests at this point. Our proposal handles this case in a couple of ways. Firstly we have seen that the LKS of each GM maintains a stable storage. All important pieces of information, such as the ones got from the GCKS and from the neighboring GMs are written to this storage, which is persistent across reboots. Hence a GM after a reboot, reads information directly from its persistent storage thereby preventing the GCKS from being flooded with requests. Secondly after retrieving information from the local storage, when the GMs need to query the GCKS itself, they do so by starting a timer and querying at a random time interval. This plays a major role in preventing the GCKS from being overloaded thereby leading to scalability. Another factor that enables partial distribution of functionality thereby enhancing scalability is the presence of the Standby GCKS. If a situation arises such that the active GCKS fails (which could be due to an overload), the Standby GCKS would immediately take over the functionality of the active one. This eliminates a single point of failure and hence allows the system to withstand higher loads, or more number of GMs in the AD. 6.6. Option to Turn Off Adjacency Management We have already discussed why it is important for an automated key management system to manage adjacencies well. In fact, this is because routing protocol updates are usually exchanged with neighbors, which in turn leads to the requirement that communicating routers should be legitimate neighbors. It is a good practice to have adjacency management turned on in a network so that for any router, only its legitimate neighbors and all of its legitimate neighbors get to know the keys it uses for securing its control traffic. However, sometimes an administrator may decide to turn off adjacency checks because his network of routers is probably too small and the extra overhead is not required. This would mean that any router is Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 32] Internet-Draft KARP AKAM-RP July 2012 then allowed to query for and receive the traffic keys of any other router in the network even though the routers may not be neighbors. If adjacency management is turned off, even routing protocols would respond to all control packets without performing adjacency checks. This definitely reduces security in the network. If the key scope is such that the same traffic key is used throughout the AD, not much harm is caused if a router gives its key information to any other router in the AD since all routers share the same key. Of course mutual authentication of the routers should happen in order to know if the routers are valid members of the AD. However, an administrator could use the key per sender model, for example, and turn off adjacency management. The administrator then relies on the physical adjacency to ensure that a router far away from another router does not query it for keys. 6.7. Incremental Deployment Whenever a new system is to be deployed in the real world, the ease with which that can be done is of utmost importance. Network operators may not be ready to switch over to a new system if it is not easy to deploy it. Also, operators using a certain setup, when switching over to a new one would usually want to deploy the new system on an incremental basis. This would help them detect problems in the new system, if any, and then decide whether to completely move to the new model or not. We have designed our automated key management system keeping this requirement in mind. The model we have proposed can be deployed on a per interface basis. This means that initially GMs could be manually configured with the TEKs for some of their interfaces, and made to run the key management protocol to derive TEKs corresponding to the other interfaces. This is for the case of separate key per interface of each router. The other cases of keying groups can be handled in a similar manner. Secondly, the new system can be used to provide TEKs for one routing protocol at a time. This again makes the transition from the manual method of configuration to the automated method smooth. 6.8. Smooth Key Rollover Whenever the TEK is changed, smooth key rollover should be ensured so that no packets are dropped during the process of key transitions. In order to achieve this, while transitioning from the old key to the new one, for a short duration routers have to accept messages secured using either key. This allows for the time delay involved in the new keys being received by all routers participating in that particular communication. After a certain time period as determined by a timer, the old key information could be cleared. For smooth key rollover in multicast communication, these points have been explained in more Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 33] Internet-Draft KARP AKAM-RP July 2012 detail in [RFC5374]. For unicast communication, either this method could be followed or the two participating routers could exchange new keys and acknowledge the receipt of the keys just before beginning to use them. 6.9. Eliminating Single Point of Failure The proposed design for key management describes the use of a centralized GCKS as the controller and co-ordinator for the entire AD. In any centralized system, there is a possibility of having a single point of failure. In such a system, if the central entity goes down, it could so happen that the entire system stops functioning due to loss of important data. This can be avoided by having a backup entity to take over when the primary controller goes down. This is precisely what is proposed in our design in Section 4.2. We propose maintaining a Standby GCKS, which is always kept in sync with the primary GCKS. This can be done by correctly syncing all data from the active to the standby at regular intervals. The appropriate interval could be determined by the policies handed down by the Policy Server to the GCKS. Whenever the active goes down, the standby can immediately take over its responsibility thereby preventing any interruption in the functioning of the system. This introduces a certain degree of distribution of functionality and hence can successfully eliminate a single point of failure. 7. Detailed Packet Formats TBD 8. IANA Considerations This document has no actions for IANA. 9. Acknowledgements 10. Change History (RFC Editor: Delete Before Publishing) [NOTE TO RFC EDITOR: this section for use during I-D stage only. Please remove before publishing as RFC.] atwood-karp-akam-rp-02 Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 34] Internet-Draft KARP AKAM-RP July 2012 o Inserted ASCII art for figures anbd hashes o Resolved internal cross-references o Resolved external citations atwood-karp-akam-rp-01 o copied in the rest of the relevant material from Revathi's thesis o added overview material on protocol operations atwood-karp-akam-rp-00 (original submission, based on Revathi's thesis) o copied in some sections of the thesis that are relevant to the specification. 11. Needs Work in Next Draft (RFC Editor: Delete Before Publishing) [NOTE TO RFC EDITOR: this section for use during I-D stage only. Please remove before publishing as RFC.] List of stuff that still needs work o o o Create the section on packet formats o o 12. References 12.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 12.2. Informative References [I-D.hartman-karp-mrkmp] Hartman, S., Zhang, D., and G. Lebovitz, "Multicast Router Key Management Protocol (MaRK)", draft-hartman-karp-mrkmp-04 (work in progress), March 2012. [I-D.ietf-karp-crypto-key-table] Housley, R., Polk, T., Hartman, S., and D. Zhang, "Database of Long-Lived Symmetric Cryptographic Keys", draft-ietf-karp-crypto-key-table-03 (work in progress), Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 35] Internet-Draft KARP AKAM-RP July 2012 June 2012. [I-D.ietf-karp-ops-model] Hartman, S. and D. Zhang, "Operations Model for Router Keying", draft-ietf-karp-ops-model-03 (work in progress), July 2012. [I-D.ietf-karp-threats-reqs] Lebovitz, G. and M. Bhatia, "Keying and Authentication for Routing Protocols (KARP) Overview, Threats, and Requirements", draft-ietf-karp-threats-reqs-05 (work in progress), May 2012. [RFC2409] Harkins, D. and D. Carrel, "The Internet Key Exchange (IKE)", RFC 2409, November 1998. [RFC3740] Hardjono, T. and B. Weis, "The Multicast Group Security Architecture", RFC 3740, March 2004. [RFC4535] Harney, H., Meth, U., Colegrove, A., and G. Gross, "GSAKMP: Group Secure Association Key Management Protocol", RFC 4535, June 2006. [RFC5374] Weis, B., Gross, G., and D. Ignjatic, "Multicast Extensions to the Security Architecture for the Internet Protocol", RFC 5374, November 2008. [RFC5796] Atwood, W., Islam, S., and M. Siami, "Authentication and Confidentiality in Protocol Independent Multicast Sparse Mode (PIM-SM) Link-Local Messages", RFC 5796, March 2010. [RFC5996] Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, "Internet Key Exchange Protocol Version 2 (IKEv2)", RFC 5996, September 2010. [RFC6407] Weis, B., Rowles, S., and T. Hardjono, "The Group Domain of Interpretation", RFC 6407, October 2011. [RFC6518] Lebovitz, G. and M. Bhatia, "Keying and Authentication for Routing Protocols (KARP) Design Guidelines", RFC 6518, February 2012. [atwo2009:AKM] Atwood, J., "Automated Key Management for Router Updates", October 2009. Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 36] Internet-Draft KARP AKAM-RP July 2012 Authors' Addresses William Atwood Concordia University/CSE 1455 de Maisonneuve Blvd, West Montreal, QC H3G 1M8 Canada Phone: +1(514)848-2424 ext3046 Email: william.atwood@concordia.ca URI: http://users.encs.concordia.ca/~bill Revathi Bangalore Somanatha Concordia University/CSE 1455 de Maisonneuve Blvd, West Montreal, QC H3G 1M8 Canada Email: revathi.bs@gmail.com Atwood & Bangalore Somanatha Expires January 31, 2013 [Page 37]