Problem Statement P.Groom Internet-Draft October 24, 2003 Expires: April 24, 2004 VPN Performance Measurements - an open model based proposal draft-groom-vpn-tunnel-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on July 9, 2004. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. I. ABSTRACT This document discusses a proposed solution for measuring the performance of a VPN tunnel, in terms of latency, packet loss and availability, using a standards based methodology. Although various vendors have provided methods of measuring VPN performance, all of them use either proprietary protocols or bespoke software. This paper proposes a standards based solution that can be easily integrated into an NMS (Network Management System). II. INTRODUCTION Currently, the management of a VPN terminating device (VTD) is well understood, using a mixture of SNMP (Simple Network Management Protocol) and proprietary vendor protocols. For example, several vendors provide SNMP MIBs (Management Information Bases) that can be used to interrogate their VTDs, in order to check the availability status of a particular VPN tunnel. [1][2] Whilst such an approach does provide availability monitoring (fault management), there is no open standard to measure metrics such as packet loss or latency through a VPN tunnel. Figure 1 shows the current architecture. +-------------------+ +-------------------+ | | | | | +---------------+ | | +---------------+ | App.| |VPN Terminating| | +------------+ | |VPN Terminating| |App. ----+-| Device (VTD) |-+--| VPN tunnel |--+-| Device (VTD) |-+---- Flow| +---------------+ | +------------+ | +---------------+ |Flow | | | | | Sphere of | | Sphere of | | Management | | Management | +-------------------+ +-------------------+ | | | +---------------+ | Network | | Management | | Station | | (NMS) | +---------------+ Figure1 Current VPN network management architecture III. How do vendors remedy this problem? There are three typical methods used by vendors, to remedy this problem, those being: ¸ Use of Netflow data [3] ¸ Use of other proprietary management protocols [4] ¸ Use of custom crafted packets and bespoke software [5] The use of proprietary Netflow data can provide session information through a particular VPN tunnel but this does not provide information on latency or packet loss. The other two types of solution can provide latency and packetloss information but are by their very nature, closed systems. IV. Proposed solution The proposed solution is intended to be an add-on to the IPSec protocol described in RFCs 2402, 2406, 2408 and 2409 [6][7][8][9] and will therefore provide an open standard that will be available to all VPN vendors to implement as an option. Clearly, in cases where there are different vendors at each end of a VPN tunnel, both must support the option for it to be effective. In a nutshell, this solution proposes the use of special keepalive packets that can be generated by just one or both VTDs, sent down the VPN tunnel to the remote VTD and returned to the initiating VTD. The initiating VTD tracks the response packet and calculates the RTT (Round Trip Time). If the response packet does not appear within a timeout period then the packet is assumed to be lost. This provides the packet loss figures for the VPN tunnel under test. 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | SPI | | | +---------------------------------------------------------------+ | | | Sequence Number | | | +---------------------------------------------------------------+ | | | Payload containing an incremental counter (variable length) | | | +---------------------------------------------------------------+ | | | Padding (variable length) | | | +---------------+---------------+-------------------------------+ | | | | | Length | Next Header |Authentication Data (variable | | | | length | +---------------+---------------+-------------------------------+ Figure 2 Initiating keepalive packet format Figures 2 and 3 show the formats for the special packets used by this solution. 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | SPI | | | +---------------------------------------------------------------+ | | | Sequence Number | | | +---------------------------------------------------------------+ | | | Payload containing the same incremental counter as the | | initiating packet (variable length) | | | +---------------------------------------------------------------+ | | | Padding (variable length) | | | +---------------+---------------+-------------------------------+ | | | | | Length | Next Header |Authentication Data (variable | | | | length | +---------------+---------------+-------------------------------+ Figure 3 Return response packet format The theory behind the packet formats is that in effect the return response packet mirrors the initiating keepalive packet, since the payload is a 64 bit number which is an up counter starting at zero. This method allows for missing packets to be detected, as well as packets being received out of sequence. Figure 4, on the next page, shows the relationship between initiator and responder across the VPN tunnel. It also shows performance information being collected in both directions. +------+------+------------+-+-+--+ Response packet <------ | | | | | | | from B towards A +------+------+------------+-+-+--+ Initiating packet +--+-+-+------------+------+------+ from A towards B | | | | | | | ------> +--+-+-+------------+------+------+ +---------------+ +---------------+ App. |VPN Terminating| +------------+ |VPN Terminating|App. ------| Device (VTD) |----| VPN tunnel |----| Device (VTD) |---- Flow | A | +------------+ | B |Flow +---------------+ +---------------+ +------+------+------------+-+-+--+ Initiating packet <------ | | | | | | | from B towards A +------+------+------------+-+-+--+ Response packet +--+-+-+------------+------+------+ from A towards B | | | | | | | ------> +--+-+-+------------+------+------+ Figure 4 Communication steps between initiator and responder The logic is that the initiating VTD retains the performance information and via its configuration, can opt to collect performance data on a single, some or all of its VPN tunnels. All of this performance is available via a SNMP MIB entry which can be queried by the NMS, once the necessary SNMP MIB has been loaded into it. If there is no desire for SNMP traffic to traverse the VPN tunnel then only the local VTD needs to be configured, whereas for traffic in both directions, both VTDs need configuring and SNMP traffic will traverse the VPN tunnel. The last value of RTT will be stored in the MIB and used for comparison.Should three (by default, although this is configurable) successive response packets not arrive at the initiating VTD then an alarm can be generated, either on the console, via a syslog message or as a SNMP trap. The same methods can be employed should the RTT values exceed a configured threshold (by default, set to two seconds). Furthermore, a 'cancellation of threshold exceeded' alarm can be generated when three successive RTTs drop below the threshold. Additionally, the MIB can provide SNMP trap capabilities so that alerts can be generated whenever the number of lost response packets exceeds a set threshold. Figure 5, on the next page, shows the proposed architecture. +----------------------------------------------------------+ | | | +---------------+ +---------------+ | App.| |VPN Terminating| +------------+ |VPN Terminating| |App. ----+-| Device (VTD) |----| VPN tunnel |----| Device (VTD) |-+---- Flow| +---------------+ +------------+ +---------------+ |Flow | | | | | Sphere of Management | +----------------------------------------------------------+ | | | +---------------+ | Network | | Management | | Station | | (NMS) | +---------------+ Figure 5 Proposed network management architecture IV. SNMP MIBs and Agent Figure 6, below, shows the MIB that can provide the monitoring of the SNMP agents on the VTDs. It is expected that the VTDs will need to incorporate additional functionality in order to provide the information to support these MIBs. VPN-PERFORMANCE-MIB DEFINITIONS ::= BEGIN IMPORTS MODULE-IDENTITY, OBJECT-TYPE, TimeTicks, Integer32, mib-2, IpAddress, Counter32, NOTIFICATION-TYPE FROM SNMPv2-SMI MODULE-COMPLIANCE, OBJECT-GROUP, NOTIFICATION-GROUP FROM SNMPv2-CONF DisplayString FROM SNMPv2-TC; vpnPerformanceMIB MODULE-IDENTITY LAST-UPDATED "200309151900Z" -- 15 September 2003 ORGANIZATION "Credit Suisse First Boston" CONTACT-INFO " Peter Groom Credit Suisse First Boston One Cabot Square London United Kingdom Email: peter.groom@csfb.com" DESCRIPTION "The MIB module for performance management of VPN tunnels." REVISION "200309151900Z" -- 15 September 2003 DESCRIPTION "Initial version, yet to be published as a RFC." ::= { mib-2 XXX } -- To be assigned by IANA. -- Request to assign same number as LMP -- ifType. vpnPerformanceMIBObjects OBJECT IDENTIFIER ::= { vpnPerformanceMIB 1 } vpnPerformanceTrapNotifications OBJECT IDENTIFIER ::= { vpnPerformanceMIBObjects 0 } vpnPerformance OBJECT IDENTIFIER ::= { vpnPerformanceMIBObjects 1 } vpnPerformanceTRAP OBJECT IDENTIFIER ::= { vpnPerformanceMIBObjects 2 } -- the VPN Performance MIB-Group -- -- a collection of objects providing performance information about -- VPN tunnels vpnPerformanceIfTable OBJECT-TYPE SYNTAX SEQUENCE OF VpnPerformanceIfEntry MAX-ACCESS not-accessible STATUS current DESCRIPTION "The table containing line entries for each VPN tunnel." ::= { vpnPerformance 1 } vpnPerformanceIfEntry OBJECT-TYPE SYNTAX VpnPerformanceIfEntry MAX-ACCESS not-accessible STATUS current DESCRIPTION "A line entry containing information about a particular configured tunnel." INDEX { vpnPerformanceIndex } ::= { vpnPerformanceIfTable 1 } VpnPerformanceIfEntry ::= SEQUENCE { vpnPerformanceIndex Integer32, vpnPerformanceIfLocalAddress IpAddress, vpnPerformanceIfRemoteAddress IpAddress, vpnPerformanceIfPollPeriod TimeTicks, vpnPerformanceIfHighThreshold TimeTicks, vpnPerformanceIfLostPackets Counter32, vpnPerformanceIfRearmThreshold TimeTicks, vpnPerformanceIfHolddownThreshold TimeTicks, vpnPerformanceIfLastRoundTripTime TimeTicks } vpnPerformanceIndex OBJECT-TYPE SYNTAX Integer32 (1..2147483647) MAX-ACCESS not-accessible STATUS current DESCRIPTION "A unique value for each table entry." ::= { vpnPerformanceIfEntry 1 } vpnPerformanceIfLocalAddress OBJECT-TYPE SYNTAX IpAddress MAX-ACCESS read-only STATUS current DESCRIPTION "The address of the local VPN tunnel endpoint." ::= { vpnPerformanceIfEntry 2 } vpnPerformanceIfRemoteAddress OBJECT-TYPE SYNTAX IpAddress MAX-ACCESS read-only STATUS current DESCRIPTION "The address of the remote VPN tunnel endpoint." ::= { vpnPerformanceIfEntry 3 } vpnPerformanceIfPollPeriod OBJECT-TYPE SYNTAX TimeTicks UNITS "1/100th Seconds" MAX-ACCESS read-write STATUS current DESCRIPTION "The time period used to send performance packets to the remote end of the VPN tunnel." ::= { vpnPerformanceIfEntry 4 } vpnPerformanceIfHighThreshold OBJECT-TYPE SYNTAX TimeTicks UNITS "1/100th Seconds" MAX-ACCESS read-write STATUS current DESCRIPTION "The time threshold used to compare round trip values against." ::= { vpnPerformanceIfEntry 5 } vpnPerformanceIfLostPackets OBJECT-TYPE SYNTAX Counter32 MAX-ACCESS read-only STATUS current DESCRIPTION "The counting threshold used to compare the number of lost packet values against." ::= { vpnPerformanceIfEntry 6 } vpnPerformanceIfRearmThreshold OBJECT-TYPE SYNTAX TimeTicks UNITS "1/100th Seconds" MAX-ACCESS read-write STATUS current DESCRIPTION "The rearm threshold used to compare successive alarm values against." ::= { vpnPerformanceIfEntry 7 } vpnPerformanceIfHolddownThreshold OBJECT-TYPE SYNTAX TimeTicks UNITS "1/100th Seconds" MAX-ACCESS read-write STATUS current DESCRIPTION "The holddown threshold used to compare successive alarm values against." ::= { vpnPerformanceIfEntry 8 } vpnPerformanceIfLastRoundTripTime OBJECT-TYPE SYNTAX TimeTicks UNITS "1/100th Seconds" MAX-ACCESS read-write STATUS current DESCRIPTION "The last round trip value collected." ::= { vpnPerformanceIfEntry 9 } -- the VPN Performance TRAP-Group -- -- a collection of objects generating snmp traps based on performance -- information about VPN tunnels vpnPerformanceTrapTable OBJECT-TYPE SYNTAX SEQUENCE OF VpnPerformanceTrapEntry MAX-ACCESS not-accessible STATUS current DESCRIPTION "The agent's table containing the alarm information." ::= { vpnPerformanceTRAP 1 } vpnPerformanceTrapEntry OBJECT-TYPE SYNTAX VpnPerformanceTrapEntry MAX-ACCESS not-accessible STATUS current DESCRIPTION "Information about the last trap generated by the agent. There is always one entry in this table, indexed by the integer value 1." INDEX { vpnPerformanceTrapIndex } ::= { vpnPerformanceTrapTable 1 } VpnPerformanceTrapEntry ::= SEQUENCE { VpnPerformanceTrapIndex Integer32, vpnPerformanceTrapSequence Counter32, vpnPerformanceTrapText DisplayString, vpnPerformanceTrapPriority Integer32, vpnPerformanceTrapTime Counter32, vpnPerformanceTrapSuspect IpAddress } vpnPerformanceTrapIndex OBJECT-TYPE SYNTAX Integer32 (1..2147483647) MAX-ACCESS not-accessible STATUS current DESCRIPTION "A unique value for each table entry." ::= { vpnPerformanceTrapEntry 1 } vpnPerformanceTrapSequence OBJECT-TYPE SYNTAX Counter32 MAX-ACCESS read-only STATUS current DESCRIPTION "A counter of the number of alarms generated since the agent for last initialised." ::= { vpnPerformanceTrapEntry 2 } vpnPerformanceTrapText OBJECT-TYPE SYNTAX DisplayString MAX-ACCESS read-write STATUS current DESCRIPTION "Trap text to make the alarm obvious." ::= { vpnPerformanceTrapEntry 3 } vpnPerformanceTrapPriority OBJECT-TYPE SYNTAX Integer32 (1..255) MAX-ACCESS read-only STATUS current DESCRIPTION "The priority level as set on the agent for this type of trap." ::= { vpnPerformanceTrapEntry 4 } vpnPerformanceTrapTime OBJECT-TYPE SYNTAX Counter32 MAX-ACCESS read-only STATUS current DESCRIPTION "The time that the condition or event occurred which generated the alarm. The value is given in seconds since 00:00:00 Greenwich Mean Time (GMT) January 1, 1970." ::= { vpnPerformanceTrapEntry 5 } vpnPerformanceTrapSuspect OBJECT-TYPE SYNTAX IpAddress MAX-ACCESS read-only STATUS current DESCRIPTION "An ASCII string describing the host which caused the alarm." ::= { vpnPerformanceTrapEntry 6 } vpnPerformanceTrapLostPackets NOTIFICATION-TYPE OBJECTS { vpnPerformanceTrapSequence, vpnPerformanceTrapText, vpnPerformanceTrapPriority, vpnPerformanceTrapTime, vpnPerformanceTrapSuspect } STATUS current DESCRIPTION "The agent has detected the number of lost packets has exceeded a threshold." ::= { vpnPerformanceTrapNotifications 1 } vpnPerformanceTrapLostPacketsOk NOTIFICATION-TYPE OBJECTS { vpnPerformanceTrapSequence, vpnPerformanceTrapText, vpnPerformanceTrapPriority, vpnPerformanceTrapTime, vpnPerformanceTrapSuspect } STATUS current DESCRIPTION "The agent has detected the number of lost packets has dropped below a threshold." ::= { vpnPerformanceTrapNotifications 2 } vpnPerformanceNotificationsGroup NOTIFICATION-GROUP NOTIFICATIONS { vpnPerformanceTrapLostPackets, vpnPerformanceTrapLostPacketsOk } STATUS current DESCRIPTION "Generic VPN Performance Notifications." ::= { vpnPerformanceTrapNotifications 3 } -- Conformance information vpnPerformanceConformance OBJECT IDENTIFIER ::= { vpnPerformanceMIB 2 } vpnPerformanceTrapConformance OBJECT IDENTIFIER ::= { vpnPerformanceMIB 3 } vpnPerformanceCompliances OBJECT IDENTIFIER ::= { vpnPerformanceConformance 1 } vpnPerformanceGroups OBJECT IDENTIFIER ::= { vpnPerformanceConformance 2 } vpnPerformanceTrapCompliances OBJECT IDENTIFIER ::= { vpnPerformanceTrapConformance 1 } vpnPerformanceTrapGroups OBJECT IDENTIFIER ::= { vpnPerformanceTrapConformance 2 } -- Compliance statement vpnPerformanceCompliance MODULE-COMPLIANCE STATUS current DESCRIPTION "The compliance statement for SNMPv2 entities which implement the VPN performance module." MODULE -- this module MANDATORY-GROUPS { vpnPerformanceGroup } ::= { vpnPerformanceCompliances 1 } vpnPerformanceTrapCompliance MODULE-COMPLIANCE STATUS current DESCRIPTION "The compliance statement for SNMPv2 entities which implement the VPN performance trap module." MODULE -- this module MANDATORY-GROUPS { vpnPerformanceTrapGroup } ::= { vpnPerformanceTrapCompliances 1 } -- Units of Conformance vpnPerformanceGroup OBJECT-GROUP OBJECTS { vpnPerformanceIfLocalAddress, vpnPerformanceIfRemoteAddress, vpnPerformanceIfPollPeriod, vpnPerformanceIfHighThreshold, vpnPerformanceIfLostPackets, vpnPerformanceIfRearmThreshold, vpnPerformanceIfHolddownThreshold, vpnPerformanceIfLastRoundTripTime } STATUS current DESCRIPTION "The vpnPerformance group of objects providing for the management of VPN performance metrics." ::= { vpnPerformanceGroups 1 } vpnPerformanceTrapGroup OBJECT-GROUP OBJECTS { vpnPerformanceTrapSequence, vpnPerformanceTrapText, vpnPerformanceTrapPriority, vpnPerformanceTrapTime, vpnPerformanceTrapSuspect } STATUS current DESCRIPTION "The vpnPerformance group of objects providing for the management of VPN performance traps." ::= { vpnPerformanceTrapGroups 1 } END Figure 6 MIB to support SNMP queries and traps The MIB tree from the above MIB is shown below in Figure 7. +-mib-2 | +-vpnPerformanceMIB(xxx) | +-vpnPerformanceMIBObjects(1) | | | +-vpnPerformanceTrapNotifications(0) | | | | | +-vpnPerformanceTrapLostPackets(1) | | +-vpnPerformanceTrapLostPacketsOk(2) | | +-vpnPerformanceNotificationsGroup(3) | | | +-vpnPerformance(1) | | | | | +-vpnPerformanceIfTable(1) | | | | | +-vpnPerformanceIfEntry(1) | | | | | +-vpnPerformanceIndex(1) | | +-vpnPerformanceIfLocalAddress(2) | | +-vpnPerformanceIfRemoteAddress(3) | | +-vpnPerformanceIfPollPeriod(4) | | +-vpnPerformanceIfHighThreshold(5) | | +-vpnPerformanceIfLostPackets(6) | | +-vpnPerformanceIfRearmThreshold(7) | | +-vpnPerformanceIfHolddownThreshold(8) | | +-vpnPerformanceIfLastRoundTripTime(9) | | | +-vpnPerformanceTRAP(2) | | | +-vpnPerformanceTrapTable(2) | | | +-vpnPerformanceTrapIndex(1) | | | +-vpnPerformanceTrapSequence(2) | +-vpnPerformanceTrapText(3) | +-vpnPerformanceTrapPriority(4) | +-vpnPerformanceTrapTime(5) | +-vpnPerformanceTrapSuspect(6) | +-vpnPerformanceConformance(2) | | | +-vpnPerformanceCompliances(1) | | | | | +-vpnPerformanceCompliance(1) | | | +-vpnPerformanceGroups(2) | | | +-vpnPerformanceGroup(1) | +-vpnPerformanceTrapConformance(3) | +-vpnPerformanceTrapCompliances(1) | | | +-vpnPerformanceTrapCompliance(1) | +-vpnPerformanceTrapGroups(2) | +-vpnPerformanceTrapGroup(1) Figure 7 MIB tree for the VPN-PERFORMANCE-MIB. It is expected that the configuration of the VTD will allow the lost packet threshold, RTT threshold and the polling period to be set on a per VPN tunnel basis. The relationship between the threshold, polling period and SNMP traps is shown below in Figure 8. Occurrence ^ +---+ +---+ +---+ Alert | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No Alert +--------------+---+-----------------+---+------------+---+-----> SNMP alerts over time A B C D E | Re-arm | |Holddown| | Round Trip Time | Period | | Period | | ^ |<---------->| |<------>| | Packet | lost or +-----/--------\------------/-------------------------\--------- too late | / \ / \ threshold | / \ / \ | / \______/ \ | / \_____ |/ | | 0 +---------------------------------------------------------------> 0 2 4 6 8 10 12 14 16 18 20 Responses over time Figure 8 The relationship between RTTs and SNMP alerts In Figure 8, point A describes the point where three successive keepalives have either not returned or arrived too late. At this point, a SNMP trap is sent to the NMS. Following the alert, either a re-arm period or a hold-down period is counted. Following point A, response packets begin arriving within the required timeframe so a re-arm timer is used (until point B). After point B, the counters are reset and some time later, at point C, a further alert is generated. However, after point C, there are still no response packets so the hold-down timer is used. The hold-down timer can be considerably longer than the re-arm timer in order to minimize the number of unnecessary alerts. After the hold-down timer has expired, at point D, the counters are reset and finally, at point E, the last alert is generated. VI. Security Considerations Since the proposed protocol is, by it's very nature, performance information gathering any information gained from unauthorised access to it us unlikely to be critical in nature. It is, therefore, proposed that that the security for this implementation be covered by the standard access controls in place to limit SNMP access to VTDs. VII. Conclusions This paper discusses an open standard methodology for capturing and reporting on performance data in a VPN tunnel environment. It is envisaged that such information will be particularly useful in a multi vendor environment where the use of SLAs (Service Level Agreements) is being considered. VIII. References [1] Intel NetStructure VPN Gateway Family - Checking Total Interface and Tunnel Count using SNMP with VPN Gateway (041675-PM01). Available at