Internet DRAFT - draft-elkins-ippm-pdm-metrics

draft-elkins-ippm-pdm-metrics



 



INTERNET-DRAFT                                                 N. Elkins
                                                               B. Jouris
                                                         Inside Products
                                                              K. Haining
                                                              U. S. Bank
                                                            M. Ackermann
Intended Status: Proposed Standard                         BCBS Michigan
Expires: July 2014                                      January 30, 2014


         IPPM Considerations for the IPv6 PDM Extension Header
                    draft-elkins-ippm-pdm-metrics-04

Table of Contents

   1  Background  . . . . . . . . . . . . . . . . . . . . . . . . . .  5
     1.1  Terminology . . . . . . . . . . . . . . . . . . . . . . . .  5
     1.2 Why End-to-end Response Time is Needed . . . . . . . . . . .  5
     1.3 Trending of Response Time Data . . . . . . . . . . . . . . .  6
     1.4 What to measure? . . . . . . . . . . . . . . . . . . . . . .  6
     1.5 TCP Timestamp not enough . . . . . . . . . . . . . . . . . .  6
     1.6 Inadequacy of Current Instrumentation Technology . . . . . .  7
       1.6.1 Synthetic transactions . . . . . . . . . . . . . . . . .  7
       1.6.2 PING . . . . . . . . . . . . . . . . . . . . . . . . . .  7
       1.6.3 Estimates of Network Time  . . . . . . . . . . . . . . .  8
       1.6.4 Server / Client Agents . . . . . . . . . . . . . . . . .  8
   2 Solution Parameters  . . . . . . . . . . . . . . . . . . . . . .  9
     2.1 Rationale for proposed solution  . . . . . . . . . . . . . .  9
     2.2 Merits of timestamp / delta in PDM . . . . . . . . . . . . .  9
     2.3 What kind of timestamp?  . . . . . . . . . . . . . . . . . . 10
   2  Why Packet Sequence Number  . . . . . . . . . . . . . . . . . . 10
     2.1 IPv4 IPID : DeFacto Sequence Number  . . . . . . . . . . . . 11
       2.1.1 Description of IPID in IPv4  . . . . . . . . . . . . . . 11
       2.1.2 DeFacto Use of IPID  . . . . . . . . . . . . . . . . . . 11
       2.1.3 Merits of DeFacto Usage  . . . . . . . . . . . . . . . . 12
       2.1.4 Use Cases of IPv4 IPID in Diagnostics  . . . . . . . . . 12
     2.2 TCP sequence number is not enough  . . . . . . . . . . . . . 14
     2.3 Inadequacy of current measurement techniques . . . . . . . . 14
       2.3.1 SNMP / CMIP Counters . . . . . . . . . . . . . . . . . . 15
       2.3.2 Router / Firewall Logs . . . . . . . . . . . . . . . . . 15
       2.3.3 Netflow  . . . . . . . . . . . . . . . . . . . . . . . . 15
       2.3.4 Access to Intermediate Devices . . . . . . . . . . . . . 15
       2.3.4 Modifications to an Operational Production Network . . . 16
   3 Solution Parameters  . . . . . . . . . . . . . . . . . . . . . . 16
     3.1 Packet Trace Meets Criteria  . . . . . . . . . . . . . . . . 17
       3.1.1 Limitations of Packet Capture  . . . . . . . . . . . . . 17
       3.1.2 Problem Scenario 1 . . . . . . . . . . . . . . . . . . . 17
       3.1.2 Problem Scenario 2 . . . . . . . . . . . . . . . . . . . 17
 


Elkins                     Expires July, 2014                   [Page 1]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   4 Rationale for Proposed Solution (PDM)  . . . . . . . . . . . . . 18
   5  Performance and Diagnostic Metrics Destination Option Layout  . 18
     5.1  Destination Options Header  . . . . . . . . . . . . . . . . 18
     5.2  PDM Types . . . . . . . . . . . . . . . . . . . . . . . . . 19
     5.3  Performance and Diagnostic Metrics Destination Option
          (Type 1)  . . . . . . . . . . . . . . . . . . . . . . . . . 19
     5.4  Performance and Diagnostic Metrics Destination Option
          (Type 2)  . . . . . . . . . . . . . . . . . . . . . . . . . 21
   6  Use of the PDM  . . . . . . . . . . . . . . . . . . . . . . . . 24
     6.1 Packet Identification Data . . . . . . . . . . . . . . . . . 24
     6.2 Data in the PDM Destination Option Headers . . . . . . . . . 24
   7 Metrics Derived from the PDM Destination Options . . . . . . . . 25
   8 Base Derived Metrics . . . . . . . . . . . . . . . . . . . . . . 25
     8.1 One-Way Delay  . . . . . . . . . . . . . . . . . . . . . . . 25
     8.2 Round-Trip Delay . . . . . . . . . . . . . . . . . . . . . . 25
     8.3 Server Delay . . . . . . . . . . . . . . . . . . . . . . . . 26
   9 Sample Implementation Flow (PDM Type 1)  . . . . . . . . . . . . 26
     9.1 Step 1 (PDM Type 1)  . . . . . . . . . . . . . . . . . . . . 26
     9.2 Step 2 (PDM Type 1)  . . . . . . . . . . . . . . . . . . . . 27
     9.3 Step 3 (PDM Type 1)  . . . . . . . . . . . . . . . . . . . . 28
     9.4 Step 4 (PDM Type 1)  . . . . . . . . . . . . . . . . . . . . 29
     9.5 Step 5 (PDM Type 1)  . . . . . . . . . . . . . . . . . . . . 30
   10 Sample Implementation Flow (PDM 2)  . . . . . . . . . . . . . . 30
     10.1 Step 1 (PDM Type 2) . . . . . . . . . . . . . . . . . . . . 30
     10.2 Step 2 (PDM Type 2) . . . . . . . . . . . . . . . . . . . . 31
     10.3 Step 3 (PDM Type 2) . . . . . . . . . . . . . . . . . . . . 32
     10.4 Step 4 (PDM Type 2) . . . . . . . . . . . . . . . . . . . . 33
     10.5 Step 5 (PDM Type 2) . . . . . . . . . . . . . . . . . . . . 34
   11 Derived Metrics : Advanced  . . . . . . . . . . . . . . . . . . 34
     11.1 Advanced Derived Metrics : Triage . . . . . . . . . . . . . 34
     11.2 Advanced Derived Metrics : Network Diagnostics  . . . . . . 35
       11.2.1 Retransmit Duplication (RD) . . . . . . . . . . . . . . 35
       11.2.2 ACK Lag (AL)  . . . . . . . . . . . . . . . . . . . . . 36
       11.2.3 Third-party Connection Reset (TPCR) . . . . . . . . . . 36
       11.2.4 Potential Hang (PH) . . . . . . . . . . . . . . . . . . 37
     11.3 Advanced Metrics : Session Classification . . . . . . . . . 37
   12  Use Cases  . . . . . . . . . . . . . . . . . . . . . . . . . . 37
   13  Security Considerations  . . . . . . . . . . . . . . . . . . . 38
   14  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 38
   15 References  . . . . . . . . . . . . . . . . . . . . . . . . . . 38
     15.1 Normative References  . . . . . . . . . . . . . . . . . . . 38
     15.2 Informative References  . . . . . . . . . . . . . . . . . . 39
   16 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 39
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 39


Abstract

 


Elkins                     Expires July, 2014                   [Page 2]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   To diagnose performance and connectivity problems, metrics on real
   (non-synthetic) transmission are critical for timely end-to-end
   problem resolution. Such diagnostics may be real-time or after the
   fact, but must not impact an operational production network. These
   metrics are defined in the IPv6 Performance and Diagnostic Metrics
   Destination Option (PDM). The base metrics are: packet sequence
   number and packet timestamp. Other metrics may be derived from these
   for use in diagnostics.  This document specifies such metrics, their
   calculation, and usage.

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


















 


Elkins                     Expires July, 2014                   [Page 3]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


Copyright and License Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


































 


Elkins                     Expires July, 2014                   [Page 4]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


1  Background

   To diagnose performance and connectivity problems, metrics on real
   (non-synthetic) transmission are critical for timely end-to-end
   problem resolution. Such diagnostics may be real-time or after the
   fact, but must not impact an operational production network. The base
   metrics are: packet sequence number and packet timestamp.  Metrics
   derived from these will be described separately. This document starts
   with the background and rationale for the requirement for end-to-end
   response time and packet sequence number(s).

   Current methods are inadequate for these purposes because they assume
   unreasonable access to intermediate devices, are cost prohibitive,
   require infeasible changes to a running production network, or do not
   provide timely data. The IPv6 Performance and Diagnostic Metrics
   destination option PDM) provides a solution to these problems.

1.1  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

1.2 Why End-to-end Response Time is Needed

   The timestamps or delta values in the PDM traveling along with the
   packet will be used to calculate end-to-end response time, without
   requiring agents in devices along the path. In many networks, end-to-
   end response times are a critical component of Service Levels
   Agreements (SLAs).

   End-to-end response is what the user of a network system actually
   experiences.  When the end user is an individual, he is generally
   indifferent to what is happening along the network; what he really
   cares about is how long it takes to get a response back.  But this is
   not just a matter of individuals' personal convenience.  In many
   cases, rapid response is critical to the business being conducted.

   When the end user is a device (e.g. with the Internet of Things),
   what matters is the speed with which requested data can be
   transferred -- specifically, whether the requested data can be
   transferred in time to accomplish the desired actions.  This can be
   important when the relevant external conditions are subject to rapid
   change.

   Response time and consistency are not just "nice to have".  On many
   networks, the impact can be financial hardship or endanger human
   life.  In some cities, the emergency police contact system operates
 


Elkins                     Expires July, 2014                   [Page 5]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   over IP, law enforcement uses TCP/IP networks, transactions on our
   stock exchanges are settled using IP networks.  The critical nature
   of such activities to our daily lives and financial well-being demand
   a solution.  Section 1.5 will detail the current state of end-to-end
   response time monitoring today.

1.3 Trending of Response Time Data

   In addition to the need for tracking current service, end-to-end
   response time is valuable for capacity planning.  By tracking
   response times, and identifying trends, it becomes possible to
   determine when network capacity is being approached.  This allows
   additional capacity to be obtained before service levels fall below
   requirements.  Without that kind of tracking, the only option is to
   wait until there is a problem, and then scramble to get additional
   capacity on an emergency (and probably high cost) basis.

1.4 What to measure?

   End to end response time can be broken down into 3 parts:

   - Network delay - Application (or server) delay- Client delay

   Network delay may be one-way delay [RFC2679] or round-trip delay
   [RFC2681].

   Additionally, network delay may include multiple hops.  Application
   and server delay include operating system by stack time.  By and
   large, the three timings are 'good enough' measurements to allow
   rapid triage into the failing component.

   Ways are available (provided by operating systems) to measure
   Application and Client times.  Network time can also be measured in
   isolation via some of the measurement techniques described in section
   1.5. The most difficult portion is to integrate network time with the
   server or application times.  Products exist to do this but are
   available at an exorbitant cost, require agents, and will likely
   become more prohibitive as the speed of networks grow and as the
   world becomes more connected via mobile devices.  

   Measuring network time needs to occur at the end-points of the
   transactions being measured.  The time needs to be available,
   regardless of the upper layer protocol being used by the transaction.
   That is, it cannot be for just TCP packets.

1.5 TCP Timestamp not enough

   Some suggest that the TCP Timestamp option might be sufficient to
 


Elkins                     Expires July, 2014                   [Page 6]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   calculate end-to-end response time.

   The TCP Timestamp Option is defined in RFC1323 [RFC1323].  The reason
   for the TCP Timestamp option is to be able to discard packets when
   the TCP sequence number wraps.  (PAWS)

   The problems with the TCP Timestamp option are:

   1.  Not everyone turns this on.
   2.  It is only available for TCP applications
   3.  No indication of date in long-running connections. (That is
       connections which last longer than one day)
   4.  The granularity of the timestamp is at best at millisecond level.

   In the future, as speeds of devices and networks grow and network
   types proliferate, TCP timestamp values, both in terms of granularity
   and date specification, will become more and more inadequate.  Even
   today, on many networks, the timings are at microsecond level not
   millisecond.  New networks called Delay Tolerant Networks may have
   connection times which are very large indeed - hours or even days.

1.6 Inadequacy of Current Instrumentation Technology

   The current technology includes:

   1.  Synthetic transactions
   2.  Pings
   3.  Estimates of network time
   4.  Server / Client Agents

   Let us discuss each of these in detail.

1.6.1 Synthetic transactions

   Synthetic transactions, also known as active measurement, can be
   extremely useful. However, in a dynamic network, the routes taken by
   the packet or the current load on the application may not be the same
   for the real transaction as when the active test was performed.  For
   example, if you time how long it takes for me to drive to work at
   2:00am in the morning, that may not be the same as how long it takes
   me to drive to work during rush hour at 8:00am in the morning.  So,
   it is important to have embedded measurement in the actual packet.

1.6.2 PING

   An ICMP ping measures network time. First, you can PING the remote
   device.  Then you assume that the time it takes to get a response to
   a PING is the same as the time that a transaction would take to
 


Elkins                     Expires July, 2014                   [Page 7]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   traverse the network. However, QoS rules, firewalls, etc. may mean
   that PING, (and other synthetic transactions) may not be subject to
   the same conditions.  PINGs, though extremely useful, also measure
   only network delays.  Server delays must also be provided.

1.6.3 Estimates of Network Time

   If a packet trace is done, it is possible to look at the time between
   when a response was seen to be sent at the packet capture device and
   when the ACK for the response comes back.

   If you assume that the ACK took the same amount of time as the
   original query, you have the network time. Unfortunately, the time
   for the ACK may not be the same as the time for a much larger query
   transaction to traverse the network.

   The biggest problem with this method is that of TCP delayed
   acknowledgements.  If the client is doing delayed ACKs, then the ACK
   will be held until the next request is ready to go out.  In this
   case, the time to receive the ACK has no correlation with network
   time.

1.6.4 Server / Client Agents

   There are also products which claim that they can determine end-to-
   end response times, integrating server and network times - and indeed
   they can do so. But they require agents which must be placed at each
   point which is to be monitored.  That is, it is necessary to add
   those agents EVERYWHERE around the network, at a very high cost -
   both in terms of manpower, knowledge and costs.  These kind of
   products can be purchased by only the richest 1% of the corporations.
   As the speed of networks grow, and as the world becomes more
   connected via mobile devices, such products will only become more
   expensive.  If, indeed, their technology can keep up.

   There are many situations where agents cannot be deployed.  Many
   situations which demand a lightweight, cost effective solution.  You
   may think of an ISP with many customers.  If the customer complains
   of poor response time, it is much more cost-effective for the ISP to
   simply take a packet trace with embedded diagnostics than to
   instrument the entire customer network.

   TCP/IP networks, including the Internet, are used throughout the
   world.  If there is not a scalable and affordable way to measure
   performance bottlenecks and failures, the growth of these networks
   will suffer and indeed may reach a plateau where further growth
   becomes impossible.

 


Elkins                     Expires July, 2014                   [Page 8]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


2 Solution Parameters

   What is needed is:

   1) A method to identify and/or track the behavior of a connection
   without assuming access to the transport devices.

   2) A method to observe a connection in flight without introducing
   agents.

   3) a method to observe arbitrary flows at multiple points within a
   network and correlate the results of those observations in a
   consistent manner.

   4) A method to signal and correlate transport issues to application
   end-to-end behavior.

   5) A method which does not require changes to a production network in
   real time.

   6) Adequate granularity in the measurement technique to provide the
   needed metrics.

   7) A method that is scalable to very large networks.

   8) A method that is affordable to all.

2.1 Rationale for proposed solution

   The current IPv6 specification does not provide a timestamp nor
   similar field in the IPv6 main header or in any extension header. So,
   we propose the IPv6 Performance and Diagnostic Metrics destination
   option (PDM) [ELKPDM].

2.2 Merits of timestamp / delta in PDM

   Advantages include:

   1.  Less overhead than other alternatives.
   2.  Real measure of actual transactions.
   3.  Less cost to provide solutions
   4.  More accurate and complete information.
   5.  Independence from transport layer protocols.
   6.  Ability to span organizational boundaries with consistent
       instrumentation

   In other words, this is a solution to a long-standing problem.  The
   PDM will provide a metric which will allow those responsible for
 


Elkins                     Expires July, 2014                   [Page 9]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   network support to determine what is happening in their network
   without expensive equipment (agents) at each device.

   The PDM does not solve every response time issue for every situation.
   Network connections with multiple hops will still need more granular
   metrics, as will the differentiation between multiple components at
   each host.  That is, TCP/IP stack time vs. applications time will
   still need to be broken out by client software.  What the PDM does
   provide is the ability to do rapid triage.  That is, to determine
   quickly if the problem is in the network or in the server or
   application.

2.3 What kind of timestamp?

   Questions arise about exactly the kind of timestamp to use.  Both the
   Network Time Protocol (NTP) [RFC5905] and Precision Time Protocol
   (PTP) [IEEE1588] are used to provide timing on TCP/IP networks.

   NTP has evolved within the IETF structure while PTP has evolved
   within the Institute of Electrical and Electronics Engineers (IEEE)
   community.  By and large, operating systems such as Windows, Linux,
   and IBM mainframe computers use NTP. These are the source and
   destination systems for packets. Intermediate nodes such as routers
   and switches may prefer PTP.

   Since we are describing a new extension header for destination
   systems, the timestamp to be used will be in accordance with NTP. The
   document, draft-ackermann-ntp-pdm-ntp-usage [NTPPDM], discusses
   guidelines for implementing NTP for use with the PDM.  The timestamp
   is only relevant for PDM type 1.  PDM type 2 uses delta values and
   requires no time synchronization.

2  Why Packet Sequence Number

   While performing network diagnostics of an end-to-end connection, it
   often becomes necessary to find the device along the network path
   creating problems.  Diagnostic data may be collected at multiple
   places along the path (if possible), or at the source and
   destination. Then, the diagnostic data must be matched.  Packet
   sequence number is critical in this matching process.  The timestamp
   or even the IP addresses may be different at different devices.  In
   IPv4 networks, the IPID field was used as a de facto sequence number.

   This method of data collection along the path is of special use on
   large multi-tier networks to determine where packet loss or packet
   corruption is happening.  Multi-tier networks are those which have
   multiple routers or switches on the path between the sender and the
   receiver.
 


Elkins                     Expires July, 2014                  [Page 10]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


2.1 IPv4 IPID : DeFacto Sequence Number

   With IPv4 networks, on many stack implementations, but not all, the
   IPID field has the property of sequentiality.  That is, the IP stack
   sending the packets sent them in numerical order.  This was not a
   requirement for the field, but an implementation which turned out to
   be quite useful in diagnostics.

2.1.1 Description of IPID in IPv4

   In IPv4, the 16 bit IP Identification (IPID) field is located at an
   offset of 4 bytes into the IPv4 header and is described in RFC0791
   [RFC0791]. In IPv6, the IPID field is a 32-bit field contained in the
   Fragment Header defined by section 4.5 of RFC2460 [RFC2460].
   Unfortunately, unless fragmentation is being done by the source node,
   the IPv6 packet will not contain this Fragment Header, and therefore
   will have no Identification field.

   The intended purpose of the IPID field, in both IPv4 and IPv6, is to
   enable fragmentation and reassembly, and as currently specified is
   required to be unique within the maximum segment lifetime (MSL) on
   all datagrams.  The MSL is often 2 minutes.

2.1.2 DeFacto Use of IPID

   In a number of networks, the IPID field is used for more than
   fragmentation.  During network diagnostics, packet traces may be
   taken at multiple places along the path, or at the source and
   destination. Then, packets can be matched by looking at the IPID.

   The inclusion of the IPID makes it easier to identify flows belonging
   to a single node, even if that node might have a different IP
   address.  For example, in the case of sessions going through a NAT or
   proxy server.

   For its de-facto diagnostic mode usage, the IPID field needs to be
   available whether or not fragmentation occurs.  It also needs to be
   unique in the context of the session, and across all the connections
   controlled by the stack.  In IPv4, the IPID is in the main header, so
   it is available for all packets.  As it is a 16-bit field, it wrapped
   during the course of the session and thus had some limitations.

   Even with these limitations, the IPID has been valuable and useful in
   IPv4 for diagnostics and problem resolution.  It is a practical
   solution that is 'good enough' in many instances.  Not having it
   available in IPv6, may be a major detriment to new IPv6 deployments
   and contribute to protracted downtimes in existing IPv6 operations.

 


Elkins                     Expires July, 2014                  [Page 11]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


2.1.3 Merits of DeFacto Usage

   As network technology evolves, the uses to which fields are put can
   change as well.  De-facto use is powerful, and should not be lightly
   ignored.  In fact, it is a testament to the power and pervasiveness
   of the protocol that users create new uses for the original
   technology.

   For example, the use of the IPID goes beyond the vision of the
   original authors.  This sort of thing has happened with numerous
   other technologies and protocols.

   The implementation of the traceroute command sends ICMP echo packets
   with a varying TTL.  This is a very useful for diagnostics yet
   departs from the original purpose of TTL.

   Similarly, cell phones have evolved to be more than just a means of
   vocal communication, including Internet communications, photo-
   sharing, stock exchange transactions, etc.  Indeed, the Internet
   itself has evolved, from a small network for researchers and the
   military to share files into the pervasive global information
   superhighway that it is today.

2.1.4 Use Cases of IPv4 IPID in Diagnostics

   Use Case # 1 --- Large Insurance Company

   -  (estimated time saved by use of IPID:  7 hours)

   Performance Tool produces extraneous packets

   - Issue was whether a performance tool was accurately replicating
   session flow during performance testing.

   - Trace IPIDs showed more unique packets within same flow from
   performance tool compared to IE Browser.

   - Having the clear IPID sequence numbers also showed where and why
   the extra packets were being generated.

   - Solution: Problem rectified in subsequent version of performance
   tool.

   - Without IPID, it was not clear if there was an issue at all.


   Use Case #2 --- Large Bank

 


Elkins                     Expires July, 2014                  [Page 12]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   -   (estimated time saved by use of IPID:  4 hours)

   Batch transfer duration increases 12x


   -  A data transfer which formerly took 30 minutes to complete started
   taking 6-8 hours to complete.

   -  Was there packet loss?  All the vendors said no.

   -  The other applications on the network did not report any problems.

   -  4 trace points were used, and the IPIDs in the packets were
   compared.

   -  The comparison showed 7% packet loss.

   -  Solution: WAN hardware was replaced and problem fixed.

   -  Without IPID, no one would agree a problem existed


   Use Case #3 --- Large Bank

   -    (estimated time saved by use of IPID: 6 hours)

   Very slow interactive performance

   - All network links looked good.

   - Traces showed duplicated small packets (which can be OK).

   - We saw that the IPID was the same in both packets but the TTL was
   always + 1.

   - A network device was "splitting" only small packets over two
   interfaces.

   - The small packets were control info, telling other side to slow
   down.

   - It erroneously looked like network congestion.

   - Solution:  Network device replaced and good interactive performance
   restored.

   - Without IPID, flows would have appeared OK.

 


Elkins                     Expires July, 2014                  [Page 13]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   Use Case #4 --- Large Government Agency

   -    (estimated time saved by use of IPID: 9 hours)


   VPN drops

   - Cell phone connections to law enforcement were being dropped. The
   connections were going through a VPN.

   - All parties (both sides of VPN connection, application, etc.) said
   it was not their problem.  The problem went on for weeks.

   - Finally, we took a trace which showed packets with IPID and TTL
    that did not match others in the flow AT ALL coming from the
   router nearest the application server end of VPN.

   - Solution:  Provider for VPN for application server changed. Problem
   resolved.

   - Without IPID, much harder to diagnose problem. Same case also
   happened with large corporation.  Again, all       parties saying not
   their fault until proven via packet trace.)

2.2 TCP sequence number is not enough

   TCP Sequence number is defined in RFC0793 [RFC0793].  Some have
   proposed that this field will meet the needs of diagnostics for a
   packet sequence number.  Indeed, the TCP Sequence Number along with
   the TCP Acknowledgment number can be used to calculate dropped
   packets, duplicate packets, out-of-order packets etc. That is, IF the
   packet flow itself reflects accurately what happened on the wire!

   See Scenario 1 (Section 1.5.2) and Scenario 2 (Section 1.5.3) for
   what happens with packet trace capture in real networks.

   The TCP Sequence Number is, obviously, available only for TCP and not
   other higher layer protocols.

2.3 Inadequacy of current measurement techniques

   The question arises of whether current methods of instrumentation
   cannot be used without a change to the protocol.  Current methods of
   measuring network data, other than packet traces, are inadequate
   because they assume unreasonable access to intermediate devices, are
   cost prohibitive, require infeasible changes to a running production
   network, or do not provide timely data.  This section will discuss
   each of these in detail.
 


Elkins                     Expires July, 2014                  [Page 14]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   Current methods include both instrumentation and third party
   products.  These include SNMP, CMIP, router logs, and firewall logs.

2.3.1 SNMP / CMIP Counters

   The traditional network performance counters measured by SNMP or CMIP
   do not provide information at the granularity desired on the behavior
   of application flows across the network.  The problem is that such
   counters do not contain enough data be able to provide a detailed and
   realistic view of the end-to-end behavior of a connection.

2.3.2 Router / Firewall Logs

   Router and firewall logs may provide some information for diagnostics
   Routers and firewalls in a production network are generally set to do
   minimal logging and diagnostics to allow maximum efficiency and
   throughput.  Such devices cannot be asked to collect detailed data
   for an operational problem, as this requires a change to a production
   network.

2.3.3 Netflow

   Netflow is instrumentation which is available from some middle
   devices.  In production networks, such devices are generally set to
   do minimal logging and diagnostics to allow maximum efficiency and
   throughput.

   It is often also not possible to start data collection in the middle
   of the day on a production network.

2.3.4 Access to Intermediate Devices

   The above current methods require access to the transport
   infrastructure - that is, the routers, switches or other intermediate
   devices.  In some cases, this is possible; in others, the connections
   in question may cross a number of administrative entities (both in
   the transport and in the endpoints).  When it is the enterprise at
   the endpoint which is interested in the diagnostics, the
   administrative entities who own the devices in the middle of the path
   have no stake in operational measurement at the enterprise or
   application level.   They have no reason to provide the necessary
   data or to impact the basic transport with the instrumentation
   necessary to capture flow-oriented data as a continuous stream
   suitable for general consumption.

   In other words, if you don't own the path end-to-end, you will not be
   able to get the data you need if you are required to get it from the
   devices in the middle.  Not only that, the devices in the middle do
 


Elkins                     Expires July, 2014                  [Page 15]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   not have the instrumentation necessary to make it easy to do end-to-
   end diagnostics because they are not responsible for that and so do
   not want to burden their devices with doing those kind of functions.

   Many networks may not own the path end-to-end.  They may be working
   with a business partner's network or crossing the Internet.

2.3.4 Modifications to an Operational Production Network

   Even when the enterprise does own all the devices along the entire
   path, to get enough data to adequately resolve a problem means
   changing the device configuration to do detailed diagnostics.  In a
   production network, devices are generally set to do minimal logging
   and diagnostics.  This is to allow maximum efficiency and throughput.
   The more logging and diagnostics such devices do, the fewer resources
   they have for actually transmitting traffic across the network.

   So, if devices are to be asked to collect more data for an
   operational problem, this requires a change to a production network.
   This is generally not possible as it destabilizes a critical network
   during business hours, thus potentially disrupting many customers.
   Making changes is usually a lengthy process requiring change control,
   testing on a test network, etc.  On networks which are critical to
   the business function, changing configuration "in flight" is
   generally not an option.

3 Solution Parameters

   What is needed is:

   1) A method to identify and/or track the behavior of a connection
   without assuming access to the transport devices.

   2) A method to observe a connection in flight without introducing
   agents at endpoints.

   3) A method to observe arbitrary flows at multiple points within a
   network and correlate the results of those observations in a
   consistent manner.

   4) A method to signal and correlate transport issues to application
   end-to-end behavior.

   5) A method which does not require changes to a production network in
   real time.

   6) Adequate granularity in the measurement technique to provide the
   needed metrics.
 


Elkins                     Expires July, 2014                  [Page 16]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


3.1 Packet Trace Meets Criteria

   The only instrumentation which provides enough detail to diagnose
   end-to-end problems is a packet trace. Packet traces do not require
   changes to devices in production mode because in many networks,
   products are available to capture packets in passive mode. Such
   products continuously monitor network traffic. Often, they are used
   not for diagnostic reasons but for regulatory reasons.  For example,
   there may be legal requirements to log all stock exchange
   transactions.

   Products for packet tracing are available freely and can be used at a
   client host without disrupting major portions of the network.

3.1.1 Limitations of Packet Capture

   Even though packets are the only reliable way to provide data at the
   needed granularity, there are limitations with collecting packet
   traces in some situations.  They are as follows:

3.1.2 Problem Scenario 1

   1. Packets are captured for analysis at places like large core
   switches.  All packets are kept.  Again, not necessarily for
   diagnostic reasons but for regulatory ones. For example, records of
   all stock trades may need to be kept for a certain number of years.

   2. When there is a problem, an analyst extracts the needed
   information.

   3. If the extract is done incorrectly, as often happens, or the
   packet capture itself is incorrect, then there may be false duplicate
   packets which can be quite misleading and can lead to wrong
   conclusions. Are these real TCP duplicates?  Is there congestion on
   the subnet? Are these retransmissions?  Situations have been seen
   where routers incorrectly send two packets instead of one - is this
   such a situation?

   4.  This is the type of problem that can be solved by having an IP
   packet sequence number.

3.1.2 Problem Scenario 2

   1. In this scenario, packets are captured for analysis at places like
   a middleware box.  It may be because problems are suspected with the
   box itself or it is a central point of the suspected failure.

   2. The box may not offer any way to tailor the packet capture.  "You
 


Elkins                     Expires July, 2014                  [Page 17]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   will get what we give you, how we give it to you!" is their
   philosophy.

   3. The packet capture incorrectly duplicates only packets going to
   certain nodes.

   4. Again, there are false duplicate packets which can be misleading
   and can lead to wrong conclusions. Are these real TCP duplicates?  Is
   there congestion on the subnet?  Situations have been seen where
   routers incorrectly send two packets instead of one - is this such a
   situation?

4 Rationale for Proposed Solution (PDM)

   The current IPv6 specification does not provide a packet sequence
   number or similar field in the IPv6 main header.  One option might be
   to force all IPv6 packets to contain a Fragment Header.  In packets
   which are entire in and of themselves, the fragment ID would be zero-
   that is, an atomic fragment. Why was a new destination option header
   defined rather than recommending that Fragment Header be used?

   Our reasoning was that the PDM destination option header would
   provide multiple benefits : the packet sequence number and the
   timings to calculate response time.

   As defined in RFC2460 [RFC2460], destination options are carried by
   the IPv6 Destination Options extension header.  Destination options
   include optional information that need be examined only by the IPv6
   node given as the destination address in the IPv6 header, not by
   routers in between.

   The PDM DOH will be carried by each packet in the network, if this is
   configured.  That is, the PDM DOH is optional.  If the user of the OS
   configures the PDM DOH to be used, then it will be carried in the
   packet.

   The metrics in the PDM are for 'real' or passive data.  That is, they
   are of the traffic actually traveling on the network.

5  Performance and Diagnostic Metrics Destination Option Layout

5.1  Destination Options Header

   The IPv6 Destination Options Header is used to carry optional
   information that need be examined only by a packet's destination
   node(s). The Destination Options Header is identified by a Next
   Header value of 60 in the immediately preceding header and is defined
   in RFC2460 [RFC2460].
 


Elkins                     Expires July, 2014                  [Page 18]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


5.2  PDM Types

   The IPv6 Performance and Diagnostic Metrics Destination Option (PDM)
   is an implementation of the Destination Options Header (Next Header
   value = 60).  Two types of PDM are defined. PDM type 1 requires time
   synchronization.  PDM type 2 does not require time synchronization.

   PDM type 1 and PDM type 2 are mutually exclusive.  That is, a 5-tuple
   can either both send PDM type 1 or both send PDM type 2.

5.3  Performance and Diagnostic Metrics Destination Option (Type 1)

   PDM type 1 is used to facilitate diagnostics by including a packet
   sequence number and timestamp.

   The PDM type 1 is encoded in type-length-value (TLV) format as
   follows:

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Option Type  | Option Length | PSN This Packet               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +                                                               +
      |                                                               |
      +                         TimeStamp This Packet (64-bit)        +
      |                                                               |
      +                                                               +
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |   PSN Last Packet             |   Reserved                    |
      |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +                                                               +
      |                                                               |
      +                         TimeStamp Last Packet (64-bit)        +
      |                                                               |
      +                                                               +
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   Option Type

   TBD = 0xXX (TBD)  [To be assigned by IANA] [RFC2780]

   Option Length
 


Elkins                     Expires July, 2014                  [Page 19]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   8-bit unsigned integer. Length of the option, in octets, excluding
   the Option Type and Option Length fields. This field MUST be set to
   22.


   Packet Sequence Number This Packet (PSNTP)

   16-bit unsigned integer.  This field will wrap. It is intended for
   human use.

   Initialized at a random number and monotonically incremented for
   packet on the 5-tuple.  The 5-tuple consists of the source and
   destination IP addresses, the source and destination ports, and the
   upper layer protocol (ex. TCP, ICMP, etc).

   Operating systems MUST implement a separate packet sequence number
   counter per 5-tuple. Operating systems MUST NOT implement a single
   counter for all connections.

   Note: This is consistent with the current implementation of the IPID
   field in IPv4 for many, but not all, stacks.


   TimeStamp This Packet (TSTP)

   A 64-bit unsigned integer field containing a timestamp that this
   packet was sent by the source node.  The value indicates the number
   of seconds since January 1, 1970, 00:00 UTC, by using a fixed point
   format.  In this format, the integer number of seconds is contained
   in the first 32 bits of the field, and the remaining 32 bits resolve
   to picoseconds.

   This follows timestamp formats used in Network Time Protocol (NTP)
   [RFC5905] and SEND [RFC3971]. A discussion of how to implement NTP
   for use with PDM header type 1 is in draft-ackermann- ntp-pdm-ntp-
   usage-00 [NTPPDM].

   Implementation note: This format is compatible with the usual
   representation of time under UNIX, although the number of bits
   available for the integer and fraction parts in different Unix
   implementations vary.


   Packet Sequence Number Last Received (PSNLR)

   16-bit unsigned integer.  This is the PSN of the packet last received
   on the 5-tuple.

 


Elkins                     Expires July, 2014                  [Page 20]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   TimeStamp Last Received (TSLR)

   A 64-bit unsigned integer field containing a timestamp.  This is the
   timestamp of the packet last received on the 5-tuple.  Format is the
   same as TSTP.

5.4  Performance and Diagnostic Metrics Destination Option (Type 2)

   The second type of IPv6 Performance and Diagnostic Metrics
   Destination Option (PDM) is as follows.  PDM type 1 and PDM type 2
   are mutually exclusive.  That is, a 5-tuple can either both send PDM
   type 1 or both send PDM type 2.

   PDM type 2 contains the following fields:

               PSNTP    : Packet Sequence Number This Packet
               PSNLR    : Packet Sequence Number Last Received 
               DELTALR  : Delta Last Received 
               PSNLS    : Packet Sequence Number Last Sent   
               DELTALS  : Delta Last Sent

   PDM destination option type 2 is encoded in type-length-value (TLV)
   format as follows:

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Option Type  | Option Length | PSN This Packet               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |   PSN Last Received           |  PSN Last Sent                |
      |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |   Delta Last Received         |  Delta Last Sent              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | TType |
      +-+-+-+-+


   Option Type

   TBD = 0xXX (TBD)  [To be assigned by IANA] [RFC2780]

   Option Length

   8-bit unsigned integer. Length of the option, in octets, excluding
   the Option Type and Option Length fields. This field MUST be set to
   22.


 


Elkins                     Expires July, 2014                  [Page 21]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   Packet Sequence Number This Packet (PSNTP)

   16-bit unsigned integer.  This field will wrap. It is intended for
   human use.

   Initialized at a random number and monotonically incremented for
   packet on the 5-tuple.  The 5-tuple consists of the source and
   destination IP addresses, the source and destination ports, and the
   upper layer protocol (ex. TCP, ICMP, etc).

   Operating systems MUST implement a separate packet sequence number
   counter per 5-tuple. Operating systems MUST NOT implement a single
   counter for all connections.

   Note: This is consistent with the current implementation of the IPID
   field in IPv4 for many, but not all, stacks.


   Packet Sequence Number Last Received (PSNLR)

   16-bit unsigned integer.  This is the PSN of the packet last received
   on the 5-tuple.


   Packet Sequence Number Last Sent (PSNLS)

   16-bit unsigned integer.  This is the PSN of the packet last sent on
   the 5-tuple.


   Delta TimeStamp Type (TIMETYPE)

   4-bit unsigned integer.  This is the type of time contained in the
   delta fields below.

   0 - unknown
   1 - time is in units of nanoseconds
   2 - time is in units microseconds
   3 - time is in units of milliseconds
   4 - time is in units of seconds
   5 - time is in units of minutes
   6 - time is in units of hours
   7 - time is in units of days

   The values 5 - 7 are relevant for Delay Tolerant Networks (DTN) which
   may operate with long delays between packets.

   Delta Last Received (DELTALR)
 


Elkins                     Expires July, 2014                  [Page 22]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   A 16-bit unsigned integer field.  This is server delay.

   DELTALR = Send time packet 2 - Receive time packet 1

   The value is according to the scale in TIMETYPE.


   Delta Last Sent (DELTALS)

   A 16-bit unsigned integer field.  This is round trip or end-to-end
   time.

   Delta Last Sent = Receive time packet 2 - Send time packet 1

   The value is in according to the scale in TIMETYPE.


   Option Type

   The two highest-order bits of the Option Type field are encoded to
   indicate specific processing of the option; for the PDM destination
   option, these two bits MUST be set to 00. This indicates the
   following processing requirements:

   00 - skip over this option and continue processing the header.

   RFC2460 [RFC2460] defines other values for the Option Type field.
   These MUST NOT be used in the PDM.  The other values are as follows:

   01 - discard the packet.

   10 - discard the packet and, regardless of whether or not the
   packet's Destination Address was a multicast address, send an ICMP
   Parameter Problem, Code 2, message to the packet's Source Address,
   pointing to the unrecognized Option Type.

   11 - discard the packet and, only if the packet's Destination Address
   was not a multicast address, send an ICMP Parameter Problem, Code 2,
   message to the packet's Source Address, pointing to the unrecognized
   Option Type.

   In keeping with RFC2460 [RFC2460], the third-highest-order bit of the
   Option Type specifies whether or not the Option Data of that option
   can change en-route to the packet's final destination.

   In the PDM, the value of the third-highest-order bit MUST be 0.  The
   possible values are as follows:

 


Elkins                     Expires July, 2014                  [Page 23]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   0 - Option Data does not change en-route


   1 - Option Data may change en-route

   The three high-order bits described above are to be treated as part
   of the Option Type, not independent of the Option Type.  That is, a
   particular option is identified by a full 8-bit Option Type, not just
   the low-order 5 bits of an Option Type.

6  Use of the PDM

6.1 Packet Identification Data

   Each packet contains information about the sender and receiver. In IP
   protocol the identifying information is called a "5-tuple".  The
   flows described below are for the set of packets flowing between A
   and B without consideration of any other packets sent to any other
   device from Host A or Host B.

   The 5-tuple consists of:

   SADDR : IP address of the sender
   SPORT : Port for sender
   DADDR : IP address of the destination
   DPORT : Port for destination
   PROTC : Protocol for upper layer (ex. TCP, UDP, ICMP, etc.)

6.2 Data in the PDM Destination Option Headers

   The IPv6 Performance and Diagnostic Metrics Destination Option (PDM)
   is an implementation of the Destination Options Header (Next Header
   value = 60).  Two types of PDM are defined. PDM type 1 requires time
   synchronization.  PDM type 2 does not require time synchronization.

   PDM type 1 and PDM type 2 are mutually exclusive.  That is, a 5-tuple
   can either both send PDM type 1 or both send PDM type 2.

   PDM type 1 contains the following fields:

   PSNTP : Packet Sequence Number This Packet
   TSTP  : Timestamp This Packet
   PSNLR : Packet Sequence Number Last Received
   TSLR  : Timestamp Last Received

   PDM type 2 contains the following fields:

   PSNTP    : Packet Sequence Number This Packet 
 


Elkins                     Expires July, 2014                  [Page 24]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   PSNLR    : Packet Sequence Number Last Received 
   DELTALR  : Delta Last Received 
   PSNLS    : Packet Sequence Number Last Sent 
   DELTALS  : Delta Last Sent

   The metrics which may be derived from these fields will be discussed
   in the following sections.

7 Metrics Derived from the PDM Destination Options

   A number of metrics may be derived from the data contained in the
   PDM.  Some are relationships between two packets, others require
   analysis of multiple packets or multiple protocols.

   These metrics fall into the following categories:

   1. Base derived metrics
   2. Metrics used for triage
   3. Metrics used for network diagnostics
   4. Metrics used for session classification
   5. Metrics used for end user performance optimization

   It must be understood that when a metric is discussed, it includes
   the average, median, and other statistical variations of that metric.

   In the next section, we will discuss the base metrics.  In later
   sections, we will discuss the more advanced metrics and their uses.

8 Base Derived Metrics

   The base metrics which may be derived from the PDM are:

   1.  One-way delay
   2.  Round-trip delay
   3.  Server delay

8.1 One-Way Delay

   One-way delay is the time taken to traverse the path one way between
   one network device to another.  The path from A to B is distinguished
   from the path from B to A.  For many reasons, the paths may have
   different characteristics and may have different delays.  One-way
   delay is discussed in "A One-way Delay Metric for IPPM" [RFC2679].

8.2 Round-Trip Delay

   Round-trip delay is the time taken to traverse the path both ways
   between one network device to another.  The entire delay to travel
 


Elkins                     Expires July, 2014                  [Page 25]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   from A to B and B to A is used. Round-trip delay cannot tell if one
   path is quite different from another.  Round-trip delay is discussed
   in "A Round-trip Delay Metric for IPPM" [RFC2681].

8.3 Server Delay

   Server delay is the interval between when a packet is received by a
   device and a subsequent packet is sent back in response.  This may be
   "Server Processing Time".  It may also be a delay caused by
   acknowledgements.  Server processing time includes the time taken by
   the combination of the stack and application to return the response.

9 Sample Implementation Flow (PDM Type 1)

   Following is a sample simple flow with one packet sent from Host A
   and one packet received by Host B.

   Time synchronization is required between Host A and Host B. See
   draft-ackermann-ntp-pdm-ntp-usage-00 [NTPPDM] for a description of
   how an NTP implementation may be set up to achieve good time
   synchronization.

   Each packet, in addition to the PDM, contains information on the
   sender and receiver. This is the 5-tuple consisting of:

   SADDR : IP address of the sender
   SPORT : Port for sender
   DADDR : IP address of the destination
   DPORT : Port for destination
   PROTC : Protocol for upper layer (ex. TCP, UDP, ICMP, etc.)

   It should be understood that the packet identification information is
   in each packet. We will not repeat that in each of the following
   steps.

9.1 Step 1 (PDM Type 1)

   Packet 1 is sent from Host A to Host B.  The time for Host A is set
   initially to 10:00AM.

   The timestamp and packet sequence number are sent in the PDM.

   The initial PSNTP from Host A starts at a random number.  In this
   case, 25.  The sub-second portion of the timestamp has been omitted
   for the sake of simplicity.



 


Elkins                     Expires July, 2014                  [Page 26]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


               Packet 1

               +----------+             +----------+
               |          |             |          |
               |   Host   | ----------> |   Host   |
               |    A     |             |    B     |
               |          |             |          |
               +----------+             +----------+

               PDM Contents:
               PSNTP : Packet Sequence Number This Packet:   25
               TSTP  : Timestamp This Packet:                10:00:00
               PSNLR : Packet Sequence Number Last Received: -
               TSLR  : Timestamp Last Received:              -

   There are no derived statistics after packet 1.

9.2 Step 2 (PDM Type 1)

   Packet 1 is received by Host B.  The time for Host B was synchronized
   with Host A.  Both were set initially to 10:00AM.

   The timestamp and PSN for the received packet are placed in the PSNLR
   and TSLR fields. These are from the point of view of B.  That is,
   they indicate when the packet from A was received and which packet it
   was.

   The PDM is not sent at this point. It is only prepared. It will be
   sent when the response to packet 1 is sent by Host B.

               Packet 1 Received

                    +----------+             +----------+
                    |          |             |          |
                    |   Host   | ----------> |   Host   |
                    |    A     |             |    B     |
                    |          |             |          |
                    +----------+             +----------+

               PDM Contents:

               PSNTP : Packet Sequence Number This Packet:   -
               TSTP  : Timestamp This Packet:                -
               PSNLR : Packet Sequence Number Last Received: 25
               TSLR  : Timestamp Last Received:              10:00:03

   At this point, the following metric may be derived: one-way delay. In
   fact, we now know the one-way delay and the path. We will call this
 


Elkins                     Expires July, 2014                  [Page 27]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   path 1.  This will be the outbound path from the point of view of
   Host A and the inbound path from the point of view of Host B.

   The calculation of one-way delay (path 1) is as follows:

   One-way delay (path 1) = Time packet 1 was received by B - Time
   Packet 1 was sent by A

   If we make the substitutions from our sample case above, then:

   One-way delay (path 1) = 10:00:03 - 10:00:00 or 3 seconds

9.3 Step 3 (PDM Type 1)

   Packet 2 is sent from Host B to Host A.  The initial PSNTP from Host
   B starts at a random number. In this case, 12.

               Packet 2

                       +----------+             +----------+
                       |          |             |          |
                       |   Host   | <---------- |   Host   |
                       |    A     |             |    B     |
                       |          |             |          |
                       +----------+             +----------+

              PDM Contents:

              PSNTP : Packet Sequence Number This Packet:   12
              TSTP  : Timestamp This Packet:                10:00:07
              PSNLR : Packet Sequence Number Last Received: 25
              TSLR  : Timestamp Last Received:              10:00:03

   After Packet 2 is sent, the following metric may be derived: server
   delay.

   The calculation of server delay is as follows:

   Server delay = Time Packet 2 is sent by B - Time Packet 1 was
   received by B

   Again, making the substitutions from the sample case: Server delay =
   10:00:07 - 10:00:03 or 4 seconds

   Further elaborations of server delay may be done by limiting the data
   length to be greater than 1.  Some protocols, for example, TCP, have
   acknowledgements with a data length of 0 or keep-alive packets with a
   data length of 1.  An ACK may preceed the actual response data
 


Elkins                     Expires July, 2014                  [Page 28]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   packet. Keep-alives may be interspersed within the data flow.

9.4 Step 4 (PDM Type 1)

   Packet 2 is received by Host A.

   The timestamp and PSN for the received packet are placed in the PSNLR
   and TSLR fields. These are from the point of view of A.  That is,
   they indicate when the packet from B was received and which packet it
   was.

   The PDM is not sent at this point. It is only prepared. It will be
   sent when the NEXT packet to Host B is sent by Host A.

                 Packet 2 Received

                       +----------+             +----------+
                       |          |             |          |
                       |   Host   | <---------- |   Host   |
                       |    A     |             |    B     |
                       |          |             |          |
                       +----------+             +----------+

               PDM Contents:

               PSNTP : Packet Sequence Number This Packet:   -
               TSTP  : Timestamp This Packet:                -
               PSNLR : Packet Sequence Number Last Received: 12
               TSLR  : Timestamp Last Received:              10:00:10

   However, at this point, the following metric may be derived: one-way
   delay (path 2).

   The calculation of one-way delay (path 2) is as follows:

   One-way delay (path 2) = Time packet 2 received by A - Time packet 2
   sent by B

   If we make the substitutions from our sample case above, then:

   One-way delay (path 2) = 10:00:10 - 10:00:07 or 3 seconds







 


Elkins                     Expires July, 2014                  [Page 29]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


9.5 Step 5 (PDM Type 1)

   Packet 3 is sent from Host A to Host B.

               Packet 3

                    +----------+             +----------+
                    |          |             |          |
                    |   Host   | ----------> |   Host   |
                    |    A     |             |    B     |
                    |          |             |          |
                    +----------+             +----------+

               PDM Contents:

               PSNTP : Packet Sequence Number This Packet:   26
               TSTP  : Timestamp This Packet:                10:00:50
               PSNLR : Packet Sequence Number Last Received: 12
               TSLR  : Timestamp Last Received:              10:00:10


   At this point the PDM flows across the network revealing the last
   received timestamp and PSN.


10 Sample Implementation Flow (PDM 2)

   Following is a sample simple flow for PDM type 2 with one packet sent
   from Host A and one packet received by Host B.  PDM type 2 does not
   require time synchronization between Host A and Host B.  The
   calculations to derive meaningful metrics for network diagnostics is
   shown below each packet sent or received.

   Each packet, in addition to the PDM contains information on the
   sender and receiver. As discussed before, a 5- tuple consists of:

   SADDR : IP address of the sender
   SPORT : Port for sender
   DADDR : IP address of the destination
   DPORT : Port for destination
   PROTC : Protocol for upper layer (ex. TCP, UDP, ICMP)

   It should be understood that the packet identification information is
   in each packet. We will not repeat that in each of the following
   steps.


10.1 Step 1 (PDM Type 2)
 


Elkins                     Expires July, 2014                  [Page 30]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   Packet 1 is sent from Host A to Host B.  The time for Host A is set
   initially to 10:00AM.

   The timestamp and packet sequence number are noted by the sender
   internally.  The packet sequence number and timestamp are sent in the
   packet.

   Packet 1

                       +----------+             +----------+
                       |          |             |          |
                       |   Host   | ----------> |   Host   |
                       |    A     |             |    B     |
                       |          |             |          |
                       +----------+             +----------+

               PDM type 2 Contents:

               PSNTP    : Packet Sequence Number This Packet:     25 
               PSNLR    : Packet Sequence Number Last Received:   - 
               DELTALR  : Delta Last Received:                    - 
               PSNLS    : Packet Sequence Number Last Sent:       - 
               DELTALS  : Delta Last Sent:                        -

   Internally, within the sender, Host A, it must keep:

               PSNTP : Packet Sequence Number This Packet:     25
               TSTP : Timestamp This Packet:                   10:00:00

   Note, the initial PSNTP from Host A starts at a random number.  In
   this case, 25.  The sub-second portion of the timestamp has been
   omitted for the sake of simplicity.

   There are no derived statistics after packet 1.

10.2 Step 2 (PDM Type 2)

   Packet 1 is received at Host B.  His time is set to one hour later
   than Host A.  In this case, 11:00AM

   Internally, within the receiver, Host B, it must keep: 

               PSNLR : Packet Sequence Number Last Received:    25 
               TSLR  : Timestamp Last Received             :    11:00:03

   Note, this timestamp is in Host B time.  It has nothing whatsoever to
   do with Host A time.

 


Elkins                     Expires July, 2014                  [Page 31]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   At this point, we have no derived statistics.  In PDM type 1, the
   derived statistic one-way delay (path 1) could have been calculated.
   In PDM type 2, this is not possible because there is no time
   synchronization.

10.3 Step 3 (PDM Type 2)

   Packet 2 is sent by Host B to Host A.  Note, the initial PSNTP from
   Host B starts at a random number.  In this case, 12.   Before sending
   the packet, Host B does a calculation of deltas.  Since Host B knows
   when it is sending the packet, and it knows when it received the
   previous packet, it can do the following calculation:

   Sending time (packet 2) - receive time (packet 1)

   We will call the result of this calculation: Delta Last Received.

   That is:

   DELTALR = Sending time (packet 2) - receive time (packet 1)

   Note, both sending time and receive time are saved internally in Host
   B.  They do not travel in the packet. Only the Delta is in the
   packet.

   Assume that within Host B is the following:

               PSNLR : Packet Sequence Number Last Received:   25 
               TSLR  : Timestamp Last Received             :   11:00:03 
               PSNTP : Packet Sequence Number This Packet  :   12 
               TSTP  : Timestamp This Packet               :   11:00:07

   Hence, DELTALR becomes:

   4 seconds = 11:00:07 - 11:00:03

   Let us look at the PDM, and then we will look at the derived metrics
   at this point.

               Packet 2

                    +----------+             +----------+
                    |          |             |          |
                    |   Host   | <---------- |   Host   |
                    |    A     |             |    B     |
                    |          |             |          |
                    +----------+             +----------+

 


Elkins                     Expires July, 2014                  [Page 32]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


                    PDM Type 2 Contents:

               PSNTP    : Packet Sequence Number This Packet:    12
               PSNLR    : Packet Sequence Number Last Received:  25
               DELTALR  : Delta Last Received:                    4
               PSNLS    : Packet Sequence Number Last Sent:       -
               DELTALS  : Delta Last Sent:                        -

   After Packet 2, the following metrics may be derived:

   Server delay = DELTALR

   Metrics left to be calculated are the path delay for path 2. This may
   be calculated when Packet 3 is sent. Clearly, if there is NO next
   packet for the 5-tuple, then this value will be missing.

10.4 Step 4 (PDM Type 2)

   Packet 2 is received at Host A.  Remember, its time is set to one
   hour earlier than Host B. It will keep internally:

               PSNLR : Packet Sequence Number Last Received:   12 
               TSLR  : Timestamp Last Received             :   10:00:12

   Note, this timestamp is in Host A time.  It has nothing whatsoever to
   do with Host B time.

   At this point, we have two derived metrics:

   1. Two-way delay or Round Trip time
   2. Total end-to-end time

   The formula for end-to-time is:

   Time Last Received - Time Last Sent

   For example, packet 25 was sent by Host A at 10:00:00. Packet 12 was
   received by Host A at 10:00:12 so:

   End-to-End response time = 10:00:12 - 10:00:00 or 12

   This derived metric we will call DELTALS or Delta Last Sent.

   To calculate two-way delay, the formula is:

   Two-way delay = DELTALS - DELTALR

   Or:
 


Elkins                     Expires July, 2014                  [Page 33]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   Two-way delay = 12 - 4 or 8

   Now, the only problem is that at this point all metrics are in the
   Host and not exposed in a packet. To do that, we need a third packet.

10.5 Step 5 (PDM Type 2)

   Packet 3 is sent from Host A to Host B.

               Packet 3

                       +----------+             +----------+
                       |          |             |          |
                       |   Host   | ----------> |   Host   |
                       |    A     |             |    B     |
                       |          |             |          |
                       +----------+             +----------+

               PDM Type 2 Contents:

               PSNTP    : Packet Sequence Number This Packet:    26 
               PSNLR    : Packet Sequence Number Last Received:  12 
               DELTALR  : Delta Last Received:                   * 
               PSNLS    : Packet Sequence Number Last Sent:      25 
               DELTALS  : Delta Last Sent:                       12

11 Derived Metrics : Advanced

   A number of more advanced metrics may be derived from the data
   contained in the PDM.  Some are relationships between two packets,
   others require analysis of multiple packets. The more advanced
   metrics fall into the categories shown below:


   1. Metrics used for triage
   2. Metrics used for network diagnostics
   3. Metrics used for session classification
   4. Metrics used for end user performance optimization

   We will discuss each of these in turn.

11.1 Advanced Derived Metrics : Triage

   In this case, triage means to distinguish between problems occurring
   on the network paths or the server. The PDM provides one-way delay
   and server delay.  This will enable distinguishing which path is a
   bottleneck as well as whether the server is a bottleneck.

 


Elkins                     Expires July, 2014                  [Page 34]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


11.2 Advanced Derived Metrics : Network Diagnostics

   The data provided by the PDM may be used in combination with data
   fields in other protocols.  We will call this Inter-Protocol Network
   Diagnostics (IPND).

   The PDM also allows us to use only a single trace point for a number
   of diagnostic situations where today we need to trace at multiple
   points to get required data. In diagnostics, there is often the
   question of did the end device really send the packet and it got lost
   in the network or did it not send it at all.

   So, what is done is that diagnostic traces are run at both client and
   server to get the required data.  With the data provided by the PDM,
   in a number of the cases, this will not be necessary.

   For example, taking PDM values along with data fields in the TCP
   protocol, the following may be found:

   1. Retransmit duplication (RD)
   2. ACK lag (AL)
   3. Third-party connection reset (TPCR)
   4. Elapsed time connection reset (ETCR)

   A description of these follows.

11.2.1 Retransmit Duplication (RD)

   The TCP protocol will retransmit segments given indications from the
   partner that it has not received them.  The retransmitted segments
   contain the TCP sequence number and acknowledgement. The sequence
   number is started at a random number and increased by the amount of
   data sent in each packet.

   Consider the following scenario.   There is a packet sequence number
   in the packet at the IP layer.  This is in the PDM that we have
   defined.  The TCP sequence number already exists in the protocol.

   Host A sends the following packets:

      IP PSN 20, TCP SEQ 10
      IP PSN 21, TCP SEQ 11
      IP PSN 22, TCP SEQ 12

      Host B receives:

      IP PSN 20, TCP SEQ 10
      IP PSN 22, TCP SEQ 12
 


Elkins                     Expires July, 2014                  [Page 35]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   Host B indicates to Host A to resend packet with TCP SEQ 2.
   Retransmits are done at the TCP layer.

   Host A sends the following packet:

   IP PSN 23, TCP SEQ 11

   The packet never reaches B.  B waits until a timeout for retransmits
   expires.  It asks for the packet again.

   Host A sends the following packet:

   IP PSN 24, TCP SEQ 11

   This time, it reaches Host B.  Having the combination of PSN (as
   provided in the PDM) and the TCP sequence number allows us to see
   whether the problem is that the network is losing the packet or
   somehow, the sender is not sending the packet correctly.

   As we said before, this also allows us a single trace point rather
   than at the client and server to get the required data.

11.2.2 ACK Lag (AL)

   Some protocols, such as TCP, acknowledge packets.  The PDM will allow
   or a calculation of rate of ACKs. Clients can be reconfigured to
   optimize acknowledgements and to speed traffic flow.

11.2.3 Third-party Connection Reset (TPCR)

   Connections may be aborted by a packet containing a particular flag.
   In the TCP protocol, this is the RESET flag.  Sometimes a third-
   party, for example, a VPN router, will abort the connection.  This
   may happen because the router is overloaded, the traffic is too
   noisy, or other reasons.  This can also be quite hard to detect
   because the third-party will spoof the address of the sender.

   Much time can be spent by the two endpoints pointing fingers at the
   other for having dropped the connection.

   Such a third-party spoofer would likely not have the PDM Destination
   Option.  Routers and other middle boxes are not required to support
   the Destination Options Extension Header. Even if a PDM DOH was
   generated, it would most likely violate the pattern of PSNs and time
   stamps being used.  This would be a clue to the diagnostician that
   the TPCR event has occurred.


 


Elkins                     Expires July, 2014                  [Page 36]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


11.2.4 Potential Hang (PH)

   Connections may be aborted by a packet containing a particular flag.
   In the TCP protocol, this is the RESET flag.  Sometimes this is done
   because a set amount of time has elapsed without activity. The PSN in
   the PDM can be used to determine the last packet sent by the partner
   and if a response is required -- a "hang" situation.

   This can be distinguished from connections which are set to be
   aborted after a certain period of inactivity.

11.3 Advanced Metrics : Session Classification

   The PDM may be used to classify sessions as follows:

   One way traffic flow
   Two way traffic flow
   One way traffic flow with keep-alive
   Two way traffic flow with keep-alive
   Multiple send traffic flow
   Multiple receive traffic flow
   Full duplex traffic flow
   Half duplex traffic flow

   Immediate ACK data flow
   Delayed ACK data flow
   Proxied ACK data flow

   A session classification system will assist the network
   diagnostician.  This system will also help in categorizing the server
   delay.

12  Use Cases

   The scheme outlined above can also handle the following types of
   cases:

   1.  Host clocks not synchronized (shown above)
   2.  IP fragmentation
   3.  Multiple sends from one side (multiple segments)
   4.  Out of order segments
   5.  Retransmits
   6.  One-way transmit only (ex. FTP)
   7.  One-way transmit only
      (e.g.real time transports and streaming protocols)
   8.  Duplicate ACKs
   9.  Duplicate segments
   10. Delayed ACKs
 


Elkins                     Expires July, 2014                  [Page 37]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   11. ACKs preceeding send for another reason
   12. Proxy servers
   13. Full duplex traffic
   14. Keep alive (0 / 1 byte segments, larger segments)
   15. No response from other side
   16. Drop without retransmit (real time transports)
   17. Looped packets (where the same packet may pass the same point
       multiple times without duplication)
   18. Multihoming via SHIM6

13  Security Considerations

   There are no security considerations.

14  IANA Considerations

   There are no IANA considerations.

15 References

15.1 Normative References

   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791, September
   1981.

   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7, RFC
   793, September 1981.

   [RFC1323]  Jacobson, V., Braden, R., and D. Borman, "TCP Extensions
   for High Performance", RFC 1323, May 1992.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
   Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
   (IPv6) Specification", RFC 2460, December 1998.

   [RFC2679]  Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way
   Delay Metric for IPPM", RFC 2679, September 1999.

   [RFC2681]  Almes, G., Kalidindi, S., and M. Zekauskas, "A Round-trip
   Delay Metric for IPPM", RFC 2681, September 1999.

   [RFC2780]  Bradner, S. and V. Paxson, "IANA Allocation Guidelines For
   Values In the Internet Protocol and Related Headers", BCP 37, RFC
   2780, March 2000.

   [RFC3971]  Arkko, J., Ed., Kempf, J., Zill, B., and P. Nikander,
 


Elkins                     Expires July, 2014                  [Page 38]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


   "SEcure Neighbor Discovery (SEND)", RFC 3971, March 2005.

   [RFC5905]  Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch,
   "Network Time Protocol Version 4: Protocol and Algorithms
   Specification", RFC 5905, June 2010.

15.2 Informative References

   [NTPPDM]  Ackermann, M., "draft-ackermann-ntp-pdm-ntp-usage-00",
   Internet Draft, January 2014.

   [ELKPDM]  Elkins, N., "draft-elkins-6man-ipv6-pdm-dest-option-05",
   Internet Draft, January 2014.

   [IEEE1588]  IEEE 1588-2002 standard, "Standard for a Precision Clock
   Synchronization Protocol for Networked Measurement and Control
   Systems"

16 Acknowledgments

   The authors would like to thank Al Morton, Brian Trammel, David
   Boyes, and Rick Troth for their comments and assistance.

Authors' Addresses

       Nalini Elkins
       Inside Products, Inc.
       36A Upper Circle
       Carmel Valley, CA 93924
       United States
       Phone: +1 831 659 8360
       Email: nalini.elkins@insidethestack.com
       http://www.insidethestack.com

       William Jouris
       Inside Products, Inc.
       36A Upper Circle
       Carmel Valley, CA 93924
       United States
       Phone: +1 925 855 9512
       Email: bill.jouris@insidethestack.com
       http://www.insidethestack.com

       Michael S. Ackermann
       Blue Cross Blue Shield of Michigan
       P.O. Box 2888
       Detroit, Michigan 48231
       United States
 


Elkins                     Expires July, 2014                  [Page 39]

INTERNET DRAFT         elkins-ippm-pdm-metrics-04           January 2014


       Phone: +1 310 460 4080
       Email: mackermann@bcbsmi.com
       http://www.bcbsmi.com

       Keven Haining
       US Bank
       16900 W Capitol Drive
       Brookfield, WI 53005
       United States
       Phone: +1 262 790 3551
       Email: keven.haining@usbank.com
       http://www.usbank.com







































Elkins                     Expires July, 2014                  [Page 40]