Internet Research Task Force                                W. Tavernier
Internet-Draft                                   Ghent University - IBBT
Intended status: Informational                          D. Papadimitriou
Expires: April 18, 2011                              Alcatel-Lucent Bell
                                                                D. Colle
                                                 Ghent University - IBBT
                                                        October 15, 2010



    Learning Capable Communication Network (LCCN) Problem Statement

             draft-tavernier-irtf-lccn-problem-statement-00


Abstract

   Operational procedures and protocols of today's communication
   networks typically use explicitly defined mechanisms and
   representations to reach the goals associated to their design. This
   practice results into numerous protocols having a restricted space
   for (self-)adaptability and sensitivity respective to their network
   context (e.g. network traffic conditions, failure conditions, etc.).
   On the other hand, a wide spectrum of learning and optimization
   techniques is available such that network could learn and optimize
   their behavior in the running context. This document describes the
   opportunities and challenges for a Learning Capable Communication
   Network (LCCN).

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF). Note that other groups may also distribute
   working documents as Internet-Drafts. The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 18, 2011.
 



Tavernier, et al.        Expires April 18, 2011                 [Page 1]

Internet-Draft           LCCN Problem Statement             October 2010


Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Learning opportunities . . . . . . . . . . . . . . . . . . . .  4
     2.1.  Availability of network data and statistics  . . . . . . .  4
     2.2.  Availability of processing capacity  . . . . . . . . . . .  5
   3.  The learning process . . . . . . . . . . . . . . . . . . . . .  5
   4.  Architectural implications . . . . . . . . . . . . . . . . . .  7
     4.1.  From a pre-defined open-loop control towards a
           self-adaptive closed-loop control  . . . . . . . . . . . .  7
     4.2.  The integration of learning capability . . . . . . . . . .  9
   5.  Applicability  . . . . . . . . . . . . . . . . . . . . . . . . 10
     5.1.  Functional domains . . . . . . . . . . . . . . . . . . . . 10
     5.2.  Scope with respect to the hourglass model  . . . . . . . . 10
   6.  Research directions and objectives . . . . . . . . . . . . . . 11
     6.1.  Relation to existing research domains  . . . . . . . . . . 11
     6.2.  Experimental research objectives . . . . . . . . . . . . . 12
   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 12
   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 12
   9.  Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 13
   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
   11. Informative references . . . . . . . . . . . . . . . . . . . . 13
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13
 












Tavernier, et al.        Expires April 18, 2011                 [Page 2]

Internet-Draft           LCCN Problem Statement             October 2010


1.  Introduction

   As currently instantiated, the Internet hour-glass model drives a
   top-down approach. Current communication networks typically operate
   with an explicit internal representation of themselves, their network
   knowledge, and their global goals. Routers follow explicitly pre-
   defined behavior, persistently decide and uniformly execute. Global
   Internet behavior is evaluated and configuration changed when the
   evaluation results indicate that the networking systems are not
   accomplishing what they were intended to, or when better
   functionality or performance is expected.

   In several Internet areas, this operational model shows its limits.
   Inter-domain routing protocols such as BGP due to their inherent
   exploration properties are increasingly impacted by topology and
   policy dynamics that delay their convergence. Network management
   becomes more and more complex, networks do not automatically take
   into account network traffic statistics, etc. Several efforts have
   been undertaken to overcome the increasing number of issues.
   However, improvement of the routing system to accommodate various
   scales of challenges in network efficiency further complicates its
   operation ([I-D.ietf-idr-bgp-issues]). Further patching the inter-
   domain routing system and equipment will result into more operational
   complexity.

   In this document, we suggest an alternative (bottom-up) approach to
   the Internet routing and forwarding system operation. Compared to
   current routed networks that requires explicit specification of their
   expected behavior, self-organizing and self-adaptive systems could
   dynamically modify or adjust their behavior to varying network
   conditions in order to tune their operation, optimize their overall
   performance and even add functionalities through closed-loop adaptive
   control.

   We see three main drivers for the design of Learning Capable
   Communication Networks (LCCN): i) the availability of network-related
   data, ii) the wide range of possible learning paradigms that can be
   borrowed from domains such as Artificial Intelligence (AI), machine
   learning, and bio-inspired learning, and iii) the increased CPU
   capacity available at both forwarding and control plane level,
   allowing for background monitoring, learning and optimization in
   routers.

   The structure of this document is as follows. In Section 2, we
   describe the opportunities for communication networks to learn how to
   best improve their performance. The next section (Section 3) gives a
   more formal but broad definition of the concept of learning.
   Section 4 provides a first set of architectural implications of



Tavernier, et al.        Expires April 18, 2011                 [Page 3]

Internet-Draft           LCCN Problem Statement             October 2010


   adding learning capability to communication networks. The
   applicability domain of LCCNs is covered in Section 5, and possible
   research directions are described in Section 6. Concluding remarks
   and future work are oultined in Section 9.


2.  Learning opportunities

2.1.  Availability of network data and statistics

   Hosts communicate with each other by sending packets between each
   other via transit network nodes. As such a communication network is
   loaded with packets corresponding to network traffic flows between
   given network source and destination nodes. Many techniques exist to
   gather statistics about the resulting traffic flows crossing routers.

   o  Online statistical counters measure properties of transiting
      traffic in a router using counters, for example the number of
      packets per destination prefix or used packet size distribution
      curves

   o  Traffic sampling: instead of counting certain traffic
      characteristics, unmodified traffic is captured for some time
      interval. This sample is then used to derive certain
      characteristics, using e.g. the setting proposed in [Estan04]) by
      means of sample-and-hold technique.

   Unfortunately, the resulting statistical data is rarely used to
   directly improve the routing and/or forwarding decision of network
   nodes (referring to the active self-adaptive closed control loop in
   Section 4.1). However, it is clear that network operation could
   benefit from taking these statistics automatically into account to
   allow for inherent load balancing, prioritized traffic flow switch-
   over, network load maximization, etc. To a lesser extend (since the
   routing system is deterministically adaptive to topological and/or
   policy changes) this observation also applies to routing information
   exchanges.

   Not only the statistics of network traffic are valuable but also the
   behavioral aspects of the network itself possibly contain usable
   information for increasing the performance of the network.
   Statistics about node or link failures can help network recovery
   mechanisms to fine tune their operation based on the specific
   statistical context of the running network. Convergence behavior of
   routing protocols in the specific running context can be monitored
   such as to reduce the time of transient loops. In brief, the
   specific running conditions of communication networks possibly hide
   (statistical) information, which are currently (largely) unused by



Tavernier, et al.        Expires April 18, 2011                 [Page 4]

Internet-Draft           LCCN Problem Statement             October 2010


   current Internet protocols; nevertheless, providing an opportunity to
   better analyze the behavior of the network behavior depending on the
   context it is running within.

2.2.  Availability of processing capacity

   The possibility of maintaining network statistics is not only
   dependent on the network conditions and environment themselves, but
   also on the physical feasibility of monitoring and storing them over
   longer periods.

   Supported by Moore's law, we observe that processing power is
   increasing over last years, either in pure clock frequency of CPU, or
   in the occurrence of combinations of multiple CPU's on one chip.  In
   combination with the high increase in line card speeds (up to 100
   Gbps), the possibility of capturing useful network statistics in
   background seems within reach.


3.  The learning process

   Many research fields study the concept of learning from various view
   points. In the context of LCCNs, learning algorithms correspond to
   the (broad) class of algorithms that discover the relationship
   between system variables (i.e. input, output and hidden variables)
   from data samples of its environment (obtained by means of
   measurement/monitoring).  More formally, the learning process
   consists of the following steps (see Figure 1).























Tavernier, et al.        Expires April 18, 2011                 [Page 5]

Internet-Draft           LCCN Problem Statement             October 2010


                             ,--.
                            +    +
                            |`--'|
                            |KIB |<------------------+
                            +    +                   |
                             `--'                    |
                               |                     |
                               v                     |
                        +----------------+           |
               ,--.     |    Learner     |        +------+
   E          +    +    |                |       /        \
   v          |`--'|    | +------------+ |      / Hypothe- \
   e -------->|    |------> Learning   | |----->\  sis h   /
   n      |   +    +    | |  algorithm | |       \        /
   t      |    `--'     | +------------+ |        +------+
          |  Training   +----------------+          | ^
          |  data set                               | |
          |                    +--------------------+ |
          |                    v                      +
          |    ,--.     +----------------+           /  \
          |   +    +    |   Performer    |          /    \
          |   |`--'|    | +------------+ |         / Test \     Target
          +-->|    |------> Learned    | |-------->\      /---> function
              +    +    | |  hypothesis| |          \    /
               `--'     | +------------+ |           \  /
               Test     +----------------+            +
             data set                                 ^
                |                                     |
                +-------------------------------------+

                                 Figure 1

   o  Step 0: Choose training and test data sets associated to a given
      (sequence of) event(s) observed in the system's running
      environment.

   o  Step 1: Training (learner): learn an hypothesis h (model),
      function of the input (training data set) that approximates at
      best output y (symbolic = classification, numeric = regression).
      Knowledge: use prior "knowledge" stored in Knowledge Information
      Base (KIB) to learn h.

   o  Step 2: Testing (performer): evaluate learned model using test
      data set.







Tavernier, et al.        Expires April 18, 2011                 [Page 6]

Internet-Draft           LCCN Problem Statement             October 2010


4.  Architectural implications

   The control of dynamic systems such as communications networks and
   routers in particular, can be explained as an interative cycle
   referred to as the control loop.  The coming sections explain the
   difference of existing communications networks and routers, with the
   control loop of LCCNs.

4.1.  From a pre-defined open-loop control towards a self-adaptive
      closed-loop control

   The configuration and operation of existing communication networks
   typically consist of a set of components and algorithms acting in a
   relatively small space of states, transitions and optimization steps.
   Let's take as example routers: they distribute topology and/or
   distance information from which they compute (e.g. shortest) routing
   paths. Using this information, they derive entries looked up to
   forward packets based on incoming packets' destination address.  When
   a topological or distance change occurs, routing updates are timely
   disseminated in the network such that each router achieves a coherent
   full view of the new network topology and/or distances and can re-
   compute new routing paths taking into account this new state of the
   network.  While these procedures might seem effective at first sight,
   they are mostly pre-determined and inflexible with respect to the
   environment they are running in.

   Indeed, routers are agnostic to traffic characteristics and to
   statistics of network failures. This situation occurs because these
   techniques have been developed in the early days of packet
   communication networks. At that time, computational and memory
   resources were scarce, and the resulting techniques needed to act
   sparingly with the available resources. Moreover, most of these
   techniques aim to automate manual procedures used to configure or
   operate communication networks. As such, routers forward packets
   based on their destination address by applying pre-determined
   decision rules and execution procedures.

   While many engineering disciplines, such as the automotive or bio-
   industry, have adopted learning techniques to improve the performance
   of their operational control loops, in computer networking, their
   application has been restricted mainly to passive applications
   leading to open-loop control procedures. Examples of such
   applications are: time series models to analyze and predict network
   traffic data, anomaly detection techniques to check networks for
   strange events, or statistical models which try to detect Shared Risk
   Link Groups (SRLG). Most of the applications of learning techniques
   are used as interesting side information in the context of network
   operation. They help network managers to understand and predict the



Tavernier, et al.        Expires April 18, 2011                 [Page 7]

Internet-Draft           LCCN Problem Statement             October 2010


   behavior of their network; however few existing network operation
   models include this learning capability into their direct control
   loop.

   In this context, the overall objective is to bring the application of
   data mining and learning techniques one step further: towards the
   active integration of these techniques into the operational and
   control processes of communication networks.  For instance, we could
   augment the above control paradigm with a machine learning component
   enabling the system and network to learn about their own behavior and
   environment over time, to detect and analyze problems, adapt their
   decision, and tune their execution using output of models in order to
   increase their functionality and performance. Systems with such an
   adaptive closed-loop control have network elements autonomously
   interrelated and controlled, dynamically adapting to changing
   environments, and learning desired behavior. These fully distributed
   and technology-independent systems allow: i) self-configuration and
   self-organization, ii) self-protection and self-healing, and iii)
   self-optimization. The objective is to improve the Internet control/
   routing and forwarding process by enabling, automating, and
   distributing the decision making processes involved in execution of
   their operation.

                  +-----------+            +-----------+      
    system  ===>  |  analyze  |----------->+  decide   |  <=== rules
    knowledge     +-----------+            +-----------+
                        ^                        |
                        |                        v
                  +-----+-----+            +-----------+
           self-  |  detect   |<-----------+  execute  |  self-
      monitoring  +-----------+            +-----------+  configuration
                        ^                        |
                        |                        v
                     +------------------------------+
                     |      Controlled Element      |
                     +------------------------------+

                                 Figure 2

   Using a more advanced control loop, the network routing mechanism
   could learn from network traffic, failure patterns and other context-
   related data observed in the network, and adapt its routing machinery
   to optimize it in this context. The resulting self-adaptive closed-
   loop control is a four step cyclic process consisting of: i) a
   detection phase (e.g., monitor network traffic) which is about
   monitoring data, ii) an analysis or learning phase (e.g., build
   traffic models for prediction) in which the data obtained during the
   detection phase is analyzed and upon which models can be learned,



Tavernier, et al.        Expires April 18, 2011                 [Page 8]

Internet-Draft           LCCN Problem Statement             October 2010


   iii) infer rules/decisions from the performed/learned analysis such
   that the learned model can influence the operation of the network and
   iv) an execution phase.

4.2.  The integration of learning capability

   While it is premature (and part of the research work) to detail the
   implications on the Internet architecture, the design of a control
   system incorporating learning capability would benefit from the
   following design principles.

   o  Adaptability: modular instead of relying on unified and ubiquitous
      approach in order to ensure gradual development (e.g. access vs
      core router)

   o  Segmentability: rely on relative local view rather than a network
      global view in order to ensure scalability, robustness, and
      resiliency

   o  Sizeability: inherits distributed properties and capabilities of
      routing system (e.g. intra- vs inter-domain) instead of a uniform
      and ubiquitous plane construction in order to ensure organic
      deployment

   Taking these principles into account, the resulting architecture
   should specify: i) expected behavior of the self-adaptive closed-loop
   process, ii) its components, and iii) the interfaces with existing
   routers' components and between learning-capable routers of a network
   (both intra- and inter-domain). The resulting closed-loop adaptive
   control includes a learning component that is either an upfront step
   or an online process, a feedback phase, and interactions with router/
   network control.

         Today                  Step 1                    Step 2
    +--------------+      +----------------+       +------------------+
    |              |      | +------------+ |       | +--------------+ |
    | +----------+ |      | |  Learning  | |       | |   Routing    | |
    | | Routing  | |      | +------------+ |       | |  + learning  | |
    | +----------+ |      |  weak coupling |       | +--------------+ |
    |              | ==>  | +------------+ |  ==>  |    integrated    |
    |              |      | |   Routing  | |       |  strong coupling |
    | +----------+ |      | +------------+ |       | +--------------+ |
    | |Forwarding| |      | +------------+ |       | |  Forwarding  | |
    | +----------+ |      | | Forwarding | |       | |  + learning  | |
    |              |      | +------------+ |       | +--------------+ |
    +--------------+      +----------------+       +------------------+

                                 Figure 3



Tavernier, et al.        Expires April 18, 2011                 [Page 9]

Internet-Draft           LCCN Problem Statement             October 2010


   Including learning capabilities into current Internet router
   architectures can follow a phased approach. Internet routers
   typically consist of two functional components: i) a forwarding
   component which takes care of processing and forwarding packets
   according to pre-configured forwarding tables, and ii) a routing
   component which takes care of distributing topology/distance
   information, computing (shortest) routing paths using this
   information, and storing resulting entries into routing tables.
   Forwarding table entries are subsequently derived from routing table
   entries. As a first integration step, a new functional component
   comprising learning capability could be included. The new component
   would then be weakly coupled to the existing forwarding and routing
   components. This implies that the routing and/or forwarding
   component can be enhanced by of the learning component. These
   functionalities could be called via pre-defined interfaces between
   the components. While this is an overlaid but modular build-up of a
   router, integration of learning capability can go one step further.
   Indeed, in a next phase, instead of a separate learning component,
   the learning functionality could be tightly integrated into the
   routing and forwarding components themselves. This implies that the
   routing and forwarding processes themselves comprise a learning cycle
   (a self-adaptive closed-loop control). It is clear that both the
   phasing and the detailing of the architecture is an important
   challenge in the design of LCCNs.


5.  Applicability

5.1.  Functional domains

   The incorporation of learning component within the router
   architecture aims to i) enhance Internet functionality in order to
   cope with known operational challenges such as manageability, and
   diagnosability, ii) address new challenges such as security and
   accountability, and iii) improve its performance (in terms of e.g.
   scalability and availability) by adapting forwarding and routing
   system decisions.  In this context of network quality, we can think
   of the automated inclusion of network traffic knowledge into the
   configuration of routes and resulting forwarding tables.

5.2.  Scope with respect to the hourglass model

   Even if learning paradigms can be applied at all levels of the hour-
   glass model, LCCN-related research focuses on the (largest) lower
   half of the hourglass model ("everything over IP, and IP over
   everything").  As depicted in Figure 4, the goal of LCCN research is
   to apply learning capabilities from the transport layer up to the
   physical layer (including thus also the network and datalink layers).



Tavernier, et al.        Expires April 18, 2011                [Page 10]

Internet-Draft           LCCN Problem Statement             October 2010


   Whereas learning capability has typically been used at higher layers
   already, for example by banking applications, large-scale websites
   such as Amazon or Google, the real networking machinery that is
   running below is still relying on low-information processes with very
   limited learning capabilities.  The incorporation of a learning
   component within wired and wireless communication networks aims to
   improve both their operation and performance from the physical
   network layer up to the TCP/IP layer.

               +---------------------+
                \  email, WWW,      /
                 \    TV, ...      /
                  \---------------/
                   \SMTP,HTTP,RTP/
            ---     \-----------/     ---
             ^       \ TCP,    /       ^
             |        \   UDP /        |
             |         \-----/         |
      LCCN   |         / IP  \         |
      scope  |        /-------\        |
             |       /Ethernet,\       |
             |      /  PPP,...  \      |
             |     /-------------\     |
             v    / CSMA, Sonet   \    v
            ---  /-----------------\  ---
                /copper,fiber,radio \
               +---------------------+


                                 Figure 4


6.  Research directions and objectives

6.1.  Relation to existing research domains

   Learning opportunities in communication networks have characteristics
   that are typical well-suited for research techniques borrowing from
   (machine) learning, robotics, AI, computational biology, etc.

   o  Difficult to explicitly characterize: events cannot be well
      characterized even when examples are available (inherent
      complexity in characterizing an event)

   o  Correlation: hidden correlations and trends between events within
      large amounts of associated data





Tavernier, et al.        Expires April 18, 2011                [Page 11]

Internet-Draft           LCCN Problem Statement             October 2010


   o  Dynamicity: changing conditions over time (in particular, for
      routing system but also variability of traffic, user expectations
      and behaviors)

   o  Quantity: amount of available data is too large for handling by
      manual intervention

   o  Evolutive: new events are constantly detected/discovered

6.2.  Experimental research objectives

   Experimental research is a primary goal of the activities to be
   conducted.  The following objectives would be targeted:

   o  The production of various studies is stimulated and should enable
      evaluation of performance and functional improvement resulting
      from the exploitation of various learning paradigms.  A common
      understanding of these paradigms and their associated capabilities
      would complement this first step.

   o  As different distribution models can be considered for what
      concerns the distribution of the learning processes (taking into
      account the various objectives but also constraints resulting from
      network partition), determining which model best fit Internet
      evolution is a specific target of this research activity.

   o  Iterative cycles of experimentation shall allow to determine
      suitability of the resulting architecture as well as to determine
      practical feasibility, applicability and deployability of the
      concept on a large scale.  Documentation of appropriate use cases/
      scenarios would complement this work item.


7.  IANA Considerations

   This memo includes no request to IANA.


8.  Security Considerations

   Beside the research objectives detailed here above, security
   mechanisms for "communication channels" between learning components
   and "learning components" themselves shall be considered comprising
   among others message authentication but also means to prevent man-in-
   the-middle and DDoS attacks.






Tavernier, et al.        Expires April 18, 2011                [Page 12]

Internet-Draft           LCCN Problem Statement             October 2010


9.  Conclusions

   Current communication networks fail to use network-related statistics
   which could be valuable to improve their performance.  In addition,
   current networks fail to provide solutions to challenging issues,
   because they become too complex to operate and manage by manual/open
   loop procedures. A learning-capable communication network (LCCN)
   includes a learning component which learns based on the network
   environment statistics and adapts and optimizes its behavior upon
   this. This gives new possibilities to improve network efficiency in
   several domains including network recoverability, accountability,
   security, scalability, and so on. The challenge (and next steps) of
   LCCNs lies into: i) developing self-adaptive closed)loop control
   system relying on learning capability, ii) building and applying it
   to various network mechanisms and iii) verifying the resulting
   prototypes in experimental environments.


10.  Acknowledgements

   This work is supported by the European Commission (EC) Seventh
   Framework Programme (FP7) ECODE project (Grant No.223936).


11.  Informative references

   [AI-modern]
              Russell, S., "Artificial Intelligence: A Modern Approach",
              2003.

   [Estan04]  Estan, C., "Building a better NetFlow", October 2004.

   [I-D.ietf-idr-bgp-issues]
              Lange, A., "Issues in Revising BGP-4 (RFC1771 to
              RFC4271)", draft-ietf-idr-bgp-issues-03 (work in
              progress), August 2010.

   [PRML]     Bishop, C., "Pattern Recognition and Machine Learning",
              October 2003.












Tavernier, et al.        Expires April 18, 2011                [Page 13]

Internet-Draft           LCCN Problem Statement             October 2010


Authors' Addresses

   Wouter Tavernier 
   Ghent University - IBBT
   Gaston Crommenlaan 8 bus 201
   Gent,   9050
   Belgium

   Phone: +32(0)9 331 49 81
   Email: wouter.tavernier@intec.ugent.be


   Dimitri Papadimitriou
   Alcatel-Lucent Bell
   Copernicuslaan 50
   Antwerpen,   2018
   Belgium

   Phone:
   Email: dimitri.papadimitriou@alcatel-lucent.com


   Didier Colle
   Ghent University - IBBT
   Gaston Crommenlaan 8 bus 201
   Gent,   9050
   Belgium

   Phone: +32(0)9 331 49 70
   Email: didier.colle@intec.ugent.be





















Tavernier, et al.        Expires April 18, 2011                [Page 14]