Internet DRAFT - draft-zhuang-tsvwg-ai-ecn-for-dcn
draft-zhuang-tsvwg-ai-ecn-for-dcn
TSVWG Y. Zhuang
Internet-Draft B. Zhang
Intended status: Informational H. Pan
Expires: April 20, 2020 Huawei Technologies Co., Ltd.
October 18, 2019
Artificial Intelligence (AI) based ECN adaptive reconfiguration for
datacenter networks
draft-zhuang-tsvwg-ai-ecn-for-dcn-00
Abstract
This document is to provide an artificial intelligence (AI) based ECN
adaptive reconfiguration for datacenter networks.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 20, 2020.
Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Zhuang, et al. Expires April 20, 2020 [Page 1]
Internet-Draft AI ECN adptive reconfiguration October 2019
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Background . . . . . . . . . . . . . . . . . . . . . . . 2
1.2. Intent . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3
2. Architecture of the AI ECN datacenter networks . . . . . . . 3
3. Scene-based ECN adaptive reconfiguration with AI . . . . . . 4
3.1. Scene Training . . . . . . . . . . . . . . . . . . . . . 5
3.2. Scene Identification and ECN Adaptive Reconfiguration . . 5
4. Data collection and AI ECN adaptive reconfiguration . . . . . 5
4.1. Data collection . . . . . . . . . . . . . . . . . . . . . 5
4.2. ECN adaptive Reconfiguration . . . . . . . . . . . . . . 6
5. Security Considerations . . . . . . . . . . . . . . . . . . . 6
6. Manageability Consideration . . . . . . . . . . . . . . . . . 6
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 6
8.1. Normative References . . . . . . . . . . . . . . . . . . 6
8.2. Informative References . . . . . . . . . . . . . . . . . 6
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 7
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 7
1. Introduction
1.1. Background
As defined in [RFC3168], Explicit Congestion Notification is
introduced for IP to allow congestion to be signaled before dropping
packets. As such, the latency of applications is reduced due to less
retransmission of the dropped packets. Besides, MPLS also supports
ECN defined in [RFC6679]. For tunneling, [RFC6040] defines how ECN
should be constructed in the case of IP-in-IP tunnels.
Meanwhile, the upper layer transports protocols, like TCP in
[RFC3168] and UDP based protocols DCCP in [RFC4341][RFC4342][RFC5632]
and RTP in [RFC6679] are defined to support ECN-capable functions.
With ECN marking, active queue management (AQM) can choose a non-
packet loss way to indicate congestion on the device, rather than
dropping packets which might ask for packet retransmission and
increase the latency. By using AQM in network devices, it can signal
to common congestion-controlled transports to manage the queue length
in the buffer and reduce the latency of traffics. Random Early
Detection (RED) specified in [RFC2309]is one of the AQM algorithms
that recommended to be implemented in routers.
As stated in [RFC7567], with proper parameters, RED can be an
effective algorithm. However, dynamically predicting the set of
Zhuang, et al. Expires April 20, 2020 [Page 2]
Internet-Draft AI ECN adptive reconfiguration October 2019
parameters (minimum threshold and maximum threshold) is difficult.
As a result, its present use in the Internet is limited. Other AQM
algorithms have also been developed, while how to find proper
parameters of algorithms for application traffics is still difficult
and affect the network performance.
For data center networks, traffic patterns change with the deployment
of applications like storage and high performance computing and
changes of corresponding traffics which make the network more
dynamic, while such applications have more restrict requirements on
high throughput and ultra-low latency. In this area, a set of static
ECN configurations suitable for all traffics at all time challenges.
With this, this document is to provide a way to seek ECN adaptive
reconfiguration by using AI technologies in running data center
network environment.
1.2. Intent
Our intent is to seek proper parameters of ECN adaptive
reconfiguration by using artificial intelligence technologies to
achieve self-tuning in a running data center network, so as to
accommodate the changes of network resources to improve the network
performance.
We also offer this as a starting point for seeking adaptive
parameters for algorithms and network reconfigurations by using
advanced technologies of AI. We do not change the way ECN works
defined in [RFC3168]. With this, this document is to provide a way
to achieve ECN adaptive reconfiguration by using AI technologies in
dyanmic data center network environment.
1.3. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
2. Architecture of the AI ECN datacenter networks
The following is a simple 2 layer data center network architecture
with an analyzer to process the AI ECN adaptive reconfiguration with
the changes of network traffics.
Zhuang, et al. Expires April 20, 2020 [Page 3]
Internet-Draft AI ECN adptive reconfiguration October 2019
+------------------------------------------------------+
| Analyzer |
+-.-----.-------------.-------.--------------.-----.---+
. . . . . .
. . . . . .
. +---.-----------+ . . +-----------.---+ .
. | Spine | . . | Spine | .
. ++--+--+----+---+ . . +-+-+-+----+----+ .
. | | +----------.-------.---------------+ .
. | +-------------.-------.-+ | | | | | .
. | | +--.-------.--------+ | | .
. | +-------------.-------.------+ | | .
+---+--+-+ ++--+--.-+ +.-+--+--+ ++-+----.+
| | | | | | | |
| Leaf | | Leaf | | Leaf | | Leaf |
++------++ ++------++ ++------++ ++------++
| | | | | | | |
| | | | | | | |
+++ +++ +++ +++ +++ +++ +++ +++
|S| ...|S| |S| ...|S| |S| ...|S| |S| ...|S|
+-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+
........ information collecting path
-------- data path
Figure 1. The architecture of a 2-layer data center network
The analyzer can be integrated with spine or can be an independent
device which is left for implementation. In this design, it is
responsible for collecting device information and conducting the
induction for proper parameters for ECN adaptive reconfiguration
periodically.
3. Scene-based ECN adaptive reconfiguration with AI
The idea of AI ECN in this document is to identify the "scene" of the
current network at some time based on the collected information over
a period. The identified scene (which can also considered as a
network traffic pattern)is one of the scenes that are collected and
learned from datacenter networks running different traffics of
various applications in training process. The ECN settings of these
scenes are decided based on human experience. As such, the ECN
parameters of current network can be tuned to the settings of the
identified scene. This adaptive reconfiguration process is running
periodically to accommodate changes of the running network
environment due to traffic changes.
Zhuang, et al. Expires April 20, 2020 [Page 4]
Internet-Draft AI ECN adptive reconfiguration October 2019
3.1. Scene Training
Scene training is the first process in the procedure. It composes of
two steps. Firstly, construct typical scenes and generate a learning
model to identify these scenes based on a set of network performance
indicators. Secondly, provide proper ECN settings for these typical
scenes based on human experience.
In the first step, it might need the network operator to select some
typical applications and the combinations of traffics based on
experience to be used as the typical training scenes. For these
typical scenes, we run a learning algorithm (for example, neutral
network) to learn the characteristics of these scenes from
periodically collected network performance indicators.
The selected network performance indicators can be device's port
bandwidth, queue size, etc al. which might be related to the
applications and traffics in the networks.
While in the second step, human experience from network
administrators can be used to provide proper ECN configurations for
these typical scenes. AI technologies can also be used to enrich the
scene sets based on these human experience, which is left for
implementation.
3.2. Scene Identification and ECN Adaptive Reconfiguration
In the practical network, the analyzer periodically collects
information of selected network performance indicators from network
nodes. The information is then used as input to the pre-learnt model
and get the identified scene. The ECN settings of network devices
will then be adaptively reconfigured to the parameters of the
identified scene periodically.
The adaptive cycle of the period can be decided according to
experience or it can be a training result in previous process defined
in section 3.1.
4. Data collection and AI ECN adaptive reconfiguration
4.1. Data collection
In both training and adaptive reconfiguration process, the analyzer
needs to collect information of the network i.e. a set of network
performance indicators.
The data collection can be achieved by grpc or yang-push or other
protocols.
Zhuang, et al. Expires April 20, 2020 [Page 5]
Internet-Draft AI ECN adptive reconfiguration October 2019
4.2. ECN adaptive Reconfiguration
The adaptive reconfiguration of ECN in a running network environment
can be achieved by control-plane protocols such as netconf.
5. Security Considerations
TBD
6. Manageability Consideration
TBD
7. IANA Considerations
No IANA action
8. References
8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
8.2. Informative References
[RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
S., Wroclawski, J., and L. Zhang, "Recommendations on
Queue Management and Congestion Avoidance in the
Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998,
<https://www.rfc-editor.org/info/rfc2309>.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001,
<https://www.rfc-editor.org/info/rfc3168>.
Zhuang, et al. Expires April 20, 2020 [Page 6]
Internet-Draft AI ECN adptive reconfiguration October 2019
[RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion
Control Protocol (DCCP) Congestion Control ID 2: TCP-like
Congestion Control", RFC 4341, DOI 10.17487/RFC4341, March
2006, <https://www.rfc-editor.org/info/rfc4341>.
[RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for
Datagram Congestion Control Protocol (DCCP) Congestion
Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342,
DOI 10.17487/RFC4342, March 2006,
<https://www.rfc-editor.org/info/rfc4342>.
[RFC5632] Griffiths, C., Livingood, J., Popkin, L., Woundy, R., and
Y. Yang, "Comcast's ISP Experiences in a Proactive Network
Provider Participation for P2P (P4P) Technical Trial",
RFC 5632, DOI 10.17487/RFC5632, September 2009,
<https://www.rfc-editor.org/info/rfc5632>.
[RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion
Notification", RFC 6040, DOI 10.17487/RFC6040, November
2010, <https://www.rfc-editor.org/info/rfc6040>.
[RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P.,
and K. Carlberg, "Explicit Congestion Notification (ECN)
for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August
2012, <https://www.rfc-editor.org/info/rfc6679>.
[RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF
Recommendations Regarding Active Queue Management",
BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015,
<https://www.rfc-editor.org/info/rfc7567>.
Acknowledgements
We would like to thank the following persons for their great efforts
and contributions to the work: Huafeng Wen, Binghui Wu, Weiqin Kong,
Ke Meng, Xitong Jia, Liang Shan, Siyu Yan, Weishan Deng, Boding Wang,
Jungan Yan, Haonan Ye and Liang Zhang.
Authors' Addresses
Yan Zhuang
Huawei Technologies Co., Ltd.
Email: zhuangyan.zhuang@huawei.com
Zhuang, et al. Expires April 20, 2020 [Page 7]
Internet-Draft AI ECN adptive reconfiguration October 2019
Bai Zhang
Huawei Technologies Co., Ltd.
Email: white.zhangbai@huawei.com
Haotao Pan
Huawei Technologies Co., Ltd.
Email: panhaotao@huawei.com
Zhuang, et al. Expires April 20, 2020 [Page 8]