rtgwg Y. Li Internet-Draft L. Iannone Intended status: Informational Huawei Technologies Expires: May 4, 2021 J. He City University of Hong Kong L. Geng P. Liu China Mobile Y. Cui Tsinghua University October 31, 2020 Architecture of Dynamic-Anycast in Compute First Networking (CFN- Dyncast) draft-li-rtgwg-cfn-dyncast-architecture-00 Abstract Compute First Networking (CFN) Dynamic Anycast refers to in-network edge computing, where a single service offered by a provider has multiple instances attached to multiple edge sites. In this scenario, flows are assigned and consistently forwarded to a specific instance through an anycast approach based on the network status as well as the status of the different instance. This document describes an architecture for the Dynamic Anycast (Dyncast) in Compute First Networking (CFN). It provides an overview, a description of the various components, and a workflow example showing how to provide a balanced multi-edge based service in terms of both computing and networking resources through dynamic anycast in real time. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Li, et al. Expires May 4, 2021 [Page 1] Internet-Draft CFN-dyncast Architecture October 2020 This Internet-Draft will expire on May 4, 2021. Copyright Notice Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Definition of Terms . . . . . . . . . . . . . . . . . . . . . 3 3. CFN-Dyncast Architecture Overview . . . . . . . . . . . . . . 4 4. Architectural Components and Interactions . . . . . . . . . . 5 4.1. Service Identity and Bindings . . . . . . . . . . . . . . 5 4.2. Service Notification between Instances and CFN node . . . 7 4.3. CFN Dyncast Control Plane . . . . . . . . . . . . . . . . 9 4.4. Service Demand Dispatching . . . . . . . . . . . . . . . 9 4.5. CFN Dispatcher . . . . . . . . . . . . . . . . . . . . . 10 5. Summary of the key elements of CFN Dyncast Architecture . . . 12 6. Conclusion (and call for contributions) . . . . . . . . . . . 13 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 9. Informative References . . . . . . . . . . . . . . . . . . . 14 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 14 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 1. Introduction Dynamic anycast in Compute First Networking (CFN-Dyncast) use cases and problem statements document [I-D.geng-rtgwg-cfn-dyncast-ps-usecase] shows the usage scenarios that require an edge to be dynamically selected from multiple edge sites to serve an edge computing service demand based on computing resource available at the site and network status in real time. Multiple edges provide service equivalency and service dynamism in CFN. The current network architecture in edge computing provides relatively static service dispatching, for example, to the closest edge, or to the server with the most computing resources without Li, et al. Expires May 4, 2021 [Page 2] Internet-Draft CFN-dyncast Architecture October 2020 considering the network status. Dynamic Anycast takes the dynamic nature of computing load as well as the network status as metrics for deciding flow's service dispatch and at the same time maintains the flow affinity in a service life cycle. CFN-Dyncast architecture presents an anycast based service and access models. The aim is to solve the problematic aspects of existing network layer edge computing service deployment, including the unawareness of computing resource information of service, static edge selection, isolated network and computing metrics and/or slow refresh of status. CFN-Dyncast assumes there are multiple equivalent edge instances implementing the same single service (think about the same service function instantiated on several edge nodes). A single edge node has limited computing resources attached, and different edge nodes may have different resources available such as CPU or GPU. Because multiple edge nodes are interconnected and can collaborate with each other, it is possible to balance the service load and network load in CFN. Computing resource available to serve a request is usually considered the main metric to assign a service demand to an instance of the service. However, the status of the network, in particular paths toward the instances, varies over time and may get congested, hence, becoming another key attribute to be considered. CFN-Dyncast aims at providing a layer 3 protocol framework able to dispatch the service demand to the "best" edge node in terms of both computing resources and network status, in real time and no application and/or service specific dependencies. This document describes the a general architecture for the service notification, status update and service dispatch in CFN edge computing. 2. Definition of Terms CFN: Compute First Networking SID: Service ID, an anycast IP address representing a service and the clients use it to access that service. SID is independent of which service instance serves the service demand. Usually multiple service instances serve a single service. BID: Binding ID, an address to reach a service instance for a given SID. It is usually a unicast IP. A service can be provided by multiple service instances with different BID. CFN-Dyncast: as defined in [I-D.geng-rtgwg-cfn-dyncast-ps-usecase]. Li, et al. Expires May 4, 2021 [Page 3] Internet-Draft CFN-dyncast Architecture October 2020 3. CFN-Dyncast Architecture Overview Service instances can be hosted on servers, virtual machines, access routers or gateway in edge data center. The CFN node is the glue allowing CFN-Dyncast network to provide the capability to exchange the information about the computing resource information of service instances attached to it, but also to forward flows consistently toward such instances. Figure 1 shows the architecture of CFN-Dyncast. CFN nodes are usually deployed at the edges of the operator infrastructure, where clients are connected. As such, we can consider that clients are logically connected to CFN nodes. A CFN node has the purpose to constantly direct flows coming from clients to an instance of the service the flow is supposed to go through. Service instances are initiated at different edge sites, where a CFN node is also running. A single service can have a huge number of instances running on different CFN nodes. A "Service ID" (SID) is used to uniquely identify a service, at the same time identifying the whole set of instances of that specific service, no matter where those instances are running. There can be several instances of the service running on the the same CFN node (e.g., one instance per CPU core), there can also be on several different CFN nodes (e.g., one instance per PGW-U in a 5G network). Each instance is associated to a "Binding ID" indicating where the instance is running. Hence, there is a dynamic binding between an SID (the service) and a set of BIDs (the instances of the service) and such bindings are enriched with information concerning the network state and the available resources so that at each new service request (a new flow) CFN nodes can decide which instance is the most appropriate to handle the request. This highlights the anycast part of CFN-Dyncast, since flow are routed toward one service end-point among a set of equivalent , i.e., one- to-one-out-of-many. When a clients sends a service demand, it will be delivered to the most appropriate instance of the service attached to a CFN node. A service demand is normally the first packet of a data flow, not necessarily an explicit out of band service request. Once the CFN node has decided which instance has to serve the flow, flow affinity must be guaranteed, meaning that all packets belonging to the same flow have to go through the same service instance. Li, et al. Expires May 4, 2021 [Page 4] Internet-Draft CFN-dyncast Architecture October 2020 edge site 1 edge site 2 edge site 3 +------------+ +------------+ +------------+ | +-+----------+ | +------------+ |-+ | service | | | service | | | instance |-+ | instance |-+ +------------+ +------------+ | | | +-----------------+ | +----------+ | | +----------+ |CFN node 1| ----| Infrastructure |---- |CFN node 3| +----------+ | | +----------+ | +-----------------+ | | | | +----------+ +----------+ | CFN | |CFN node 2| |Dispatcher| +----------+ +----------+ | | | | | +-----+ +------+ +------+| +------+ | |client|+ |client|-+ +------+ +------+ Figure 1: CFN-Dyncast Architecture 4. Architectural Components and Interactions Figure 1 also shows that the local components of the architecture are service instance, CFN node, CFN dispatcher and client. The following subsections provide an overview of how some of these architectural components interact. The figures accompanying the examples do not show the interconnecting infrastructure to avoid making them too cluttered. 4.1. Service Identity and Bindings As previously stated, the CFN-Dyncast architecture uses Service ID (SID) and Binding ID (BID) in order to identify services and their instances. Service ID (SID) is an anycast service identifier (which may or may not be a routable IP address). It is used to access a specific service no matter which service instance eventually handles the Li, et al. Expires May 4, 2021 [Page 5] Internet-Draft CFN-dyncast Architecture October 2020 client's flow. CFN nodes must be able to know SIDs (and their bindings) in advance and must be able to identify which flow needs which service. This can be achieved in different ways, for example, use a special range or coding of anycast IP address as SID, or use DNS. Binding ID (BID) is a unicast IP address. It is usually the interface IP address of a service instance. Mapping and binding from a SID to a BID is dynamic and depends on the computing resousrces and network state at the time the service demand is made. The CFN node must be able to guarantee flow affinity, i.e., steering the flow always toward the same instance. Figure 2 shows an abstract example of the use of SIDs and BIDs. There are three services, namely SID1, SID2, and SID3. In particular, SID2 has two instances on different CFN nodes (CFN node 2 and CFN node 3). In this case the complete list of bindings (only in term of SID and BID, no network or resource state) are: o SID1:BID21 o SID2:BID22,BID32 o SID3:BID33 Li, et al. Expires May 4, 2021 [Page 6] Internet-Draft CFN-dyncast Architecture October 2020 SID: Service ID BID: Binding ID SID1 +--------+ service +--| BID21 | instance1 | +--------+ +----------+ | +------|CFN node 2|-------| SID2 | +----------+ | +--------+ service | +--| BID22 | instance2 | +--------+ | +------+ +----------+ |client|---|CFN node 1| SID2 +------+ +----------+ +--------+ service | +--| BID32 | instance3 | | +--------+ | +----------+ | +------|CFN node 3|-------| SID3 +----------+ | +--------+ service +--| BID33 | instance4 +--------+ Figure 2: CFN-Dyncast Architectural Concept Example 4.2. Service Notification between Instances and CFN node CFN-Dyncast service side is responsible to notify its attaching CFN node about the mapping information of SID and BID when a new service is instantiated, terminated, or its metrics (e.g., load) change, as shown in Figure 3. Li, et al. Expires May 4, 2021 [Page 7] Internet-Draft CFN-dyncast Architecture October 2020 SID: Service ID BID: Binding ID service info (SID1, BID21, metrics) (SID2, BID22, metrics) <---------------> SID1 +--------+ service +--| BID21 | instance1 | +--------+ +----------+ | +------|CFN node 2|-------| SID2 | +----------+ | +--------+ service | +--| BID22 | instance2 | +--------+ | +------+ +----------+ |client|---|CFN node 1| SID2 +------+ +----------+ +--------+ service | +--| BID32 | instance3 | | +--------+ | +----------+ | +------|CFN node 3|-------| SID3 +----------+ | +--------+ service +--| BID33 | instance4 +--------+ <----------------> service info (SID2, BID32, metrics) (SID3, BID32, metrics) Figure 3: CFN-Dyncast Service Notification Computing resource information of service instances is key information in CFN-Dyncast. Some of them are relatively static like CPU/GPU capacity, and some are very dynamic, for example, CPU/GPU utilization, number of sessions associated, number of queuing requests. The service side has to notify and refresh this information to its attaching CFN node. Various ways can be used, for instance via protocol or via an API of the management system. Conceptually, a CFN node keeps track of the SIDs and computing metrics of all service instances attached to it in real-time. Li, et al. Expires May 4, 2021 [Page 8] Internet-Draft CFN-dyncast Architecture October 2020 4.3. CFN Dyncast Control Plane CFN Dyncast needs a control plane allowing to share information about resources and costs. Through the control plane, CFN nodes share and update among themselves the service information and the associated computing metrics for the service instances attached to it. As a network node, CFN node also monitors the network state to other CFN nodes. In this way, each CFN node is able to aggregate the information and create a complete vision of the resources avaible and the cost to reach them. For instance, for the scenario in Figure 3, the different CFN nodes will learn that there exists two instances of SID2, each of which has a certain computational capacity expressed in the metrics. Different mechanisms can be used in updating the status, for instance, BGP [RFC4760], IGP or controller based mechanism. An important question CFN Dyncast raises is on the different ways to represent the computing metrics. A single digitalized value calculated from weighted attributes like CPU/GPU consumption and/or number of sessions associated may be the easiest. However, it may not accurately reflect the computing resources of interest. Multi- dimensional variables may give finer information, however the structure and the algorithmic processing should be sufficiently general to accommodate different type of services (i.e., metrics). A second important issue is related to the system stability and signaling overhead. As computing metrics may change very frequently, when and how frequent such information should be exchanged among CFN nodes should be determined. A spectrum of approaches can be employed, interval based update, threshold update, policy based update, etc. 4.4. Service Demand Dispatching Assuming that the set of metric are well defined and that the update rate is tailored so to have a stable system, the CFN Dyncast data plane has the task to dispatch flows to the "best" service instance. When a new flow comes to a CFN ingress, CFN ingress node selects the most appropriate CFN egress in terms of the network status and the computing resources of the attached service instances and guarantees flow affinity for the flow from now on. Flow affinity is one of the critical features that CFN-Dyncast should support. The flow affinity means the packets from the same flow for a service should always be sent to the same CFN egress to be processed by the same service instance. Li, et al. Expires May 4, 2021 [Page 9] Internet-Draft CFN-dyncast Architecture October 2020 At the time that the most appropriate CFN egress and service instance is determined when a new flow comes, a flow binding table should save this flow binding information which may include flow identifier, selected CFN node, affinity timeout value, etc. The subsequent packets of the flow are forwarded based on the table. Figure 4 shows an example of what a flow binding table at CFN ingress node can look like. +-----------------------------------------+------------+--------+ | Flow Identifier | | | +------+--------+---------+--------+------+ CFN egress | timeout| |src_IP| dst_IP |src_port |dst_port|proto | | | +------+--------+---------+--------+------+------------+--------+ | X | SID2 | - | 8888 | tcp | CFN node 2 | xxx | +------+--------+---------+--------+------+------------+--------+ | Y | SID2 | - | 8888 | tcp | CFN node 3 | xxx | +------+--------+---------+--------+------+------------+--------+ Figure 4: Example of flow binding table A flow entry in the flow binding table can be identified using the classic 5-tuple value. However, it is worth noting that different services may have different granularity of flow identification. For instance, an RTP video streaming may use different port numbers for video and audio, and it may be identified as two flows if 5-tuple flow identifier is used. However they certainly should be treated as the same flow. Therefore 3-tuple based flow identifier is more suitable for this case. Hence, it is desired to provide certain level of flexibility in identifying flows in order to apply flow affinity. Flow affinity attributes information can be configured per service in advance. For each service, the information can include the flow identifier type, affinity timeout value, etc. The flow identifier type can indicate what are the values, for instance, 5-tuple, 3-tuple or anything else that can be used as the flow identifier. Because we deal with single services the matching rules have to be disjoint, meaning that two different services need not have non-overlapping matching flow set. 4.5. CFN Dispatcher When a CFN node maintains the flow binding table, the memory consumed is determined by the number of flows that CFN ingress node handles. The ingress node can be an edge data center gateway, hence it may cover hundreds of thousands of users and each user may have tens of flows. The memory space consumption on binding table at the CFN Li, et al. Expires May 4, 2021 [Page 10] Internet-Draft CFN-dyncast Architecture October 2020 ingress node can be a concern. To alleviate it, a functional entity called CFN Dispatcher can help. CFN Dispatcher is deployed closer to the clients and it normally handles the flows for a limited number of clients. In this case, the memory space required by the binding table will be much smaller. CFN dispatcher is a client side located entity which directs traffic to an CFN egress node. It is not a CFN node itself, that is to say, it does not participate in the status update about network and computing metrics among CFN nodes. CFN dispatcher does not determine the best CFN egress to forward packets for a new flow by itself. It has to learn such information from a CFN node and maintains it to ensure the flow affinity for the subsequent packets. In this way, the CFN node simply selects the most appropriate egress for the new flows and informs CFN dispatcher in explicit or implicit way. It is relieved from flow binding table maintenance. Figure 5 shows the interaction between an CFN Dispatcher and a CFN node. After CFN node makes the service demand dispatch, it informs the CFN dispatcher about the selected CFN egress node for the flow. Then CFN dispatcher maintains the flow binding table to ensure the flow affinity. Message exchange between the CFN dispatcher and its corresponding CFN node needs to be defined. The CFN dispatcher can simply forward the first packet of a flow to the CFN node, who takes the decision of which instance to use and pushes this information in the flow binding table of the CFN dispatcher. However, in case of failures, e.g., CFN egress not reachable anymore, further interaction is needed between the CFN dispacther and the CFN node. Li, et al. Expires May 4, 2021 [Page 11] Internet-Draft CFN-dyncast Architecture October 2020 SID: Service ID BID: Binding ID SID1 +--------+ +--| BID21 | binding info | +--------+ (flow1,egress2) +----------+ | (flow2,egress3) +--|CFN node 2|---| SID2 <----- | +----------+ | +--------+ +------+ | +--| BID22 | |Client|-+ | +--------+ +------+ \ | \ | +--------------+ +----------+ |CFN Dispatcher|-----|CFN Node 1| +--------------+ +----------+ / | SID3 +------+ / | +--------+ |Client|-+ | +--| BID32 | +------+ | | +--------+ | +----------+ | +--|CFN node 3|---| SID3 +----------+ | +--------+ +--| BID33 | +--------+ Figure 5: Service Demand Dispatch with CFN Dispatcher 5. Summary of the key elements of CFN Dyncast Architecture o CFN Control Plane: * SID: CFN nodes have to made aware of existing services through the existence of the corresponding SID. It can be achieved in different ways. For example, use a special range or coding of anycast IP address as service IDs or use DNS. * BID bindings: SID are bound to a set of BID representing the different instances of the service. Associated to these BID there is as well a set of metrics describing the state of the instance. These bindings have to be shared among the CFN nodes so that they are aware of the different instances and their computing resource status. Li, et al. Expires May 4, 2021 [Page 12] Internet-Draft CFN-dyncast Architecture October 2020 * Network state: CFN nodes have to be able to share network status so to have an idea on the impact of the dispatching decision in terms of link congestion. * Metric and network status updates need to be sufficiently sparse so to limit the signaling overhead and keep the system stable, but also sufficiently regular so to make the system reactive to sudden traffic fluctuations. o CFN Data Plane: * In case of a new flow: CFN ingress node selects the most appropriate CFN egress in terms of the network status and the computing resource of the service instance attached to the egresses. * Flow affinity: CFN ingress nodes make sure the subsequent packets of an existing flow are always delivered to the same CFN egress node so that they can be served by the same service instance. 6. Conclusion (and call for contributions) This document introduces an architecture for CFN Dyncast, enabling the service demand request to be sent to an optimal edge to improve the overall system load balancing. It can dynamically adapt to the computing resources consumption and network status change and avoid overloading single edges. CFN-Dyncast is a network based architecture that supports a large number of edges and is independent of the applications or services hosted on the edge. This present document is a strawman for defining CFN-Dyncast architecure. More discussions on control plane and data plane approach are welcome. 7. Security Considerations TBD 8. IANA Considerations No IANA action is required so far. Li, et al. Expires May 4, 2021 [Page 13] Internet-Draft CFN-dyncast Architecture October 2020 9. Informative References [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol Extensions for BGP-4", RFC 4760, DOI 10.17487/RFC4760, January 2007, . [I-D.geng-rtgwg-cfn-dyncast-ps-usecase] Geng, L., Liu, P., and P. Willis, "Dynamic-Anycast in Compute First Networking (CFN-Dyncast) Use Cases and Problem Statement", draft-geng-rtgwg-cfn-dyncast-ps- usecase-00 (work in progress), October 2020. Acknowledgements TBD Authors' Addresses Yizhou Li Huawei Technologies Email: liyizhou@huawei.com Luigi Iannone Huawei Technologies Email: Luigi.iannone@huawei.com Jianfei He City University of Hong Kong Email: jianfeihe2-c@my.cityu.edu.hk Liang Geng China Mobile Email: gengliang@chinamobile.com Peng Liu China Mobile Email: liupengyjy@chinamobile.com Li, et al. Expires May 4, 2021 [Page 14] Internet-Draft CFN-dyncast Architecture October 2020 Yong Cui Tsinghua University Email: cuiyong@tsinghua.edu.cn Li, et al. Expires May 4, 2021 [Page 15]