Internet Research Task Force (IRTF) R. Krishnan Internet Draft Brocade Category: Informational Dilip Krishnaswamy IBM Research D. R. Lopez Telefonica I+D Asif Qamar Evolv Expires: April 2015 October 25, 2014 NFV Architectural Framework for Real-time Analytics and Orchestration draft-krishnan-nfvrg-real-time-analytics-orch-00 Abstract One of the key goals of NFV is to optimize the infrastructure resource usage while driving operational simplicity. Real-time analytics providing insight into various components such as compute (e.g. dynamic CPU utilization), storage (e.g. dynamic capacity usage), network (e.g. dynamic bandwidth utilization), energy (e.g. dynamic power consumption) is key to not only providing visibility into the NFV infrastructure and thus driving operational simplicity but also optimizing resource usage for the purposes of orchestration. This draft focusses on a NFV architecture for real- time analytics and orchestration including Big Data predictive analytics for addressing the aforementioned requirements. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Krishnan Expires April 2014 [Page 1] Internet-Draft NFV Real-time Analytics and Orchestration October 2013 The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire in April 2015. Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Table of Contents 1. Introduction...................................................3 2. Real-time Analytics Application for Optimizing Resource Utilization.......................................................3 2.1. Enhancements to Real-time Analytics Application...........4 2.1.1. Distributed Predictive Analytics.....................4 2.1.2. Detecting Noisy Neighbors............................4 2.1.3. Addressing security issues due to inconsistent configuration...............................................5 3. Summary........................................................5 4. Future Work....................................................6 5. IANA Considerations............................................6 6. Security Considerations........................................6 7. Contributors...................................................6 8. Acknowledgements...............................................6 9. References.....................................................6 9.1. Normative References......................................6 9.2. Informative References....................................6 Authors' Addresses................................................7 Krishnan Expires April 2015 [Page 2] Internet-Draft NFV Real-time Analytics and Orchestration October 2013 1. Introduction Operator Network Point-of-Presence (N-PoP) locations [ETSI-NFV-TERM] often have capacity, energy and other constraints. Thus, optimizing overall resource usage is an important requirement [ETSI-NFV-REQ]. The general case must consider a distributed (elastic) VNF platform implementation where VMs running for different VNFs (with different characteristics) can co-exist in the same physical server. This case must address the goal of optimizing overall resource usage through mechanisms like bin-packing [BIN-PACK]. In this context, some of the important challenges faced are . Performance issues due to noisy neighbor effect, where a VM running for a VNF can affect the VM(s) running for another VNF. . Security issues, especially due to inconsistent configuration in a dynamic environment where one VNF could affect others. . Energy Efficiency given that servers have substantial idle power usage. The purpose of this document is two-fold. First, it intends to discuss various possible solutions to address the above challenges. Second it will depict an architectural framework for real-time analytics and orchestration, applying the above solutions in a multi-vendor environment. 2. Real-time Analytics Application for Optimizing Resource Utilization A real-time analytics application periodically collects information from individual VMs, VNFs, physical servers, network elements etc. regarding various sub-systems such as compute (e.g. dynamic CPU utilization), storage (e.g. dynamic capacity usage), network (e.g. dynamic bandwidth utilization), energy (e.g. dynamic power consumption) through polling. The real-time analytics application computes the average utilization for VMs, VNFs, physical servers, networks etc. regarding the various sub-systems such as compute (e.g. average CPU utilization), storage (e.g. average capacity usage), network (e.g. average bandwidth utilization), energy (e.g. average power consumption). Using the average utilization information, the real-time analytics application provides real-time visibility into the NFV infrastructure thus driving operational efficiency. The NFV orchestrator uses the average utilization information from the real-time analytics application to determine the appropriate Krishnan Expires April 2015 [Page 3] Internet-Draft NFV Real-time Analytics and Orchestration October 2013 time to scale up/down the running software instances. Typically the thresholds for scale up/down are manually programmed into the system - this may not be performance optimal since the workloads and deployment scenarios can substantially vary. In addition, predictive analytics based on machine learning techniques [MACHINE-LEARNING-BOOK] can be used by the real-time analytics application to automatically determine the appropriate thresholds for scale up/down the running software instances for differing workloads including events related to social behavior (think of a YouTube video going viral) and deployment scenarios. This information can be used by the orchestrator for optimizing overall performance and maximizing energy efficiency. Maximizing energy efficiency comes from the fact that by determining the appropriate thresholds for scale up/down the workloads can be consolidated into a minimum set of physical resources so the rest of the unused physical resources can be completely powered off to avoid any idle power consumption. [SPEC-BENCHMARK] analyzes the power profile of physical servers from various vendors; the active idle power consumption of physical servers could be as much as 30%. 2.1. Enhancements to Real-time Analytics Application 2.1.1. Distributed Predictive Analytics A real-time analytics application could be notified of significant events by individual running software instances of VMs, VNFs etc. or by infrastructure elements such as physical servers, hypervisors etc. This helps reduce the rate of polling by the real-time analytics application and also helps in reacting to significant events such as overload much faster. The challenge in this case is to determine the appropriate thresholds (e.g. average power consumption has been higher than x Watts for t seconds) for event notification. Predictive analytics engines which use machine learning techniques [MACHINE-LEARNING-BOOK] can be used to determine the appropriate thresholds per running software instance and infrastructure element for different workloads and deployment scenarios. These predictive analytics engines can run in various nodes in the infrastructure in a distributed predictive analytics architectural framework. 2.1.2. Detecting Noisy Neighbors In the context of multiple VNFs, "Noisy Neighbor Effect" could be defined as follows: the VM running for one VNF can affect the performance of a VM running for another VNF in the case where they Krishnan Expires April 2015 [Page 4] Internet-Draft NFV Real-time Analytics and Orchestration October 2013 are using the same physical resources (physical servers, physical network elements). A real-time analytics application could help in detecting and mitigating the noisy neighbor effect. A good example is the case where the VMs running for two VNFs share the same physical server, are memory access intensive (load balancers, firewalls etc.) and have correlated memory access patterns for the given workload and deployment scenario. Real-time big data analytics techniques [RT-ANALYTICS-BOOK] can be used by the analytics application to determine such correlation patterns which can affect performance in real-time. Additionally, predictive analytics based on machine learning techniques [MACHINE- LEARNING-BOOK] can be used to predict the frequency and duration of such correlation patterns. This information can be used to create dynamic anti-affinity rules for VM placement and migration including redundancy considerations - e.g. VMs of VNF "A" cannot co-exist with VMs of VNF "B". 2.1.3. Addressing security issues due to inconsistent configuration NFV configuration is expected to be dynamic, especially in the edge NFV PoPs where capacity is limited; a very good example is handling a viral event such as mobile gaming application. While autonomic networking techniques could be used to automate the configuration process including modular updates, it is important to take into account that incomplete and/or inconsistent configuration may lead to security issues. Distributed VNF implementations (e.g. VMs of single VNF which span different physical servers) typically use an eventually consistent configuration model [CAP-THEOREM] for scalability reasons -- this poses additional security challenges. Real-time analytics techniques [RT-ANALYTICS-BOOK] can be used by the analytics application to determine communication pattern anomalies due to incomplete and/or inconsistent configuration in real-time by analyzing event logs. Additionally, predictive analytics based on machine learning techniques [MACHINE-LEARNING- BOOK] can be used to predict the frequency and duration of such communication pattern anomalies. A simple example is a flow-specific firewall rule which never got installed due to reasons such as control plane messaging issues, data plane table full condition etc. 3. Summary TBD Krishnan Expires April 2015 [Page 5] Internet-Draft NFV Real-time Analytics and Orchestration October 2013 4. Future Work TBD 5. IANA Considerations This draft does not have any IANA considerations. 6. Security Considerations 7. Contributors 8. Acknowledgements None. 9. References 9.1. Normative References 9.2. Informative References [ETSI-NFV-WHITE] "ETSI NFV White Paper," http://portal.etsi.org/NFV/NFV_White_Paper.pdf [ETSI-NFV-USE-CASES] "ETSI NFV Use Cases," http://www.etsi.org/deliver/etsi_gs/NFV/001_099/001/01.01.01_60/gs_N FV001v010101p.pdf [ETSI-NFV-REQ] "ETSI NFV Virtualization Requirements," http://www.etsi.org/deliver/etsi_gs/NFV/001_099/004/01.01.01_60/gs_N FV004v010101p.pdf [ETSI-NFV-ARCH] "ETSI NFV Architectural Framework," http://www.etsi.org/deliver/etsi_gs/NFV/001_099/002/01.01.01_60/gs_N FV002v010101p.pdf [ETSI-NFV-TERM] "Terminology for Main Concepts in NFV," http://www.etsi.org/deliver/etsi_gs/NFV/001_099/003/01.01.01_60/gs_n fv003v010101p.pdf [OPENSTACK] "OpenStack Open Source Software," https://www.openstack.org/ [OPENSTACK-CONGRESS-POLICY-ENGINE] "A policy as a service open source project in OpenStack," https://wiki.openstack.org/wiki/Congress Krishnan Expires April 2015 [Page 6] Internet-Draft NFV Real-time Analytics and Orchestration October 2013 [OPENSTACK-CELIOMETER-MEASUREMENT] "OpenStack Celiometer," http://docs.openstack.org/developer/ceilometer/measurements.html [OPENSTACK-NOVA-COMPUTE] "OpenStack Nova," https://wiki.openstack.org/wiki/Nova [NFV-MANO-SPEC] "NFV Management and Orchestration Framework Specification," http://docbox.etsi.org/ISG/NFV/Open/Latest_Drafts/NFV-MAN001v061- %20management%20and%20orchestration.pdf [BIN-PACK] Coffman, Jr., E., M. Garey, and D. Johnson. Approximation Algorithms for Bin-Packing -- An Updated Survey. In Algorithm Design for Computer System Design, ed. by Ausiello, Lucertini, and Serafini. Springer-Verlag, 1984. [SPEC-BENCHMARK] "SPEC Benchmark Results: HP Proliant DL380p Rack Server," http://i.dell.com/sites/doccontent/shared-content/data- sheets/en/Documents/Comparing-Dell-R720-and-HP-Proliant-DL380p-Gen8- Servers.pdf [CAP-THEOREM] Eric Brewer, "CAP twelve years later: How the "rules" have changed", IEEE Explore, Volume 45, Issue 2 (2012), pg. 23-29. [MACHINE-LEARNING-BOOK] Ian H. Witten et al., "Practical Machine Learning Tools and Techniques, Third Edition," Morgan Kaufmann, 2011 [RT-ANALYTICS-BOOK] Byron Ellis, "Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data," Wiley, 2014 Authors' Addresses Ram (Ramki) Krishnan Brocade Communications ramk@brocade.com Dilip Krishnaswamy IBM Research dilikris@in.ibm.com Diego Lopez Telefonica I+D Don Ramon de la Cruz, 82 Madrid, 28006, Spain +34 913 129 041 diego.r.lopez@telefonica.com Krishnan Expires April 2015 [Page 7] Internet-Draft NFV Real-time Analytics and Orchestration October 2013 Asif Qamar Evolv asif@asifqamar.com Krishnan Expires April 2015 [Page 8]