Network Working Group D. Waltermire, Ed. Internet-Draft NIST Intended status: Informational May 16, 2012 Expires: November 17, 2012 Automated XML Content Data Exchange and Management draft-waltermire-content-repository-00 Abstract TBD... Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on November 17, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Waltermire Expires November 17, 2012 [Page 1] Internet-Draft Content Repository May 2012 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 7 1.2. Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.1. Content . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.2. Security Automation Content . . . . . . . . . . . . . . 7 1.2.3. Content Producer . . . . . . . . . . . . . . . . . . . 7 1.2.4. Content Consumer . . . . . . . . . . . . . . . . . . . 7 1.2.5. Content Bundle . . . . . . . . . . . . . . . . . . . . 7 2. Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1. The Content Metadata Model . . . . . . . . . . . . . . . . 7 2.2. Content federation . . . . . . . . . . . . . . . . . . . . 8 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 4. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.1. Normative References . . . . . . . . . . . . . . . . . . . 8 5.2. Informative References . . . . . . . . . . . . . . . . . . 9 Appendix A. Additional Stuff . . . . . . . . . . . . . . . . . . . 9 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 9 Waltermire Expires November 17, 2012 [Page 2] Internet-Draft Content Repository May 2012 1. Introduction Data-driven programming is a common paradigm in software engineering. When using this approach, a program is developed to process a series of data statements that describe the sequence of actions to be taken. These data statements, often referred to as content, provide the user with a dynamic degree of control over the function of the software. In many cases, this approach can lead to a proliferation of content. Without adequate content management and distribution capabilities, use of content can become impractical. It is common practice today to format content using the Extensible Markup Language XML . While many content management solutions exist today, few are designed to support the management and distribution of XML-based content. Current solutions largely focus on exploiting the raw XML syntax or a specific data model. Some solutions, such as XML databases, expose the raw syntax of XML for querying using techniques like XQuery. Other solutions utilize specialized database schema designed to support one or more specific data models represented in XML using XML Schema . These solutions are often brittle, inflexible to revisions of the underlying data models and do not adequately represent the logical information components used within data-driven programs. XML-based data-driven content is produced by many organizations in a range of formats, covering many different information domains. Where content repositories exist to support this content, they often operate independently and vary in the data models and capabilities they support. Rarely do these repositories interact and if they do it is through proprietary interfaces. Content consumers often have to manually download the content they want to use with their tools. In many cases they may want to customize this content for local use and must contend with managing updates to the content manually. One example of where data-driven programming is used is in the IT Security Automation community. Standardized security automation content is used to provide the instructions necessary for security tools to examine a computer's state to evaluate and report on the degree of compliance to configuration policies, to detect the presence of vulnerabilities, and to verify the installation state of patches. Other tools use data-driven content to collect and correlate digital events or to aggregate security information. Much of the focus in the security automation community has been on defining the standards and schemas for expressing security-related data in XML. Standardizing the methods for retrieval and exchange of security automation content has not been a primary area of focus. The content management challenges introduced by diverse data models, Waltermire Expires November 17, 2012 [Page 3] Internet-Draft Content Repository May 2012 decentralized production and use of content, and the proprietary nature of content repositories today create a need to define common content exchange requirements and mechanisms that will complement the content specifications and XML schemas. The following challenges are addressed by this specification: Distribution - In the absence of a standardized, automated distribution mechanism, content producers have no way to notify content consumers when new or updated content is available. Content consumers must manually import content at the point of use. This specification defines an automated notification mechanism that can be used to indicate to content consumers when new or updated content is available. The specification also defines the technical mechanisms used to exchange content between repositories, providing a standardized delivery mechanism to make remotely published content available at the point of use. Reuse Without a standardized method to search, retrieve and utilize existing content, both content consumers and producers have a tendency to recreate content. This duplication often causes content to become static or stale, introduces errors, and reduces the efficiency for developing content. In support of making content more reusable, this specification provides mechanisms for querying content so that it can be searched and gathered from many content providers. This allows organizations that are developing content to leverage, extend, and customize existing content from a variety of sources. This specification also defines a stable method of identifying blocks of externally provided content enabling content to be remotely referenced. This approach supports reuse and reduces the need for manual duplication across repositories. Interoperability Content repositories may require proprietary clients or tools to access their content. This hampers the ability for a content consumer to retrieve content from a variety of content sources using a single tool implementation. This specification standardizes the methods used to publish to and retrieve content from a content repository enabling standardized clients to be developed. Access to content repositories may be restricted or require the use of various standard or proprietary communication protocols (e.g. HTTP, FTP). Content is often packaged using various Waltermire Expires November 17, 2012 [Page 4] Internet-Draft Content Repository May 2012 file formats and compression algorithms, such as Zip, CAB or GZIP. Variation in these approaches hampers interoperability. This specification standardizes the communication protocol and distribution formats used promoting interoperability. Content packaging XML-based content is exchanged as XML documents, also called instances. This document centric view of information does not align well with how humans use information. Humans are more comfortable working with logical objects that represent a concept (e.g. rule, assessment check, logical construct) verses XML syntax. While XML Schema enables these concepts to be modeled, XML is still represented as a collection of elements and attributes. This specification defines a metamodel that identifies the logical objects that are represented in XML- based content and their boundaries within the XML model enabling content repositories to use the conceptual view of the content. This technique enables XML instances to be treated as containers of conceptual constructs. These conceptual constructs can be exchanged individually and can be composed into new documents dynamically based on metadata rules. This specification will provide a methodology for gathering and packaging content based on the needs or interest of the content consumer using a metadata approach. Integrity Content consumers and need assurances that the content that has been received has not been modified during the exchange process. This specification defines the use of automated mechanisms for verifying the integrity of exchanged content. Confidentiality In some scenarios, it is necessary to secure the exchange of content or restrict access to specific content. This specification will detail mechanisms for securing repository- to-repository and client-to-repository communications. Additionally this specification will specify authorization mechanisms that enable restricted access to content if needed. Content Version Management The content managed by content repositories may often undergo revision. When revisions occur, it is important to be able to Waltermire Expires November 17, 2012 [Page 5] Internet-Draft Content Repository May 2012 query specific revision to maintain the integrity of content bundles. This specification provides a query method that enables either a specific revision or the latest revision to be retrieved. This approach also enables remote references to include a content identifier and a specific revision. Model Revision Management Content repositories are often based on a specific data specification revision. When using this approach, updating content repository software to support specification revisions may require costly, time-consuming effort. Organizations maintaining content repositories may be reluctant to adopt new revisions or support old revisions due to this burden. This makes it difficult for a tool to use content based on an older or newer model revision. This specification defines properties within the metadata model to indicate where content is backwards and forwards compatible. These properties are then used to enable content to be provided based on the required model revision or to drive proper error handling where content is incompatible. For example the Open Vulnerability and Assessment Language (OVAL) versions content based on the major and minor revision of the OVAL XML schema. A repository containing OVAL content may have content ranging from OVAL 5.3 to 5.10. The difference in model version, while minor, could negatively impact a security tool's ability to properly process content that is outside of its expected range . This could cause tool errors or unexpected results to be produced. By using the model revision properties in the metamodel, the effective model revision of content returned from a content repository may be calculated based on the maximum schema revision used. Alternately, substitute content may be provided that supports a specific maximum schema revision provided in the query. By addressing these challenges, content producers will be able to effectively manage and share content they produce, and content consumers will be able to effectively use content provided by many different providers. By defining communication interfaces that can leverage existing communication protocols, we can begin to automate content distribution among disparate systems and make content more readily available. By defining a federated data model, we can establish rules and relationships of data types which allow for flexible content management with support for dynamic methods for collecting and bundling content for consumers. Sections [...] of this document focus on: Waltermire Expires November 17, 2012 [Page 6] Internet-Draft Content Repository May 2012 TBD 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 1.2. Terms 1.2.1. Content 1.2.2. Security Automation Content 1.2.3. Content Producer 1.2.4. Content Consumer 1.2.5. Content Bundle 2. Key Concepts This section provides a high-level overview of key concepts introduced in this specification. The first concept subsection describes a content metamodel that provides a needed level of abstraction over XML-based data models. The second subsection describes the federated content architectural approach defined within this specification. Through the use of these concepts, a robust, general purpose, distributed content management system is possible that supports automated content exchange between content consumers and producers. 2.1. The Content Metadata Model In order to create a generalized approach to XML-based content management it is necessary to generalize how XML-based data is processed by the content system. A variety of XML schema languages are used to define the syntax used to express a data model in XML. While these languages provide rules to constrain XML instance data, they do not adequately describe the information objects that exist within the model or the relationships between information objects. An information object is a block of XML data that represents a specific concept such as policy definition, a configuration setting or a scanning rule. Relationships represent cross references or links between information objects. Information objects and relationships are concepts that humans use to conceptualize the data model primitives that exist within content. In order for a content Waltermire Expires November 17, 2012 [Page 7] Internet-Draft Content Repository May 2012 management approach to be successful, a mechanism is needed that bridges the gap between the XML syntax understood by machines and the conceptual primitives that humans understand. The content metadata model provides this bridge. Within the content metamodel, an information object is represented as an entity definition. Complete this section... 2.2. Content federation Complete this section... Discuss Use of namespaces within content identifiers for repository lookup using DNS SRV records. Discuss using external namespaces for other cases. Discuss authoritative content repositories vs. caching repository content. Discuss using an architectural model similar to DNS for content repositories (e.g. local, forwarding, caching). 3. IANA Considerations This memo includes no request to IANA. 4. Security Considerations All drafts are required to have a security considerations section. See RFC 3552 [RFC3552] for a guide. 5. References 5.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Waltermire Expires November 17, 2012 [Page 8] Internet-Draft Content Repository May 2012 5.2. Informative References [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on Security Considerations", BCP 72, RFC 3552, July 2003. Appendix A. Additional Stuff This becomes an Appendix if needed. Author's Address David Waltermire (editor) National Institute of Standards and Technology 100 Bureau Drive Gaithersburg, Maryland 20877 USA Phone: Email: david.waltermire@nist.gov Waltermire Expires November 17, 2012 [Page 9]