Network Working Group                                 D. Waltermire, Ed.
Internet-Draft                                                      NIST
Intended status: Informational                              May 16, 2012
Expires: November 17, 2012


           Automated XML Content Data Exchange and Management
                 draft-waltermire-content-repository-00

Abstract

   TBD...

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on November 17, 2012.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Waltermire              Expires November 17, 2012               [Page 1]

Internet-Draft             Content Repository                   May 2012


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . . . 7
     1.2.  Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
       1.2.1.  Content . . . . . . . . . . . . . . . . . . . . . . . . 7
       1.2.2.  Security Automation Content . . . . . . . . . . . . . . 7
       1.2.3.  Content Producer  . . . . . . . . . . . . . . . . . . . 7
       1.2.4.  Content Consumer  . . . . . . . . . . . . . . . . . . . 7
       1.2.5.  Content Bundle  . . . . . . . . . . . . . . . . . . . . 7
   2.  Key Concepts  . . . . . . . . . . . . . . . . . . . . . . . . . 7
     2.1.  The Content Metadata Model  . . . . . . . . . . . . . . . . 7
     2.2.  Content federation  . . . . . . . . . . . . . . . . . . . . 8
   3.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8
   4.  Security Considerations . . . . . . . . . . . . . . . . . . . . 8
   5.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 8
     5.1.  Normative References  . . . . . . . . . . . . . . . . . . . 8
     5.2.  Informative References  . . . . . . . . . . . . . . . . . . 9
   Appendix A.  Additional Stuff . . . . . . . . . . . . . . . . . . . 9
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . . . 9


Waltermire              Expires November 17, 2012               [Page 2]

Internet-Draft             Content Repository                   May 2012


1.  Introduction

   Data-driven programming is a common paradigm in software engineering.
   When using this approach, a program is developed to process a series
   of data statements that describe the sequence of actions to be taken.
   These data statements, often referred to as content, provide the user
   with a dynamic degree of control over the function of the software.
   In many cases, this approach can lead to a proliferation of content.
   Without adequate content management and distribution capabilities,
   use of content can become impractical.

   It is common practice today to format content using the Extensible
   Markup Language XML .  While many content management solutions exist
   today, few are designed to support the management and distribution of
   XML-based content.  Current solutions largely focus on exploiting the
   raw XML syntax or a specific data model.  Some solutions, such as XML
   databases, expose the raw syntax of XML for querying using techniques
   like XQuery.  Other solutions utilize specialized database schema
   designed to support one or more specific data models represented in
   XML using XML Schema .  These solutions are often brittle, inflexible
   to revisions of the underlying data models and do not adequately
   represent the logical information components used within data-driven
   programs.

   XML-based data-driven content is produced by many organizations in a
   range of formats, covering many different information domains.  Where
   content repositories exist to support this content, they often
   operate independently and vary in the data models and capabilities
   they support.  Rarely do these repositories interact and if they do
   it is through proprietary interfaces.  Content consumers often have
   to manually download the content they want to use with their tools.
   In many cases they may want to customize this content for local use
   and must contend with managing updates to the content manually.

   One example of where data-driven programming is used is in the IT
   Security Automation community.  Standardized security automation
   content is used to provide the instructions necessary for security
   tools to examine a computer's state to evaluate and report on the
   degree of compliance to configuration policies, to detect the
   presence of vulnerabilities, and to verify the installation state of
   patches.  Other tools use data-driven content to collect and
   correlate digital events or to aggregate security information.  Much
   of the focus in the security automation community has been on
   defining the standards and schemas for expressing security-related
   data in XML.  Standardizing the methods for retrieval and exchange of
   security automation content has not been a primary area of focus.

   The content management challenges introduced by diverse data models,


Waltermire              Expires November 17, 2012               [Page 3]

Internet-Draft             Content Repository                   May 2012


   decentralized production and use of content, and the proprietary
   nature of content repositories today create a need to define common
   content exchange requirements and mechanisms that will complement the
   content specifications and XML schemas.

   The following challenges are addressed by this specification:

      Distribution - In the absence of a standardized, automated
      distribution mechanism, content producers have no way to notify
      content consumers when new or updated content is available.
      Content consumers must manually import content at the point of
      use.  This specification defines an automated notification
      mechanism that can be used to indicate to content consumers when
      new or updated content is available.  The specification also
      defines the technical mechanisms used to exchange content between
      repositories, providing a standardized delivery mechanism to make
      remotely published content available at the point of use.

      Reuse

         Without a standardized method to search, retrieve and utilize
         existing content, both content consumers and producers have a
         tendency to recreate content.  This duplication often causes
         content to become static or stale, introduces errors, and
         reduces the efficiency for developing content.  In support of
         making content more reusable, this specification provides
         mechanisms for querying content so that it can be searched and
         gathered from many content providers.  This allows
         organizations that are developing content to leverage, extend,
         and customize existing content from a variety of sources.  This
         specification also defines a stable method of identifying
         blocks of externally provided content enabling content to be
         remotely referenced.  This approach supports reuse and reduces
         the need for manual duplication across repositories.

      Interoperability

         Content repositories may require proprietary clients or tools
         to access their content.  This hampers the ability for a
         content consumer to retrieve content from a variety of content
         sources using a single tool implementation.  This specification
         standardizes the methods used to publish to and retrieve
         content from a content repository enabling standardized clients
         to be developed.

         Access to content repositories may be restricted or require the
         use of various standard or proprietary communication protocols
         (e.g.  HTTP, FTP).  Content is often packaged using various


Waltermire              Expires November 17, 2012               [Page 4]

Internet-Draft             Content Repository                   May 2012


         file formats and compression algorithms, such as Zip, CAB or
         GZIP.  Variation in these approaches hampers interoperability.
         This specification standardizes the communication protocol and
         distribution formats used promoting interoperability.

      Content packaging

         XML-based content is exchanged as XML documents, also called
         instances.  This document centric view of information does not
         align well with how humans use information.  Humans are more
         comfortable working with logical objects that represent a
         concept (e.g. rule, assessment check, logical construct) verses
         XML syntax.  While XML Schema enables these concepts to be
         modeled, XML is still represented as a collection of elements
         and attributes.  This specification defines a metamodel that
         identifies the logical objects that are represented in XML-
         based content and their boundaries within the XML model
         enabling content repositories to use the conceptual view of the
         content.

         This technique enables XML instances to be treated as
         containers of conceptual constructs.  These conceptual
         constructs can be exchanged individually and can be composed
         into new documents dynamically based on metadata rules.  This
         specification will provide a methodology for gathering and
         packaging content based on the needs or interest of the content
         consumer using a metadata approach.

      Integrity

         Content consumers and need assurances that the content that has
         been received has not been modified during the exchange
         process.  This specification defines the use of automated
         mechanisms for verifying the integrity of exchanged content.

      Confidentiality

         In some scenarios, it is necessary to secure the exchange of
         content or restrict access to specific content.  This
         specification will detail mechanisms for securing repository-
         to-repository and client-to-repository communications.
         Additionally this specification will specify authorization
         mechanisms that enable restricted access to content if needed.

      Content Version Management

         The content managed by content repositories may often undergo
         revision.  When revisions occur, it is important to be able to


Waltermire              Expires November 17, 2012               [Page 5]

Internet-Draft             Content Repository                   May 2012


         query specific revision to maintain the integrity of content
         bundles.  This specification provides a query method that
         enables either a specific revision or the latest revision to be
         retrieved.  This approach also enables remote references to
         include a content identifier and a specific revision.

      Model Revision Management

         Content repositories are often based on a specific data
         specification revision.  When using this approach, updating
         content repository software to support specification revisions
         may require costly, time-consuming effort.  Organizations
         maintaining content repositories may be reluctant to adopt new
         revisions or support old revisions due to this burden.  This
         makes it difficult for a tool to use content based on an older
         or newer model revision.  This specification defines properties
         within the metadata model to indicate where content is
         backwards and forwards compatible.  These properties are then
         used to enable content to be provided based on the required
         model revision or to drive proper error handling where content
         is incompatible.

         For example the Open Vulnerability and Assessment Language
         (OVAL) versions content based on the major and minor revision
         of the OVAL XML schema.  A repository containing OVAL content
         may have content ranging from OVAL 5.3 to 5.10.  The difference
         in model version, while minor, could negatively impact a
         security tool's ability to properly process content that is
         outside of its expected range .  This could cause tool errors
         or unexpected results to be produced.  By using the model
         revision properties in the metamodel, the effective model
         revision of content returned from a content repository may be
         calculated based on the maximum schema revision used.
         Alternately, substitute content may be provided that supports a
         specific maximum schema revision provided in the query.

   By addressing these challenges, content producers will be able to
   effectively manage and share content they produce, and content
   consumers will be able to effectively use content provided by many
   different providers.  By defining communication interfaces that can
   leverage existing communication protocols, we can begin to automate
   content distribution among disparate systems and make content more
   readily available.  By defining a federated data model, we can
   establish rules and relationships of data types which allow for
   flexible content management with support for dynamic methods for
   collecting and bundling content for consumers.

   Sections [...] of this document focus on:


Waltermire              Expires November 17, 2012               [Page 6]

Internet-Draft             Content Repository                   May 2012


      TBD

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

1.2.  Terms

1.2.1.  Content

1.2.2.  Security Automation Content

1.2.3.  Content Producer

1.2.4.  Content Consumer

1.2.5.  Content Bundle


2.  Key Concepts

   This section provides a high-level overview of key concepts
   introduced in this specification.  The first concept subsection
   describes a content metamodel that provides a needed level of
   abstraction over XML-based data models.  The second subsection
   describes the federated content architectural approach defined within
   this specification.  Through the use of these concepts, a robust,
   general purpose, distributed content management system is possible
   that supports automated content exchange between content consumers
   and producers.

2.1.  The Content Metadata Model

   In order to create a generalized approach to XML-based content
   management it is necessary to generalize how XML-based data is
   processed by the content system.  A variety of XML schema languages
   are used to define the syntax used to express a data model in XML.
   While these languages provide rules to constrain XML instance data,
   they do not adequately describe the information objects that exist
   within the model or the relationships between information objects.
   An information object is a block of XML data that represents a
   specific concept such as policy definition, a configuration setting
   or a scanning rule.  Relationships represent cross references or
   links between information objects.  Information objects and
   relationships are concepts that humans use to conceptualize the data
   model primitives that exist within content.  In order for a content


Waltermire              Expires November 17, 2012               [Page 7]

Internet-Draft             Content Repository                   May 2012


   management approach to be successful, a mechanism is needed that
   bridges the gap between the XML syntax understood by machines and the
   conceptual primitives that humans understand.  The content metadata
   model provides this bridge.

   Within the content metamodel, an information object is represented as
   an entity definition.

   Complete this section...

2.2.  Content federation

   Complete this section...

   Discuss

      Use of namespaces within content identifiers for repository lookup
      using DNS SRV records.  Discuss using external namespaces for
      other cases.

      Discuss authoritative content repositories vs. caching repository
      content.

      Discuss using an architectural model similar to DNS for content
      repositories (e.g. local, forwarding, caching).


3.  IANA Considerations

   This memo includes no request to IANA.


4.  Security Considerations

   All drafts are required to have a security considerations section.
   See RFC 3552 [RFC3552] for a guide.


5.  References

5.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.


Waltermire              Expires November 17, 2012               [Page 8]

Internet-Draft             Content Repository                   May 2012


5.2.  Informative References

   [RFC3552]  Rescorla, E. and B. Korver, "Guidelines for Writing RFC
              Text on Security Considerations", BCP 72, RFC 3552,
              July 2003.


Appendix A.  Additional Stuff

   This becomes an Appendix if needed.


Author's Address

   David Waltermire (editor)
   National Institute of Standards and Technology
   100 Bureau Drive
   Gaithersburg, Maryland  20877
   USA

   Phone:
   Email: david.waltermire@nist.gov


Waltermire              Expires November 17, 2012               [Page 9]