Network Working Group Internet-Draft C. Wilper Intended status: Experimental DuraSpace Expires: October 15, 2012 April 13, 2012 Semantic Content Packages (SCP) draft-wilper-semantic-content-pkgs-00 Abstract This document specifies Semantic Content Packages, an experimental data structure and associated format for storing and transmitting a named set of RDF statements with a set of content streams. Packages can be arranged as a set of files in a directory hierarchy or serialized into a single stream for transmission and archival storage. For the latter, a new ZIP-based media type, application/scp, is defined. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on October 15, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. C. Wilper Expires October 15, 2012 [Page 1] Internet Draft semantic-content-pkgs April 13, 2012 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. Notational Conventions . . . . . . . . . . . . . . . . . . 3 2. The SCP Data Structure . . . . . . . . . . . . . . . . . . . . 4 3. The SCP Vocabulary . . . . . . . . . . . . . . . . . . . . . . 4 3.1. Bytestream . . . . . . . . . . . . . . . . . . . . . . . . 4 3.2. ContentLocation . . . . . . . . . . . . . . . . . . . . . . 4 3.2.1. ResolvableURI . . . . . . . . . . . . . . . . . . . . . 5 3.2.2. PackagePath . . . . . . . . . . . . . . . . . . . . . . 5 3.3. Package . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.4. PackageType . . . . . . . . . . . . . . . . . . . . . . . . 5 4. The SCP Format . . . . . . . . . . . . . . . . . . . . . . . . 5 4.1. Directory Structure . . . . . . . . . . . . . . . . . . . 5 4.2. ZIP-based Serialization . . . . . . . . . . . . . . . . . 6 4.3. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.3.1. A Minimal Package . . . . . . . . . . . . . . . . . . 6 4.3.2. A Typed Package with an Image and Statements . . . . . 7 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 7.1. Normative References . . . . . . . . . . . . . . . . . . . 8 7.1. Informative References . . . . . . . . . . . . . . . . . . 8 Appendix A. SCP-OWL-Ontology.ttl . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 C. Wilper Expires October 15, 2012 [Page 2] Internet Draft semantic-content-pkgs April 13, 2012 1. Introduction 1.1. Motivation In data management, the primitives used to describe structured and semi-structured data are usually determined by the kind of storage technology at hand. For example, data stored in a relational database is modeled using the well-known entity-relationship paradigm, but data stored in a key-value store is modeled using a different set of primitives. Over time, high-value data will necessarily be migrated from one storage technology to another. This is often complicated by the need to translate the data model to the target storage technology while retaining the original semantics. The Resource Description Framework (RDF) provides a core data model capable of expressing detailed descriptions about any kind of resource in a way that is independent of any storage technology or protocol. With this simple, but powerful capability, RDF can form the basis of a more durable kind of content modeling paradigm. Semantic Content Packages build upon the core RDF model by providing a way to logically and physically bundle a set of statements with a set of content streams, and to include a unique identifier within each bundle. 1.2. Notational Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. C. Wilper Expires October 15, 2012 [Page 3] Internet Draft semantic-content-pkgs April 13, 2012 2. The SCP Data Structure A Semantic Content Package is a logical container consisting of: o A set of finite-length octet stream values, also known as content, each with a distinct location within the package, a path. o An id which is a Uniform Resource Identifier (URI). Every package MUST contain a content stream at path '.scpi/id' whose value is an ASCII-encoded URI identifying the package. o A set of Resource Description Framework (RDF) triples, also known as statements. The path '.scpi/graph.ttl' is reserved for an RDF Turtle [W3C.SUBM-turtle-20110328] serialization of the statements, if any. 3. The SCP Vocabulary SCP defines a minimal RDF vocabulary in order to: 1. Provide a uniform, but OPTIONAL way to refer to content within a package. This allows assertions to be made about the content using other, more appropriate vocabularies such as Dublin Core, SKOS, PREMIS-OWL, and OAI-ORE [TBD: spec refs under Informative References] 2. Allow packages to declare the compound data models, if any, to which they conform. All terms in this vocabulary are defined as Web Ontology Language (OWL) classes or properties, and are described below. See Appendix A for the official, machine-readable ontology. 3.1. Bytestream A Bytestream is defined as a specific sequence of octets that exists conceptually regardless of where the content might be found. The "location" property refers to a ContentLocation. A Bytestream may refer to any number of ContentLocations. 3.2. ContentLocation A ContentLocation indicates where the content of Bytestream is expected to be found. Two subclasses are defined by this specification: C. Wilper Expires October 15, 2012 [Page 4] Internet Draft semantic-content-pkgs April 13, 2012 3.2.1. ResolvableURI A ResolvableURI is a kind of ContentLocation with a URI that can be deferenced. 3.2.2. PackagePath A PackagePath is a kind of ContentLocation that points to a path within a Package. The "path" property refers to a Plain Literal value, such as "path/to/file.txt", and the "inPackage" property refers to the Package. 3.3. Package A Package is a container for content and statements. The "type" property refers to a PackageType. 3.4. PackageType A PackageType is a particular kind of package. Some sort of data model is usually associated with a PackageType, but the means by which package types are described or represented is not expected to be uniform, nor is a method suggested by this specification. 4. The SCP Format Packages that adhere to the SCP data model MAY be stored in a variety of ways. This specification defines two. 4.1. Directory Structure The SCP model is designed to fit naturally into a filesystem directory structure. Each content stream is available from some root directory path, and the 'path' of that content stream in the package is the relative path from the root directory of the package. For example, if the root directory of a package is /tmp/mypackage, there MUST be a directory whose full path is '/tmp/mypackage/.scpi' with at least one file in it named 'id', whose content is the id of the package. C. Wilper Expires October 15, 2012 [Page 5] Internet Draft semantic-content-pkgs April 13, 2012 4.2. ZIP-based Serialization A new ZIP-based Internet Media Type, application/scp (extension .scp) is defined by this specification as a way to serialize packages. The ZIP file is structured such that unzipping it will instantiate the package in a directory structure as described in the section above. It follows that it MUST contain an item named '.scpi/id' whose content is the id of the package. 4.3. Examples The following examples illustrate packages in a directory structure. 4.3.1. A Minimal Package In order to represent a package, a directory is only required to have one file in a particular place to declare id, as shown below: example-package-1/ | -- .scpi/ | | id | (http://example.org/pkg1) - C. Wilper Expires October 15, 2012 [Page 6] Internet Draft semantic-content-pkgs April 13, 2012 4.3.2. A Typed Package with an Image and Statements This example shows a package containing a single image bundled with several statements, including: o An assertion of the "type" of package. o An assertion from the Dublin Core vocabulary that the format of the image is "image/jpeg". o An assertion that the image Bytestream is expected to be available in three known locations, one being a path within the package itself. example-package-2/ | | image.jpg | (binary data) | -- .scpi/ | | id | (http://example.org/pkg2) | | graph.ttl ( | @prefix dc: . | @prefix rdf: . | @prefix scp: . | | rdf:type scp:Package ; | scp:type . | _img rdf:type scp:Bytestream ; | dc:format "image/jpeg" ; | scp:location _path , | , | . | _path rdf:type scp:PackagePath ; | scp:path "image1.jpg" ; | scp:inPackage . | ) - C. Wilper Expires October 15, 2012 [Page 7] Internet Draft semantic-content-pkgs April 13, 2012 5. IANA Considerations A new media type shall be registered for this format. Type name: application Subtype name: scp Required parameters: none Optional parameters: none 6. Security Considerations The security considerations for this specification are the union of those that apply to processing Turtle RDF and ZIP files. 7. References 7.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. [W3C.SUBM-turtle-20110328] Beckett, T. and Berners-Lee, T., "Turtle - Terse RDF Triple Language", W3C Team Submission SUBM-turtle-20110328, March 2011, Latest version available at . [TBD: Refs to RDF, OWL, ZIP, etc.] 7.1. Informative References [TBD: Refs to DC, SKOS, OAI-ORE, PREMIS-OWL, etc.] C. Wilper Expires October 15, 2012 [Page 8] Internet Draft semantic-content-pkgs April 13, 2012 Appendix A. SCP-OWL-Ontology.ttl # Namespaces @prefix owl: . @prefix rdf: . @prefix rdfs: . @prefix scp: . # Classes scp:Bytestream rdf:type owl:Class . scp:ContentLocation rdf:type owl:Class . scp:ResolvableURI rdfs:subClassOf scp:ContentLocation . scp:PackagePath rdfs:subClassOf scp:ContentLocation . scp:Package rdf:type owl:Class . scp:PackageType rdf:type owl:Class . # Properties scp:location rdf:type owl:ObjectProperty ; rdfs:domain scp:Bytestream ; rdfs:range scp:ContentLocation . scp:path rdf:type owl:DatatypeProperty , owl:FunctionalProperty ; rdfs:domain scp:PackagePath ; rdfs:range rdf:PlainLiteral . scp:inPackage rdf:type owl:ObjectProperty , owl:FunctionalProperty ; rdfs:domain scp:PackagePath ; rdfs:range scp:Package . scp:type rdf:type owl:ObjectProperty ; rdfs:domain scp:Package ; rdfs:range scp:PackageType . Authors' Addresses Chris Wilper West Henrietta, NY USA Email: cwilper@gmail.com C. Wilper Expires October 15, 2012 [Page 9]