TOC 
Network Working GroupL. Romary
Internet-DraftTEI Consortium and INRIA
Intended status: InformationalS. Lundberg
Expires: May 16, 2011The Royal Library, Copenhagen
 November 12, 2010


The 'application/tei+xml' mediatype
draft-lundberg-app-tei-xml-09

Abstract

This document defines the 'application/tei+xml' media type for markup languages defined in accordance with the Text Encoding and Interchange guidelines

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

This Internet-Draft will expire on May 16, 2011.

Copyright Notice

Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.



Table of Contents

1.  Introduction
2.  Recognizing TEI files
3.  Fragment identifier
4.  Security considerations
    4.1.  Harmful content
    4.2.  Intellectual Property Rights
    4.3.  Authenticity and confidentiality
5.  IANA Considerations
    5.1.  Registration of MIME type 'application/tei+xml'
6.  References
    6.1.  Normative References
    6.2.  Informative References
§  Authors' Addresses




 TOC 

1.  Introduction

Text Encoding and Interchange (TEI) is an international and interdisciplinary standard that is widely used by libraries, museums, publishers, and individual scholars to represent all kinds of textual material for online research and teaching.[TEI] (, “TEI Guidelines,” .)

This document defines the 'application/tei+xml' media type in accordance with [RFC3023] (Murata, M., St. Laurent, S., and D. Kohn, “XML Media Types,” January 2001.) in order enable generic processing of such documents on the Internet using eXtensible Markup Language (XML)[W3C.REC‑xml‑20081126] (Maler, E., Yergeau, F., Sperberg-McQueen, C., Paoli, J., and T. Bray, “Extensible Markup Language (XML) 1.0 (Fifth Edition),” November 2008.) technologies.



 TOC 

2.  Recognizing TEI files

TEI files are XML documents or fragments having the root element (as defined in [W3C.REC‑xml‑20081126] (Maler, E., Yergeau, F., Sperberg-McQueen, C., Paoli, J., and T. Bray, “Extensible Markup Language (XML) 1.0 (Fifth Edition),” November 2008.)) in a TEI namespace. TEI namespace names are defined as an Universal Resource Identifier (URI) [RFC3986] (Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.) in accordance with [W3C.REC‑xml‑names‑20091208] (Thompson, H., Hollander, D., Tobin, R., Bray, T., and A. Layman, “Namespaces in XML 1.0 (Third Edition),” December 2009.) and begins with http://www.tei-c.org/ns/ followed by the version number of the namespace. The current namespace is http://www.tei-c.org/ns/1.0

The most common root element names for TEI documents are

<TEI>

<teiCorpus>

The teiCorpus documents give the possibility to bundle multiple documents into a single file.

Examples:

A document having <TEI> root element

	    <?xml version="1.0" encoding="UTF-8" ?>
	    <TEI xmlns="http://www.tei-c.org/ns/1.0">
	       <teiHeader>
	       ...
	       </teiHeader>
	       <text>
	       ...
	       </text>
	    </TEI>

A document having <teiCorpus> root element

	    <?xml version="1.0" encoding="UTF-8" ?>
	    <teiCorpus  xmlns="http://www.tei-c.org/ns/1.0">
	       <teiHeader>
	       ...
	       </teiHeader>
	       <TEI>
	          <teiHeader>
	          ...
	          </teiHeader>
	          <text>
	          ...
	          </text>
	       </TEI>
	       <TEI>
	       ... second document ...
	       </TEI>
	       <TEI>
	       ... third document  ...
	       </TEI>
	    </teiCorpus>

TEI and teiCorpus files are often given the extensions .tei and .teiCorpus, respectively. There is a third type of file, which often is given the suffix .odd. ODD (‘One Document Does it All’) is a TEI XML document which include schema fragments, prose documentation, and reference documentation. It is used for the definition and documentation of XML based languages, and primarily for the TEI Guidelines.[ODD] (, “Getting Started with P5 ODDs,” .) In other words, ODD files does not differ from other TEI file in syntax, only in function.



 TOC 

3.  Fragment identifier

Documents having the media type 'application/tei+xml', use the fragment identifier notation as specified in [RFC3023] (Murata, M., St. Laurent, S., and D. Kohn, “XML Media Types,” January 2001.) for the media type 'application/xml'.



 TOC 

4.  Security considerations

An XML resource does not in itself compromise data security. When being available on a network simply through the dereferencing of an Internationalized Resource Identifier (IRI) [RFC3987] (Duerst, M. and M. Suignard, “Internationalized Resource Identifiers (IRIs),” January 2005.) or an URI, care must be taken to properly interpret the data to prevent unintended access. Hence the security issues of [RFC3986] (Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.), section 7, apply. In addition, as this media type uses the "+xml" convention, it shares the same security considerations as described in RFC 3023 [RFC3023] (Murata, M., St. Laurent, S., and D. Kohn, “XML Media Types,” January 2001.), section 10. In general, security issues related to the use of XML in IETF protocols are treated in RFC 3470[RFC3470] (Hollenbeck, S., Rose, M., and L. Masinter, “Guidelines for the Use of Extensible Markup Language (XML) within IETF Protocols,” January 2003.), section 7. We will not try to duplicate this material, but review some aspects that are important for document centric XML as applied to text encoding.



 TOC 

4.1.  Harmful content

Any application accepting submitted or retrieving TEI XML for processing has to be aware of risks connected with injection of harmful scripts and executable XML. XML inclusion[W3C.REC‑xinclude‑20061115] (Orchard, D., Marsh, J., and D. Veillard, “XML Inclusions (XInclude) Version 1.0 (Second Edition),” November 2006.) and the use of external entities, are vulnerable to various forms of spoofing but can also reveal aspects of a service in a way that may compromise its security. Any vulnerability of these kinds are, however, application specific. The TEI namespaces do not contain such elements.



 TOC 

4.2.  Intellectual Property Rights

TEI documents often arise in digitization of cultural heritage materials. Texts made accessible in TEI format may be unrestricted in the sense that their distribution may be unlimited by Digital Rights Management[DRM] (, “Digital rights management,” .) or Intellectual Property Rights[IPR] (, “Intellectual property,” .) constraints. However, TEI documents are heterogeneous. Some parts of a document may be unrestricted, whereas other, such as editorial text and annotations may be subject to DRM restrictions.

The TEI format provides means for highly granular attribution, down to the content of individual XML elements. Software agents participating in the exchange or processing TEI may be required to honour markup of this kind. Even when there are no IPR constraints, intellectual property attribution alone requires that document users are able to tell the difference between content from different sources.



 TOC 

4.3.  Authenticity and confidentiality

Historical archival records are often encoded in TEI and legal document may be binding centuries after they were written. Digitization and encoding of legal texts may require technologies for assuring authenticity, such as cryptographic check sums and electronic signatures.

Similarly, historical documents may in part or in their entirety be confidential. This may be required by law or by the terms and conditions such as in the case of donated or deposited text from private sources. A text archive may need content filtering or cryptographic technologies to meet such requirements.



 TOC 

5.  IANA Considerations



 TOC 

5.1.  Registration of MIME type 'application/tei+xml'

MIME media type name: application

MIME subtype name: tei+xml

Required parameters: None

Optional parameters: charset

the parameter has identical semantics to the charset parameter of the "application/xml" media type as specified in RFC 3023 [RFC3023] (Murata, M., St. Laurent, S., and D. Kohn, “XML Media Types,” January 2001.).

Encoding considerations:

Identical to those for 'application/xml'. See RFC 3023 [RFC3023] (Murata, M., St. Laurent, S., and D. Kohn, “XML Media Types,” January 2001.), Section 3.2.

Security considerations:

See Security considerations (Security considerations) in this specification.

Interoperability considerations:

TEI documents are often given the extension '.xml', which is not uncommon for other XML document formats.

Published specification:

This media type registration is for TEI documents[TEI] (, “TEI Guidelines,” .) as described in here. TEI syntax is defined in a schema.[TEIschema] (, “Schema generated from ODD source,” .)

Applications which use this media type:

There are currently no known applications using the media type 'application/tei+xml'.

Additional information:

Magic number(s):

There is no single initial octet sequence that is always present in TEI documents.

file extension(s):

Common extensions are '.tei', '.teiCorpus' and '.odd'. See Recognizing TEI files (Recognizing TEI files) in this specification.

Macintosh File Type Code(s)

TEXT

Object Identifier(s) or OID(s)

Not applicable



 TOC 

6.  References



 TOC 

6.1. Normative References

[RFC3023] Murata, M., St. Laurent, S., and D. Kohn, “XML Media Types,” RFC 3023, January 2001 (TXT).
[RFC3470] Hollenbeck, S., Rose, M., and L. Masinter, “Guidelines for the Use of Extensible Markup Language (XML) within IETF Protocols,” BCP 70, RFC 3470, January 2003 (TXT, HTML, XML).
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifier (URI): Generic Syntax,” STD 66, RFC 3986, January 2005 (TXT, HTML, XML).
[RFC3987] Duerst, M. and M. Suignard, “Internationalized Resource Identifiers (IRIs),” RFC 3987, January 2005 (TXT).
[TEI] TEI Guidelines.”
[TEIschema] Schema generated from ODD source.”
[W3C.REC-xml-20081126] Maler, E., Yergeau, F., Sperberg-McQueen, C., Paoli, J., and T. Bray, “Extensible Markup Language (XML) 1.0 (Fifth Edition),” World Wide Web Consortium Recommendation REC-xml-20081126, November 2008 (HTML).
[W3C.REC-xml-names-20091208] Thompson, H., Hollander, D., Tobin, R., Bray, T., and A. Layman, “Namespaces in XML 1.0 (Third Edition),” World Wide Web Consortium Recommendation REC-xml-names-20091208, December 2009 (HTML).


 TOC 

6.2. Informative References

[DRM] Digital rights management.”
[IPR] Intellectual property.”
[ODD] Getting Started with P5 ODDs.”
[W3C.REC-xinclude-20061115] Orchard, D., Marsh, J., and D. Veillard, “XML Inclusions (XInclude) Version 1.0 (Second Edition),” World Wide Web Consortium Recommendation REC-xinclude-20061115, November 2006 (HTML).


 TOC 

Authors' Addresses

  Laurent Romary
  TEI Consortium and INRIA
 
Email:  laurent.romary@inria.fr
URI:  http://www.tei-c.org/
  
  Sigfrid Lundberg
  The Royal Library, Copenhagen
  Postbox 2149
  1016 København K
  Denmark
Email:  slu@kb.dk
URI:  http://sigfrid-lundberg.se/