Network Working Group Y. Shafranovich Internet-Draft SolidMatrix Technologies, Inc. Expires: August 6, 2005 February 2, 2005 MIME Type for CSV Files draft-shafranovich-mime-csv-00.txt Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 3 of RFC 3667. By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she become aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on August 6, 2005. Copyright Notice Copyright (C) The Internet Society (2005). Abstract This document defines MIME types "text/csv" and "text/comma-separated-values" which used for Comma-Separated Values (CSV) files. Shafranovich Expires August 6, 2005 [Page 1] Internet-Draft MIME Type for CSV Files February 2005 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. MIME Type Registration of text/csv and text/comma-separated-values . . . . . . . . . . . . . . . . . 3 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 4 4. Security Considerations . . . . . . . . . . . . . . . . . . . 5 5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5.1 Normative References . . . . . . . . . . . . . . . . . . . 5 5.2 Informative References . . . . . . . . . . . . . . . . . . 5 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 5 A. Appendix A - Discussion of the CSV format . . . . . . . . . . 6 Intellectual Property and Copyright Statements . . . . . . . . 8 Shafranovich Expires August 6, 2005 [Page 2] Internet-Draft MIME Type for CSV Files February 2005 1. Introduction The comma separated values format (CSV) has been used for exchanging and converting data between various spreadsheet programs for quite some time. Surprisingly, while this file is very common it has never been formally documented. Additionally, while the IANA MIME registration tree includes a registraton for "text/tab-separated-values" type, no MIME types have ever been registered with IANA for CSV. At the same time, various programs and operating systems have begun to use different MIME types for this format, many of which vary from system to system. This document seeks to formally register two MIME types for CSV in accordance with RFC 2048 [4]. 2. MIME Type Registration of text/csv and text/comma-separated-values This section provides the media-type registration application (as per RFC 2048 [4], which will be submitted to IANA after IESG approval of this document. To: ietf-types@iana.org Subject: Registration of MIME media types text/csv and text/comma-separated-values MIME media type name: text MIME subtype name: csv, comma-separated-values Required parameters: none Optional parameters: charset Common usage of CSV is US-ASCII, but other character sets as defined by IANA for the "text" tree may be used. Encoding considerations: While section 4.1.1. of RFC 2046 [1] stipulates that "text" subtypes MUST use a CRLF sequence as a line break, in practice that is not always true for CSV. Therefore, implementors should be aware that either CR or CRLF maybe used as a line break for this format. Security considerations: CSV files contain passive text data which should not pose any risks. However, it is possible in theory that malicious binary Shafranovich Expires August 6, 2005 [Page 3] Internet-Draft MIME Type for CSV Files February 2005 data maybe included in order to exploit potential buffer overruns in the program processing CSV data. Additionally, private data maybe shared via this format which of course applies to any text data. Interoperability considerations: Due to lack of a single specification there are considerable differences among different implementations as described in appendix A. The most common difference among various format is whether double quotes (") are used to enclose strings. Implementors should "be conservative in what you do, be liberal in what you accept from others" (RFC 793 [2]) when processing CSV files. Published specification: While numerous private specifications exist for various programs and systems, there is no single "master" specification for this format. A sampling of formats and discussion of differences is included in appendix A. Applications which use this media type: Spreadsheet programs and various data conversion utilities Additional information: Magic number(s): none File extension(s): CSV Macintosh File Type Code(s): TEXT Person & email address to contact for further information: Yakov Shafranovich Intended usage: COMMON Author/Change controller: IESG 3. IANA Considerations After IESG approval, IANA is expected to register these two types "text/csv" and "text/comma-separated-values" using the application provided in this document. Shafranovich Expires August 6, 2005 [Page 4] Internet-Draft MIME Type for CSV Files February 2005 4. Security Considerations See discussion above 5. References 5.1 Normative References [1] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996. [2] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [3] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997. 5.2 Informative References [4] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures", BCP 13, RFC 2048, November 1996. [5] Repici, J., "HOW-TO: The Comma Separated Value (CSV) File Format", 2004, . [6] Edoceo, Inc., "CSV Standard File Format", 2004, . [7] Rodger, R. and O. Shanaghy, "Documentation for Ricebridge CSV Manager", February 2005, . [8] Raymond, E., "The Art of Unix Programming, Chapter 5", September 2003, . Author's Address Yakov Shafranovich SolidMatrix Technologies, Inc. Email: ietf@shaftek.org URI: http://www.shaftek.org Shafranovich Expires August 6, 2005 [Page 5] Internet-Draft MIME Type for CSV Files February 2005 Appendix A. Appendix A - Discussion of the CSV format While there are various specifications and implementations for the CSV format (for ex. [5], [6], [7] and [8]), no formal specification exists. This causes a wide variety of interpretations for CSV files. While this document does not seek to document the CSV format, nevertheless we want to document the format that seems to be followed by most implementations: 1. Each record is located on a separate line delimited by a line break (either CR or CR/LF). For example: aaa,bbb,ccc CRLF zzz,yyy,xxx CRLF 2. The last record in the file may or may not have an ending linebreak. For example: aaa,bbb,ccc CRLF zzz,yyy,xxx 3. There maybe an optional header line appearing as the first line of the file with the same format as normal record lines. This header will contain names corresponding to the fields in the file and will usually contain the same number of fields as the records in the rest of the file. For example: field_name,field_name,field_name CRLF aaa,bbb,ccc CRLF zzz,yyy,xxx CRLF 4. Within the header and each record there may be one or more fields, delimited by commas. The last field in the record may or may not be followed by a comma. For example: aaa,bbb,ccc 5. Each field may or may not be enclosed in double quotes, however some programs such as Microsoft Excel do not use double quotes at all. For example: "aaa","bbb","ccc" CRLF zzz,yyy,xxx 6. Field containing line breaks (CR or CR/LF) and commas should be enclosed in double-quotes. For example: "aaa","b CRLF Shafranovich Expires August 6, 2005 [Page 6] Internet-Draft MIME Type for CSV Files February 2005 bb","ccc" CRLF zzz,yyy,xxx 7. If double-quotes are used to enclosed fields, then double-quotes inside fields must be surounded by double quotes. For example: "aaa","b"""bb","ccc" 8. Whitespace immediately before and after commas maybe removed unless it appears inside double-quotes. For example: zzz, yyy , xxx would be processed as if it was: zzz,yyy,xxx The ABNF grammar [3] appears as follows: COMMA = %x2C file = [header] *record end-of-field = COMMA / (CR / CRLF) header = *(*WSP field *WSP end-of-field) record = *(*WSP field *WSP end-of-field) field = escaped / non-escaped escaped = DQUOTE *(VCHAR / CR / CRLF / 3*DQUOTE) DQUOTE non-escaped = *VCHAR Shafranovich Expires August 6, 2005 [Page 7] Internet-Draft MIME Type for CSV Files February 2005 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Shafranovich Expires August 6, 2005 [Page 8]