SIPCLF G. Salgueiro Internet-Draft Cisco Systems Expires: January 1, 2011 V. Gurbani Bell Labs, Alcatel-Lucent A. B. Roach Tekelec June 30, 2010 Indexed File Format for the Session Initiation Protocol (SIP) Common Log Format (CLF) draft-salgueiro-sipclf-indexed-ascii-00 Abstract The SIPCLF Workgroup has defined a common log format framework for Session Initiation Protocol (SIP) servers. This common log format mimics the wildly successful event logging mechanism found in well- known web servers like Apache and web proxies like Squid. This document proposes an indexed text encoding format for the SIP Common Log Format (CLF) that retains the key advantages of a text-based format, while significantly increasing processing performance over a purely text-based implementation. This file format adheres to the SIP CLF data model and provides an effective encoding scheme for all mandatory and optional fields that appear in a SIP CLF record. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 1, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. Salgueiro, et al. Expires January 1, 2011 [Page 1] Internet-Draft Indexed File Format for SIP CLF June 2010 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Example Record . . . . . . . . . . . . . . . . . . . . . . . . 9 5. Text Tool Considerations . . . . . . . . . . . . . . . . . . . 9 6. Security Considerations . . . . . . . . . . . . . . . . . . . 10 7. Operational Guidance . . . . . . . . . . . . . . . . . . . . . 10 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 10.1. Normative References . . . . . . . . . . . . . . . . . . 10 10.2. Informative References . . . . . . . . . . . . . . . . . 10 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 Salgueiro, et al. Expires January 1, 2011 [Page 2] Internet-Draft Indexed File Format for SIP CLF June 2010 1. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119[1]. RFC3261[3] defines additional terms used in this document that are specific to the SIP domain such as "proxy"; "registrar"; "redirect server"; "user agent server" or "UAS"; "user agent client" or "UAC"; "back-to-back user agent" or "B2BUA"; "dialog"; "transaction"; "server transaction". This document uses the term "SIP Server" that is defined to include the following SIP entities: user agent server, registrar, redirect server, a SIP proxy in the role of user agent server, and a B2BUA in the role of a user agent server. 2. Introduction The extensive list of benefits and the widespread adoption of the Apache Common Log Format (CLF) has prompted the development of a functionally equivalent event logging mechanism for the Session Initiation Protocol (SIP). Implementing a logging scheme for SIP is a considerable challenge. This is due in part to the fact that the behavior of a SIP entity is more complex as compared to an HTTP entity. Additionally, there are shortcomings to the purely text- based HTTP Common Log Format that need to be addressed in order to allow for real-time inspection of SIP log files. Experience with Apache Common Log Format has shown that dealing with large quantities of log data can be very processor intensive, as doing so necessarily requires reading and parsing every byte in the log file(s) of interest. An implementation independent framework for the SIP CLF has been defined [2]. This memo describes an indexed text file format for logging SIP messages received and sent by SIP clients, servers, and proxies that adheres to the data model presented in Section 8 of draft-ietf-sipclf-problem-statement [2]. This document defines a format that is no more difficult to generate by logging entities, while being radically faster to process. In particular, the format is optimized for both rapidly scanning through log records, as well as quickly locating commonly accessed data fields. Further, the format proposed by this document retains the key advantage of being human readable and able to be processed using the various Unix text processing tools, such as sed, awk, perl, cut, and grep. Salgueiro, et al. Expires January 1, 2011 [Page 3] Internet-Draft Indexed File Format for SIP CLF June 2010 3. Format Each data record is encoded according to the following format. Note that indications of "hexadecimal encoded" indicate that the value is to be written out in human-readable base-16 numbers using the ASCII characters 0x30 through 0x39 ('0' through '9') and 0x41 through 0x46 ('A' through 'F'). Similarly, indications of "decimal encoded" indicate that the value is to be written out in human readable base-10 number using the ASCII characters 0x30 through 0x39 ('0' through '9'). In both encodings, numbers always take up the number of bytes indicated, and are padded on the left with ASCII '0' characters to fill the entire space. 0 7 8 15 16 23 24 31 +------------+------------+------------+------------+ | Version | Record Length | 0 - 3 +------------+------------+------------+------------+ | Record Length (cont) | 0x2C | 4 - 7 +------------+------------+------------+------------+ | Flags Field | 0x2C | 8 - 11 +------------+------------+------------+------------+ | CSeq Pointer (Hex) | 12 - 15 +------------+------------+------------+------------+ | CSeq Length (Hex) | 16 - 19 +------------+------------+------------+------------+ | R-URI Pointer (Hex) | 20 - 23 +------------+------------+------------+------------+ | R-URI Length (Hex) | 24 - 27 +------------+------------+------------+------------+ | Destination:port:xport Pointer (Hex) | 28 - 31 +------------+------------+------------+------------+ | Destination:port:xport Length (Hex) | 32 - 35 +------------+------------+------------+------------+ | Source:port:xport Pointer (Hex) | 36 - 39 +------------+------------+------------+------------+ | Source:port:xport Length (Hex) | 40 - 43 +------------+------------+------------+------------+ | To Pointer (Hex) | 44 - 47 +------------+------------+------------+------------+ | To Length (Hex) | 48 - 51 +------------+------------+------------+------------+ | From Pointer (Hex) | 52 - 55 +------------+------------+------------+------------+ | From Length (Hex) | 56 - 59 +------------+------------+------------+------------+ Salgueiro, et al. Expires January 1, 2011 [Page 4] Internet-Draft Indexed File Format for SIP CLF June 2010 | Call-Id Pointer (Hex) | 60 - 63 +------------+------------+------------+------------+ | Call-Id Length (Hex) | 64 - 67 +------------+------------+------------+------------+ | Status Pointer (Hex) | 68 - 71 +------------+------------+------------+------------+ | Status Length (Hex) | 72 - 75 +------------+------------+------------+------------+ | Server-Txn Pointer (Hex) | 76 - 79 +------------+------------+------------+------------+ | Server-Txn Length (Hex) | 80 - 83 +------------+------------+------------+------------+ | Client-Txn Pointer (Hex) | 84 - 87 +------------+------------+------------+------------+ | Client-Txn Length (Hex) | 88 - 91 +------------+------------+------------+------------+ | TLV Start Pointer (Hex) | 92 - 95 +------------+------------+------------+------------+ | 0x0A | | 96 - 99 +------------+ + | Timestamp | 100 - 103 + +------------+ | | 0x2E | 104 - 107 +------------+------------+------------+------------+ | Fractional Seconds | 0x09 | 108 - 111 +------------+------------+------------+------------+ | | | | | Mandatory Fields | | | | | +------------+------------+------------+------------+ \ | 0x09 | Tag (Hex) | \ +------------+------------+------------+------------+ \ | Tag (cont) | 0x2C | Length (Hex) | \ Repeated as +------------+------------+------------+------------+ > many times | Length (cont) | 0x2C | | / as necessary +------------+------------+------------+ + / | Value | / +------------+------------+------------+------------+ / | 0x0A | +------------+ First, an 96-byte header indicates meta-data about the record. Note that the field lengths encoded in the header do not include the ASCII tab characters used to separate fields from each other. Salgueiro, et al. Expires January 1, 2011 [Page 5] Internet-Draft Indexed File Format for SIP CLF June 2010 Version (1 byte): 0x41 for this document; hexadecimal encoded. Record Length (6 bytes): Hexadecimal encoded total length of this log record, including "Flags" and "Record Length" fields, and terminating line-feed Flags Field (3 bytes): byte 1 - Request/Response flag R = request r = response byte 2 - Retransmission flag o = original transmission d = duplicate transmission s = server is stateless [i.e., retransmissions are not detected] byte 3 - Sent/Received flag r = message received s = message sent Bytes 12 through 91 contain hexadecimal encoded pointer/length pairs that point to the values of variable-length mandatory fields. The "Pointer" fields indicate absolute byte values within the record, and must be >=101. They point to the start of the corresponding value within the "Mandatory Fields" area. Implementations MUST NOT interpret (or dereference) a pointer field whose length field value is 0. A length of 0 implies that the particular field is not present in the SIPCLF record. The final pointer, "TLV Start Pointer," points to the ASCII Tab (0x09) character for the first entry in the Tag/ Length/Value area; if no such entries are present, this value is set to zero. The "Length" fields indicate the length of the corresponding value. Note that the "Length" fields do not include the tab delimiters between fields. Further note that there are no delimiters between these pointer/length values -- they are packed together as a single, 84-character hexadecimal encoded string. Salgueiro, et al. Expires January 1, 2011 [Page 6] Internet-Draft Indexed File Format for SIP CLF June 2010 CSeq: The CSeq header field. R-URI: The Request-URI in the start line (mandatory in request), including any URI parameters. Destination:port:xport The DNS name or IP address of the downstream server, including the port number. The port number must be separated from the DNS name or IP address by a single ':'. The transport must be separated from the port by a single ':'. The allowable values for the transport are 'tcp', 'udp', 'sctp', and 'tls'. Source:port:xport The DNS name or IP address of the upstream client, including the port number and the transport over which the SIP message was received. The port number must be separated from the DNS name or IP address by a single ':'. The transport must be separated from the port by a single ':'. The allowable values for the transport are 'tcp', 'udp', 'sctp', and 'tls'. To: Value of the To URI header field, including the tag parameter (if present). From: Value of the From URI header field, including the tag parameter. Whilst one may question the value of the From URI in light of RFC4474 [5], the From URI, nonetheless, imparts some information. For one, the From tag is important and, in the case of a REGISTER request, the From URI can provide information on whether this was a third-party registration or a first-party one. Call-Id: The value of the Call-ID header field. Status: Set to the value of the SIP response status code for responses. Set to 0 for requests. Server-Txn: Server transaction identification code - the transaction identifier associated with the server transaction. Implementations can reuse the server transaction identifier (the topmost branch-id of the incoming request, with or without the magic cookie), or they could generate a unique identification string for a server transaction (this identifier needs to be locally unique to the server only.) This identifier is used to correlate ACKs and CANCELs to an INVITE transaction; it is also used to aid in forking. (See Section 9 of The Common Log Format (CLF) for the Session Initiation Protocol (SIP) [2] for usage.) Salgueiro, et al. Expires January 1, 2011 [Page 7] Internet-Draft Indexed File Format for SIP CLF June 2010 Client-Txn: Client transaction identification code - this field is used to associate client transactions with a server transaction for forking proxies or B2BUAs. Upon forking, implementations can reuse the value they inserted into the topmost Via header's branch parameter, or they can generate a unique identification string for the client transaction. (See Section 9 of The Common Log Format (CLF) for the Session Initiation Protocol (SIP) [2] for usage.) Following the pointer/length pairs, several fixed-length fields are encoded. As before, all fields are completely filled, pre-pending values with '0' characters as necessary. Timestamp (10 bytes): Date and time of the request or response represented as the number of seconds since the Unix epoch (i.e. seconds since midnight, January 1st, 1970, GMT). Decimal encoded. Fractional Seconds (6 bytes): Fractional seconds portion of the Timestamp field to millisecond accuracy. Decimal encoded. Mandatory Field Data: Contains actual values for the mandatory fields. This data must appear in the order listed, and each field must be present. Fields are separated by a single ASCII Tab character (0x09). Any tab characters present in the data to be written will be replaced by an ASCII space character (0x20) prior to being logged. After the "Mandatory Fields" section, Tag/Length/Value groups appear zero or more times. The location within the log record is indicated by the "TLV Start Pointer" field. They are used to log information that is not mandatory for all messages (although specific TLVs are mandatory in request logs). Tag Field (4 bytes): indicates the type of value coded by this TLV; hexadecimal encoded. Currently defined tags are: 0x0000 - Contact value (can be repeated) Contains entire value of Contact header field Salgueiro, et al. Expires January 1, 2011 [Page 8] Internet-Draft Indexed File Format for SIP CLF June 2010 0x0001 - Remote Host (mandatory in request) The DNS name of the IP address from which the message was received (if "Sent/Received flag" is set to "r"). The DNS name of the IP address to which the message is being sent (if "Sent/ Received flag" is set to "s") 0x0002 - Authenticated User Contains the user name by which the user has been authenticated 0x0003 - Complete SIP Message (optional, should be omitted by default) Contains complete SIP message. Can be repeated multiple times to accommodate SIP messages that exceed 65535 bytes in length. Length Field (2 bytes): indicates the length of the value coded in this TLV, hexadecimal encoded. This length does NOT include the TLV header. Value Field (0 to 65535 bytes): contains the actual value of this TLV. As with the mandatory fields, ASCII Tab characters (0x09) are replaced with ASCII space characters (0x20). 4. Example Record NOTE: To be filled in later 5. Text Tool Considerations This format has been designed to allow text tools to easily process logs without needing to understand the indexing format. Index lines may be rapidly discarded by checking the first character of the line: index lines will always start with an alphabetical character, while field lines will start with a numerical character. Within a field line, script tools can quickly split fields at the tab characters. The first 12 fields are positional, and the meaning of any subsequent fields can be determined by checking the first four characters of the field. Alternately, these non-positional fields can be located using a regular expression. For example, the "Contact value" in a request can be found by searching for the perl regex /\t0000,....,([^\t]*)/. Note also that requests can be distinguished from responses by Salgueiro, et al. Expires January 1, 2011 [Page 9] Internet-Draft Indexed File Format for SIP CLF June 2010 checking the third positional field -- for requests, it will always be set to "000"; any other value indicates a response. 6. Security Considerations This document does not introduce any new security considerations beyond those in draft-ietf-sipclf-problem-statement [2]. 7. Operational Guidance SIP CLF log files will take up substantive amount of disk space depending on traffic volume at a processing entity and the amount of information being logged. As such, any enterprise using SIP CLF should establish operational procedures for file rollovers as appropriate to the needs of the organization. Listing such operational guidelines in this document is out of scope for this work. 8. IANA Considerations This document does not require any considerations from IANA. 9. Acknowledgements TBD 10. References 10.1. Normative References [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [2] Gurbani, V., Burger, E., Anjali, T., Abdelnur, H., and O. Festor, "The Common Log Format (CLF) for the Session Initiation Protocol (SIP)", draft-ietf-sipclf-problem-statement-02 (work in progress), June 2010. 10.2. Informative References [3] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Salgueiro, et al. Expires January 1, 2011 [Page 10] Internet-Draft Indexed File Format for SIP CLF June 2010 Session Initiation Protocol", RFC 3261, June 2002. [4] Gurbani, V., Burger, E., Anjali, T., Abdelnur, H., and O. Festor, "The Common Log File (CLF) format for the Session Initiation Protocol (SIP)", draft-gurbani-sipping-clf-01 (work in progress), March 2009. [5] Peterson, J. and C. Jennings, "Enhancements for Authenticated Identity Management in the Session Initiation Protocol (SIP)", RFC 4474, August 2006. Authors' Addresses Gonzalo Salgueiro Cisco Systems 7200-12 Kit Creek Road Research Triangle Park, NC 27709 US Email: gsalguei@cisco.com Vijay Gurbani Bell Labs, Alcatel-Lucent 1960 Lucent Lane Rm 9C-533 Naperville, IL 60563 US Email: vkg@bell-labs.com Adam Roach Tekelec 17210 Campbell Rd. Suite 250 Dallas, TX 75252 US Email: adam@nostrum.com Salgueiro, et al. Expires January 1, 2011 [Page 11]