IPS Julian Satran Internet Draft Daniel Smith Document: draft-ietf-ips-iscsi-02.txt Kalman Meth Category: standards-track IBM Constantin Sapuntzakis Cisco Systems Matt Wakeley Agilent Technologies Paul Von Stamwitz Adaptec Randy Haagens Hewlett-Packard Co. Efri Zeidner SANGate Luciano Dalle Ore Quantum Yaron Klein SANRAD iSCSI Julian Satran Standards-Track, Expire June 2001 1 iSCSI December 30, 2000 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or made obsolete by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract The Small Computer Systems Interface (SCSI) is a popular family of protocols for communicating with I/O devices, especially storage devices. This memo describes a transport protocol for SCSI that operates on top of TCP. The iSCSI protocol aims to be fully compliant with the requirements laid out in the SCSI Architecture Model - 2 [SAM2] document. Acknowledgements Besides the authors a large group of people contributed through their review, comments and valuable insights to the creation of this document - too many to mention them all. Nevertheless, we are grateful to all of them. We are especially grateful to those that found the time and patience to participate in our weekly phone conferences and intermediate meetings in Almaden and Haifa and thus helped shape this document: Jim Hafner, John Hufferd, Prasenjit Sarkar, Meir Toledano, John Dowdy, Steve Legg, Alain Azagury (IBM), Dave Nagle (CMU), David Black (EMC), John Matze (Veritas), Mark Bakke, Steve DeGroote, Mark Shrandt (NuSpeed), Gabi Hecht (Gadzoox), Robert Snively (Brocade), Nelson Nachum (StorAge). Many more helped clean and improve this document within the IPS working group. We are especially grateful to David Robinson (Sun), Charles Monia, Joshua Tseng (Nishan), Somesh Gupta, Mallikarjun C., Michael Krause (HP), Stephen Byan (Genroco), Yaron Klein (SANRAD). And last but not least Satran, J. Standards-Track, June 2001 2 iSCSI December 30, 2000 thanks Ralph Weber for keeping us in-line with T10 (SCSI) standardization. Conventions used in this document In examples, "I->" and "T->" indicate iSCSI PDUs sent by the initiator and target respectively. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119. Satran, J. Standards-Track, June 2001 3 iSCSI December 30, 2000 Table of Contents Status of this Memo...................................................2 Abstract..............................................................2 Acknowledgements......................................................2 Conventions used in this document.....................................3 1. Overview...........................................................8 1.1 SCSI Concepts...................................................8 1.2 iSCSI Concepts & Functional Overview...........................9 1.2.1 Layers & Sessions............................................9 1.2.2 Ordering and iSCSI numbering.................................9 1.2.2.1 Command numbering........................................10 1.2.2.2 Response/Status numbering................................11 1.2.2.3 Data PDU numbering.......................................12 1.2.3 iSCSI Login.................................................12 1.2.4 Text mode negotiation.......................................13 1.2.5 iSCSI Full Feature Phase....................................14 1.2.6 iSCSI Connection Termination................................16 1.2.7 Naming & mapping............................................16 1.2.8 Message Framing.............................................18 1.2.8.1 Framing Justification....................................18 1.2.8.2 Markers At Fixed Intervals...............................20 1.2.8.3 iSCSI PDU Size...........................................20 1.2.8.4 Initial marker-less interval.............................20 2. iSCSI PDU Formats.................................................21 2.1 Template Header and Opcodes....................................21 2.1.1 Opcode......................................................22 2.1.2 Opcode-specific fields......................................23 2.1.3 Length......................................................23 2.1.4 LUN.........................................................23 2.1.5 Initiator Task Tag..........................................23 2.1.6 Header Digest and Data Digest...............................23 2.2 SCSI Command...................................................25 2.2.1 Flags & Task Attributes.....................................25 2.2.2 AddCDB......................................................26 2.2.3 CmdRN - Command Reference Number............................26 2.2.4 ExpStatRN - Expected Status Reference Number................26 2.2.5 Expected Data Transfer Length...............................26 2.2.6 CDB - SCSI Command Descriptor Block.........................27 2.2.7 Command-Data................................................27 2.3 SCSI Response..................................................28 2.3.1 Byte 1 - Flags..............................................28 2.3.2 Basic Residual Count........................................29 2.3.3 Bidi-Read Residual Count....................................29 2.3.4 Command Status..............................................29 Satran, J. Standards-Track, June 2001 4 iSCSI December 30, 2000 2.3.5 Resp_length - Response length...............................29 2.3.6 Sense_length - Length of sense data.........................29 2.3.7 Response and/or Sense Data..................................29 2.3.8 StatRN - Status Reference Number............................30 2.3.9 ExpCmdRN - next expected CmdRN from this initiator..........30 2.3.10 MaxCmdRN - maximum CmdRN acceptable from this initiator....30 2.4 SCSI Task Management Command...................................31 2.4.1 Function....................................................31 2.4.2 Referenced Task Tag.........................................32 2.5 SCSI Task Management Response..................................33 2.5.1 Referenced Task Tag.........................................34 2.6 SCSI Data......................................................35 2.6.1 F (Final) bit...............................................36 2.6.2 Length......................................................36 2.6.3 Target Task Tag.............................................37 2.6.4 Buffer Offset...............................................37 2.6.5 Flags.......................................................37 2.6.6 Data numbering (DataRN).....................................37 2.7 Text Command...................................................39 2.7.1 Length......................................................39 2.7.2 Initiator Task Tag..........................................39 2.7.3 Text........................................................39 2.8 Text Response..................................................41 2.8.1 Length......................................................41 2.8.2 Initiator Task Tag..........................................41 2.8.3 Text Response...............................................41 2.9 Login Command..................................................43 2.9.1 Version-major and Version-minor.............................43 2.9.2 CID.........................................................43 2.9.3 InitCmdRN...................................................43 2.9.4 Login Parameters............................................44 2.10 Login Response................................................45 2.10.1 Version-major minor........................................45 2.10.2 InitStatRN.................................................45 2.10.3 Status.....................................................46 2.10.4 TSID.......................................................46 2.10.5 Final bit..................................................46 2.11 NOP-Out.......................................................47 2.11.1 P - Ping bit...............................................48 2.11.2 Length.....................................................48 2.11.3 Initiator Task Tag.........................................48 2.11.4 Target Task Tag............................................48 2.11.5 Ping Data..................................................48 2.12 NOP-In........................................................49 2.12.1 Target Task Tag............................................49 2.13 Logout Command................................................50 Satran, J. Standards-Track, June 2001 5 iSCSI December 30, 2000 2.13.1 CID........................................................50 2.13.2 Reason Code................................................50 2.14 Logout Response...............................................51 2.14.1 Status.....................................................51 2.15 Ready To Transfer (R2T).......................................52 2.15.1 Desired Data Transfer Length and Buffer Offset.............53 2.15.2 Target Transfer Tag........................................53 2.16 Asynchronous Event............................................54 2.16.1 iSCSI Event................................................54 2.16.2 SCSI Event Indicator.......................................55 2.17 Third Party Commands..........................................56 2.18 Reject........................................................57 2.19 Reason........................................................57 3. Login phase.......................................................58 3.1 Login phase start..............................................58 3.2 Security negotiation...........................................59 3.3 iSCSI Security.................................................59 4. iSCSI Error Handling and Recovery.................................61 4.1 Connection failure.............................................61 4.2 Protocol Errors................................................62 4.3 Session Errors.................................................62 4.4 Format errors..................................................62 4.5 Digest errors..................................................63 5. Notes to Implementers.............................................64 5.1 Multiple Network Adapters......................................64 5.2 Autosense......................................................64 6. Security Considerations...........................................65 6.1 Data Integrity.................................................65 6.2 Network operations and the Threat Model........................65 6.2.1 Threat Model................................................65 6.2.1.1 Passive Attacks..........................................65 6.2.1.2 Active Attacks...........................................66 6.2.2 Security Model..............................................66 6.2.2.1 No Security..............................................66 6.2.2.2 End-to-End Authentication................................66 6.2.2.3 iSCSI integrity and authentication.......................66 6.2.2.4 Encryption...............................................67 6.2.3 Other Considerations........................................67 6.3 Login Process..................................................67 6.4 Feasibility....................................................67 7. IANA Considerations...............................................69 8. References and Bibliography.......................................70 9. Author's Addresses................................................72 Apendix A. iSCSI Security............................................75 01 Security keys and values..........................................75 02 Authentication....................................................77 Satran, J. Standards-Track, June 2001 6 iSCSI December 30, 2000 03 Salt..............................................................78 04 Challenge.........................................................78 05 Login Phase examples:.............................................78 Apendix B. Examples..................................................82 06 Read operation example............................................82 07 Write operation example...........................................83 Apendix C. Login/Text keys (not security related)....................84 08 MaxConnections....................................................84 09 Target............................................................84 10 Initiator.........................................................84 11 AccessID..........................................................84 12 UPFrame.................................Error! Bookmark not defined. 13 UseR2T............................................................86 14 BidiUseR2T........................................................86 15 DataNumber........................................................86 16 ImmediateDataLength...............................................87 17 ITagLength........................................................87 18 PingMaxReplyLength................................................87 19 StartSecure.......................................................87 20 TotalText.........................................................87 21 KeyValueText......................................................87 22 MaxOutstandingR2T.................................................88 Full Copyright Statement.............................................89 Satran, J. Standards-Track, June 2001 7 iSCSI December 30, 2000 1. Overview 1.1 SCSI Concepts The SCSI Architecture Model-2 [SAM2] describes in detail the architecture of the SCSI family of I/O protocols. This section provides a brief background to situate readers in the vocabulary of the SCSI architecture. At the highest level, SCSI is a family of interfaces for requesting services from I/O devices, including hard drives, tape drives, CD and DVD drives, printers, and scanners. In SCSI parlance, an individual I/O device is called a "logical unit" (LU). SCSI is a client-server architecture. Clients of a SCSI interface are called "initiators". Initiators issue SCSI "commands" to request service from a logical unit. The "device server" on the logical unit accepts SCSI commands and executes them. A "SCSI transport" maps the client-server SCSI protocol to a specific interconnect. Initiators are one endpoint of a SCSI transport. The “target” is the other endpoint. A “target” can have multiple Logical Units (LUs) behind it. Each logical unit has an address within a target called a Logical Unit Number (LUN). A SCSI task is a SCSI command or possibly a linked set of SCSI commands. Some LUs support multiple pending (queued) tasks. The queue of tasks is managed by the target, though. The target uses an initiator provided "task tag" to distinguish between tasks. Only one command in a task can be outstanding at any given time. Each SCSI command results in an optional data phase and a required response phase. In the data phase, information can travel from the initiator to target (e.g. WRITE), target to initiator (e.g. READ), or in both directions. In the response phase, the target returns the final status of the operation, including any errors. A response terminates a SCSI command. For performance reasons iSCSI allows "phase-binding" - e.g., command and its associated data may be shipped together from initiator to target and data and responses may be shipped together from targets. Command Data Blocks (CDB) are the data structures used to contain the command parameters to be handed by an initiator to a target. The CDB content and structure is defined by [SAM] and device-type specific SCSI standards. Satran, J. Standards-Track, June 2001 8 iSCSI December 30, 2000 1.2 iSCSI Concepts & Functional Overview The iSCSI protocol is a mapping of the SCSI remote procedure invocation model on top of the TCP protocol. In keeping with similar protocols, the initiator and target divide their communications into messages. This document will use the term "iSCSI protocol data unit" (iSCSI PDU) for these messages. iSCSI transfer direction is defined with regard to the initiator. Outbound or outgoing transfers are transfers from initiator to target while inbound or incoming transfers are from target to initiator. 1.2.1 Layers & Sessions The following conceptual layering model is used in this document to specify initiator and target actions and how those relate to transmitted and received Protocol Data Units: -the SCSI layer builds/receives SCSI CDB (Command Data Blocks) and relays/receives them with the remaining command execute parameters (cf. SAM-2) to/from the -the iSCSI layer that builds/receives iSCSI PDUs and relays/receives them to/from - one or more TCP connections that form an initiator-target "session". Communication between initiator and target occurs over one or more TCP connections. The TCP connections carry control messages, SCSI commands, parameters and data within iSCSI Protocol Data Units (iSCSI PDUs). The group of TCP connections linking an initiator with a target form a session (loosely equivalent to a SCSI I-T nexus). A session is defined by a session ID (composed of an initiator part and a target part). TCP connections can be added and removed from a session. Connections within a session are identified by a connection ID (CID). Across all connections within a session, an initiator will see one "target image". All target identifying elements, like LUN are the same. In addition, across all connections within a session a target will see one "initiator image". Initiator identifying elements like Initiator Task Tag can be used to identify the same entity regardless of the connection on which they are sent or received. iSCSI targets and initiators MUST support at least one TCP connection and MAY support several connections in a session. 1.2.2 Ordering and iSCSI numbering Satran, J. Standards-Track, June 2001 9 iSCSI December 30, 2000 iSCSI uses Command, Status and Data numbering schemes. Command numbering is session wide and is used for ordered command delivery over multiple connections. It can also be used as a mechanism for command flow control over a session. Status numbering is per connection and is used to enable recovery in case of connection failure. Data numbering is per command and is meant to reduce the amount of memory needed by a target sending unrecoverable data for command retry. Normally, fields in the iSCSI PDUs communicate the reference numbers between the initiator and target. During periods when traffic on a connection is unidirectional, iSCSI NOP-message PDUs may be utilized to synchronize the command and status ordering counters of the target and initiator. iSCSI NOP-Out PDUs are used as acknowledgements for data numbering. 1.2.2.1 Command numbering iSCSI supports ordered command delivery within a session. All commands (initiator-to-target) are numbered. Any SCSI activity is related to a task (SAM-2). The task is identified by the Initiator Task Tag for the life of the task. Commands in transit from the initiator SCSI layer to the target SCSI layer are numbered by iSCSI and the number is carried by the iSCSI PDU as CmdRN (Command-Reference-Number). The numbering is session- wide. All iSCSI PDUs that have a task association carry this number. CmdRNs are allocated by the initiator iSCSI within a 32 bit unsigned counter (modulo 2**32). The value 0 is reserved and used to mean immediate delivery. Comparisons and arithmetic on CmdRN SHOULD use Serial Number Arithmetic as defined in [RFC1982] where SERIAL_BITS = 32. The target may choose to deliver some task management commands for immediate delivery. The means by which the SCSI layer may request immediate delivery for a command or by which iSCSI will decide by itself to mark a PDU for immediate delivery are outside the scope of this document. Satran, J. Standards-Track, June 2001 10 iSCSI December 30, 2000 CmdRNs are significant only during command delivery to the target. Once the device serving part of the target SCSI has received a command, CmdRN ceases to be significant. During command delivery to the target, the allocated numbers are unique session wide. The target iSCSI layer SHOULD deliver the commands to the target SCSI layer in the order specified by CmdRN. The initiator and target are assumed to have three counters that define the allocation mechanism - CmdRN - the current command reference number advanced by 1 on each command shipped - ExpCmdRN - the next expected command by the target - acknowledges all commands up to it - MaxCmdRN - the maximum number to be shipped - MaxCmdRN - ExpCmdRN defines the queuing capacity of the receiving iSCSI layer. The target SHOULD NOT transmit a MaxCmdRN that is more than 2**31 - 1 above the last ExpCmdRN. CmdRN can take any value from ExpCmdRN to MaxCmdRN except 0. The target MUST silently ignore any command outside this range or duplicates within the range not flagged with the retry bit (the X bit in the opcode). The target and initiator counters MUST uphold causal ordering. iSCSI initiators MUST implement the command numbering scheme if they support more than one connection per session (as even sessions with a single connection may be expanded beyond one connection). Command numbering for sessions that will only be made up of one connection is optional. iSCSI initiators utilizing a single connection for a session and not utilizing command numbering MUST indicate that they will not support command numbering by setting InitCmdRN to 0 in the Login command. Whenever an initiator indicates support for command numbering, by setting InitCmdRN to a non-zero value at Login, the target MUST provide ExpCmdRN and MaxCmdRN values that will enable the initiator to make progress. 1.2.2.2 Response/Status numbering Responses in transit from the target to the initiator are numbered. The StatRN (Status Reference Number) is used for this purpose. StatRN Satran, J. Standards-Track, June 2001 11 iSCSI December 30, 2000 is a counter maintained per connection. ExpStatRN is used by the initiator to acknowledge status. To enable command recovery the target MAY maintain enough state to enable data and status recovery after a connection failure. A target can discard all the state information maintained for recovery after the status delivery is acknowledged through ExpStatRN. A large difference between StatRN and ExpStatRN may indicate a failed connection. Initiators and Targets MUST support the response-numbering scheme regardless of the support for command recovery. 1.2.2.3 Data PDU numbering Incoming Data PDUs MAY be numbered by a target to enable fast recovery of long running READ commands. Data PDUs are numbered with DataRN. NOP-Out PDUs carrying the same Initiator Tag as the Data PDUs are used to acknowledge the incoming Data PDUs with ExpDataRN. Support for Data PDU acknowledgement and the maximum number of unacknowledged data PDUs are negotiated at login. In a PDU carrying both data and status, the field is used for StatRN and the last set of data blocks is implicitly acknowledged when Status is acknowledged. 1.2.3 iSCSI Login The purpose of iSCSI login is to enable a TCP connection for iSCSI use, authenticate the parties, negotiate the session's parameters, open a security association protocol and mark the connection as belonging to an iSCSI session. A session is used to identify to a target all the connections with a given initiator that belong to the same I_T nexus. If an initiator and target are connected through more than one session each of the initiator and target perceives the other as a different entity on each session (a different I_T nexus in SAM-2 parlance). The targets listen on a well-known TCP port for incoming connections. The initiator begins the login process by connecting to that well- known TCP port. As part of the login process, the initiator and target MAY wish to authenticate each other and set a security association protocol for Satran, J. Standards-Track, June 2001 12 iSCSI December 30, 2000 the session. This can occur in many different ways and is subject to negotiation. Negotiation and security associations executed before the Login Command are outside the scope of this document although they might realize a related function (e.g., establish a IPsec or TLS session). The Login Command starts the iSCSI Login Phase. Within the Login Phase, negotiation is carried on through parameters of the Login Command and Response and optionally through intervening Text Commands and Responses. The Login Response concludes the Login Phase. Once suitable authentication has occurred, the target MAY authorize the initiator to send SCSI commands. How the target chooses to authorize an initiator is beyond the scope of this document. The target indicates a successful authentication and authorization by sending a login response with "accept login". Otherwise, it sends a response with a "login reject", indicating a session is not established. It is expected that iSCSI parameters will be negotiated after the security association protocol is established if there is a security association. The login message includes a session ID - composed with an initiator part ISID and a target part TSID. For a new session, the TSID is null. As part of the response, the target will generate a TSID. Session specific parameters can be specified only for the first login of a session (TSID null)(e.g., the maximum number of connections that can be used for this session). Connection specific parameters (if any) can be specified for any login. Thus, a session is operational once it has at least one connection. Any message except login and text sent on a TCP connection before this connection gets into full feature phase at the initiator SHOULD be ignored by the initiator. Any message except login and text reaching a target on a TCP connection before the full feature phase MUST be silently ignored by the target. 1.2.4 Text mode negotiation During login and thereafter some session or connection parameters are negotiated through an exchange of textual information. In "list" negotiation, the offering party will send a list of values for a key in its order of preference. Satran, J. Standards-Track, June 2001 13 iSCSI December 30, 2000 The responding party will answer with a value from the list. The value "none" MUST always be used to indicate a missing function. However, none is a valid selection only if it was explicitly offered and it MAY be selected by omission (i.e. :none MAY be omitted). The general format is: Offer-> :(,,...,) Answer-> : In "numerical" negotiations, the offering and responding party state a numerical value. The result of the negotiation is key dependent (usually the lower or the higher of the two values). 1.2.5 iSCSI Full Feature Phase Once the initiator is authorized to do so, the iSCSI session is in iSCSI full feature phase. The initiator may send SCSI commands and data to the various LUs on the target by wrapping them in iSCSI messages that go over the established iSCSI session. For SCSI commands that require data and/or parameter transfer, the (optional) data and the status for a command must be sent over the same TCP connection that was used to deliver the SCSI command (we call this "connection allegiance"). Thus if an initiator issues a READ command, the target must send the requested data, if any, followed by the status to the initiator over the same TCP connection that was used to deliver the SCSI command. If an initiator issues a WRITE command, the initiator must send the data, if any, for that command and the target MUST return R2T, if any, an the status over the same TCP connection that was used to deliver the SCSI command. However consecutive commands that are part of a SCSI linked commands task MAY use different connections - connection allegiance is strictly per-command and not per-task. During iSCSI Full Feature Phase, the initiator and target MAY interleave unrelated SCSI commands, their SCSI Data and responses, over the session. Outgoing SCSI data (initiator to target - user data or command parameters) will be sent as either solicited data or unsolicited data. Solicited data are sent in response to Ready To Transfer (R2T) PDUs. Unsolicited data can be part of an iSCSI command PDU ("immediate data") or an iSCSI data PDU. An initiator may send unsolicited data (immediate or in a separate PDU) up to the SCSI limit (initial burst size - mode page 02h). All subsequent data have to be solicited. Satran, J. Standards-Track, June 2001 14 iSCSI December 30, 2000 Targets operate in either solicited (R2T) data mode or unsolicited (non R2T) data mode. An initiator MUST always honor an R2T data request for a valid outstanding command (i.e., carrying a valid Initiator Task Tag) and provided the command is supposed to deliver outgoing data and the R2T specifies data within the command bounds. It is considered an error for an initiator to send unsolicited data PDUs to a target operating in R2T mode (only solicited data). It is also an error for an initiator to send more data whether immediate or as a separate PDU) than the SCSI limit for initial burst. An initiator MAY request, at login, to send immediate data blocks of any size. If the initiator requests a specific block size the target MUST indicate the size of immediate data blocks it is ready to accept in its response. Beside iSCSI, SCSI also imposes a limit on the amount of unsolicited data a target is willing to accept. The iSCSI immediate data limit MUST not exceed the SCSI limit. A target SHOULD NOT silently discard data and request retransmission through R2T. Initiators MUST NOT perform any score boarding for data and the residual count calculation is to be performed by the targets. Incoming data is always implicitly solicited. SCSI Data packets are matched to their corresponding SCSI commands by using Tags that are specified in the protocol. Initiator tags for pending commands are unique initiator-wide for a session. Target tags are not strictly specified by the protocol - it is assumed that those will be used by the target to tag (alone or in combination with the LUN) the solicited data. Target tags are generated by the target and "echoed" by the initiator. The above mechanisms are designed to accomplish efficient data delivery and a large degree of control over the data flow. iSCSI initiators and targets MUST also enforce some ordering rules to achieve deadlock-free operation. Unsolicited data MUST be sent on every connection in the same order in which commands were sent. If the amount of data exceeds the amount allowed for unsolicited write data, the specific connection MUST be stalled - i.e., no more unsolicited data will not be on this connection until the specific command has finished sending all its data and has received a response. However new commands can be sent on the connection. A target receiving data out of order or observing a connection violating the above rules MUST terminate the session. Each iSCSI session to a target is treated as if it originated from a different and logically independent initiator. Satran, J. Standards-Track, June 2001 15 iSCSI December 30, 2000 1.2.6 iSCSI Connection Termination Connection termination is assumed an exceptional event. Graceful TCP connection shutdowns are done by sending TCP FINs. Graceful connection shutdowns MUST only occur when there are no outstanding tasks that have allegiance to the connection. A target SHOULD respond rapidly to a FIN from the initiator by closing it's half of the connection after waiting for all outstanding tasks that have allegiance to the connection to conclude and send their status. Connection termination with outstanding tasks may require recovery actions. Connection termination is also required as prelude to recovery. By terminating a connection before starting recovery, initiator and target can avoid having stale PDUs being received after recovery. In this case, the initiator will send a LOGOUT request on any of the operational connections of a session indicating what connection should be terminated. 1.2.7 Naming & mapping Text string names are used in iSCSI to: - provide explicitly a transportID for the target to enable the latter to recognize the initiator because the conventional IP- address and port pair is inaccurate behind firewalls and NAT devices (key - initiator) - provide a targetID for simple configurations hiding several targets behind an IP-address and port (key - target) - provide a symbolic address for source and destination targets in third party commands; those will be mapped into SCSI addresses by a SCSI aliasing mechanism The targetID MUST be presented within the login phase. The names do not require handling within iSCSI - i.e. are opaque entities within this document. In order to enable implementers to relate them to other names and name handling mechanisms the following syntax for names SHOULD be used [/modifier] Where domain-name follows DNS (or dotted IP) rules and the modifier is an alphanumeric string (N.B. the whole pattern follows the URL structure) Satran, J. Standards-Track, June 2001 16 iSCSI December 30, 2000 Some mapped names for third party command use might have to include a port number. For those the following syntax SHOULD be used: [:[port][/modifier] The text to address transformation, wherever needed, will be performed through available name translation services (DNS servers, LDAP accessible directories etc.). To enable simple devices to operate without name-to-address conversion services the following conventions SHOULD be used: A domain name that contains exactly four numbers separated by dots (.), where each number is in the range 0 through 255, will be interpreted as an IPv4 address. A domain name that contains more than four, but at most 16 numbers separated by dots (.), where each number is in the range 0 through 255, will be interpreted as an Ipv6 address. Examples of IPv4 addresses/names: 10.0.0.1/diskfarm1 10.0.0.2 Examples of IPv6 addresses/names 12.5.7.10.0.0.1/tapefarm1 12.5.6.10.0.0.2 For management/support tools as well as naming services that use a text prefix to express the protocol intended (as in http:// or ftp://) the following form MAY be used: iSCSI://[:port][/modifier] Examples: iSCSI://diskfarm1.acme.com iSCSI://computingcenter.acme.com/diskfarm1 iSCSI://computingceneter.acme.com:4002/scanners Satran, J. Standards-Track, June 2001 17 iSCSI December 30, 2000 When a target has to act as an initiator for a third party command, it MAY use the initiator name it learned during login as required by the authentication mechanism to the third party. To address targets and logical units within a target, SCSI uses a fixed length (8 bytes) uniform addressing scheme; in this document, we call those addresses SCSI reference addresses (SRA). To provide the target with the protocol specific addresses iSCSI relies on the SCSI aliasing mechanism (work in progress in T10). The aliasing support enables an initiator to associate protocol specific addresses with SRAs; the later can be used in subsequent commands. For iSCSI, a protocol specific address is a TCP address and a selector. 1.2.8 Message Framing 1.2.8.1 Framing Justification iSCSI presents a mapping of the SCSI protocol onto TCP. This encapsulation is accomplished by sending iSCSI PDUs that are of varying length. Unfortunately, TCP does not have a built-in mechanism for signaling message boundaries at the TCP layer. iSCSI overcomes this obstacle by placing the message length in the iSCSI message header. This serves to delineate the end of the current message as well as the beginning of the next message. In situations where IP packets are delivered in-order from the network, iSCSI message framing is not an issue (messages are processed one after the other). In the presence of IP packet reordering (e.g. frames being dropped), legacy TCP implementations store the "out of order" TCP segments in temporary buffers until the missing TCP segments arrive, upon which the data must be copied to the application buffers. In iSCSI it is desirable to steer the SCSI data within these out of order TCP segments into the pre-allocated SCSI buffers rather than store them in temporary buffers. This decreases the need for dedicated reassembly buffers as well as the latency and bandwidth related to extra copies. Unfortunately, when relying solely on the "message length in the iSCSI message" scheme to delineate iSCSI messages, a missing TCP segment that contains an iSCSI message header (with the message length) makes it impossible to find message boundaries in subsequent TCP segments. The missing TCP segment(s) must be received before any of the following segments can be steered to the correct SCSI buffers (due to the inability to determine the iSCSI message boundaries). Satran, J. Standards-Track, June 2001 18 iSCSI December 30, 2000 Since these segments cannot be steered to the correct location, they must be save in temporary buffers that must then be copied to the SCSI buffers. To reduce the amount of temporary buffering and copying, synchronization information (markers) is placed at fixed intervals in the TCP stream to enable accelerated iSCSI/TCP implementations to find and delineate iSCSI messages in the presence of IP packet reordering. The use of markers is negotiable. Initiator and target MAY indicate their readiness to receive and/or send markers, during login, separately for each connection. The default is NO. In certain environments a sender not willing to supply markers to a receiver willing to accept markers MAY suffer from a considerable performance degradation. Satran, J. Standards-Track, June 2001 19 iSCSI December 30, 2000 1.2.8.2 Markers At Fixed Intervals At fixed intervals in the TCP byte stream, a "Marker" is inserted. This Marker indicates the offset to the next iSCSI message header. The Marker is eight bytes in length, and contains two 32-bit offset fields that indicate how many bytes to skip in the TCP stream to find the next iSCSI message header. There are two copies of the offset in the Marker to handle the case where the Marker straddles a TCP segment boundary. Each end of the iSCSI session specifies during login the interval of the Marker it will be receiving, or disables the Marker altogether. If a receiver indicates that it desires a Marker, the sender SHOULD provide the Marker at the desired interval. The marker interval (and the initial marker-less interval) are counted in terms of the TCP-sequence-number. Anything counted in the TCP sequence-number is counted for the interval and the initial marker-less interval. Markers MUST point to a 4 byte word boundary in the TCP stream - the last 2 bits of each marker word are reserved and will be considered 0 for offset computation. Padding iSCSI PDU payloads to 4 byte word boundaries simplifies marker manipulation. 1.2.8.3 iSCSI PDU Size When a large iSCSI message is sent, the TCP segment(s) containing the iSCSI header may be lost. The remaining TCP segment(s) up to the next iSCSI message need to be buffered (in temporary buffers), since the iSCSI header that indicates what SCSI buffers, the data is to be steered to was lost. To minimize the amount of buffering, it is recommended that the iSCSI PDU size be restricted to a small value (perhaps a few TCP segments in length). Each end of the iSCSI session specifies during login the maximum size of an iSCSI PDU it will accept. 1.2.8.4 Initial marker-less interval To enable the connection setup including the login phase negotiation the negotiated marking will be started at negotiated boundary in the stream. The marker-less interval will not be less than 64 kbytes and the default will be 64 kbytes. Satran, J. Standards-Track, June 2001 20 iSCSI December 30, 2000 2. iSCSI PDU Formats All multi-byte integers specified in formats defined in this document are to be represented in network byte order (i.e., big endian). Any bits not defined should be set to zero. 2.1 iSCSI PDU length and padding iSCSI PDUs are padded to an integer number of 4 byte words. 2.2 Template Header and Opcodes All iSCSI PDUs begin with a 48-byte header. Additional data appears, as necessary, beginning with byte 48. The fields of Opcode and Length appear in all iSCSI PDUs. In addition, the Initiator Task tag, Logical Unit Number, and Flags fields, when used, always appear in the same location in the header. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode |X| Opcode-specific fields | | |P| | +---------------+---------------+---------------+---------------+ 4| Length of Data (after 48 byte Header) | +---------------+---------------+---------------+---------------+ 8| LUN or Opcode-specific fields | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag or Opcode-specific fields | +---------------+---------------+---------------+---------------+ 20/ Opcode-specific fields / +/ / +---------------+---------------+---------------+---------------+ 48| Header digest (optional-constant-length) | +---------------------------------------------------------------+ +n/ / +/ Data (optional) / +---------------------------------------------------------------+ m| Data digest (optional-variable-length) | +---------------------------------------------------------------+ Satran, J. Standards-Track, June 2001 21 iSCSI December 30, 2000 2.2.1 Opcode The Opcode indicates what type of iSCSI PDU the header encapsulates. The Opcode is further encoded as follows: b7 Response b6-0 Operation The opcodes are divided into two categories: initiator opcodes and target opcodes. Initiator opcodes are in PDUs sent by the initiators, and target opcodes are in PDUs sent by the target. The initiator MUST NOT send target opcodes and the target MUST NOT send initiator opcodes. Target opcodes are also called responses and are distinguished by having the Response bit (bit 6) set to 1. Valid initiator opcodes defined in this specification are: 0x00 NOP-Out (from initiator to target) 0x01 SCSI Command (encapsulates a SCSI Command Descriptor Block) 0x02 SCSI Task Management Command 0x03 Login Command 0x04 Text Command 0x05 SCSI Data (for WRITE operation) 0x06 Logout Command Valid target opcodes are: 0x80 NOP-In (from target to initiator) 0x81 SCSI Response (contains SCSI status and possibly sense information or other response information) 0x82 SCSI Task Management Response 0x83 Login Response 0x84 Text Response 0x85 SCSI Data (for READ operation) 0x86 Logout Response 0x90 Ready To Transfer (R2T - sent by target to initiator when it is ready to receive data from initiator) 0x91 Asynchronous Event (sent by target to initiator to indicate certain special conditions) 0xef Reject Satran, J. Standards-Track, June 2001 22 iSCSI December 30, 2000 Initiator opcodes 0x70-0x7f and target opcodes 0xf0-0xff are vendor specific codes. 2.2.2 Opcode-specific fields These fields have different meanings for different messages. Bit 7 of the second byte is used as a retry indicator for commands (X bit) or Poll bit (P bit) and must be 0 in all other iSCSI PDUs 2.2.3 Length The Length field indicates the number of bytes, beyond the first 48 bytes, that are being sent together with this message header. The length includes the header and data digests if any. It is anticipated that most iSCSI PDUs (not counting data transfer PDUs) will not need more than the 48 byte header. The length field accounts for proper iSCSI PDU content; whatever padding is required to reach a 4 byte boundary in the TCP stream is implied by the protocol but not accounted for in the length field. 2.2.4 LUN Some opcodes operate on a specific Logical Unit. The Logical Unit Number (LUN) field identifies which Logical Unit. If the opcode does not relate to a Logical Unit, this field either is ignored or may be used for some other purpose. The LUN field is 64-bits in accordance with [SAM2]. The exact format of this field can be found in the [SAM2] document. 2.2.5 Initiator Task Tag The initiator assigns a Task Tag to each SCSI task that it issues. This tag is a session-wide unique identifier that can be used to uniquely identify the Task. 2.2.6 Header Digest and Data Digest Optional header and data digests protect the integrity and authenticity of header and data, respectively. The digests, if present, appear as trailers located, respectively, after the header and PDU-specific data. Satran, J. Standards-Track, June 2001 23 iSCSI December 30, 2000 The digest types are negotiated during the login phase. The separation of the header and data digests is useful in iSCSI routing applications, where only the header changes when a message is forwarded. In this case, only the header digest should be re- calculated. Satran, J. Standards-Track, June 2001 24 iSCSI December 30, 2000 2.3 SCSI Command Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x01 |X|R|W|0 0|ATTR | Reserved (0) | AddCDB | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| Logical Unit Number (LUN) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Expected Data Transfer Length | +---------------+---------------+---------------+---------------+ 24| CmdRN | +---------------+---------------+---------------+---------------+ 28| ExpStatRN | +---------------+---------------+---------------+---------------+ 32/ SCSI Command Descriptor Block (CDB) / +/ / +---------------+---------------+---------------+---------------+ 48/ Command Data (Command Dependent) / +/ / +---------------+---------------+---------------+---------------+ 2.3.1 Flags & Task Attributes The flags field for a SCSI Command is: b7 Retry (X) b6 (R) set to 1 when input data is expected b5 (W) set to 1 when output data is expected b3-4 Reserved (MUST be 0) b0-2 used to indicate Task Attributes The Task Attributes (ATTR) can have one of the following integer values (see [SAM2] for details): 0 Untagged 1 Simple Satran, J. Standards-Track, June 2001 25 iSCSI December 30, 2000 2 Ordered 3 Head of Queue 4 ACA 2.3.2 AddCDB Additional CDB length (over 16) in units of 4 bytes. 2.3.3 CmdRN - Command Reference Number Enables ordered delivery across multiple connections in a single session. 2.3.4 ExpStatRN - Expected Status Reference Number Command responses up to ExpStatRN-1 (mod 2**32) have been received (acknowledges status) on the connection. 2.3.5 Expected Data Transfer Length For unidirectional operations, the Expected Data Transfer Length field states the number of bytes of data involved in this SCSI operation. For a WRITE operation, the initiator uses this field to specify the number of bytes of data it expects to transfer for this operation. For a READ operation, the initiator uses this field to specify the number of bytes of data it expects the target to transfer to the initiator. It corresponds to the SAM-2 byte count. For bi-directional operations, this field states the number of data bytes involved in the outbound transfer. For bi-directional operations, an additional field indicating the Expected Bidi-Read Data Transfer Length is following the (possibly extended) CDB as shown below: +---------------+---------------+---------------+---------------+ 48/ Additional CDB (if any) / +/ / +---------------+---------------+---------------+---------------+ +n| Expected Bidi-Read Data Transfer Length | +---------------------------------------------------------------+ +4/ Immediate data (optional) / / / +---------------------------------------------------------------+ Satran, J. Standards-Track, June 2001 26 iSCSI December 30, 2000 If no data will be transferred in SCSI Data packets for this SCSI operation, this field should be set to zero. Upon completion of a data transfer, the target will inform the initiator of how many bytes were actually processed (sent or received) by the target. This will be done through residual counts. 2.3.6 CDB - SCSI Command Descriptor Block There are 16 bytes in the CDB field to accommodate the commonly used CDB. Whenever larger CDBs are used, the CDB spillover MAY extend beyond the 48-byte header. 2.3.7 Command-Data Some SCSI commands require additional parameter data to accompany the SCSI command. This data may be placed beyond the 48-byte boundary of the iSCSI header. Alternatively, user data (as from a WRITE operation) can be placed in the same PDU (both cases referred to as immediate data). Satran, J. Standards-Track, June 2001 27 iSCSI December 30, 2000 2.4 SCSI Response Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x81 |Rsvd |o|u|O|U| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| Reserved (0) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Basic Residual Count | +---------------+---------------+---------------+---------------+ 24| StatRN | +---------------+---------------+---------------+---------------+ 28| ExpCmdRN | +---------------+---------------+---------------+---------------+ 32| MaxCmdRN | +---------------+---------------+---------------+---------------+ 36| Command Status| Reserved (0) | +---------------+---------------+---------------+---------------+ 40| Resp_length | Sense_length | +---------------+---------------+---------------+---------------+ 44| Bidi-Read Residual Count | +---------------+---------------+---------------+---------------+ 48/ Response and/or sense Data (optional) / +/ / +---------------+---------------+---------------+---------------+ 2.4.1 Byte 1 - Flags b0 (U) set for Residual Underflow. In this case, the Basic Residual Count indicates how many bytes were not transferred out of those expected to be transferred. b1 (O) set for Residual Overflow. In this case, the Basic Residual Count indicates how many bytes could not be transferred because the initiator's Expected Data Transfer Length was too small. b2 (u) same as b0 but for the read-part of a bi-directional operation Satran, J. Standards-Track, June 2001 28 iSCSI December 30, 2000 b3 (o) same as b1 but for the read-part of a bi-directional operation b4-7 not used (SHOULD be set to 0) Bits O and U are mutually exclusive and so are bits o and u. 2.4.2 Basic Residual Count The Basic Residual Count field is valid only in case either the U bit or the O bit is set. If neither bit is set, the Basic Residual Count field SHOULD be zero. If the U bit is set, the Basic Residual Count indicates how many bytes were not transferred out of those expected to be transferred. If the O bit is set, the Basic Residual Count indicates how many bytes could not be transferred because the initiator's Expected Data Transfer Length was too small. 2.4.3 Bidi-Read Residual Count The Bidi-Read Residual Count field is valid only in case either the u bit or the o bit is set. If neither bit is set, the Bidi-Read Residual Count field SHOULD be zero. If the u bit is set, the Bidi- Read Residual Count indicates how many bytes were not transferred in out of those expected to be transferred. If the o bit is set, the Bidi-Read Residual Count indicates how many bytes could not be transferred in because the initiator's Expected Bidi-Read Transfer Length was too small. 2.4.4 Command Status The Command Status field is used to report the SCSI status of the command (as specified in [SAM2]). 2.4.5 Resp_length - Response length 2.4.6 Sense_length - Length of sense data 2.4.7 Response and/or Sense Data iSCSI targets MUST support and enable autosense. If the Command Status was CHECK CONDITION (0x02), then the Response and/or Sense Data field will contain sense data for the failed command after the response data. Some sense codes will relate to iSCSI check conditions (e.g. excessive number of outstanding commands, immediate data blocks too large etc.). The Length parameters specify the number of bytes in each section of this field. If no error occurred, and no data is needed for the response to the SCSI Command the length Satran, J. Standards-Track, June 2001 29 iSCSI December 30, 2000 field is zero. If both Response Data and Sense Data are present, the Response Data precedes the Sense Data. 2.4.8 StatRN - Status Reference Number StatRN is a reference number that the target iSCSI layer generates per connection and that in turn enables the initiator to acknowledge status reception. StatRN is incremented by 1 for every response/status sent on a connection. 2.4.9 ExpCmdRN - next expected CmdRN from this initiator ExpCmdRN is a reference number that the target iSCSI returns to the initiator to acknowledge command reception. It is used to update a local counter with the same name. 2.4.10 MaxCmdRN - maximum CmdRN acceptable from this initiator MaxCmdRN is a reference number that the target iSCSI returns to the initiator to indicate the maximum CmdRN the initiator can send. It is used to update a local counter with the same name. MaxCmdRN and ExpCmdRN are processed as follows: -if the PDU MaxCmdRN is less than the PDU ExpCmdRN (in Serial Arithmetic Sense and with a difference bounded by 2**31-1) they are both ignored -if the PDU MaxCmdRN is less than the current MaxCmdRN (in Serial Arithmetic Sense and with a difference bounded by 2**31- 1) it is ignored else it updates MaxCmdRN -if the PDU ExpCmdRN is less than the current ExpCmdRN (in Serial Arithmetic Sense and with a difference bounded by 2**31- 1) it is ignored else it updates ExpCmdRN This sequence is required as updates may arrive out of order (they travel on different TCP connections). Satran, J. Standards-Track, June 2001 30 iSCSI December 30, 2000 2.5 SCSI Task Management Command Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x02 |0| Function | Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| Logical Unit Number (LUN) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Referenced Task Tag or Reserved (0) | +---------------+---------------+---------------+---------------+ 24| CmdRN | +---------------+---------------+---------------+---------------+ 28| ExpStatRN | +---------------+---------------+---------------+---------------+ 32/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48 2.5.1 Function The Task Management functions provide an initiator with a way to explicitly control the execution of one or more Tasks. The Task Management functions are summarized as follows (for a more detailed description see the [SAM2] document): 1 Abort Task---aborts the task identified by the Referenced Task Tag field. 2 Abort Task Set---aborts all Tasks issued by this initiator on the Logical Unit. 3 Clear ACA---clears the Auto Contingent Allegiance condition. 4 Clear Task Set---Aborts all Tasks (from all initiators) for the Logical Unit. 5 Logical Unit Reset 6 Target Warm Reset 7 Target Cold Reset Satran, J. Standards-Track, June 2001 31 iSCSI December 30, 2000 For the functions above a SCSI Task Management Response MUST be returned, using the Initiator Task Tag to identify the operation for which it is responding. For the , if SCSI control mode enables AE reporting, the target MUST send an Asynchronous Event to all other attached initiators to inform them that all pending tasks are cancelled and then enter the ACA state for any initiator for which it had pending tasks. For the and functions, the target cancels all pending operations and are both equivalent to the Target Reset as specified by SAM-2. Provided that SCSI control mode enables AE reporting, the target MUST send an Asynchronous Event to all attached initiators notifying them that the target is being reset. In addition, for the the target will enter the ACA state on all sessions and all LUs on which an AE was sent. In addition, for the the target then MUST terminate all of its TCP connections to all initiators (all sessions are terminated). However, if the target finds that it cannot send the required response or AEN it MUST continue the reset operation and it SHOULD log the condition for later retrieval. The logging operation MUST be reported through the target MIB. Further actions on reset functions are specified in the relevant SCSI documents for the specific class of devices. 2.5.2 Referenced Task Tag Initiator Task Tag of the task to be aborted - for abort task Satran, J. Standards-Track, June 2001 32 iSCSI December 30, 2000 2.6 SCSI Task Management Response Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x82 |0| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| Logical Unit Number (LUN) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Referenced Task Tag or Reserved (0) | +---------------+---------------+---------------+---------------+ 24| StatRN | +---------------+---------------+---------------+---------------+ 28| ExpCmdRN | +---------------+---------------+---------------+---------------+ 32| MaxCmdRN | +---------------+---------------+---------------+---------------+ 36| Response | Reserved (0) | +---------------+---------------+---------------+---------------+ 40/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48 For the functions , the target performs the requested Task Management function and sends a SCSI Task Management Response back to the initiator. The target provides a Response, which may take on the following values: 0 Function Complete 1 No Task Found 255 Function Rejected For the and functions, the target cancels all pending operations. If SCSI control mode enables AE reporting, the target MUST send an Asynchronous Event to all Satran, J. Standards-Track, June 2001 33 iSCSI December 30, 2000 attached initiators notifying them that the target has been reset. For the the target MUST then close all of its TCP connections to all initiators (terminates all sessions). 2.6.1 Referenced Task Tag Initiator Task Tag of the task not found Satran, J. Standards-Track, June 2001 34 iSCSI December 30, 2000 2.7 SCSI Data The typical data transfer specifies the length of the data payload, the Transfer Tag provided by the receiver for this data transfer, and a buffer offset. The typical SCSI Data packet for WRITE (from initiator to target) has the following format: Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x05 |F| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| LUN or Reserved (0) | 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Target Task Tag (solicited) or Reserved (0) (unsolicited) | +---------------+---------------+---------------+---------------+ 24| Reserved (0) | +---------------+---------------+---------------+---------------+ 28| ExpStatRN | +---------------+---------------+---------------+---------------+ 32/ Reserved (0) / / / +---------------+---------------+---------------+---------------+ 40| Buffer Offset | +---------------+---------------+---------------+---------------+ 44| Reserved (0) | +---------------+---------------+---------------+---------------+ 48/ Payload / +/ / +---------------+---------------+---------------+---------------+ Satran, J. Standards-Track, June 2001 35 iSCSI December 30, 2000 The typical SCSI Data packet for READ (from target to initiator) has the following format: Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x85 |P| (0) |S|O|U| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| Reserved (0) | +---------------+---------------+---------------+---------------+ 12| Reserved (0) | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Residual Count | +---------------+---------------+---------------+---------------+ 24| DataRN /StatRN | +---------------+---------------+---------------+---------------+ 28| ExpCmdRN | +---------------+---------------+---------------+---------------+ 32| MaxCmdRN | +---------------+---------------+---------------+---------------+ 36| Command Status| Reserved (0) | +---------------+---------------+---------------+---------------+ 40| Buffer Offset | +---------------+---------------+---------------+---------------+ 44| Reserved (0) | +---------------+---------------+---------------+---------------+ 48/ Payload / +/ / +---------------+---------------+---------------+---------------+ 2.7.1 F (Final) bit This bit is 1 for the last PDU of immediate data or the last PDU of a sequence answering a R2T. 2.7.2 Length The length field specifies the total number of bytes in the following payload. Satran, J. Standards-Track, June 2001 36 iSCSI December 30, 2000 2.7.3 Target Task Tag The Target Task Tag is provided to the target if the transfer is honoring a R2T. In this case, the Target Task Tag field is a replica of the Target Task Tag provided with the R2T. The Target Task Tag values are not specified by this protocol except that the all-bits-one value (0x'ffffffff') is reserved and means that the Target Task Tag is not supplied. If the Target Task Tag is provided then the LUN field MUST hold a valid value and consistent with whatever was specified with the command, else the LUN field is reserved. 2.7.4 Buffer Offset The Buffer Offset field contains the offset of the following data against the complete data transfer. The sum of the buffer offset and length should not exceed the expected transfer length for the command. 2.7.5 Flags The last SCSI Data packet sent from a target to an initiator for a particular SCSI command that completed successfully may optionally also contain the Command Status for the data transfer. In this case Sense Data cannot be sent together with the Command Status. If the command completed with an error, then the response and sense data must be sent in a SCSI Response packet and must not be sent in a SCSI Data packet. b0-1 as in an ordinary SCSI Response b2 S (status)- set to indicate that the Command Status field contains status b3-6 not used (should be set to 0) b7 P (poll) - set to indicate data acknowledgement is requested; b7 and b2 are mutually exclusive - if S bit is set P bit MUST be ignored If the S bit is set, then there is meaning to the extra fields in the SCSI Data packet (StatRN, Command Status, Residual Count). 2.7.6 Data numbering (DataRN) On inbound data, the target MAY number (sequence) the data packets to enable shorter recovery on connection failure. In case the target numbers data packets, the initiator MUST acknowledge them by specifying the next expected packet in a NOP-Out with the same Satran, J. Standards-Track, June 2001 37 iSCSI December 30, 2000 Initiator Tag. Acknowledging NOP PDUs MAY be postponed for up to the number of incoming data PDUs negotiated at login. An explicit request for acknowledgement made by setting the P bit MUST be honored. Satran, J. Standards-Track, June 2001 38 iSCSI December 30, 2000 2.8 Text Command The Text Command is provided to allow the exchange of information and for future extensions. It permits the initiator to inform a target of its capabilities or to request some special operations. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x04 |0| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Reserved (0) | +---------------+---------------+---------------+---------------+ 24| CmdRN | +---------------+---------------+---------------+---------------+ 28| ExpStatRN | +---------------+---------------+---------------+---------------+ 32/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48/ Text / +/ / +---------------+---------------+---------------+---------------+ 2.8.1 Length This is the length, in bytes, of the Text field. 2.8.2 Initiator Task Tag The initiator assigned identifier for this Text Command. If the command is sent as part of the Login Phase the Initiator Task Tag MUST be the same as the one sent with the Login Command. 2.8.3 Text Satran, J. Standards-Track, June 2001 39 iSCSI December 30, 2000 The initiator sends the target a set of key:value or key:(list) pairs encoded in UTF-8 Unicode. The key and value are separated by a ':' (0x3A) delimiter. Many key:value pairs can be included in the Text block by separating them with null ' ' (0x00) delimiters. Character strings are represented following the C-language syntax. Numeric and binary values are represented using either using decimal numbers or the hexadecimal 0x'ffff' notation. The result is adjusted to the specific key. Some basic key:value pairs are described in Appendix A & C. The target responds by sending its response back to the initiator. The target and initiator can then perform some advanced operations based on their common capabilities. Manufacturers may introduce new keys by prefixing them with their (reversed) domain name, for example the company owning the domain acme.com can issue: com.acme.bar.foo.do_something:0000000000000003 Any key that the target does not understand may be ignored without affecting basic function. Once the target has processed all the key:value or key:(list) pairs, it responds with the Text Response command, listing the parameters that it supports. It is recommended that Text operations that will take a long time should be placed in their own Text command. If the Text Response does not contain a key that was requested, the initiator must assume that the key was not understood by the target. Targets and initiators may limit the size of the text accepted in a text command and text response as well as the size of key:value pairs. Such limits should be indicated at login. The default limit is 16384 UTF8 characters. Satran, J. Standards-Track, June 2001 40 iSCSI December 30, 2000 2.9 Text Response The Text Response message contains the responses of the target to the initiator's Text Command. The format of the Text field matches that of the Text Command. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x84 |0| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Reserved (0) | +---------------+---------------+---------------+---------------+ 24| StatRN | +---------------+---------------+---------------+---------------+ 28| ExpCmdRN | +---------------+---------------+---------------+---------------+ 32| MaxCmdRN | +---------------+---------------+---------------+---------------+ 36/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48/ Text Response / +/ / +---------------+---------------+---------------+---------------+ 2.9.1 Length This is the length, in bytes, of the Text Response field. 2.9.2 Initiator Task Tag The Initiator Task Tag matches the tag used in the initial Text Command or the Login Initiator Task Tag. 2.9.3 Text Response Satran, J. Standards-Track, June 2001 41 iSCSI December 30, 2000 The Text Response field contains responses in the same key:value format as the Text Command. Appendix C lists some basic Text Commands and their Responses. If the Text Response does not contain a key that was requested, the initiator must assume that the key was not understood by the target or that the answer is :none and the two MUST be equivalent where applicable. Satran, J. Standards-Track, June 2001 42 iSCSI December 30, 2000 2.10 Login Command After establishing a TCP connection between an initiator and a target, the initiator MUST issue a Login Command to gain further access to the target's resources. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x03 |0| Reserved (0)| Version-major | Version-minor | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| CID | Reserved (0) | +---------------+---------------+---------------+---------------+ 12| ISID |TSID | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Reserved (0) | +---------------+---------------+---------------+---------------+ 24| InitCmdRN or 0 | +---------------+---------------+---------------+---------------+ 28/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48/ Login Parameters in Text Command Format / +/ / +---------------+---------------+---------------+---------------+ 2.10.1 Version-major and Version-minor Currently 0.3 2.10.2 CID A unique id for this connection within the session 2.10.3 InitCmdRN Is significant only if TSID is zero and indicates the starting Command reference number for this session; it SHOULD be zero for all other instances. If it is significant (TSID is 0) and the value is Satran, J. Standards-Track, June 2001 43 iSCSI December 30, 2000 zero then this is a single connection session with no support for command numbering. 2.10.4 Login Parameters The initiator MAY provide some basic parameters in order to enable the target to determine if the initiator may in fact use the target's resources and the initial text parameters for the security exchange. The format of the parameters is as specified for the Text Command. Keys and their explanations are listed in Appendixes. Satran, J. Standards-Track, June 2001 44 iSCSI December 30, 2000 2.11 Login Response The Login Response indicates the end of the login phase. Note, if security is established, the login response is authenticated. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x83 |F| Reserved (0)| Version-major | Version-minor | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 12| ISID |TSID | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Reserved (0) | +---------------+---------------+---------------+---------------+ 24| InitStatRN | +---------------+---------------+---------------+---------------+ 28| ExpCmdRN | +---------------+---------------+---------------+---------------+ 32| MaxCmdRN | +---------------+---------------+---------------+---------------+ 36| Status | Reserved (0) | +---------------+---------------+---------------+---------------+ 40/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48/ Login Parameters in Text Command Format / +/ / +---------------+---------------+---------------+---------------+ 2.11.1 Version-major minor Indicates the version supported. Assuming versions are backward compatible, it indicates the highest (compatible) version supported by the target. 2.11.2 InitStatRN Satran, J. Standards-Track, June 2001 45 iSCSI December 30, 2000 This is the starting status reference number for this connection. 2.11.3 Status The Status returned in a Login Response is one of the following: 0 accept login (will now accept SCSI commands) 1 reject login In the case that the Status is "accept login" the initiator may proceed to issue SCSI commands. In the case that the Status is "reject login" the initiator should immediately close down its end of the TCP connection, thus freeing up the target's port for some other connection. The target also has the option of immediately closing down its end of the TCP connection. 2.11.4 TSID The TSID is an initiator identifying tag set by the target. A 0 in the returned TSID indicates that either the target supports only a single connection or that the ISID has already been used as a leading ISID. In both cases, the target is rejecting the login. 2.11.5 Final bit Final bit is set to one in the Final Login Response. A Final bit of 0 indicates a "partial" response - more negotiation needed. TSID must be returned in the partial response and the same value must be presented with the final response. Satran, J. Standards-Track, June 2001 46 iSCSI December 30, 2000 2.12 NOP-Out Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x00 |P| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag or Reserved (0) | +---------------+---------------+---------------+---------------+ 20| Target Tag or Reserved (0x'ffffffff') | +---------------+---------------+---------------+---------------+ 24| CmdRN or (0) | +---------------+---------------+---------------+---------------+ 28| ExpStatRN or (0) | +---------------+---------------+---------------+---------------+ 32| ExpDataRN or (0) | +---------------+---------------+---------------+---------------+ 36/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48/ Ping Data (optional) / +/ / +---------------+---------------+---------------+---------------+ The NOP-Out with the P bit set acts as a "ping command". This form of the NOP-Out can be used to verify that a connection is still active and all it's components are operational using in-order delivery or out-of-order delivery. It may be useful in the case where an initiator has been waiting a long time for the response to some command, and the initiator suspects that there is some problem with the connection. When a target receives the NOP-Out with the Ping bit set, it should respond with a Ping Response, duplicating as much as possible of the data that was provided in the NOP-Out. If the initiator does not receive the NOP-In within some time (determined by the initiator), or if the data returned by the NOP-In is different from the data that was in the NOP-Out, the initiator may conclude that there is a problem with the connection. The initiator will then close the connection and may try to establish a new connection. Satran, J. Standards-Track, June 2001 47 iSCSI December 30, 2000 The NOP-Out with the P bit not set MUST be used to acknowledge data received from a target (data-ack) whenever data numbering is used. In this case, the command caries the same Initiator Task Tag as the data it acknowledges and the CmdRN field MUST be zero. Duplicate or obsolete data acknowledgements MUST be silently discarded by the target. The NOP-Out can be sent by an initiator because of a NOP-In with the poll bit set, in which case the Target Tag will copy the NOP-In value. 2.12.1 P - Ping bit Request a NOP-In 2.12.2 Length This is the length of the optional Ping Data. 2.12.3 Initiator Task Tag An initiator assigned identifier for the operation. The NOP-Out MUST have the Initiator Task Tag set only if the P bit is one or the DataRN field is set. 2.12.4 Target Task Tag A target assigned identifier for the operation. The NOP-Out MUST have the Target Tag set only if it issued in response to a NOP-In with the P bit one, in which case it copies the Target Tag from the NOP-In PDU. 2.12.5 Ping Data Binary data that will be reflected in the Ping Response. Satran, J. Standards-Track, June 2001 48 iSCSI December 30, 2000 2.13 NOP-In Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x80 |0| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20 Target Tag or Reserved (0x'ffffffff') | +---------------+---------------+---------------+---------------+ 24| StatRN | +---------------+---------------+---------------+---------------+ 28| ExpCmdRN | +---------------+---------------+---------------+---------------+ 32| MaxCmdRN | +---------------+---------------+---------------+---------------+ 36/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48/ Return Ping Data / +/ / +---------------+---------------+---------------+---------------+ When a target receives the NOP-Out with the P bit set, it MUST respond with a NOP-In, with the same Initiator Task Tag that was provided in the Ping Command. It SHOULD also duplicate as much of the initiator provided Ping Data as allowed by a configurable target parameter. A target may issue a NOP-In by its own to test connection and the state of the initiator. In this case the Initiator Task Tag MUST be 0 and the Target Tag MUST be set (not x'ffffffff') only if the P bit is 1. 2.13.1 Target Task Tag A target assigned identifier for the operation. Satran, J. Standards-Track, June 2001 49 iSCSI December 30, 2000 2.14 Logout Command The Logout command is used to perform a controlled closing of a connection. An initiator MAY use a logout command to remove a connection from a session. If an initiator intends to start recovery for a failing connection it MUST use the Logout command to "clean-up" the target end of a failing connection and enable recovery to start. On sessions with a single connection, this might imply opening a second connection with the sole purpose of cleaning-up the first. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x06 |0| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| CID | Reserved (0) |Reason Code | +---------------+---------------+---------------+---------------+ 12| Reserved (0) | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48 2.14.1 CID The connection ID of the connection to be closed (including closing the TCP stream) 2.14.2 Reason Code Indicate the reason for Logout: 0 - Remove the connection session is closing 1 - Remove the connection for recovery 2 - Remove the connection at targets requests (requested through an AEN) Satran, J. Standards-Track, June 2001 50 iSCSI December 30, 2000 2.15 Logout Response The logout is used by the target to indicate that the cleanup operation for the failed connection has completed. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x86 |0| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8/ Reserved (0) / / / +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 28| ExpCmdRN | +---------------+---------------+---------------+---------------+ 32| MaxCmdRN | +---------------+---------------+---------------+---------------+ 36| Status | Reserved (0) | +---------------------------------------------------------------+ 48 2.15.1 Status Logout ending status: 0 - connection closed successfully 1 - cleanup failed Satran, J. Standards-Track, June 2001 51 iSCSI December 30, 2000 2.16 Ready To Transfer (R2T) When an initiator has submitted a SCSI Command with data passing from the initiator to the target (WRITE), the target may specify which blocks of data it is ready to receive. In general, the target may request that the data blocks be delivered in whatever order is convenient for the target at that particular instant. This information is passed from the target to the initiator in the Ready To Transfer (R2T) message. In order to allow write operations without R2T, the initiator and target must have agreed to do so by both sending the UseR2T:no key- pair attribute to each other (either during Login or through the Text Command/Response mechanism). An R2T MAY be answered with one or more iSCSI Data-out PDU with a matching Target Task Tag. If an R2T is answered with a single Data PDU the Buffer Offset in the Data PDU MUST be the same as the one specified by the R2T and the data length of the Data PDU must not exceed the Desired Data Length specified in R2T. If the R2T is answered with a sequence of Data PDUs the Buffer Offset and Length must be within the range of those specified by R2T, the last PDU should have the F bit set to 1, the Buffer Offsets and Lengths for consecutive PDUs SHOULD form a continuous non-overlapping range and the PDUs should be sent in increasing offset order. The target may send several R2T PDUs and thus have a number or data transfers pending. The present document does not limit the number of outstanding data transfers. However, the target SHOULD NOT issue overlapping R2T request (i.e. referring to the same data area). All outstanding R2T should have different Target Transfer Tags. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x90 |0| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| Reserved (0) | + + 12| | +---------------+---------------+---------------+---------------+ Satran, J. Standards-Track, June 2001 52 iSCSI December 30, 2000 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Target Task Tag | +---------------+---------------+---------------+---------------+ 24| Reserved (0) | +---------------+---------------+---------------+---------------+ 28| ExpCmdRN | +---------------+---------------+---------------+---------------+ 32| MaxCmdRN | +---------------+---------------+---------------+---------------+ 36| Desired Data Length | +---------------+---------------+---------------+---------------+ 40| Buffer Offset | +---------------+---------------+---------------+---------------+ 44| Reserved (0) | | | +---------------+---------------+---------------+---------------+ 48 2.16.1 Desired Data Transfer Length and Buffer Offset The target specifies how many bytes it wants the initiator to send because of this R2T message. The target may request the data from the initiator in several chunks, not necessarily in the original order of the data. The target, therefore, also specifies a Buffer Offset indicating the point at which the data transfer should begin, relative to the beginning of the total data transfer. 2.16.2 Target Transfer Tag The target assigns its own tag to each R2T request that it sends to the initiator. This can be used by the target to easily identify data it receives. The Target Transfer Tag is copied in the outgoing data PDUs and is provided by the target and used by the target only. There is no protocol rule about Target Transfer Tag but it is assumed that it will be used to tag the response data to the target (alone or combination with the LUN). Satran, J. Standards-Track, June 2001 53 iSCSI December 30, 2000 2.17 Asynchronous Event An Asynchronous Event may be sent from the target to the initiator without corresponding to a particular command. The target specifies the status for the event and sense data. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x91 |0| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| Logical Unit Number (LUN) | + + 12| | +---------------+---------------+---------------+---------------+ 16/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 24| StatRN | +---------------+---------------+---------------+---------------+ 28| ExpCmdRN | +---------------+---------------+---------------+---------------+ 32| MaxCmdRN | +---------------+---------------+---------------+---------------+ 36|SCSI Event Ind |iSCSI Event Ind| Reserved (0) | +---------------+---------------+---------------+---------------+ 40/ Reserved (0) / / / +---------------+---------------+---------------+---------------+ 48/ Sense Data / +/ / +---------------+---------------+---------------+---------------+ 2.17.1 iSCSI Event Some Asynchronous Events are strictly related to iSCSI while others are related to SAM-2. The codes returned for iSCSI Asynchronous Events are: 1 Target is being reset. 2 Target requests Logout on this connection Satran, J. Standards-Track, June 2001 54 iSCSI December 30, 2000 2.17.2 SCSI Event Indicator The following values are defined. (See [SAM2] for details): 1 An error condition was encountered after command completion. 2 A newly initialized device is available to this initiator. 3 All Task Sets are being Reset by another Initiator 5 Some other type of unit attention condition has occurred. 6 An asynchronous event has occurred. Sense Data accompanying the report identifies the condition. The Length parameter is set to the length of the Sense Data. For new device identification an iSCSI target MUST support the Device Identification page. Please note that StatRN counts this PDU as a acknowledgeable event allowing the initiator and target state synchronization. Satran, J. Standards-Track, June 2001 55 iSCSI December 30, 2000 2.18 Third Party Commands SCSI allows every addressable entity to be ether initiator or target. In host-to-host communication, each one of them can take on the initiator role. In typical I/O operations between a host and a peripheral subsystem, the host plays the initiator role and the peripheral subsystem plays the target role. For EXTENDED COPY and other third party commands SCSI commands, that involve device-to-device communication, such as (EXTENDED) COPY and COMPARE, SCSI defines a copy-manager. The copy-manager takes on the role of initiator in the device-to-device communication. The copy- manager is the "original-target" of the command and acts as initiator for a (variable) number of the devices, called sources and destinations. Sources and destinations act as targets. The whole operation is described by one "master CDB" delivered to the copy- manager and a series of descriptor blocks; each descriptor block addresses a source and destination target and LU and a description of the work to be done in terms of blocks or bytes as required by the device types. The relevant SCSI standards do not require full support of the (EXTENDED) COPY or COMPARE nor do they provide a detailed execution model. To address them an iSCSI copy-manager will use information provided to it through map commands and the SRAs and flags provided in the descriptors - allowing for iSCSI and FC sources and destinations. Enabling a FC copy-manager to support iSCSI sources and destinations is subject to coordination with T10. Satran, J. Standards-Track, June 2001 56 iSCSI December 30, 2000 2.19 Reject Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|0| 0xef |0| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 36| Reason | Reserved (0) | +---------------+---------------+---------------+---------------+ 40| Reserved (0) | +/ / +---------------+---------------+---------------+---------------+ 48/ Header of Bad Message / +/ / +---------------+---------------+---------------+---------------+ 96 It may happen that a target receives a message with a format error (inconsistent fields, reserved fields not 0, inexistent LUN etc.) or a digest error (invalid payload or header). The target returns the header of the message in error as the data of the response. 2.20 Reason The reject Reason is coded as follows: 1 - Format Error 2 - Header Digest Error 3 - Payload Digest Error Satran, J. Standards-Track, June 2001 57 iSCSI December 30, 2000 3. Login phase The login phase establishes an iSCSI session between initiator and target. It sets the iSCSI protocol parameters, security parameters, and authenticates initiator and target to each other. The login phase is implemented via login and text commands and responses only. The login command is sent from the initiator to target in order to start the login phase and the login response is sent from the target to the initiator to conclude the login phase. Text messages are used to implement negotiation, establish security and set operational parameters. The whole login phase is considered as a single task and has a single Initiator Task Tag (very much like the linked SCSI commands). The login phase sequence of commands and responses proceeds as follows: - Login command (mandatory) - Login Partial-Response (optional) - Text Command(s) and Response(s) (optional) - Login Final-Response (mandatory) 3.1 Login phase start The login phase starts with a login request via a login command from the initiator to the target. The login request includes: -Protocol version supported by the initiator (currently 0.3) -Session and connection Ids -Security Parameters (if security is requested) and -Protocol parameters The target can answer in the following ways: -Login Response with Login Reject (and Final bit 1). This is an immediate rejection from the target causing the session to terminate. Causes for rejection are address rejection, local protection etc.. Login reject with Final bit 0 is a format error. -Login Response with Login Accept with session ID and iSCSI parameters and Final bit 1. In this case, the target does not support any security or authentication mechanism and starts with the session immediately (enters full feature phase) Satran, J. Standards-Track, June 2001 58 iSCSI December 30, 2000 -Login Response with Final bit 0 indicating the start of a authentication/negotiation sequence. The response includes the protocol version supported by the target and the security parameters (not iSCSI parameters, those will be returned only after security is established to protect them) supported by the target. 3.2 Security negotiation The negotiation proceeds as follows: -The initiator sends a text command with an ordered list of the options it supports for each subject (encryption algorithm, authentication algorithm, iSCSI parameters and so on). The options are listed from the most preferable (to the initiator) to the least. -The target MUST reply with the first option in the list it supports. The parameters are encoded in Unicode - UTF8 as key:value (e.g., the encryption option of triple-DES will appear as encryption:3des-cbc). The initiator MAY send proprietary options as well. The "none" option MUST be included in the list, indicating no algorithm supported by the target. If security is to be established, the initiator MUST NOT send parameters other than security parameters in the login command. The general parameters should be negotiated only after security is established at the desired level. Any operational parameters sent before establishing a secure context MUST be reset by both the target and the initiator when establishing the security context. For a list of security parameters see Appendix A. 3.3 iSCSI Security The security exchange sets the security mechanism and authenticates the user and the target to each other. The exchange proceeds according to the algorithms that were chosen in the negotiation phase and is conducted by the text commands key:value parameters. The security mechanism includes the following elements: -Initial authentication - the host and the target authenticate themselves to each other. A negotiable algorithm, e.g., user/password or public key, provides this feature. -Message integrity - an integrity and authentication digest is attached to each packet and authenticates it. The algorithm is negotiable. Satran, J. Standards-Track, June 2001 59 iSCSI December 30, 2000 -Encryption - data from host to target and from target to host is encrypted. The user MAY choose to encrypt only part of the data, e.g., headers only (for complexity reasons). Encryption MAY use IPsec. The algorithm and its parameters are negotiable. Using IPsec for encryption or authentication may eliminate the need for parameter negotiation at the iSCSI level (for example, ISAKMP for IPsec). However, there is still a need to negotiate for the algorithm itself. If security is established in the login phase note that: -After setting message integrity, each iSCSI message MUST include the appropriate digest field (i.e., each message after the one through which the target choose the algorithm. -If encryption is to be set (e.g., IPsec), it should be set prior to the login phase. -The iSCSI parameter negotiation (non-security parameters) SHOULD start only after security is established. This should be carried on text commands. Satran, J. Standards-Track, June 2001 60 iSCSI December 30, 2000 4. iSCSI Error Handling and Recovery 4.1 Connection failure For any outstanding SCSI command, it is assumed that iSCSI in conjunction with SCSI at the initiator is able to keep enough information to be able to rebuild the command PDU, that outgoing data is available (in host memory) for retransmission while the command is outstanding. It is also assumed that, at a target, iSCSI and specialized TCP implementations are able to recover unacknowledged data packets from a closing connection or, alternatively, the target has means to re-read data from a device server. It is further assumed that a target will keep the "status & sense" for a command it has executed while the total number of outstanding commands and executed commands does not exceed its limit. A target will sequentially number the delivered responses and thus enable initiators to tell when a response is missing and which response is missing. Under those conditions, iSCSI will be able to keep a session in operation if it is able to keep/establish at least one TCP connection between the initiator and target in a timely fashion. Unfortunately, the maximum admissible recovery time is a function of the target and for some devices and communications networks recovery may be complex and may percolate to upper software layers. It is assumed that targets and/or initiators will recognize a failing connection by either transport level means (TCP) or by a gap in the command or response stream that is not filled for a long time, or by a failing iSCSI NOP-ping (the later MAY be used periodically by highly reliable implementations). Initiators and targets MAY also use the keep-alive option on the TCP connection to enable early link failure detection on idle links. The iSCSI recovery involves the following steps: -abort offending TCP connection(s) (target & initiator) and recover at target all unacknowledged read-data -issue a Logout command on a remaining connection or create a new connection and issue the Logout command -wait for the Logout response -if needed, create one or more new TCP connections (within the same session) and associate all outstanding commands from the failed connection to the new connection at both initiator and target. Satran, J. Standards-Track, June 2001 61 iSCSI December 30, 2000 -the initiator will reissue all outstanding commands with their original Initiator Task Tag and their original CmdRN if they are not acknowledged yet or a CmdRN of 0 (not-numbered) if they were acknowledged; the retry (X) flag in the command PDU will be set -upon receiving the new/retry commands the target will resume command execution; for write commands it means requesting data retransmission through R2T, for reads retransmitting recovered data and for "terminated" commands retransmitting the Status & Sense while retaining the original StatRN. If data recovery is not possible, the target will either provide data from the media or redo the operation (if the operation is not idempotent the device server may fail the operation). 4.2 Protocol Errors The authors recognize that mapping framed messages over a "stream" connection (like TCP) makes the proposed mechanisms vulnerable to simple software framing errors and introducing framing mechanisms may be onerous for performance and bandwidth. Command reference numbers and the above mechanisms for connection drop and reestablishment will help handle this type of mapping errors. 4.3 Session Errors If all the connections of a session fail and can't be reestablished in a short time or if initiators detect protocol errors repeatedly, an initiator may choose to terminate a session and establish a new session. It will terminate all outstanding requests with an iSCSI error indication before initiating a new session. A target that detects one of the above errors will take the following actions: - Reset the TCP connections (close the session). - Abort all Tasks in the task set for the corresponding initiator. 4.4 Format errors Explicit violations of the rules stated in this document are considered as format errors. While a session is active whenever a target receives an iSCSI PDU with a format error is MUST answer with a Reject iSCSI PDU with a Reason-code of Format-error. Satran, J. Standards-Track, June 2001 62 iSCSI December 30, 2000 When a session is active whenever an initiator receives an iSCSI PDU with a format error, for which it has an outstanding task, it MUST abort the target task and report the error as a SCSI check condition status with a sense key of 4h (hardware error). 4.5 Digest errors When a target receives an iSCSI data PDU with a data payload digest error, it MUST discard it and request retransmission with a R2T. When a target receives an iSCSI PDU with a header digest error or a payload digest error in anything but a data iSCSI PDU it MUST answer with a Reject iSCSI PDU with a Reject iSCSI PDU with a Reason-code of Digest-error. When an initiator receives an iSCSI data PDU with a data payload digest error or any other iSCSI PDU with a header or payload digest error it MUST discard it, and restart the task - the later provided it could recognize the Initiator Task Tag. If the initiator can't recognize the Initiator Task Tag, (e.g., a header digest error) the initiators MUST logout the connection and restart it (including restarting all outstanding tasks). Satran, J. Standards-Track, June 2001 63 iSCSI December 30, 2000 5. Notes to Implementers This section notes some of the performance and reliability considerations of the iSCSI protocol. This protocol was designed to allow efficient silicon and software implementations. The iSCSI tag mechanism was designed to enable RDMA at the iSCSI level or lower. 5.1 Multiple Network Adapters The iSCSI protocol allows multiple connections, not all of which need go over the same network adapter. If multiple network connections are to be utilized with hardware support, the iSCSI protocol command- data-status allegiance to one TCP connection insure that there is no need to replicate information across network adapters or otherwise require them to cooperate. 5.2 Autosense Autosense refers to the automatic return of sense data to the initiator in case a command did not complete successfully. iSCSI mandates support for autosense. Satran, J. Standards-Track, June 2001 64 iSCSI December 30, 2000 6. Security Considerations 6.1 Data Integrity We assume that basic level end-to-end data integrity can be reasonably handled by TCP, by using the standard checksum. For those applications for which data integrity is of utmost importance iSCSI will provide an integrity option. 6.2 Network operations and the Threat Model Historically, native storage systems have not had to consider security because their environments offered minimal security risks. That is, these environments consisted of storage devices either directly attached to hosts or connected via a subnet distinctly separate from the communications network. The use of storage protocols, such as SCSI, over IP networks requires that security concerns be addressed. 6.2.1 Threat Model Attacks fall into three main areas; passive, active, and denial of service. 6.2.1.1 Passive Attacks Often, data transfers will be made through a switched fabric, making sniffing difficult. In addition, the nature of the data (block transfers), even if sniffed, would not necessarily be readily understandable to the attacker. That being said, a determined attacker, by capturing of content and analyzing traffic over time, could replicate enough of a storage device to make the captured data meaningful. Certain storage operations which are mostly unidirectional, such as writing to a tape or reading from a CD-ROM, are more susceptible to passive attacks since the listener will be able to replicate most if not all of the operation. Passive attacks by traffic analysis alone is deemed out of scope since it is unlikely that the listener will be able to guess any pertinent information without knowing the content of the messages. It is also out of scope to detect passive attacks. The protocol must be able to prevent passive attacks by masking the contents of messages through some form of encryption. Finally, it is assumed that a strong authentication mechanism will be necessary. Therefore, any long-lived passwords or private keys SHOULD never be sent in the clear. Satran, J. Standards-Track, June 2001 65 iSCSI December 30, 2000 6.2.1.2 Active Attacks Whereas passive attacks involve SNIFFING, active attacks will generally involve SPOOFING. If an attacker can successfully masquerade as a client, he will have total read/write access to those storage resources assigned to that client. Spoofing as a server is sometimes more difficult, since many operations involve client reads of some expected or otherwise understandable data. Most likely, many of the sessions will be long-lived. This feature has a dual effect of making these sessions more vulnerable to attack (hijacking TCP connections, cryptographic attacks), while at the same time providing mechanisms to detect attacks. An attempt to open a session while one is already active can be treated as a possible attack. Both the transport and session layer protocols will have sequencing that would need to be adhered to by the attacker to avoid generating errors that could also be treated as a possible attack. Message modification can be a significant threat to an environment reliant on the integrity of the data. Message replay, insertion, or deletion will generally produce errors (such as data overruns/underruns) that can be recovered successfully, they can have the effect of reducing performance, and as such can act as a denial of service. It is possible that an attacker can modify a message in such a way the session becomes uncoordinated, resulting in a tear down of the session. 6.2.2 Security Model 6.2.2.1 No Security This mode does not authenticate nor does it encrypt data. This mode should only be used in environments where there is minimal security risk and little chance for configuration errors. 6.2.2.2 End-to-End Authentication This mode protects against an unauthorized access to storage resources either through an active attack (SPOOFING) or configuration errors. Once the client is authenticated, all messages are sent and received in the clear. This mode should only be used when there is minimal risk to man-in-the-middle attacks, eavesdropping, message insertion, deletion, and modification. For example, this mode can be used when IPsec is used in security gateways. 6.2.2.3 iSCSI integrity and authentication Satran, J. Standards-Track, June 2001 66 iSCSI December 30, 2000 The iSCSI protocol provides an authentication mechanism for initiator and target. This includes login authentication and authentication trailers for headers and data. No encryption is provided at the iSCSI protocol level. The implementers may use other protocols (e.g., IPsec) for this purpose. 6.2.2.4 Encryption This mode provides for the end-to-end encryption (e.g. IPsec). In addition to authenticating the client, it provides end-to-end data integrity and protects against man-in-the-middle attacks, eavesdropping, message insertion, deletion, and modification. A connection or multiple connections can be protected end-to-end by using IPSec. In this case, the initiator must use the "Implicit Authentication" parameter to indicate that IPSec should be used to specify the Access ID and perform authentication. 6.2.3 Other Considerations Due to long-lived sessions, is there a need for periodic authentication after the session is established? For example, should the client be challenged during key-alive exchanges in addition to login? Due to long-lived sessions with encryption, is there a higher level of vulnerability to cryptographic attacks? 6.3 Login Process In some environments, a target will not be interested in authenticating the initiator. In this case, the target can simply ignore some or all of the parameters sent in a Login Command, and the target can simply reply with a basic Login Response indicating a successful login. Some targets MAY want to perform some kind of authentication. Various authentication schemes can be used, including encrypted passwords and trusted certificate authorities. Once the initiator and target are confident of the identity of the attached party, the established channel is considered secure. 6.4 Feasibility The encryption algorithms are computationally complex. Therefore, the real time constraints on the transmission and reception may render Satran, J. Standards-Track, June 2001 67 iSCSI December 30, 2000 difficult the implementation of completely encrypted streams. Working with fast networks will force the implementers to use one of the following alternatives: -Hardware implementation -Partial encryption The first alternative enables the use of completely encrypted streams. Although robust, this may be (at least at top speeds) expensive. The second alternative does not require specialized hardware, but will reduce the safety of the system. In most cases, however, the safety tradeoff is acceptable (e.g., encryption of headers only by defining an IPsec policy). Data integrity/authentication through data and header digests can easily be performed. Satran, J. Standards-Track, June 2001 68 iSCSI December 30, 2000 7. IANA Considerations There will be a well-known port for iSCSI connections. This well known port will be registered with IANA. Satran, J. Standards-Track, June 2001 69 iSCSI December 30, 2000 8. References and Bibliography [AC] A detailed proposal for Access Control, Jim Hafner, T10/99-245 [ALTC] Internet Draft: Alternative checksums (work in progress) [CAM] ANSI X3.232-199X, Common Access Method-3 (Cam-3) [CRC] ISO 3309, High-Level Data Link Control (CRC 32) [FIPS-180-1] FIPS-Secure Hash Standard [FIPS-186-2] FIPS-Digital Signature Standard [Orm96] Orman, H., "The Oakley Key Determination Protocol", version 1, TR97-92, Department of Computer Science Technical Report, University of Arizona. [PKIX-Part1] Housley, R., et al, "Internet X.509 Public Key Infrastructure, Certificate and CRL Profile", Internet Draft, draft-ietf-pkix-ipki-part1-11.txt [RFC793] Transmission Control Protocol, RFC 793 [RFC1122] Requirements for Internet Hosts-Communication Layer, RFC1122, R. Braden (editor) [RFC-1766] Alvestrand, H., "Tags for the Identification of Languages", March 1995. [RFC1982] Elz, R., Bush, R., "Serial Number Arithmetic", RFC 1982, August 1996. [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 3", RFC 2026, October 1996. [RFC-2044] Yergeau, F., "UTF-8, a Transformation Format of Unicode and ISO 10646", October 1996. [RFC-2104] Krawczyk, H., Bellare, M., and Canetti, R., "HMAC: Keyed-Hashing for Message Authentication", February 1997 [RFC-2119] Bradner, S. "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC-2144] Adams, C., "The CAST-128 Encryption Algorithm", May 1997. [RFC-2234] D. Crocker, P. Overell Augmented BNF for Syntax Specifications: ABNF [RFC-2313] B. Kaliski, PKCS #1: RSA Encryption, Version 1.5 [RFC-2434] T. Narten, and H. Avestrand, "Guidelines for Writing an IANA Considerations Section in RFCs.", RFC2434, October 1998. [RFC-2440] Callas, J., et al, "OpenPGP Message Format", November 1998. [SAM2] ANSI X3.270-1998, SCSI-3 Architecture Model (SAM-2) [SBC] ANSI X3.306-199X, SCSI-3 Block Commands (SBC) [SCSI2] ANSI X3.131-1994, SCSI-2 [Schneier] Schneier, B., "Applied Cryptography Second Edition: protocols, algorithms, and source code in C", 2nd edition, John Wiley & Sons, New York, NY, 1996. Satran, J. Standards-Track, June 2001 70 iSCSI December 30, 2000 [SPC] ANSI X3.301-199X, SCSI-3 Primary Commands (SPC) [TLS] The TLS Protocol, RFC 2246, T. Dierks et al. Satran, J. Standards-Track, June 2001 71 iSCSI December 30, 2000 9. Author's Addresses Julian Satran Kalman Meth IBM, Haifa Research Lab MATAM - Advanced Technology Center Haifa 31905, Israel Phone +972 4 829 6211 Email: Julian_Satran@vnet.ibm.com meth@il.ibm.com Daniel F. Smith IBM Almaden Research Center 650 Harry Road San Jose, CA 95120-6099, USA Phone: +1 408 927 2072 Email: dfsmith@almaden.ibm.com Costa Sapuntzakis Cisco Systems, Inc. 170 W. Tasman Drive San Jose, CA 95134, USA Phone: +1 408 525 5497 Email: csapuntz@cisco.com Randy Haagens Hewlett-Packard Company 8000 Foothills Blvd. Roseville, CA 95747-5668, USA Phone: +1 (916) 785-4578 E-mail: Randy_Haagens@hp.com Matt Wakeley Agilent Technologies 1101 Creekside Ridge Drive Suite 100, M/S RH21 Roseville, CA 95661 Phone: +1 (916) 788-5670 E-Mail: matt_wakeley@agilent.com Efri Zeidner SANGate Satran, J. Standards-Track, June 2001 72 iSCSI December 30, 2000 Israel efri@sangate.com Satran, J. Standards-Track, June 2001 73 iSCSI December 30, 2000 Paul von Stamwitz Adaptec, Inc. 691 South Milpitas Boulevard Milpitas, CA 95035 Phone: +1(408) 957-5660 E-mail: paulv@corp.adaptec.com Luciano Dalle Ore Quantum Corp. Phone: +1(408) 232 6524 E-mail: lldalleore@snapserver.com Yaron Klein SANRAD 24 Raul Valenberg St. Tel-Aviv, 69719 Israel Phone: +972-3-7659998 E-mail: klein@sanrad.com Comments may be sent to Julian Satran Satran, J. Standards-Track, June 2001 74 iSCSI December 30, 2000 Apendix A. iSCSI Security 01 Security keys and values The parameters (keys) negotiated for security are: - digests (header_digest:, data_digest:) - authentication methods (init_auth:, target_auth:) - public key algorithm (public_key) The following table lists cyclic integrity checksums that can be negotiated for the digests. +---------------------------------------------+ | Name | Description | +---------------------------------------------+ | crc-16 | 16 bit CRC | +---------------------------------------------+ | crc-CCITT | 16 bit CRC | +---------------------------------------------+ | crc-32 | 32 bit CRC | +---------------------------------------------+ | crc-64 | 64 bit CRC | +---------------------------------------------+ | none | no digest | +---------------------------------------------+ The generator polynomials for those digests are: crc-16 - x**16+x**15+x**2+1 crc-CCITT - x**16+x**12+x**5+1 crc-32 - x**32+x**26+x**x23+x**22+x**16+x**12+x**11+x**10+ x**8+x**7+x**5+x**4+x**2+x+1 crc-64 - Digests enable checking end-to-end data integrity (beyond the integrity checks provided by the link layers and covering the whole communication path including all elements that may change the network level PDUs - like routers, switches, proxies etc.). crc-16 and crc-CCITT are considered adequate for very short blocks (like PDU headers or very short payloads). crc-32 and crc-64 are considered adequate for longer blocks. Satran, J. Standards-Track, June 2001 75 iSCSI December 30, 2000 Cyclic codes are particularly well suited for hardware implementations. Implementations MAY also negotiate some hash functions that may provide data authentication in addition to integrity as detailed in the following table: +-----------------------------------------------------------+ | Name | Description | Definition | +-----------------------------------------------------------+ | hmac-sha1 | HMAC-SHA1 length=20 | RFC-2104 | +-----------------------------------------------------------+ | hmac-sha-96 | first 96 bits of HMAC-SHA 1 | RFC-2104 | +-----------------------------------------------------------+ | hmac-md5 | HMAC-MD5 length 16 | RFC-2104 | +-----------------------------------------------------------+ | hmac-md5-96 | first 96 bits of HMAC-MD5 | RFC-2104 | +-----------------------------------------------------------+ Other and proprietary algorithms MAY also be negotiated. The none value is the only one that MUST be supported. The following table details authentication methods: +-----------------------------------------------------------+ | Name | Description | +-----------------------------------------------------------+ | publickey | Public key authentication | +-----------------------------------------------------------+ | password | Plain text user-password | +-----------------------------------------------------------+ | challenge | Challenge and response | +-----------------------------------------------------------+ | none | No authentication | +-----------------------------------------------------------+ The following table details public key algorithms for authentication: Satran, J. Standards-Track, June 2001 76 iSCSI December 30, 2000 +-----------------------------------------------------------+ | Name | Description | Definition | +-----------------------------------------------------------+ | ssh-dss | Simple DSS | [FIPS-186] | +-----------------------------------------------------------+ | rsa | RSA public key | [RFC2313] | +-----------------------------------------------------------+ | none | No Public Key | - | +-----------------------------------------------------------+ Where the public key information is encoded as: public_key:, For example, if ssh-dss is selected: public_key:ssh-dss,p,q,g,y Here the "p", "q", "g", and "y" parameters (encoded as numbers in Unicode UTF8) form the signature key blob. Signing and verifying using this key format are done according to the Digital Signature Standard [FIPS-186] using the SHA-1 hash. A description can also be found in [Schneier]. The dss signature blob is encoded as a string containing "r" followed by "s" (which are 160 bits long integers, without lengths or padding, unsigned and in network byte order). 02 Authentication The authentication exchange SHOULD authenticate the initiator and target to each other. Authentication is not mandatory and is distinct from the data integrity exchange. Different levels of authentication can be applied such as initiator authentication, target authentication or both. The authentication methods to be used are public key, user/password or challenge/response. If public key is selected then each party MUST use: authenticate:, Satran, J. Standards-Track, June 2001 77 iSCSI December 30, 2000 where user-id is an assigned id of the host-OS for the initiator or the World-Wide-Name for the target and blob is the public-key blob. For user/password each party must use: authenticate:, where user-id is as above and password is a plain-text password. 03 Salt salt: can be used by different authentication schemes to prevent replay attacks (a random number - cookie - or a time stamp or both) 04 Challenge challenge: and authenticate: MUST be used for challenge answer schemes 05 Login Phase examples: The first example is a "user-password" authentication: In this example, the result of the negotiation is to use md5 for header digest, crc32-2k for data digest and user/password for initiator authentication. No target authentication required. I-> Login header_digest:(hmac-md5,hmac-md5-96,crc32,none) data_digest:(crc32-2k) init_auth:(public-key,password,none) target_auth:(none) public_key:((ssh-dss,parameters),none) T-> Text header_digest:hmac-md5 data_digest:crc32-2k init_auth:password I-> Text authenticate:alef,sesam If the authentication is successful: T->StartSecure:HERE ... T-> Login "login accept" If the authentication was not successful: T-> Login "login reject" Note - the Text command including SecureStart:HERE and each PDU after it will have the trailer consisting in a hmac-md5 digest for the header and a crc32 for each 2k of data (or fraction thereof). Satran, J. Standards-Track, June 2001 78 iSCSI December 30, 2000 The next example is a "public-key" authentication. The initiator authenticates itself to the target; no keys are exchanged: I-> Login header_digest:(hmac-md5,hmac-md5- 96,crc32,none)data_digest:(crc32-2k,none) init_auth:(publickey,password,none) target_auth:(none) public_key:((rsa,parameters),(ssh-dss,parameters),none) T-> Text header_digest:hmac-md5 data_digest:crc32-2k init_auth:publickey public_key:(ssh-dss,parameters) I-> Text authenticate:user,blob salt:578913456 NB - where the parameters stands for the hash of header and the salt, i.e., hash(heder || salt). The initiator SHOULD add "salt" to the packet, e.g. add the pair salt: (or timestamp or a mixture) to its packet to prevent record and replay. The key distribution may be done by a certificate authority or other server and is beyond the scope of this document If the user was not confirmed, the target sends a login response message with "login reject" to the initiator. Else, it can send a login response with "login accept" and MAY attach a secret: T->Text StartSecure:HERE secret: I->Text ... parameters ...EndLogin:HERE T->Login (accept) ... parameters ... The next example is another "public-key" authentication. The initiator authenticates itself to the target. The target authenticates itself to the initiator and key are exchanged: I-> Login header_digest:(hmac-md5,hmac-md5- 96,crc32,none)data_digest:(crc32-2k,none) init_auth:(publickey,password,none) target_auth: (none) public_key:((ssh-dss,parameters),(rsa, parameters),none) T-> Text header_digest:hmac-md5 data_digest:crc32-2k init_auth:publickey public_key:(ssh-dss,parameters) target_auth:(publickey,password,none) public_key:(ssh- dss,parameters),none I-> Text authenticate:user,blob target_auth:publickey public_key:(ssh_dss,parameters) salt:20001103172433 where blob stands for hash(header || salt). Note: the last packet should have the appropriate trailers. Satran, J. Standards-Track, June 2001 79 iSCSI December 30, 2000 If the initiator was not confirmed, the target sends a login response message with "login reject" to the initiator. Else, it can continue with the login process: T-> Text authenticate:user,blob salt:532678925 where blob stands for hash(header || salt). In here, the target authenticates itself to the initiator. If the authentication was successful, the initiator responses with an empty text command, continuing the login phase. Else, it stops the login phase. I->Text T->Text secret:blob Where blob is a key encrypted with the initiator’s public key. I->Text StartSecure:HERE... parameters ... ... T->Login "login accept" ... parameters ... In the next example the target authenticates the initiator via challenge and response. I-> Login header_digest:(hmac-md5,hmac-md5-96,crc32,none) data_digest:(crc32-2k) init_auth:(public- key,password,challenge,none) target_auth:(none) public_key:(ssh-dss,parameters) T-> Text header_digest:hmac-md5 data_digest:crc32-2k init_auth:challenge challenge:question I-> Text authenticate:answer If authentication is successful, i.e., the answer to the question is correct, the target may proceeds: T->... parameter negotiation Or give another challenge: T-> Text challenge:question2 I-> Text authenticate:answer2 And at the end: Satran, J. Standards-Track, June 2001 80 iSCSI December 30, 2000 T-> Login "login accept" If the authentication was not successful: T-> Login "login reject" Note - the Text command after authentication and each PDU thereafter will have in the trailer an hmac-md5 digest for the header and a crc32 for each 2k of data (or fraction of it). Satran, J. Standards-Track, June 2001 81 iSCSI December 30, 2000 Apendix B. Examples 06 Read operation example |Initiator Function| Message Type | Target Function | +------------------+-----------------------+----------------------+ | Command request |SCSI Command (READ)>>> | | | (read) | | | +------------------+-----------------------+----------------------+ | | | Prepare Data Transfer| +------------------+-----------------------+----------------------+ | Receive Data | <<< SCSI Data | Send Data | +------------------+-----------------------+----------------------+ | Receive Data | <<< SCSI Data | Send Data | +------------------+-----------------------+----------------------+ | Receive Data | <<< SCSI Data | Send Data | +------------------+-----------------------+----------------------+ | | <<< SCSI Response |Send Status and Sense | +------------------+-----------------------+----------------------+ | Command Complete | | | +------------------+-----------------------+----------------------+ Satran, J. Standards-Track, June 2001 82 iSCSI December 30, 2000 07 Write operation example +------------------+-----------------------+---------------------+ |Initiator Function| Message Type | Target Function | +------------------+-----------------------+---------------------+ | Command request |SCSI Command (WRITE)>>>| Receive command | | (write) | | and queue it | +------------------+-----------------------+---------------------+ | | | Process old commands| +------------------+-----------------------+---------------------+ | | | Ready to process | | | <<< R2T | WRITE command | +------------------+-----------------------+---------------------+ | Send Data | SCSI Data >>> | Receive Data | +------------------+-----------------------+---------------------+ | | <<< R2T | | +------------------+-----------------------+---------------------+ | | <<< R2T | | +------------------+-----------------------+---------------------+ | Send Data | SCSI Data >>> | Receive Data | +------------------+-----------------------+---------------------+ | Send Data | SCSI Data >>> | Receive Data | +------------------+-----------------------+---------------------+ | | <<< SCSI Response |Send Status and Sense| +------------------+-----------------------+---------------------+ | Command Complete | | | +------------------+-----------------------+---------------------+ Satran, J. Standards-Track, June 2001 83 iSCSI December 30, 2000 Apendix C. Login/Text keys (not security related) ISID and TSID form collectively the SSID (session id). A TSID of zero indicates a leading connection. Only a leading connection login can carry session specific parameters, e.g. MaxConnections, the maximum immediate data length requested, etc.. 08 MaxConnections MaxConnections: Initiator and target negotiate the maximum number of connections requested/acceptable. 09 Target Target:[/modifier] Examples: Target:disk-array.sj-bldg-h.cisco.com Target:disk-array.sj-bldg-h.cisco.com/control7 This key is provided by the initiator of the TCP connection to the remote endpoint. The Target key specifies the domain name of the target, since that information is not available from the TCP layer. The target is not required to support this key. The initiator should send this key in the first login message. The Target key might be used by the target to select a unit within a multi-unit target. 10 Initiator Initiator:[domainname[/modifier]] Examples: Initiator:sample.foobar.org Initiator:cluster.foobar.org/machine1 Initiator: The Initiator key enables the initiator to identify itself to the remote endpoint. The domain name should be that of the initiator. A zero-length domain name is interpreted as "other side of TCP connection". The target may silently ignore this key if it does not support it. 11 AccessID Satran, J. Standards-Track, June 2001 84 iSCSI December 30, 2000 AccessID: Deliver a SCSI AccessID to the target 12 FMarker FMarker: Examples: I->FMarker:send-receive T->FMarker:send-receive results in Marker being used in both directions while I->FMarker:send-receive T->FMarker:receive results in Marker being used from the initiator to the target but not from the target to initiator. 13 RFMarkInt RFMarkInt: Indicates at what interval (in 4 byte words) the receiver wants the markers. The larger of the numbers (wanted by receiver and offered by sender) is selected. 14 SFMarkInt SFMarkInt: Indicates at what interval (in 4 byte words) the sender offers to send the markers. The larger of the numbers (wanted by receiver and offered by sender) is selected. 15 IFMarkInt IFMarkInt: Indicates that the initial marker-less interval required by the initiator in both directions. Satran, J. Standards-Track, June 2001 85 iSCSI December 30, 2000 16 UseR2T UseR2T: Examples: I->UseR2T:no T->UseR2T:no The UseR2T key is used to turn off the default use of R2T, thus allowing an initiator to send data to a target without the target having sent an R2T to the initiator. The default action is that R2T is required, unless both the initiator and the target send this key- pair attribute specifying UseR2T:no. Once UseR2T has been set to 'no', it cannot be set back to 'yes'. Note than only the first outgoing data item (either immediate data or a separate PDU) can be sent unsolicited by a R2T. 17 BidiUseR2T BidiUseR2T: Examples: I->BidiUseR2T:no T->BidiUseR2T:no The BidiUseR2T key is used to turn off the default use of BiDiR2T, thus allowing an initiator to send data to a target without the target having sent an R2T to the initiator for the output data (write part) of a Bi-directional command (having both the R and the W bits set). The default action is that R2T is required, unless both the initiator and the target send this key-pair attribute specifying BidiUseR2T:no. Once BidiUseR2T has been set to 'no', it cannot be set back to 'yes'. Note than only the first outgoing data item (either immediate data or a separate PDU) can be sent unsolicited by a R2T. 18 DataNumber DataNumber: Example: The DataNumber key is used by targets to turn on the use of input data packet numbering, thus allowing a target to discard input data Satran, J. Standards-Track, June 2001 86 iSCSI December 30, 2000 as soon as acknowledged without loosing recovery capabilities. By default data numbering is off. A nonzero value for DataNumber indicates both that data numbering is requested and the maximum number of unacknowledged packets. An initiator MUST support data numbering if requested. 19 ImmediateDataLength ImmediateDataLength: Initiator and target negotiate the maximum length supported for immediate data. Default is 2**32-1 bytes. 20 ITagLength ITagLength: Initiator and target negotiate the significant length of the initiator tag to be used. Default is 32. 21 PingMaxReplyLength PingMaxReplyLength: Initiator and target negotiate the maximum length of data contained in a ping reply. Default is 4096. 22 StartSecure StartSecure:HERE Initiator and target indicate the end-of-authentication/integrity exchange (start of parameter negotiation if any). 23 TotalText TotalText: Initiator and target indicate the total text limit for any Text or Login command. 24 KeyValueText KeyValueText: Satran, J. Standards-Track, June 2001 87 iSCSI December 30, 2000 Initiator and target indicate the total text limit for any key:value pair. 25 MaxOutstandingR2T MaxOutstandingR2T: Initiator and target negotiate the maximum number of outstanding R2Ts per task. The default is 256. Satran, J. Standards-Track, June 2001 88 iSCSI December 30, 2000 Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Satran, J. Standards-Track, June 2001 89