IPS Julian Satran Internet Draft Daniel Smith Document: draft-ietf-ips-iscsi-07.txt Kalman Meth Category: standards-track Ofer Biran Jim Hafner IBM Costa Sapuntzakis Mark Bakke Cisco Systems Matt Wakeley Agilent Technologies Luciano Dalle Ore Quantum Paul Von Stamwitz Adaptec Randy Haagens Mallikarjun Chadalapaka Hewlett-Packard Co. Efri Zeidner SANGate Yaron Klein SANRAD iSCSI Julian Satran Standards-Track, Expire January 2002 1 iSCSI July 20, 2001 Status of this Memo This document is an Internet-Draft and fully conforms to all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or made obsolete by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract The Small Computer Systems Interface (SCSI) is a popular family of protocols for communicating with I/O devices, especially storage devices. This memo describes a transport protocol for SCSI that operates on top of TCP. The iSCSI protocol aims to be fully compliant with the requirements laid out in the SCSI Architecture Model - 2 [SAM2] document. Acknowledgements In addition to the authors, a large group of people contributed to this work through their review, comments and valuable insights. We are grateful to all of them. We are especially grateful to those who found the time and patience to participate in our weekly phone conferences and intermediate meetings in Almaden and Haifa, thus helping to shape this document: John Hufferd, Prasenjit Sarkar, Meir Toledano, John Dowdy, Steve Legg, Alain Azagury (IBM), Dave Nagle (CMU), David Black (EMC), John Matze (Veritas), Steve DeGroote, Mark Shrandt (NuSpeed), Gabi Hecht (Gadzoox), Robert Snively (Brocade), Nelson Nachum (StorAge), Uri Elzur (Intel). Many more helped clean up and improve this document within the IPS working group. We are especially grateful to David Robinson and Raghavendra Rao (Sun), Charles Monia, Joshua Tseng (Nishan), Somesh Gupta, Michael Krause, Pierre Labat, Santosh Rao, Matthew Burbridge (HP), Stephen Bailey (Sandburst), Robert Elliott (Compaq), Steve Senum, Ayman Ghanem (CISCO), Barry Reinhold (Trebia Networks), Bob Russell (UNH), Bill Satran, J. Standards-Track, Expire November 2001 2 iSCSI July 20, 2001 Lynn (Adaptec) and Doug Otis (Sanlight). The recovery chapter was enhanced with help from Stephen Bailey (Sandburst), Somesh Gupta (HP), Venkat Rangan (RhapsodyNetworks), Vince Cavanna, Pat Thaler (Agilent), Eddy Quicksall (iVivity, Inc.) - Eddy also contributed with some examples. Last, but not least, thanks to Ralph Weber for keeping us in line with T10 (SCSI) standardization. We would like to thank Steve Hetzler for his unwavering support and for coming up with such a good name for the protocol, Micky Rodeh, Jai Menon, Clod Barrera and Andy Bechtolsheim for helping this work happen. At the time of the writing, this document has to be considered in conjunction with the "Naming & Discovery" and the "Boot" documents. The "Naming & Discovery" is authored by: Mark Bakke (Cisco), Joe Czap, Jim Hafner, John Hufferd, Kaladhar Voruganti (IBM), Howard Hall (Pirus), Jack Harwood (EMC), Yaron Klein (SANRAD), Lawrence Lamers (San Valley Systems), Todd Sperry (Adaptec) and Joshua Tseng (Nishan). The "Boot" is authored by: Prasenjit Sarkar (IBM), Duncan Missimer (HP) and Costa Sapuntzakis (CISCO). We are grateful to all of them for their good work and for helping us correlate this document with the ones they produced. Conventions used in this document In examples, "I->" and "T->" indicate iSCSI PDUs sent by the initiator and target respectively. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119. Change Log The following changes were made from draft-ietf-ips-iSCSI-06 to draft-ietf-ips-iSCSI-07: Satran, J. Standards-Track, Expire November 2001 3 iSCSI July 20, 2001 - Clarified the "fate" of immediate commands and resources mandated (1.2.2.1) and introduced a reject-code for rejected immediate commands - Clarify CmdSN handling and checking order for ITT and CmdSN 1.2.2.1 - Added a statement to the effect that a receiver must be able to accept 0 length Data Segments to 2.7.6. Added also a statement to 2.2.1 that a zero-length data segment implies a zero-length digest - SCSI Mode-SET will not really set the parameters (will not cause an error either). The parameters will be set exclusively with text mode and can be retrieved with either text or Mode- SENSE. This enables us to disable their change after the Login negotiation. Also added to the negotiation (1.2.4) the value "?" with special meaning of enquiry - Changed "task" to "command" wherever relevant - EMDP usage in line with other SCSI protocols. EMDP governs how a target may request data and deliver. Similar to FCP a separate (protocol) parameter governs data PDU ordering within Sequence (DataDeliveryOrder). Cleaned wording of DataOrder. Fixed final bit to define sequences in input stream. - Added a "persistent state" part (1.2.8) - Some Task Management commands may require authorization or may not be implemented. If not authorized they will return as if executed with a qualifier indicating "not authorized" or "not implemented" (clear LU and the resets) - Task management commands and responses are "generalized" to all iSCSI tagged commands (they are named now Task Management command and response). Their behavior with respect to their CmdSN is clarified and mandated - The logic to update ExpCmdSN etc. moved to 1.2.2.1 - Explicitly specified that a target can "initiate" negotiating a parameter (offering)(1.2.4) - Returned the "direction" bit and a set of codes similar to version 05 - Introduced a "special" session type (CopyManagerSession) to be used between a Copy Manager and all of its target; it may help define authentication and limit the type f commands to be executed in such a session - Added 8.4 - How to Abort Safely a Command that Was Not Received - Fixed the Logout Text - AHSLength is now the first field in the AHS - Fixed wording in 2.35 indicating AHS is mandatory for Bi- directional commands Satran, J. Standards-Track, Expire November 2001 4 iSCSI July 20, 2001 - All key=value responses have to be explicit (none, not- understood etc.); no more selection by hiatus - Targets can also offer key=value pairs (i.e., initiate negotiation) stated explicitly in 2.9.3 - Logout has a CmdSN field - The Status SNACK can be discarded if the target has no such recovery - Some parameters have been removed and replaced by "reasonable" defaults (read arbitrary defaults!); many others can't be changed anymore while the session is in full-feature phase - NOP-Out specifies how LUN is generated when used (copied from NOP-In) - Initial Marker-Less Interval is not a parameter anymore - A response with F=1 during negotiation may not contain key=value pairs that may require additional answers from the initiator - Clarified the meaning of the F bit on Write commands with regard to immediate and unsolicited data; F bit 0 means that unsolicited data will follow while F bit 1 means that this is the last of them (if any) - You can have both immediate and unsolicited Data-Out PDUs - DataPDULength and FirstBurstSize of 0 are allowed and mean unlimited length - Task management command behavior relative to their own CmdSN is now stated in no uncertain terms (they are mandated to execute as if issued at CmdSN and, in case of aborts and clear/reset no additional response/status is expected for those commands after the task management command response - DataSN field in R2T renamed as R2TSN (better reflects semantics) and SNACK explicitly says that it requests Data or R2T. - A session can have only one outstanding text command (not sequence) - Text for Login Response 0301 changed (removed the maintenance mention) - Clarified when ExpDataSN and ExpR2TSN are reserved in SCSI Response - Clarified the text and parameter (timers) for iSCSI event - Padding bytes should be 0 (2.1) - TotalAHSLength in 2.1.1.1 includes padding - DataSegmentLength in 2.1.1.2 excludes padding - Clarified bits in AHS type - Limit for key/value string lengths (63, 255) in 2.8.3 - Added an example of SCSI event to Asynchronous Message - Changed "Who" to "Who can send" in appendix Satran, J. Standards-Track, Expire November 2001 5 iSCSI July 20, 2001 - Clarified meaning of parameters on 2.18.1 - Asynchronous Message - iSCSI Event - Clarified the required initiator behavior at logout (not sending other commands) and how one expects the TCP close to be performed in 2.14 - Added a Login Response code indicating that a session can't include a given connection (0208) - Clarified transition to full feature phase (per session and per connection and the role of the leading connection) in 1.2.5 - Corrected "one outstanding text command per connection" instead of "per session" - For the Login Response TSID must be valid only if Login is accepted and the F bit is 1 - Added examples illustrating DataSN and R2TSN (from Eddy Quicksall) - Added more text to the task management command 2.5 - Removed EnableACA and its dependents (in task management) and stated the requirement for a Unit Attention conform to SAM2 - iSCSI Target Name if used on a connection other than the first must be the same as on the first (4.1) - Fixed the examples in the Login appendix to correspond to the new keys - Fixed SCSI Response Flags and made them consistent with the Data-In PDU - All specified keys except X-* MUST be accepted (2.8.3) - Hexadecimal notation is 0xab123cd (not 0x'ab123cd') - Clarified CmdSN usage in immediate commands and the meaning of "execution engine" in 1.2.2.1 - Reject response that prevent the creation of a SCSI task or result in a SCSI task being terminated must be followed by a SCSI Response with a Check Condition status 2.19.1 - Additional Runs (AddRuns) dropped from the SNACK request (too complex). With it disappeared also the implicit acknowledgement of sequences "between runs" - PDUs delivered because of SNACK will be exact replicas of the original PDUs (including all flags) 2.16 - Added CommandReplaySupport key to negotiate support for full command replay (a command can be replayed after the status has been issued but has not been acknowledged) and a reject cause of unsupported command reply - Added CommandFailoverSupport key to negotiate support for command allegiance change (command retry on another connection) - Status SNACK for an acknowledged status is a protocol error (cause for reject) - Reject cause "Command In Progress" when requesting replay before status is issued and while command is running Satran, J. Standards-Track, Expire November 2001 6 iSCSI July 20, 2001 - Premature SNACKs are silently discarded (2.16) - Status SNACK has to supported only if within command or within connection recovery is supported. If within session recovery is supported SNACK can be discarded and followed by an Async. Message requesting logout - StatSN added to Logout Response - Added "CID not found" to Logout Response reason codes - Async Message - iSCSI event 2 (request logout) has to be sent on the connection to be dropped. Wording fixed. - Naming changes - iqn (stands for iSCSI qualified name) introduced as a replacement to fqn. Iqn prefixes also reversed names - text in 8.3 revised (task management implementation mechanism) - Fixed bit 7 byte 1 in Task Management response to 1 (consistency) - Clarified in 1.2.2 behavior when "command window" is 0 (MaxCmdSN = ExpCmdSN -1) - Added state transitions part (new part 6) - Refreshed recovery chapter (new part 7) - Added an appendix with detailed recovery mechanisms (Appendix E) - Added session types a brief explanation in part 1 - Added DiscoverySession key and SendTargets appendix - SCSI response made to fit having both a Status and a Response field. Needed for target errors that result in a check condition and ACA. In line with SAM2 that requires both fields (former versions where modeled on FCP). - The security appendix list SRP as mandatory to implement - Clarified initial CmdSN and the role of TSID as a serializer - Long Text Responses - additional fields added to the text command and text response - Added a SCSI to iSCSI concept mapping section 1.5 - Clarified SNACK wording to indicate that in general command. Request, iSCSI command and iSCSI command have the same meaning. Also status, response or numbered response. - Changed InitStatSN and clarified how it increases - Added requirement for a 0x00 delimiter after each key=value - Added binary negotiations (yes|no) explicitly to 1.2.4 - All keys and values in the spec are case sensitive (stated in the text command) - Changed the "operational parameters sent before the security.. MAY be discarded" into MUST be discarded - Changed the login reject 0201 to read - Security Negotiation Failed - Added to 2.3.1 a paragraph about mandatory consistencies Satran, J. Standards-Track, Expire November 2001 7 iSCSI July 20, 2001 - Stated clearly that F bit pairing is "local" (per/pair) and not per negotiation - Clarified dependent parameter status - Added CRC Example - Added OpParmReset=yes - SecurityContextComplete is mandatory if any option offered - Added a warning about the implications of not sending all unsolicited data to part 8 - Added a recommendation to send unsolicited data at FirstBurstSize and a response (error) for targets not supporting less - Many more minor editorial changes, clarifications, typos etc. Satran, J. Standards-Track, Expire November 2001 8 iSCSI July 20, 2001 Table of Contents Status of this Memo...................................................2 Abstract..............................................................2 Acknowledgements......................................................2 Conventions used in this document.....................................3 Change Log............................................................3 1. Overview..........................................................15 1.1 SCSI Concepts..................................................15 1.2 iSCSI Concepts and Functional Overview.........................15 1.2.1 Layers and Sessions.........................................16 1.2.2 Ordering and iSCSI Numbering................................17 1.2.2.1 Command Numbering and Acknowledging......................17 1.2.2.2 Response/Status Numbering and Acknowledging..............20 1.2.2.3 Data Sequencing..........................................20 1.2.3 iSCSI Login.................................................21 1.2.4 Text Mode Negotiation.......................................22 1.2.5 iSCSI Full Feature Phase....................................23 1.2.6 iSCSI Connection Termination................................25 1.2.7 Naming and Addressing.......................................26 1.2.8 Persistent State............................................28 1.2.9 Message Synchronization and Steering........................29 1.2.9.1 Rationale................................................29 1.2.9.2 Synch and Steering Functional Model......................30 1.2.9.3 Synch and Steering and Other Encapsulation Layers........33 1.2.9.4 Synch/Steering and iSCSI PDU Size........................33 1.3 Third Party Commands...........................................34 1.4 iSCSI session types............................................34 1.5 SCSI to iSCSI concepts mapping model...........................35 1.5.1 iSCSI Architectural Model...................................35 1.5.2 SCSI Architecture Model.....................................37 1.5.3 Consequences of the model...................................38 2. iSCSI PDU Formats.................................................40 2.1 iSCSI PDU Length and Padding...................................40 2.2 PDU Template, Header and Opcodes...............................40 2.2.1 Header Digest and Data Digest...............................41 2.2.2 Basic Header Segment (BHS)..................................41 2.2.2.1 X........................................................42 2.2.2.2 I........................................................42 2.2.2.3 Opcode...................................................42 2.2.2.4 Opcode-specific Fields...................................43 2.2.2.5 TotalAHSLength...........................................44 2.2.2.6 DataSegmentLength........................................44 2.2.2.7 LUN......................................................44 2.2.2.8 Initiator Task Tag.......................................44 Satran, J. Standards-Track, Expire November 2001 9 iSCSI July 20, 2001 2.2.3 Additional Header Segment...................................44 2.2.3.1 AHSType..................................................44 2.2.3.2 AHSLength................................................45 2.2.4 Extended CDB Additional Header Segment......................45 2.2.5 Bi-directional Expected Read-Data Length Additional Header Segment...........................................................45 2.3 SCSI Command...................................................46 2.3.1 Flags and Task Attributes...................................46 2.3.2 CRN.........................................................47 2.3.3 CmdSN - Command Sequence Number.............................47 2.3.4 ExpStatSN/ExpDataSN - Expected Status Sequence Number.......47 2.3.5 Expected Data Transfer Length...............................47 2.3.6 CDB - SCSI Command Descriptor Block.........................48 2.3.7 Command-Data Data Segment...................................48 2.4 SCSI Response..................................................49 2.4.1 Byte 1 - Flags..............................................49 2.4.2 Status......................................................50 2.4.3 Response....................................................50 2.4.4 Basic Residual Count........................................51 2.4.5 Bidi-Read Residual Count....................................51 2.4.6 Sense and Response Data Segment.............................52 2.4.7 ExpDataSN...................................................52 2.4.8 ExpR2TSN....................................................52 2.4.9 StatSN - Status Sequence Number.............................53 2.4.10 ExpCmdSN - Next Expected CmdSN from this Initiator.........53 2.4.11 MaxCmdSN - Maximum CmdSN Acceptable from this Initiator....53 2.5 Task Management Command........................................54 2.5.1 Function....................................................54 2.5.2 Referenced Task Tag.........................................55 2.5.3 RefCmdSN....................................................56 2.6 Task Management Response.......................................57 2.6.1 Referenced Task Tag.........................................58 2.7 SCSI Data-out & SCSI Data-in...................................59 2.7.1 F (Final) Bit...............................................60 2.7.2 Target Transfer Tag.........................................61 2.7.3 StatSN......................................................61 2.7.4 DataSN......................................................61 2.7.5 Buffer Offset...............................................61 2.7.6 DataSegmentLength...........................................62 2.7.7 Flags.......................................................62 2.8 Text Command...................................................63 2.8.1 F (Final) Bit...............................................63 2.8.2 B (Bookmark-valid) Bit......................................64 2.8.3 Initiator Task Tag..........................................64 2.8.4 Bookmark....................................................64 2.8.5 Text........................................................64 Satran, J. Standards-Track, Expire November 2001 10 iSCSI July 20, 2001 2.9 Text Response..................................................66 2.9.1 F (Final) Bit...............................................66 2.9.2 B (Bookmark-valid) Bit......................................67 2.9.3 Initiator Task Tag..........................................67 2.9.4 Bookmark....................................................67 2.9.5 Text Response Data..........................................67 2.10 Login Command.................................................68 2.10.1 X - Restart................................................68 2.10.2 F (Final) Bit..............................................69 2.10.3 Version-max................................................69 2.10.4 Version-min................................................69 2.10.5 CID........................................................69 2.10.6 ISID.......................................................69 2.10.7 CmdSN......................................................69 2.10.8 ExpStatSN..................................................69 2.10.9 Login Parameters...........................................70 2.11 Login Response................................................71 2.11.1 F (Final) bit..............................................71 2.11.2 Version-max................................................72 2.11.3 Version-active/lowest......................................72 2.11.4 TSID.......................................................72 2.11.5 StatSN.....................................................72 2.11.6 Status-Class and Status-Detail.............................72 2.12 NOP-Out.......................................................75 2.12.1 P (Ping) Bit...............................................76 2.12.2 Initiator Task Tag.........................................76 2.12.3 Target Transfer Tag........................................76 2.12.4 Ping Data..................................................77 2.13 NOP-In........................................................78 2.13.1 P bit......................................................78 2.13.2 Target Transfer Tag........................................79 2.13.3 LUN........................................................79 2.14 Logout Command................................................80 2.14.1 CID........................................................81 2.14.2 ExpStatSN..................................................81 2.14.3 Reason Code................................................81 2.15 Logout Response...............................................83 2.15.1 Response...................................................83 2.15.2 Parameter2.................................................84 2.15.3 Parameter3.................................................84 2.16 SNACK Request.................................................84 2.16.1 S..........................................................85 2.16.2 BegRun.....................................................85 2.16.3 RunLength..................................................85 2.16.4 ExpStatSN/ExpDataSN........................................85 2.17 Ready To Transfer (R2T).......................................86 Satran, J. Standards-Track, Expire November 2001 11 iSCSI July 20, 2001 2.17.1 R2TSN......................................................87 2.17.2 StatSN.....................................................87 2.17.3 Desired Data Transfer Length and Buffer Offset.............87 2.17.4 Target Transfer Tag........................................88 2.18 Asynchronous Message..........................................89 2.18.1 iSCSI Event................................................90 2.18.2 SCSI Event.................................................91 2.19 Reject........................................................92 2.19.1 Reason.....................................................92 2.19.2 First Bad Byte.............................................93 3. SCSI Mode Parameters for iSCSI....................................94 3.1 SCSI Disconnect-Reconnect Mode Page use in iSCSI...............94 3.1.1 MaximumBurstSize Field (16 bit).............................94 3.1.2 E - Enable Modify Data Pointers Bit (EMDP)..................94 3.1.3 D - Immediate Data Disable..................................95 3.1.4 FirstBurstSize Field (16 bit)...............................95 3.1.5 Other Fields................................................95 3.2 iSCSI Logical Unit Control Mode Page...........................95 3.2.1 Enable CRN (C)..............................................96 3.3 iSCSI Port Mode Page...........................................96 3.3.1 Protocol Identifier (iSCSI).................................97 3.3.2 LogoutLoginMinTime..........................................97 3.3.3 LogoutLoginMaxTime..........................................97 4. Login Phase.......................................................98 4.1 Login Phase Start..............................................99 4.2 iSCSI Security and Integrity Negotiation......................100 4.3 Operational Parameter Negotiation During the Login Phase......102 5. Operational Parameter Negotiation Outside the Login Phase........104 6. State transitions................................................105 6.1 Standard connection state diagram.............................105 6.2 Connection recovery state diagram.............................107 6.3 Session state diagram.........................................110 7. iSCSI Error Handling and Recovery................................113 7.1 Usage of retry bit (X bit) in recovery........................113 7.2 Usage of Reject PDU in recovery...............................114 7.3 Format Errors.................................................115 7.4 Digest Errors.................................................115 7.5 Sequence Errors...............................................116 7.6 SCSI Timeouts.................................................117 7.7 Negotiation failures..........................................117 7.8 Protocol Errors...............................................117 7.9 Connection Failure............................................117 7.10 Session Errors...............................................118 7.11 Recovery Levels..............................................118 7.11.1 Recovery Within-command...................................119 7.11.2 Recovery Within-connection................................120 Satran, J. Standards-Track, Expire November 2001 12 iSCSI July 20, 2001 7.11.3 Recovery Within-session...................................120 7.11.4 Session Recovery..........................................121 8. Notes to Implementers............................................123 8.1 Multiple Network Adapters.....................................123 8.2 Autosense and Auto Contingent Allegiance (ACA)................123 8.3 Task Management Commands and Immediate Delivery...............123 8.4 How to Abort Safely a Command that Was Not Received...........125 8.5 Synch and steering layer and performance......................126 8.6 Unsolicited data and performance..............................126 9. Security Considerations..........................................127 9.1 iSCSI Security Protection Modes...............................127 9.1.1 No Security................................................127 9.1.2 Initiator-Target Authentication............................127 9.1.3 Data Integrity and Authentication..........................127 9.1.4 Encryption.................................................128 10. IANA Considerations.............................................129 11. References and Bibliography.....................................130 12. Author's Addresses..............................................132 Appendix A. iSCSI Security and Integrity............................135 01 Security Keys and Values.......................................135 02 Authentication.................................................137 03 Login Phase Examples...........................................140 Appendix B. Examples................................................149 04 Read Operation Example.........................................149 05 Write Operation Example........................................150 06 R2TSN/DataSN use examples......................................150 07 CRC Examples...................................................153 Appendix C. Synch and Steering with Fixed Interval Markers..........155 08 Markers At Fixed Intervals.....................................155 09 Initial Marker-less Interval...................................156 Appendix D. Login/Text Operational Keys.............................157 10 MaxConnections.................................................157 11 SendTargets....................................................157 12 TargetAddress..................................................157 13 TargetName.....................................................158 14 InitiatorName..................................................159 15 TargetAlias....................................................159 16 InitiatorAlias.................................................159 17 TargetAddress..................................................160 18 AccessID.......................................................160 19 FMarker........................................................161 20 RFMarkInt......................................................161 21 SFMarkInt......................................................162 22 InitialR2T.....................................................162 23 BidiInitialR2T.................................................162 24 ImmediateData..................................................163 Satran, J. Standards-Track, Expire November 2001 13 iSCSI July 20, 2001 25 DataPDULength..................................................164 26 FirstBurstSize.................................................164 27 LogoutLoginMinTime.............................................165 28 LogoutLoginMaxTime.............................................165 29 MaxOutstandingR2T..............................................165 30 DataOrder......................................................165 31 DataDeliveryOrder..............................................166 32 CommandReplaySupport...........................................166 33 CommandFailoverSupport.........................................167 34 SessionType....................................................167 35 OpParmReset....................................................167 36 The Glen-Turner Vendor Specific Key Format.....................168 Appendix E. SendTargets operation...................................169 Appendix F. Algorithmic presentation of error recovery levels.......173 37 General Data structure and procedure description...............173 38 Within-command error recovery algorithms.......................174 1 Procedure descriptions........................................174 2 Initiator algorithms..........................................175 3 Target algorithms.............................................177 39 Within-connection recovery algorithms..........................179 4 Procedure descriptions........................................179 1. Initiator algorithms.........................................180 2. Target algorithms............................................181 5 Within-session recovery algorithms............................182 3. Procedure descriptions.......................................182 4. Initiator algorithms.........................................182 5. Target algorithms............................................184 Full Copyright Statement............................................187 Satran, J. Standards-Track, Expire November 2001 14 iSCSI July 20, 2001 1. Overview 1.1 SCSI Concepts The SCSI Architecture Model-2 [SAM2] describes in detail the architecture of the SCSI family of I/O protocols. This section provides a brief background to familiarize readers with the terminology of the SCSI architecture. At the highest level, SCSI is a family of interfaces for requesting services from I/O devices, including hard drives, tape drives, CD and DVD drives, printers, and scanners. In SCSI parlance, an individual I/O device is called a "logical unit" (LU). SCSI is a client-server architecture. Clients of a SCSI interface are called "initiators". Initiators issue SCSI "commands" to request service from a logical unit. The "device server" on the logical unit accepts SCSI commands and executes them. A "SCSI transport" maps the client-server SCSI protocol to a specific interconnect. Initiators are one endpoint of a SCSI transport. The "target" is the other endpoint. A target can have multiple Logical Units (LUs) behind it. Each Logical Unit has an address within a target called a Logical Unit Number (LUN). A SCSI task is a SCSI command or possibly a linked set of SCSI commands. Some LUs support multiple pending (queued) tasks but the queue of tasks is managed by the target. The target uses an initiator provided "task tag" to distinguish between tasks. Only one command in a task can be outstanding at any given time. Each SCSI command results in an optional data phase and a required response phase. In the data phase, information can travel from the initiator to target (e.g., WRITE), target to initiator (e.g., READ), or in both directions. In the response phase, the target returns the final status of the operation, including any errors. A response terminates a SCSI command. Command Descriptor Blocks (CDB) is the data structure used to contain the command parameters that are to be handed by an initiator to a target. The CDB content and structure is defined by [SAM] and device- type specific SCSI standards. 1.2 iSCSI Concepts and Functional Overview Satran, J. Standards-Track, Expire November 2001 15 iSCSI July 20, 2001 The iSCSI protocol is a mapping of the SCSI remote procedure invocation model over the TCP protocol. In keeping with similar protocols, the initiator and target divide their communications into messages. This document uses the term "iSCSI protocol data unit" (iSCSI PDU) for these messages. For performance reasons, iSCSI allows a "phase-collapse". A command and its associated data may be shipped together from initiator to target and data and responses may be shipped together from targets. The iSCSI transfer direction is defined with regard to the initiator. Outbound or outgoing transfers are transfers from initiator to target, while inbound or incoming transfers are from target to initiator. An iSCSI task is an iSCSI request for which a response is expected. In this document "iSCSI request", "iSCSI command", request or (unqualified) command have the same meaning. Also, unless specified otherwise, status, response or numbered response have the same meaning. 1.2.1 Layers and Sessions To specify initiator and target actions and how they relate to transmitted and received Protocol Data Units the following conceptual layering model is used: -the SCSI layer builds/receives SCSI CDBs (Command Descriptor Blocks) and relays/receives them with the remaining command execute parameters (cf. SAM-2) to/from -> -the iSCSI layer that builds/receives iSCSI PDUs and relays/receives them to/from one or more TCP connections that form an initiator-target "session". Communication between the initiator and target occurs over one or more TCP connections. The TCP connections carry control messages, SCSI commands, parameters and data within iSCSI Protocol Data Units (iSCSI PDUs). The group of TCP connections that link an initiator with a target, form a session (loosely equivalent to a SCSI I-T nexus). A session is defined by a session ID that is composed of an initiator part and a target part. TCP connections can be added and removed from a session. Connections within a session are identified by a connection ID (CID). Satran, J. Standards-Track, Expire November 2001 16 iSCSI July 20, 2001 Across all connections within a session, an initiator sees one "target image". All target identifying elements, like LUN, are the same. In addition, across all connections within a session, a target sees one "initiator image". Initiator identifying elements like the Initiator Task Tag, can be used to identify the same entity regardless of the connection on which they are sent or received. iSCSI targets and initiators MUST support at least one TCP connection and MAY support several connections in a session. 1.2.2 Ordering and iSCSI Numbering iSCSI uses Command and Status numbering schemes and a Data sequencing scheme. Command numbering is session-wide and is used for ordered command delivery over multiple connections. It can also be used as a mechanism for command flow control over a session. Status numbering is per connection and is used to enable missing status detection and recovery in the presence of transient or permanent communication errors. Data sequencing is per command or part of a command (R2T triggered sequence) and is used to detect missing data and/or R2T PDUs due to header digest errors. Normally, fields in the iSCSI PDUs communicate the Sequence Numbers between the initiator and target. During periods when traffic on a connection is unidirectional, iSCSI NOP-Out/In PDUs may be utilized to synchronize the command and status ordering counters of the target and initiator. 1.2.2.1 Command Numbering and Acknowledging iSCSI supports ordered command delivery within a session. All commands (initiator-to-target PDUs) are numbered. Any SCSI activity is related to a task (SAM-2). The task is identified by the Initiator Task Tag for the life of the task. Commands in transit from the initiator to the target layer are numbered by iSCSI; the number is carried by the iSCSI PDU as CmdSN (Command-Sequence-Number). The numbering is session-wide. All outgoing iSCSI PDUs that have a task association, except Data-Out, carry this number. CmdSNs are allocated by the initiator iSCSI within a 32-bit unsigned counter (modulo 2**32). Comparisons and arithmetic Satran, J. Standards-Track, Expire November 2001 17 iSCSI July 20, 2001 on CmdSN SHOULD use Serial Number Arithmetic as defined in [RFC1982] where SERIAL_BITS = 32. Commands meant for immediate delivery are marked as such through an immediate delivery flag. They MAY carry any CmdSN. The CmdSN is not advanced for commands marked for immediate delivery. Command numbering starts with the login request on the first connection of a session (the leading login) and includes every non- immediate command issued afterwards whether during login or in full- feature phase. If immediate delivery is used with task management commands, these commands may reach the target task management before the tasks they are supposed to act upon. However, their CmdSN is a marker of their position in the stream of commands. The task management command MUST carry the CmdSN that would be given to the next non-immediate command. The initiator and target must ensure that the task management commands act as specified by SAM2 - i.e., both commands and responses appear as if delivered in order. Not covered in this document are the means by which one may request immediate delivery for a command or by which iSCSI will decide by itself to mark a PDU for immediate delivery. Please note that the number of commands used for immediate delivery is not limited and their delivery to execution is not acknowledged through the numbering scheme. Immediate commands can be rejected by the iSCSI target due to lack of resources. An iSCSI target MUST be able to handle at least one immediate task management command and one immediate non-task-management iSCSI request per connection at any time. Except for the commands marked for immediate delivery the iSCSI target layer MUST deliver the commands for execution in the order specified by CmdSN. Commands marked for immediate delivery may be handed over by the iSCSI target layer for execution as soon as detected. iSCSI may avoid delivering some command for execution if so required by some prior SCSI or iSCSI action (e.g., clear task set Task Management request received before all the commands it was supposed to act on). Delivery for execution means delivery to the SCSI execution engine or an iSCSI-SCSI protocol specific execution engine (e.g., for text commands). The initiator and target are assumed to have three registers, unique session wide, that define the numbering mechanism: Satran, J. Standards-Track, Expire November 2001 18 iSCSI July 20, 2001 - CmdSN - the current command Sequence Number advanced by 1 on each command shipped except for commands marked for immediate delivery. - ExpCmdSN - the next expected command by the target. The target acknowledges all commands up to but not including this number. The target iSCSI layer sets the ExpCmdSN to the largest non-immediate CmdSN that it is able to deliver for execution plus 1 (no holes in the CmdSN sequence). - MaxCmdSN - the maximum number to be shipped. The queuing capacity of the receiving iSCSI layer is MaxCmdSN - ExpCmdSN + 1. ExpCmdSN and MaxCmdSN are derived from target-to-initiator PDU fields. MaxCmdSN and ExpCmdSN fields are processed as follows: -if the PDU MaxCmdSN is less than the PDU ExpCmdSN-1 (in Serial Arithmetic Sense and with a difference bounded by 2**31-1), they are both ignored -if the PDU MaxCmdSN is less than the local MaxCmdSN (in Serial Arithmetic Sense and with a difference bounded by 2**31-1), it is ignored; else it updates the local MaxCmdSN -if the PDU ExpCmdSN is less than the local ExpCmdSN (in Serial Arithmetic Sense and with a difference bounded by 2**31-1), it is ignored; else it updates the local ExpCmdSN This sequence is required as updates may arrive out of order because they travel on different TCP connections. The target MUST NOT transmit a MaxCmdSN that is more than 2**31 - 1 above the last ExpCmdSN. For non-immediate commands, the CmdSN field can take any value from ExpCmdSN to MaxCmdSN. The target MUST silently ignore any non-immediate command outside this range or non- immediate duplicates within the range that have not been flagged with the retry bit (the X bit in the opcode). iSCSI initiators and targets MUST support the command numbering scheme. A numbered iSCSI command will not change its allocated CmdSN regardless of the number of times and circumstances in which it is reissued. At the target, it is assumed that CmdSN is relevant only while the command has not created any execution state (can't find the Initiator Task Tag). Afterwards CmdSN becomes irrelevant. Testing Satran, J. Standards-Track, Expire November 2001 19 iSCSI July 20, 2001 for execution state is assumed to precede any other action at the target and is followed by ordering and delivery if no execution state is found or delivery if execution state is found. Immediate commands can't be retried unless there is execution state available at the target for them (they are rejected for retry if the target can't find the Initiator Task Tag). 1.2.2.2 Response/Status Numbering and Acknowledging Responses in transit from the target to the initiator are numbered. The StatSN (Status Sequence Number) is used for this purpose. StatSN is a counter maintained per connection. ExpStatSN is used by the initiator to acknowledge status. Status numbering starts after Login. During login, there is always only one outstanding command per connection and status numbering is not strictly needed but may be used as a sanity check. The login response includes an initial value for status numbering. To enable command recovery the target MAY maintain enough state information to enable data and status recovery after a connection failure. A target can discard all the state information maintained for recovery after the status delivery is acknowledged through ExpStatSN. A large difference between StatSN and ExpStatSN may indicate a failed connection. Initiators MUST undertake recovery actions if the difference is greater than 2**31-1. Initiators and Targets MUST support the response-numbering scheme. 1.2.2.3 Data Sequencing Data and R2T PDUs transferred as part of some command execution MUST be sequenced. The DataSN field is used for data sequencing. For input (read) data PDUs DataSN starts with 0 for the first data PDU of an input command and advances by 1 for each subsequent data PDU. For output data PDUs, DataSN starts with 0 for the first data PDU of a sequence (the initial unsolicited sequence or any data PDU sequence issued to satisfy a R2T) and advances by 1 for each subsequent data PDU. R2Ts are also sequenced per command - i.e. the first R2T has a R2TSN of 0 and advances by 1 for each subsequent R2T. Unlike command and status, the data PDUs and R2Ts are not acknowledged except as implied by status. The DataSN/R2TSN field is meant to enable the Satran, J. Standards-Track, Expire November 2001 20 iSCSI July 20, 2001 initiator to detect missing data PDUs and simplify this operation at the target. For any given write command a target must have issued less than 2**32-1 R2Ts. Any input or output data sequence MUST contain less than 2**32-1 numbered PDUs. 1.2.3 iSCSI Login The purpose of the iSCSI login is to enable a TCP connection for iSCSI use, authenticate the parties, negotiate the session's parameters, open a security association protocol, and mark the connection as belonging to an iSCSI session. A session is used to identify to a target all the connections with a given initiator that belong to the same I_T nexus. If an initiator and target are connected through more than one session, both the initiator and target perceive the other as a different entity on each session (a different I_T nexus in SAM-2 parlance). The targets listen on a well-known TCP port for incoming connections. The initiator begins the login process by connecting to that well- known TCP port. As part of the login process, the initiator and target MAY wish to authenticate each other and set a security association protocol for the session. This can occur in many different ways and is subject to negotiation. Negotiation and security associations executed before the Login Command are outside the scope of this document although they may realize a related function (e.g., establish a IPsec tunnel). The Login Command starts the iSCSI Login Phase. Within the Login Phase, negotiation is carried on through parameters of the Login Command and Response, and optionally through intervening Text Commands and Responses. The Login Response indicates the progress and/or concludes the Login Phase. Once suitable authentication has occurred, the target MAY authorize the initiator to send SCSI commands. How the target chooses to authorize an initiator is beyond the scope of this document. The target indicates a successful authentication and authorization by sending a login response with "login accept". Otherwise, it sends a response with a "login reject", which indicates that a session is not established and the connection is terminated. Satran, J. Standards-Track, Expire November 2001 21 iSCSI July 20, 2001 It is expected that iSCSI parameters will be negotiated after the security association protocol is established, if there is a security association. The login PDU includes a session ID that is composed of an initiator part ISID and a target part TSID. For a new session, the TSID is null. As part of the response, the target generates a TSID. Session specific parameters can be specified only during the login phase begun by a login command containing a null TSID (e.g., the maximum number of connections that can be used for this session). Connection specific parameters, if any, can be specified during the login phase begun by any login command. Thus, a session is operational once it has at least one connection. During session establishment the target identifies the initiator through the value pair InitiatorName and ISID (InitiatorName is described later in this part). Any state associate with an initiator that is persistent according to the SCSI standards (e.g., reservations) is associated with an initiator based on this identity. Any PDU except login and text, which is sent on a TCP connection before this connection gets into full feature phase, is a protocol error. When received at the initiator and target such a PDU MUST cause the connection to terminate. At the target, closing the connection MAY be preceded by a Reject PDU sent to the initiator. 1.2.4 Text Mode Negotiation During login and thereafter some session or connection parameters are negotiated through an exchange of textual information. In "list" negotiation, the offering party sends for each key a list of values (which may include "none") in its order of preference. The responding party answers with the first value from the list it supports and is allowed to use for the specific initiator. The value "none" MUST always be used to indicate a missing function. However, none is a valid selection only if it is explicitly offered. If a target is not supporting, or not allowed to use with a specific initiator, any of the offered options, it may use the value "reject". The values "none" and "reject" are reserved and must be used only as described here. Any key not understood is answered with "NotUnderstood". Satran, J. Standards-Track, Expire November 2001 22 iSCSI July 20, 2001 The general format of text negotiation is: Offer-> =,,..., Answer-> =|reject|NotUnderstood In "numerical" negotiations, the offering and responding party state a numerical value. The result of the negotiation is key dependent; frequently the lower or the higher of the two values is used. Binary negotiations (for keys taking the values yes or no) are a restricted form of numerical negotiations and, as in the general numerical case, the result is key dependent. For numerical (and binary) negotiations, if the responding party is not responding with the required key, it is assumed as answering with the default. However not responding is considered bad practice and is discouraged. The value "?" with any key has the meaning of enquiry and should be answered with the current value or "NotUnderstood". Although the initiator is the requesting party and controls the request-response initiation and termination the target can offer key=value pairs of its own as part of a sequence and not only in response to an identical key=value pair offered by the initiator. 1.2.5 iSCSI Full Feature Phase Once the initiator is authorized to do so, the iSCSI session is in iSCSI full feature phase. A session is in full feature phase after successfully finishing the login phase on the first (leading) connection of a session. A connection is in full feature phase if the session in full feature phase and the connection login has completed successfully. In full feature phase the initiator may send SCSI commands and data to the various LUs on the target by wrapping them in iSCSI PDUs that go over the established iSCSI session. If an iSCSI request is issued over one TCP connection, the corresponding response or requested PDU MUST be sent over the same connection. We call this "connection allegiance". For SCSI commands that require data and/or parameter transfer, the (optional) data and the status for a command MUST be sent over the same TCP connection that was used to deliver the SCSI command. Satran, J. Standards-Track, Expire November 2001 23 iSCSI July 20, 2001 Thus, if an initiator issues a READ command, the target MUST send the requested data, if any, followed by the status to the initiator over the same TCP connection that was used to deliver the SCSI command. If an initiator issues a WRITE command, the initiator MUST send the data, if any, for that command and the target MUST return R2T, if any, and the status over the same TCP connection that was used to deliver the SCSI command. Retransmission requests (SNACK PDUs) as well as the data and status that they generate MUST also use the same connection. However, consecutive commands that are part of a SCSI linked command- chain task MAY use different connections. Connection allegiance is strictly per-command and not per-task. During the iSCSI Full Feature Phase, the initiator and target MAY interleave unrelated SCSI commands, their SCSI Data and responses, over the session. Outgoing SCSI data (initiator to target user data or command parameters) is sent as either solicited data or unsolicited data. Solicited data is sent in response to Ready To Transfer (R2T) PDUs. Unsolicited data can be sent as part of an iSCSI command PDU ("immediate data") or in separate iSCSI data PDUs. An initiator may send unsolicited data as immediate (up to the negotiated maximum PDU size) or in a separate PDU sequence (up to the negotiated limit). All subsequent data has to be solicited. The maximum size of an individual data PDU or the immediate-part of the first unsolicited burst as well as the first burst size MAY be negotiated at login. Targets operate in either solicited (R2T) data mode or unsolicited (non R2T) data mode. In unsolicited mode, an initial R2T is implied. A target MAY separately enable immediate data without enabling the more general (separate data PDUs) form of unsolicited data. An initiator SHOULD honor a R2T data request for a valid outstanding command (i.e., carrying a valid Initiator Task Tag) provided the command is supposed to deliver outgoing data and the R2T specifies data within the command bounds. It is considered an error for an initiator to send unsolicited data PDUs to a target operating in R2T mode (only solicited data is allowed). It is also an error for an initiator to send more data, whether immediate or as separate PDUs, than the SCSI limit for first burst. At login, an initiator MAY request, to send data blocks and a first burst of any size; in this case, the target MUST indicate the size of the first burst and of the immediate and data blocks that it is ready to accept. The agreed upon limits for the first burst as Satran, J. Standards-Track, Expire November 2001 24 iSCSI July 20, 2001 well as the maximum data PDU are recorded (and are retrievable from) the disconnect-reconnect mode page. A target SHOULD NOT silently discard data and request retransmission through R2T. Initiators SHOULD NOT do any score boarding for data. The residual count calculation is to be performed by the targets. Incoming data is always implicitly solicited. SCSI data packets are matched to their corresponding SCSI commands by using Tags that are specified in the protocol. Initiator tags for pending commands are unique initiator-wide for a session. Target tags are not strictly specified by the protocol. It is assumed that these tags are used by the target to tag (alone or in combination with the LUN) the solicited data. Target tags are generated by the target and "echoed" by the initiator. The above mechanisms are designed to accomplish efficient data delivery and a large degree of control over the data flow. iSCSI initiators and targets MUST also enforce some ordering rules to achieve deadlock-free operation. Unsolicited data MUST be sent on every connection in the same order in which commands were sent. A target receiving data out of order SHOULD terminate the session. Each iSCSI session to a target is treated as if it originated from a different and logically independent initiator. 1.2.6 iSCSI Connection Termination Connection termination is assumed an exceptional event. Graceful TCP connection shutdowns are done by sending TCP FINs. Graceful connection shutdowns MUST only occur when there are no outstanding tasks that have allegiance to the connection or when the connection is not in full-feature phase. A target SHOULD respond rapidly to a FIN from the initiator by closing it's half of the connection after waiting for all outstanding commands that have allegiance to the connection to conclude and send their status. Connection termination with outstanding commands may require recovery actions. Connection termination is also required as a prelude to recovery. By terminating a connection before starting recovery, the initiator and target can avoid having stale PDUs being received after recovery. In this case, the initiator sends a Logout request on any of the operational connections of a session indicating what connection should be terminated. Satran, J. Standards-Track, Expire November 2001 25 iSCSI July 20, 2001 Logout can also be issued by an initiator at the explicit request of a target (through an Asynchronous Message PDU) or it can be performed autonomously by the target after announcing it to the initiator (through an Asynchronous Message PDU). 1.2.7 Naming and Addressing This section provides a summary of detail provided in the iSCSI Naming & Discovery draft [NDT]. All iSCSI initiators and targets are named. Each target or initiator is known by an iSCSI Name. The iSCSI Name is independent of the location of the initiator and target; two formats are provided that allow the use of existing naming authorities when generating them. One of these formats allows the use of a registered domain name as a naming authority; it is important not to confuse this with an address. The iSCSI Name is a UTF-8 text string, and is defined in [NDT]. iSCSI Names are used to provide: - a target identifier for configurations that present multiple targets behind a single IP address and port. - a method to recognize multiple paths to the same device on different IP addresses and ports. - a symbolic address for source and destination targets for use in third party commands. - an identifier for initiators and targets to enable them to recognize each other regardless of IP address and port mapping on intermediary firewalls. The initiator MUST present both its iSCSI Initiator Name and the iSCSI Target Name to which it wishes to connect during the login phase except for a discovery session in which case it is optional to present the iSCSI Target Name. A target also provides a default name called "iSCSI". This is not a globally unique name. An initiator can log into this default target name, and use a command called "SendTargets" to retrieve a list of iSCSI targets that exist at that address. The only session type that can be created with the target using the iSCSI name is a discovery session (see 1.4). iSCSI Names do not require special handling within iSCSI; they are opaque. Satran, J. Standards-Track, Expire November 2001 26 iSCSI July 20, 2001 iSCSI targets also have addresses. An iSCSI address specifies a single path on which to find an iSCSI target. The iSCSI Name is incorporated as part of the address. An iSCSI address is specified as a URL, such as: [:]/ Where is one of: - IPv4 address, in dotted decimal notation. Assumed if the name contains exactly four numbers, separated by dots (.), where each number is in the range 0..255. - IPv6 address, in dotted decimal notation. Assumed if the name contains more than four, but at most 16 numbers, separated by dots (.), where each number is in the range 0..255. - Fully Qualified Domain Name (host name). Assumed if the is neither an IPv4 nor an IPv6 address. and is the iSCSI name of the target being addressed. The in the address is optional; if specified it is the TCP port on which the target is listening for connections. If is not specified, a default port, to be assigned by IANA, will be assumed. The iSCSI address, or URL, is not generally used within normal connections between iSCSI initiators and targets; it is primarily used during discovery. Details are specified in [NDT]. Examples of iSCSI Names: iqn.com.disk-vendor.diskarrays.sn.45678 iqn.com.gateways.yourtargets.24 iqn.com.os-vendor.plan9.cdrom.12345 iqn.com.service-provider.users.customer235.host90 eui.02004567A425678D Examples of IPv4 addresses/names: 10.0.0.1/iqn.com.disk-vendor.diskarrays.sn.45678 10.0.0.2/iscsi Examples of IPv6 addresses/names: 12.5.7.10.0.0.1/iqn.com.gateways.yourtargets.24 Satran, J. Standards-Track, Expire November 2001 27 iSCSI July 20, 2001 12.5.6.10.0.0.2/iscsi For management/support tools as well as naming services that use a text prefix to express the protocol intended (as in http:// or ftp://) the following form MAY be used: iSCSI://[:port][/iSCSI-name] Examples: iSCSI://diskfarm1.acme.com/iscsi iSCSI://computingcenter.acme.com/eui.02004567A425678D iSCSI://computingcenter.acme.com:4002/iqn.com.gateways.yourtarg ets.24 To assist in providing a more human-readable user interface for devices containing iSCSI targets and initiators, a target or initiator may also provide an alias. This alias is a simple UTF-8 string, is not globally unique, and is never interpreted or used to identify an initiator or device within the iSCSI protocol. Its use is described in [NDT]. When a target has to act as an initiator for a third party command, it MAY use the iSCSI Initiator Name it learned during login as required by the authentication mechanism to the third party. To address targets and logical units within a target, SCSI uses a fixed length (8 bytes) uniform addressing scheme; in this document, we call those addresses SCSI reference addresses (SRA). To provide the target with the protocol specific addresses iSCSI relies on the SCSI aliasing mechanism (work in progress in T10). The aliasing support enables an initiator to associate protocol specific addresses with SRAs; the later can be used in subsequent commands. For iSCSI, a protocol specific address is a TCP address and an iSCSI Name. An initiator may use one of a few techniques to configure and/or discover the iSCSI Target Names to which it has access, along with their addresses. These techniques are discussed fully in [NDT]. 1.2.8 Persistent State iSCSI does not require any persistent state maintenance across sessions, beyond the persistent state required by SCSI. However SCSI requires the transport to associate an initiator name to persistent Satran, J. Standards-Track, Expire November 2001 28 iSCSI July 20, 2001 reserves, access control enrolment etc.. The value pair IIN and ISID are used for this purpose iSCSI sessions do not persist beyond power cycles and boot operations. All iSCSI session and connection parameters are reinitialized on session and connection creation. Commands persist beyond connection termination if the session persists and command recovery within session is supported. However command execution as perceived by iSCSI (i.e., involving data transfer) is suspended when a connection is dropped until a new allegiance is established by the initiator reissuing the command with a resume (restart) flag. 1.2.9 Message Synchronization and Steering 1.2.9.1 Rationale iSCSI presents a mapping of the SCSI protocol onto TCP. This encapsulation is accomplished by sending iSCSI PDUs that are of varying length. Unfortunately, TCP does not have a built-in mechanism for signaling message boundaries at the TCP layer. iSCSI overcomes this obstacle by placing the message length in the iSCSI message header. This serves to delineate the end of the current message as well as the beginning of the next message. In situations where IP packets are delivered in order from the network, iSCSI message framing is not an issue; messages are processed one after the other. In the presence of IP packet reordering (e.g., frames being dropped), legacy TCP implementations store the "out of order" TCP segments in temporary buffers until the missing TCP segments arrive, upon which the data must be copied to the application buffers. In iSCSI it is desirable to steer the SCSI data within these out of order TCP segments into the pre-allocated SCSI buffers rather than store them in temporary buffers. This decreases the need for dedicated reassembly buffers as well as the latency and bandwidth related to extra copies. Unfortunately, when relying solely on the "message length in the iSCSI message" scheme to delineate iSCSI messages, a missing TCP segment that contains an iSCSI message header (with the message length) makes it impossible to find message boundaries in subsequent TCP segments. The missing TCP segment(s) must be received before any of the following segments can be steered to the correct SCSI buffers (due to the inability to determine the iSCSI message boundaries). Satran, J. Standards-Track, Expire November 2001 29 iSCSI July 20, 2001 Since these segments cannot be steered to the correct location, they must be saved in temporary buffers that must then be copied to the SCSI buffers. Different schemes can be used to recover synchronization. One of these schemes is detailed in an Appendix C. To make those schemes work iSCSI implementations have to make sure that the appropriate protocol layers are provided with enough information to implement a synchronization and/or data steering mechanism. 1.2.9.2 Synch and Steering Functional Model We assume that iSCSI is implemented according to the following layering scheme: Satran, J. Standards-Track, Expire November 2001 30 iSCSI July 20, 2001 +------------------------+ | SCSI | +------------------------+ | iSCSI | +------------------------+ | Synch and Steering | | +-------------------+ | | | TCP | | | +-------------------+ | +------------------------+ | Lower Functional Layers| | (LFL) | +------------------------+ | IP | +------------------------+ | Link | +------------------------+ In this model, LFL can be IPsec (a mechanism changing the IP stream and invisible to TCP). We assume that Synch and Steering operates just underneath iSCSI. Note that an implementation may choose to place Synch and Steering somewhere else in the stack if it can translate the information kept by iSCSI in terms valid for the chosen layer. According to our model of layering, iSCSI considers the information it delivers to the Synch and Steering layer (headers and payloads) as a contiguous stream of bytes mapped to the positive integers from 0 to infinity. For all practical purposes, iSCSI is not supposed to have to handle infinitely long streams. The stream addressing scheme will wrap around at 2**32-1. It is also assumed that iSCSI will deliver to the layers beneath any PDU through an indivisible (atomic) operation. If a specific implementation does PDU delivery to the Synch and Steering layer through multiple operations it MUST bracket an operation set used to deliver a single PDU in a manner understandable to the Synch and Steering Layer. The Synch and Steering Layer (which itself is OPTIONAL) MUST retain the PDU end address within the stream for every delivered iSCSI PDU. To enable the Synch and Steering operation to perform Steering, additional information including identifying tags and buffer offsets MUST also be retained for every sent PDU. The Synch and Steering Satran, J. Standards-Track, Expire November 2001 31 iSCSI July 20, 2001 Layer is required to add to every sent data item (IP packet, TCP packet or some other superstructure) enough information to enable the receiver to steer it to a memory location independent of any other piece. If the transmission stream is built dynamically, this information is used to insert Synch and Steering information in the transmission stream (at first transmission or at re-transmission) either through a globally accessible table or a call-back mechanism. If the transmission stream is built statically, the Synch and Steering information is inserted in the transmission stream. The retained information can be released whenever the transmitted data is acknowledged by the receiver (in case of dynamically built streams by deletion from the global table or by an additional callback). On the outgoing path, the Synch and Steering layer MUST map the outgoing stream addresses from iSCSI stream addresses to TCP stream sequence numbers. On the incoming path, the Synch and Steering layer extracts the Synch and Steering information from the TCP stream. Then it helps steer (place) the data stream to its final location and/or recover iSCSI PDU boundaries when some TCP packets are lost or received out of order. The data stream seen by the receiving iSCSI layer is identical to the data stream that left the sending iSCSI layer. The Synch and Steering information is kept until the PDUs it refers to are completely processed by the iSCSI layer. On the incoming path, the Synch and Steering layer does not change the way TCP notifies iSCSI about in-order data arrival. All data placements, in-order or out-of-order, performed by the Synch and Steering layer are hidden from iSCSI will conventional, in order, data arrival notifications generated by TCP are passed through to iSCSI Satran, J. Standards-Track, Expire November 2001 32 iSCSI July 20, 2001 1.2.9.3 Synch and Steering and Other Encapsulation Layers We recognize that in many environments the following is a more appropriate layering model: +----------------------------------+ | SCSI | +----------------------------------+ | iSCSI | +----------------------------------+ | Upper Functional Layers (UFL) | +----------------------------------+ | Synch and Steering | | +-----------------------------+ | | | TCP | | | +-----------------------------+ | +----------------------------------+ | Lower Functional Layers (LFL) | +----------------------------------+ | IP | +----------------------------------+ | Link | +----------------------------------+ In this model, UFL can be TLS or some other transport conversion mechanism (a mechanism changing the TCP stream but transparent to iSCSI). To be effective and act on reception of TCP packets out of order, Synch and Steering has to be underneath UFL and Synch and Steering data has to be left out of any UFL transformation (encryption, compression, padding etc.). However, Synch and Steering MUST take into account the additional data inserted in the stream by UFL. Synch and Steering MAY also restrict the type of transformations UFL may perform on the stream. This makes implementation of Synch and Steering in the presence of otherwise opaque UFLs less attractive. 1.2.9.4 Synch/Steering and iSCSI PDU Size When a large iSCSI message is sent, the TCP segment(s) that contain the iSCSI header may be lost. The remaining TCP segment(s) up to the next iSCSI message need to be buffered (in temporary buffers) since the iSCSI header that indicates what SCSI buffers the data is to be Satran, J. Standards-Track, Expire November 2001 33 iSCSI July 20, 2001 steered to was lost. To minimize the amount of buffering, it is recommended that the iSCSI PDU size be restricted to a small value (perhaps a few TCP segments in length). During login, each end of the iSCSI session specifies the maximum size of an iSCSI PDU it will accept. 1.3 Third Party Commands SCSI allows every addressable entity to be either an initiator or a target. In host-to-host communication, each such entity can take on the initiator role. In typical I/O operations between a host and a peripheral subsystem, the host plays the initiator role and the peripheral subsystem plays the target role. For EXTENDED COPY and other third party SCSI commands, that involve device-to-device communication, such as (EXTENDED) COPY and COMPARE, SCSI defines a copy-manager. The copy-manager takes on the role of initiator in the device-to-device communication. The copy-manager is the "original-target" of the command and acts as initiator for a (variable) number of the devices, which are called sources and destinations. Sources and destinations act as targets. The whole operation is described by one "master CDB" that is delivered to the copy-manager and a series of descriptor blocks; each descriptor block addresses a source and destination target, LU and a description of the work to be done in terms of blocks or bytes as required by the device types. The relevant SCSI standards do not require full support of the (EXTENDED) COPY or COMPARE, nor do they provide a detailed execution model. Enabling a FC copy-manager to support iSCSI sources and destinations is subject to coordination with T10. 1.4 iSCSI session types iSCSI defines several types of sessions: a) normal operational session - an unrestricted session b) boot session - a session intended for system boot only - target MAY limit the type of commands accepted c) copy-manager session - a session opened between a copy manager to its targets as part of executing third party SCSI commands d) discovery-session - a session opened only for target discovery; the target MAY accept only text commands with only the SendTargets key Satran, J. Standards-Track, Expire November 2001 34 iSCSI July 20, 2001 The session type is defined during login with key=value parameter in the login command. 1.5 SCSI to iSCSI concepts mapping model The following diagram shows an example of how multiple iSCSI Nodes (targets in this case) can co-exist within the same Network Entity and can share Network Portals (IP names, addresses and TCP ports). Other more complex configurations are also possible. Detailed descriptions of the components of these diagrams are given in 1.5.1 +-----------------------------------+ | Network Entity (iSCSI Client) | | | | +-------------+ | | | iSCSI Node | | | | (Initiator) | | | +-------------+ | | | | | | +--------------+ +--------------+ | | |Network Portal| |Network Portal| | | | 10.1.30.4 | | 10.1.40.6 | | +-+--------------+-+--------------+-+ | | | IP Networks | | | +-+--------------+-+--------------+-+ | |Network Portal| |Network Portal| | | | 10.1.30.21 | | 10.1.40.3 | | | | TCP Port 4 | | TCP Port 4 | | | +--------------+ +--------------+ | | | | | | ----------------- | | | | | | +-------------+ +--------------+ | | | iSCSI Node | | iSCSI Node | | | | (Target) | | (Target) | | | +-------------+ +--------------+ | | | | Network Entity (iSCSI Server) | +-----------------------------------+ 1.5.1 iSCSI Architectural Model Satran, J. Standards-Track, Expire November 2001 35 iSCSI July 20, 2001 This section describes that part of the iSCSI architectural model that has the most bearing on the relationship between iSCSI and the SCSI Architectural Model. a) Network Entity - The Network Entity represents a device or gateway that is accessible from the IP network. This device or gateway may support one or more iSCSI Node. The iSCSI Node is accessed via a network portal (see below). b) iSCSI Node - There may be one or more iSCSI Storage Nodes within the Network Entity. An iSCSI Node is identified by its iSCSI name. There is a requirement for iSCSI names to be unique. iSCSI names are useful because in some cases (e.g. when DHCP services [xxx] are used etc), the combination of IP address and port number cannot uniquely identify an initiator or a target. There is a default iSCSI Node object present at every target network entity that can be accessed without specifying the iSCSI name. However, if there are multiple iSCSI target Nodes that are serviced by a single Network Entity and Network Portal objects, then it is necessary for the initiator to specify the target iSCSI name to uniquely identify the target iSCSI node. An alias string could also be associated with an iSCSI target node. The target alias helps an organization to associate their own semantic meaning with the target alias string. However, the target alias string is not a substitute for the target iSCSI name. c) Network Portal - The Network Portal is a port through which access to any iSCSI Node within the Network Entity can be obtained. A Network Entity must have one or more Network Portals, each of which is usable by some iSCSI Nodes contained in that Network Entity to gain access to the IP network. A Network Portal in an initiator is identified by it IP name or address. A Network Portal in a target is identified by its IP name or address and its listening TCP port. Satran, J. Standards-Track, Expire November 2001 36 iSCSI July 20, 2001 d) Portal Groups - iSCSI supports multiple connections within the same session; some implementations will have the ability to do this across multiple Network Portals. A Portal Group is a group of Network Portals that collectively can support a multiple-connection session. A system may contain one or more Portal Groups. Each Network Portal belongs to exactly one portal group on each iSCSI node. Portal Groups are identified within an iSCSI Node by a portal group tag, a simple integer value. All Network Portals with the same portal group tag are in the same Portal Group. The following diagram shows an example of one such configuration on a target and how a session may be established that shares Network Portals within a Portal Group. ----------------------------IP Network--------------------- | | | +--------|---------------|--------------------|---------------------+ | +----|---------------|-----+ +----|---------+ | | | +---------+ +---------+ | | +---------+ | | | | | Network | | Network | | | | Network | | | | | | Portal | | Portal | | | | Portal | | | | | +--|------+ +---------+ | | +---------+ | | | | | | | | | | | | | | Portal | | | | Portal | | | | | Group 1 | | | | Group 2 | | | +--------------------------+ +--------------+ | | | | | | | +----------------------------+ +-----------------------------+ | | | iSCSI Session (Target side)| | iSCSI Session (Target side) | | | | | | | | | | (iSCSI Name + TSID=0) | | (iSCSI Name + TSID=1) | | | +----------------------------+ +-----------------------------+ | | | | iSCSI Target Node | | (within Network Entity, not shown) | +-------------------------------------------------------------------+ 1.5.2 SCSI Architecture Model This part describes the relationship between the SCSI Architecture Model [SAM-2] constructs of SCSI device, SCSI port and I_T nexus and the iSCSI constructs described above. This relationship implies implementation requirements in order to conform to the SAM-2 model and other SCSI operational functions. Satran, J. Standards-Track, Expire November 2001 37 iSCSI July 20, 2001 These implementation requirements are detailed in 1.5.3. a) SCSI Device - This is the SAM-2 term for an entity that contains other SCSI entities. For example, a SCSI Initiator Device contains one or more SCSI Initiator Ports and zero or more application clients; a SCSI Target Device contains one or more SCSI Target Ports and one or more logical units. For iSCSI, the SCSI Device is the component within an iSCSI Node that provides the SCSI functionality. As such, there can be at most one SCSI Device within a given iSCSI Node. The SCSI Device Name is the same as the iSCSI Node name. b) SCSI Port - This is the SAM-2 term for an entity in a SCSI device that provides the SCSI functionality to interface with a service delivery subsystem or transport. For iSCSI, the definition of SCSI Initiator Port and SCSI Target Port are different. SCSI Initiator Port: this maps to the endpoint of a session. An iSCSI session is negotiated through the login process between an iSCSI Initiator Node and an iSCSI Target Node. At successful completion of this process, a SCSI Initiator Port is created within the iSCSI Initiator Node. The SCSI Initiator Port Name and SCSI Initiator Port Identifier are both defined as the iSCSI Initiator Name together with the ISID portion of the session identifier. SCSI Target Port: this maps to a target Portal Group. The SCSI Target Port Name and SCSI Target Port Identifier are both defined as the iSCSI Target Name together with the portal group tag. c) I_T Nexus - The I_T Nexus is a relationship between a SCSI Initiator Port and a SCSI Target Port. For iSCSI, this relationship is a session. This then is a relationship between the initiator end of a session (SCSI Initiator Port) and the target portal group (SCSI Target Port) through which the session is established. 1.5.3 Consequences of the model This section describes the implementation and behavioral requirements that result from the mapping of SCSI constructs to iSCSI constructs defined above. Satran, J. Standards-Track, Expire November 2001 38 iSCSI July 20, 2001 ISID RULE: Between a given iSCSI Initiator and iSCSI Target Portal Group, there can be only one session with a given ISID. This does not preclude use of the same ISID with a different Target Portal Group on the same iSCSI Target Node (or on other iSCSI Target Nodes), nor does it preclude other sessions with different ISIDs to the same Target Portal Group. The reason for this rule is to avoid instantiation of "parallel" nexuses between the same two SCSI initiator and SCSI target ports. Certain nexus relationships contain explicit state (e.g., initiator- specific mode pages or reservation state) that may need to be preserved through changes or failures in the iSCSI layer (e.g., session collapses). In order for that state to be restored, the initiator should reestablish its session (re-login) to the same Target Portal Group using the previous ISID. This is because the SCSI Initiator Port Identifier and the SCSI Target Port Identifier (or relative target port) that the SCSI logical unit device server uses to identify the nexus. To facilitate compliance with the ISID RULE, the following should hold: iSCSI Initiator Requirements: a) The iSCSI Name should be configurable parameter of each initiator portal group. b) The ISID name space of the iSCSI Initiator should be partitioned among the initiator portal groups. iSCSI Target Requirements: a) The iSCSI Name should be configurable parameter of each target portal group. b) The TSID name space of the iSCSI Initiator should be partitioned among the target portal groups. SCSI Mode Pages: If the SCSI target does not maintain initiator-specific mode pages, and an initiator makes changes to port-specific mode pages, the changes may affect all other initiators logged in to that iSCSI Target through the same Target Portal Group. Satran, J. Standards-Track, Expire November 2001 39 iSCSI July 20, 2001 2. iSCSI PDU Formats All multi-byte integers that are specified in formats defined in this document are to be represented in network byte order (i.e., big endian). Any bits not defined MUST be set to zero. Any reserved fields and values MUST be 0 unless specified otherwise. 2.1 iSCSI PDU Length and Padding iSCSI PDUs are padded to an integer number of 4 byte words. The padding bytes should be 0. 2.2 PDU Template, Header and Opcodes All iSCSI PDUs have one or more header segments and, optionally, a data segment. After the entire header segment group there MAY be a header-digest. The data segment MAY also be followed by a data- digest. The first segment, and in many cases the only segment, (Basic Header Segment or BHS) is a fixed-length 48-byte header segment. It may be followed by Additional Header Segments (AHS). Thus, when we have only a BHS (no data or digests) the size of the iSCSI PDU is 48 bytes. The overall structure of a PDU is as follows: Satran, J. Standards-Track, Expire November 2001 40 iSCSI July 20, 2001 Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0 / BHS / +/ / +---------------+---------------+---------------+---------------+ 48/ AHS (optional) / +/ / +---------------+---------------+---------------+---------------+ ---- +---------------+---------------+---------------+---------------+ m/ Header-Digest (optional) / +/ / +---------------+---------------+---------------+---------------+ n/ Data Segment(optional) / +/ / +---------------+---------------+---------------+---------------+ m/ Data-Digest (optional) / +/ / +---------------+---------------+---------------+---------------+ All PDU segments and digests are padded to an integer number of 4 byte words. The padding bytes should be 0. 2.2.1 Header Digest and Data Digest Optional header and data digests protect the integrity and authenticity of header and data, respectively. The digests, if present, are located, respectively, after the header and PDU-specific data and include the padding bytes. The digest types are negotiated during the login phase. The separation of the header and data digests is useful in iSCSI routing applications, where only the header changes when a message is forwarded. In this case, only the header digest should be re- calculated. Digests are not included in data or header length fields. A zero-length Data Segment implies also a zero-length data-digest. 2.2.2 Basic Header Segment (BHS) Satran, J. Standards-Track, Expire November 2001 41 iSCSI July 20, 2001 The Basic Header Segment is 48 bytes long. The Opcode, TotalAHSLength and DataSegmentLength fields appear in all iSCSI PDUs. In addition, the Initiator Task Tag, Logical Unit Number, and Flags fields, when used, always appear in the same location in the header. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|X|I| Opcode |F| Opcode-specific fields | | |P| | +---------------+---------------+---------------+---------------+ 4|TotalAHSLength | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| LUN or Opcode-specific fields | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag or Opcode-specific fields | +---------------+---------------+---------------+---------------+ 20/ Opcode-specific fields / +/ / +---------------+---------------+---------------+---------------+ 48 2.2.2.1 X The X bit is used as a Retry/Restart indicator for request PDUs (PDUs from initiator to target). This bit is always 1 for response PDUs (PDUs from target to initiator). 2.2.2.2 I The I bit is used as immediate delivery marker for request PDUs (PDUs from initiator to target). This bit is always 1 for response PDUs (PDUs from target to initiator). 2.2.2.3 Opcode The Opcode indicates what type of iSCSI PDU the header encapsulates. The Opcodes are divided into two categories: initiator opcodes and target opcodes. Initiator opcodes are in PDUs sent by the initiators Satran, J. Standards-Track, Expire November 2001 42 iSCSI July 20, 2001 (request PDUs), and target opcodes are in PDUs sent by the target (response PDUs). Initiators MUST NOT use target opcodes and targets MUST NOT use initiator opcodes. Valid initiator opcodes defined in this specification are: 0x00 NOP-Out (from initiator to target) 0x01 SCSI Command (encapsulates a SCSI Command Descriptor Block) 0x02 SCSI Task Management Command 0x03 Login Command 0x04 Text Command 0x05 SCSI Data-out (for WRITE operations) 0x06 Logout Command 0x10 SNACK Request Valid target opcodes are: 0x20 NOP-In (from target to initiator) 0x21 SCSI Response (contains SCSI status and possibly sense information or other response information) 0x22 SCSI Task Management Response 0x23 Login Response 0x24 Text Response 0x25 SCSI Data-in (for READ operations) 0x26 Logout Response 0x31 Ready To Transfer (R2T - sent by target to initiator when it is ready to receive data from initiator) 0x32 Asynchronous Message (sent by target to initiator to indicate certain special conditions) 0x3f Reject Initiator opcodes 0x1c-0x1e and target opcodes 0x3c-0x3e are vendor specific codes. 2.2.2.4 Opcode-specific Fields These fields have different meanings for different opcode types. Bit 7 of the second byte is used as a Poll/Final bit (P/F bit) for some iSCSI PDUs and MUST be 0 in all other iSCSI PDUs. When used as a Poll bit it indicates that an answer is required. When used as a Final bit it indicates a Final PDU in a logical sequence (e.g., the Satran, J. Standards-Track, Expire November 2001 43 iSCSI July 20, 2001 last Data PDU of unsolicited or solicited data PDU sequence or the perceived final Request/Response of the Login Phase). 2.2.2.5 TotalAHSLength Total length of all AHS header segments in 4 byte words including padding if any. 2.2.2.6 DataSegmentLength This is the data segment payload length in bytes (excluding padding). 2.2.2.7 LUN Some opcodes operate on a specific Logical Unit. The Logical Unit Number (LUN) field identifies which Logical Unit. If the opcode does not relate to a Logical Unit, this field either is ignored or may be used for some other purpose. The LUN field is 64-bits in accordance with [SAM2]. The exact format of this field can be found in the [SAM2] document. 2.2.2.8 Initiator Task Tag The initiator assigns a Task Tag to each iSCSI task that it issues. While a task exists this tag MUST uniquely identify it session-wide. 2.2.3 Additional Header Segment The general format of the additional header segments is: Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| AHSLength | AHSType | AHS-Specific | +---------------+---------------+---------------+---------------+ 4/ AHS-Specific / +/ / +---------------+---------------+---------------+---------------+ x 2.2.3.1 AHSType The AHSType field is coded as follows: Satran, J. Standards-Track, Expire November 2001 44 iSCSI July 20, 2001 B7 - Drop Bit - if set to 1 this AHS may be ignored if not understood; if set to 0 this AHS must be rejected if not understood. B6 - Reserved - must be 0 B0-5 - AHS code 0 - Reserved 1 - Extended CDB 2 - Bi-Directional SCSI commands Expected Read Data Length 3-59 Reserved 60-63 Non iSCSI extensions 2.2.3.2 AHSLength This field contains the effective length in bytes of the AHS excluding AHSType and AHSLength (not including padding). The AHS is padded to an integer number of 4 byte words. 2.2.4 Extended CDB Additional Header Segment Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| CDBLength-16 | 0x01 | Reserved (0) | +---------------+---------------+---------------+---------------+ 4/ ExtendedCDB...+padding / +/ / +---------------+---------------+---------------+---------------+ x 2.2.5 Bi-directional Expected Read-Data Length Additional Header Segment Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| 0x05 | 0x02 | Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Expected Read-Data Length | +---------------+---------------+---------------+---------------+ 8 Satran, J. Standards-Track, Expire November 2001 45 iSCSI July 20, 2001 2.3 SCSI Command Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|X|I| 0x01 |F|R|W|0 0|ATTR | Reserved | CRN or Rsvd | +---------------+---------------+---------------+---------------+ 4|TotalAHSLength | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| Logical Unit Number (LUN) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Expected Data Transfer Length | +---------------+---------------+---------------+---------------+ 24| CmdSN | +---------------+---------------+---------------+---------------+ 28| ExpStatSN or ExpDataSN | +---------------+---------------+---------------+---------------+ 32/ SCSI Command Descriptor Block (CDB) / +/ / +---------------+---------------+---------------+---------------+ 48| Digests if any... | +---------------+---------------+---------------+---------------+ / DataSegment - Command Data (optional) / +/ / +---------------+---------------+---------------+---------------+ 2.3.1 Flags and Task Attributes The flags for a SCSI Command are: b7 (F) set to 1 when no unsolicited SCSI Data-Out PDUs follow this PDU. For a write, if Expected Data Transfer Length is larger than the Length the target may solicit additional data through R2T. b6 (R) set to 1 when input data is expected b5 (W) set to 1 when output data is expected b3-4 Reserved (MUST be 0) b0-2 used to indicate Task Attributes Satran, J. Standards-Track, Expire November 2001 46 iSCSI July 20, 2001 The Task Attributes (ATTR) can have one of the following integer values (see [SAM2] for details): 0 Untagged 1 Simple 2 Ordered 3 Head of Queue 4 ACA Having both the W and the F bit set to 0 is an error. The R and W MAY be 1 while the corresponding Expected Data Transfer Lengths are 0 but they MUST NOT be 0 when the corresponding Expected Data Transfer Lengths are not 0. 2.3.2 CRN SCSI command reference number - if present in the SCSI execute command arguments (according to [SAM2]). 2.3.3 CmdSN - Command Sequence Number Enables ordered delivery across multiple connections in a single session. 2.3.4 ExpStatSN/ExpDataSN - Expected Status Sequence Number Command responses up to ExpStatSN-1 (mod 2**32) have been received (acknowledges status) on the connection. If the command is a retry (the X bit is 1) establishing a new connection allegiance, this field will contain the next consecutive input DataSN number expected by the initiator (no gaps) for this command in a previous execution. 2.3.5 Expected Data Transfer Length For unidirectional operations, the Expected Data Transfer Length field states the number of bytes of data involved in this SCSI operation. For a WRITE (W flag set to 1 and R flag set to 0) operation, the initiator uses this field to specify the number of bytes of data it expects to transfer for this operation. For a READ (W flag set to 0 and R flag set to 1) operation, the initiator uses this field to specify the number of bytes of data it expects the target to transfer to the initiator. It corresponds to the SAM-2 byte count. If the Expected Data Transfer Length for a WRITE and the length of immediate data part that follows the command (if any) are the same Satran, J. Standards-Track, Expire November 2001 47 iSCSI July 20, 2001 then no more data PDUs are expected to follow. In this case, the F bit MUST be set to 1. If the Expected Data Transfer Length is higher than the FirstBurstSize (the negotiated maximum amount of unsolicited data the target will accept) the initiator SHOULD send the maximum size of unsolicited data. The target MAY terminate in error a command for which the Expected Data Transfer Length is higher than the FirstBurstSize and for which the initiator sent less than FirstBurstSize unsolicited data. For bi-directional operations (both R and W flags are set to 1), this field states the number of data bytes involved in the outbound transfer. For bi-directional operations, an additional header segment MUST be present in the header sequence indicating the Expected Bi- directional Read Data Length. Upon completion of a data transfer, the target informs the initiator of how many bytes were actually processed (sent or received) by the target. This is done through residual counts. 2.3.6 CDB - SCSI Command Descriptor Block There are 16 bytes in the CDB field to accommodate the commonly used CDB. Whenever the CDB is larger than 16 bytes, an Extended CDB AHS MUST is used to contain the CDB spillover. 2.3.7 Command-Data Data Segment Some SCSI commands require additional parameter data to accompany the SCSI command. This data may be placed beyond the boundary of the iSCSI header (a data segment). Alternatively, user data (as from a WRITE operation) can be placed in the same PDU (both cases referred to as immediate data). Those data are governed by the general rules for solicited vs. unsolicited data. Satran, J. Standards-Track, Expire November 2001 48 iSCSI July 20, 2001 2.4 SCSI Response Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|1|1| 0x21 |1|Rsv|o|u|O|U|0| Status | Response | +---------------+---------------+---------------+---------------+ 4| Reserved (0) | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| Reserved (0) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Basic Residual Count | +---------------+---------------+---------------+---------------+ 24| StatSN | +---------------+---------------+---------------+---------------+ 28| ExpCmdSN | +---------------+---------------+---------------+---------------+ 32| MaxCmdSN | +---------------+---------------+---------------+---------------+ 36| ExpDataSN or Reserved (0) | +---------------+---------------+---------------+---------------+ 40| ExpR2TSN or Reserved (0) | +---------------+---------------+---------------+---------------+ 44| Bidi-Read Residual Count | +---------------+---------------+---------------+---------------+ 48| Digests if any... | +---------------+---------------+---------------+---------------+ / Sense Data and Response Data (optional) / +/ / +---------------+---------------+---------------+---------------+ 2.4.1 Byte 1 - Flags b0 (0) Reserved b1 (U) set for Residual Underflow. In this case, the Basic Residual Count indicates how many bytes were not transferred out of those expected to be transferred. Satran, J. Standards-Track, Expire November 2001 49 iSCSI July 20, 2001 b2 (O) set for Residual Overflow. In this case, the Basic Residual Count indicates how many bytes could not be transferred because the initiator's Expected Data Transfer Length was too small. b3 (u) same as b1 but for the read-part of a bi-directional operation b4 (o) same as b2 but for the read-part of a bi-directional operation b5-6 Reserved Bits O and U are mutually exclusive and so are bits o and u. For a response (S=0) b1-b4 MUST be 0. 2.4.2 Status The Status field is used to report the SCSI status of the command (as specified in [SAM2]). If a SCSI device error is detected while data from the initiator is still expected (the command PDU did not contain all the data and the target has not received a Data PDU with the final bit Set) the target MUST wait until it receives the a Data PDU with the F bit set before sending the Response PDU. 2.4.3 Response This field contains iSCSI service response. Valid iSCSI service response codes are: 0x00 - Command Completed at Target 0x01 - Target Failure 0x02 - Delivery Subsystem Failure 0x03 - Unsolicited data rejected 0x04 - Not enough unsolicited data 0x05 - Command in progress 0x80-0xff - Reserved for Vendor-Unique Responses N.B. Response code 0x04 must be used only if the target does not support output (write) operations in which the total data length is higher than FirstBurstSize but the initiator sent less than first burst size and out-of-order R2Ts can't be used. The Response is used to report a Service Response. The exact mapping of the iSCSI response codes to SAM service response symbols is outside the scope of this document. Satran, J. Standards-Track, Expire November 2001 50 iSCSI July 20, 2001 Certain response codes MUST be associated by the target with a Check Condition Status as outlined in the next table: +--------+------------------------------+---------------------------+ | Code | Reason | with SCSI CHECK CONDITION | +--------+------------------------------+---------------------------+ |0x01 | Target Failure | ASC = 0x44 ASCQ = 0x00 | +--------+------------------------------+---------------------------+ |0x02 | Delivery Subsystem Failure | ASC = 0x47 ASCQ = 0x03 | +--------+------------------------------+---------------------------+ |0x03 | Unsolicited data rejected | ASC = 0x49 ASCQ = 0x00 | +--------+------------------------------+---------------------------+ |0x04 | Not enough unsolicited | ASC = 0x49 ASCQ = 0x00 | +--------+------------------------------+---------------------------+ |0x05 | SNACK rejected | ASC = 0x47 ASCQ = 0x03 | +--------+------------------------------+---------------------------+ As listed above, each defined response code MUST be used (under the conditions described in the 'Reason' field), only when the corresponding SCSI CHECK CONDITION is signaled, to convey additional protocol service information. A SCSI CHECK CONDITION with the ASC and ASCQ values as tabulated MUST be signaled by iSCSI targets for all the instances in this document referring to usage of one of the above defined response codes. Please note that a response of "Command Completed at Target" may also be associated with an error status. 2.4.4 Basic Residual Count The Basic Residual Count field is valid only in the case where either the U bit or the O bit is set. If neither bit is set, the Basic Residual Count field SHOULD be zero. If the U bit is set, the Basic Residual Count indicates how many bytes were not transferred out of those expected to be transferred. If the O bit is set, the Basic Residual Count indicates how many bytes could not be transferred because the initiator's Expected Data Transfer Length was too small. 2.4.5 Bidi-Read Residual Count The Bidi-Read Residual Count field is valid only in the case where either the u bit or the o bit is set. If neither bit is set, the Bidi-Read Residual Count field SHOULD be zero. If the u bit is set, the Bidi-Read Residual Count indicates how many bytes were not transferred to the initiator out of those expected to be transferred. If the o bit is set, the Bidi-Read Residual Count indicates how many Satran, J. Standards-Track, Expire November 2001 51 iSCSI July 20, 2001 bytes could not be transferred to the initiator because the initiator's Expected Bidi-Read Transfer Length was too small. 2.4.6 Sense and Response Data Segment iSCSI targets MUST support and enable autosense. If the Command Status was CHECK CONDITION (0x02), then the Data Segment contains sense data for the failed command. For some iSCSI responses, the response data segment MAY contain some response related information, (e.g., for a target failure it may contain a vendor specific detailed description of the failure). If the Data Segment Length is not 0 the format of the Sense and Response Data Segment is: Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|Sense Length | Sense Data | +---------------+---------------+---------------+---------------+ x/ Sense Data / +---------------+---------------+---------------+---------------+ y/ Response Data / + + / / +---------------+---------------+---------------+---------------+ z| | 2.4.7 ExpDataSN One past the largest DataSN in an input (read) data PDU the target has sent for the command. 0 means no data PDUs were sent. This field is reserved if S is 0. 2.4.8 ExpR2TSN One past the largest R2TSN the target has sent for the command. 0 means no R2T PDUs were sent. This field is reserved if S is 0. Satran, J. Standards-Track, Expire November 2001 52 iSCSI July 20, 2001 2.4.9 StatSN - Status Sequence Number StatSN is a Sequence Number that the target iSCSI layer generates per connection and that in turn enables the initiator to acknowledge status reception. StatSN is incremented by 1 for every response/status sent on a connection except for responses sent as a result of a retry or SNACK. For responses sent because of a retransmission request the StatSN used MUST be the same as the first time the PDU was sent unless the connection was restarted since then. 2.4.10 ExpCmdSN - Next Expected CmdSN from this Initiator ExpCmdSN is a Sequence Number that the target iSCSI returns to the initiator to acknowledge command reception. It is used to update a local register with the same name. An ExpCmdSN equal to MaxCmdSN+1 indicates that the target cannot accept new commands. 2.4.11 MaxCmdSN - Maximum CmdSN Acceptable from this Initiator MaxCmdSN is a Sequence Number that the target iSCSI returns to the initiator to indicate the maximum CmdSN the initiator can send. It is used to update a local register with the same name. If MaxCmdSN is equal to ExpCmdSN-1 that indicates to the initiator that the target can't receive any additional commands. When MaxCmdSN changes at the target while the target has no pending PDUs to convey this information to the initiator it MUST generate a NOP-IN to carry the new MaxCmdSN. Satran, J. Standards-Track, Expire November 2001 53 iSCSI July 20, 2001 2.5 Task Management Command Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|X|I| x02 |0| Function | Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Reserved (0) | +---------------+---------------+---------------+---------------+ 8| Logical Unit Number (LUN) or Reserved (0) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Referenced Task Tag or Reserved (0xffffffff) | +---------------+---------------+---------------+---------------+ 24| CmdSN | +---------------+---------------+---------------+---------------+ 28| ExpStatSN | +---------------+---------------+---------------+---------------+ 32| RefCmdSN | +---------------+---------------+---------------+---------------+ 36/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48 2.5.1 Function The Task Management functions provide an initiator with a way to explicitly control the execution of one or more Tasks (SCSI and iSCSI tasks). The Task Management functions are summarized as follows (for a more detailed description of SCSI task management see the [SAM2] document): 1 Abort Task - aborts the task identified by the Referenced Task Tag field. 2 Abort Task Set - aborts all Tasks issued by this initiator on the Logical Unit. 3 Clear ACA - clears the Auto Contingent Allegiance condition. 4 Clear Task Set - Aborts all Tasks (from all initiators) for the Logical Unit. 5 Logical Unit Reset Satran, J. Standards-Track, Expire November 2001 54 iSCSI July 20, 2001 6 Target Warm Reset 7 Target Cold Reset For all these functions, if executed, the Task Management Response MUST be returned using the Initiator Task Tag to identify the operation for which it is responding. All those functions apply to the referenced tasks regardless if they are proper SCSI tasks or tagged iSCSI operations. Task management commands must be executed as if all the commands having a CmdSN lower or equal to the task management CmdSN have been received by the target (i.e., have to be executed as if received for ordered delivery even when marked for immediate delivery). For all the tasks covered by the task management response (i.e., with CmdSN not higher than the task management command CmdSN), additional responses MUST NOT be delivered to the SCSI layer after the task management response. For the , the target MUST enter a Unit Attention Condition for all other attached initiators to inform them that all pending tasks are cancelled. The Target Reset function (Warm and Cold) implementation is OPTIONAL and when implemented they should act as described below. Target Reset MAY be also subject to authorization of the requesting initiator. When not implemented or when authorization fails at target, Target Reset functions should end as if the function was executed successfully and the response qualifier will detail what was executed. For the and functions, the target cancels all pending operations and are both equivalent to the Hard Reset as specified by SAM-2. The target MUST enter a Unit Attention Condition for all attached initiators notifying them that the target is being reset. In addition, for the the target then MUST terminate all of its TCP connections to all initiators (all sessions are terminated). However, if the target finds that it cannot send the required response or AEN, it MUST continue the reset operation and it SHOULD log the condition for later retrieval. The logging operation MUST be reported through the target MIB. Further actions on reset functions are specified in the relevant SCSI documents for the specific class of devices. 2.5.2 Referenced Task Tag Satran, J. Standards-Track, Expire November 2001 55 iSCSI July 20, 2001 Initiator Task Tag of the task to be aborted - for abort task 2.5.3 RefCmdSN For abort-task the task CmdSN to enable task removal. If RefCmdSN is is lower that ExpCmdSN or higher than MaxCmdSN the target will ignore RefCmdSN. Satran, J. Standards-Track, Expire November 2001 56 iSCSI July 20, 2001 2.6 Task Management Response Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|1|1| 0x22 |1| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Reserved (0) | +---------------+---------------+---------------+---------------+ 8| Logical Unit Number (LUN) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Referenced Task Tag or Reserved (0xffffffff) | +---------------+---------------+---------------+---------------+ 24| StatSN | +---------------+---------------+---------------+---------------+ 28| ExpCmdSN | +---------------+---------------+---------------+---------------+ 32| MaxCmdSN | +---------------+---------------+---------------+---------------+ 36| Response | Qualifier | Reserved (0) | +---------------+---------------+---------------+---------------+ 40/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48 For the functions , the target performs the requested Task Management function and sends a Task Management Response back to the initiator. The target provides a Response, which may take on the following values: 0 Function Complete 1 Task was not in task set 2 Command not received yet but placeholder marked for task abort 3 LUN does not exist 255 Function Rejected Satran, J. Standards-Track, Expire November 2001 57 iSCSI July 20, 2001 The Qualifier field details the Response. For Function Complete the valid Qualifiers are: 0 - Function Executed 1 - Function not implemented 2 - Not Authorized For the and functions, the target cancels all pending operations. If SCSI control mode enables AE reporting, the SCSI target MUST send an Asynchronous Message to all logged-in initiators notifying them that the target has been reset. For the the target MUST then close all of its TCP connections to all initiators (terminates all sessions). The mapping of the response code into a SCSI service response code, if needed, is outside the scope of this document. 2.6.1 Referenced Task Tag Initiator Task Tag of the task not found used in conjunction with Response value 1. It MUST be set to 0xffffffff in other cases. Satran, J. Standards-Track, Expire November 2001 58 iSCSI July 20, 2001 2.7 SCSI Data-out & SCSI Data-in The typical data transfer specifies the length of the data payload, the Target Transfer Tag provided by the receiver for this data transfer, and a buffer offset. The typical SCSI Data PDU for WRITE (from initiator to target) has the following format: Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|0|0| 0x05 |F| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Reserved (0) | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| LUN or Reserved (0) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Target Transfer Tag or (0xffffffff) | +---------------+---------------+---------------+---------------+ 24| Reserved (0) | +---------------+---------------+---------------+---------------+ 28| ExpStatSN | +---------------+---------------+---------------+---------------+ 32| Reserved (0) | +---------------+---------------+---------------+---------------+ 36| DataSN | +---------------+---------------+---------------+---------------+ 40| Buffer Offset | +---------------+---------------+---------------+---------------+ 44| Reserved (0) | +---------------+---------------+---------------+---------------+ 48| Digests if any... | +---------------+---------------+---------------+---------------+ / DataSegment / +/ / +---------------+---------------+---------------+---------------+ Satran, J. Standards-Track, Expire November 2001 59 iSCSI July 20, 2001 The typical SCSI Data packet for READ (from target to initiator) has the following format: Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|1|1| 0x25 |F| (0) |O|U|S| Reserved (0) |Status or Rsvd | +---------------+---------------+---------------+---------------+ 4| Reserved (0) | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| Reserved (0) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Reserved (0) | +---------------+---------------+---------------+---------------+ 24| StatSN or Reserved (0) | +---------------+---------------+---------------+---------------+ 28| ExpCmdSN | +---------------+---------------+---------------+---------------+ 32| MaxCmdSN | +---------------+---------------+---------------+---------------+ 36| DataSN | +---------------+---------------+---------------+---------------+ 40| Buffer Offset | +---------------+---------------+---------------+---------------+ 44| Residual Count | +---------------+---------------+---------------+---------------+ 48| Digests if any... | +---------------+---------------+---------------+---------------+ / DataSegment / +/ / +---------------+---------------+---------------+---------------+ Status can accompany the last Data-in PDU if the command did not end with an exception. Presence of status (and of a residual count) is signaled though the S flag bit. 2.7.1 F (Final) Bit Satran, J. Standards-Track, Expire November 2001 60 iSCSI July 20, 2001 For outgoing data, this bit is 1 for the last PDU of unsolicited data or the last PDU of a sequence answering a R2T. For incoming data, this bit is 1 for the last input (read) data PDU of a sequence. 2.7.2 Target Transfer Tag On outgoing data, the Target Transfer Tag is provided to the target if the transfer is honoring a R2T. In this case, the Target Transfer Tag field is a replica of the Target Transfer Tag provided with the R2T. The Target Transfer Tag values are not specified by this protocol except that the all-bits-one value (0xffffffff) is reserved and means that the Target Transfer Tag is not supplied. If the Target Transfer Tag is provided then the LUN field MUST hold a valid value and be consistent with whatever was specified with the command, otherwise the LUN field is reserved. 2.7.3 StatSN This field MUST be set only if the S bit is set to 1 2.7.4 DataSN For input (read) data PDUs, the DataSN is the data PDU number (starting with 0) within the data transfer for the command identified by the Initiator Task Tag. For output (write) data PDUs, the DataSN is the data PDU number (starting with 0) within the current output sequence. The current output sequence is identified by the Initiator Task Tag (for unsolicited data) or is a data sequence generated for one R2T (for data solicited through R2T). Any input or output data sequence MUST contain less than 2**32-1 numbered PDUs. 2.7.5 Buffer Offset The Buffer Offset field contains the offset of this PDU payload data within the complete data transfer. The sum of the buffer offset and Satran, J. Standards-Track, Expire November 2001 61 iSCSI July 20, 2001 length should not exceed the expected transfer length for the command. The order of data PDUs within a sequence is determined by the DataDeliveryOrder (when set to yes it means that PDUs have to be in increasing Buffer Offset order and overlays are forbidden). Data ordering between sequences is determined by EMPD (DataOrder) (EMDP=0 means that sequence ordering is mandatory). 2.7.6 DataSegmentLength This is the data payload length of a SCSI Data-In or SCSI Data-Out PDU; sending of 0 length data segments should be avoided, but initiators and targets must be able to properly receive 0 length data segments. 2.7.7 Flags The last SCSI Data packet sent from a target to an initiator for a particular SCSI command that completed successfully may also optionally contain the Command Status for the data transfer. In this case, Sense Data cannot be sent together with the Command Status. If the command is completed with an error, then the response and sense data MUST be sent in a SCSI Response PDU (i.e., MUST NOT be sent in a SCSI Data packet). For Bi-directional commands, the status MUST be sent in a SCSI Response PDU. b0 S (status)- set to indicate that the Command Status field contains status. If this bit is set to 1 the F bit MUST also be set to 1 b1-2 as in an SCSI Response b3-6 not used (should be set to 0) The fields StatSN, Command Status, Residual Count have meaningful content only if the S bit is set to 1. Satran, J. Standards-Track, Expire November 2001 62 iSCSI July 20, 2001 2.8 Text Command The Text Command is provided to allow the exchange of information and for future extensions. It permits the initiator to inform a target of its capabilities or to request some special operations. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|X|I| 0x04 |F|B| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Reserved (0) | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| Reserved (0) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Reserved (0) | +---------------+---------------+---------------+---------------+ 24| CmdSN | +---------------+---------------+---------------+---------------+ 28| ExpStatSN | +---------------+---------------+---------------+---------------+ 32| Reserved (0) | | | +---------------+---------------+---------------+---------------+ 40| Bookmark or Reserved (0) | | | +---------------+---------------+---------------+---------------+ 48| Digests if any... | +---------------+---------------+---------------+---------------+ / DataSegment (Text) / +/ / +---------------+---------------+---------------+---------------+ 2.8.1 F (Final) Bit When set to 1 it indicates that this is the last or only text command in a sequence of commands; otherwise it indicates that more commands will follow. Satran, J. Standards-Track, Expire November 2001 63 iSCSI July 20, 2001 2.8.2 B (Bookmark-valid) Bit A value of 1 indicates that the Bookmark field is valid. 2.8.3 Initiator Task Tag The initiator assigned identifier for this Text Command. If the command is sent as part of a sequence of commands (e.g., the Login Phase or a sequence of Text commands) the Initiator Task Tag MUST be the same for all the commands within the sequence (similar to linked SCSI commands). 2.8.4 Bookmark An opaque handle copied from a previous text response. It is supposed to allow a target to transfer a large amount of textual data over a sequence of text-command/text-response exchanges. The target associates the bookmark it issues with the Initiator Task Tag and a received Bookmark is considered valid by the Target only if received with the same Initiator Task Tag and if the target did not receive on the same connection a text command with a different Initiator Text Tag since it issued the Bookmark. A target MAY reject an old Bookmark. The Bookmark field is valid only if the B bit is 1. Long text responses are handled as in the following example: I->T Text SendTargets=all (F=1,B=0) T->I Text (F=0,B=1,Bookmark) I->T Text (F=1,B=1,Bookmark) T->I Text (F=0,B=1,Bookmark) I->T Text (F=1,B=1,Bookmark) ... T->I Text (F=1,B=0) 2.8.5 Text The initiator sends the target a set of key=value or key=list pairs encoded in UTF-8 Unicode. All the text keys and text values specified in this document are to be presented and interpreted in the case they appear in this document (they are case sensitive). The key and value are separated by a '=' (0x3D) delimiter. Every key=value pair (including the last or only pair) MUST be followed by null (0x00) delimiter. A list is a set of values separated by comma (0x2C). Large binary items can be encoded using their hexadecimal Satran, J. Standards-Track, Expire November 2001 64 iSCSI July 20, 2001 representation (e.g., 8190 is 0x1FFE) or decimal representation. The maximum length of an individual value (not its string representation) is 255 bytes. The data lengths of a text command or response MUST NOT exceed 4096 bytes. Key names MUST NOT exceed 63 bytes. Key values MUST NOT exceed 255 characters. Character strings are represented as plain text. Numeric and binary values are represented using either decimal numbers or the hexadecimal 0xFFFF notation. Upper and lower case letters may be used interchangeably in hexadecimal notation (i.e., 0x1aBc, 0x1AbC and 0x1ABC are equivalent). The target responds by sending its response back to the initiator. The response text format is similar to the request text format. Some basic key=value pairs are described in Appendix A and D. All of these keys, except for the X- extension format, MUST be supported by iSCSI initiators and targets. Manufacturers may introduce new keys by prefixing them with X- followed by their (reversed) domain name, for example the company owning the domain acme.com can issue: X-com.acme.bar.foo.do_something=0000000000000003 Any other key not understood by the target may be ignored by the target without affecting basic function. However the Text Response for a key that was not understood MUST be key=NotUnderstood. Text operations are usually meant for parameter setting/negotiations but can be used also to perform some active operations. It is recommended that Text operations that will take a long time should be placed in their own Text command. A connection may have only one outstanding text command at any given time. Satran, J. Standards-Track, Expire November 2001 65 iSCSI July 20, 2001 2.9 Text Response The Text Response PDU contains the target's responses to the initiator's Text Command. The format of the Text field matches that of the Text Command. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|1|1| 0x24 |F|B| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Reserved (0) | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| Reserved (0) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Reserved (0) | +---------------+---------------+---------------+---------------+ 24| StatSN | +---------------+---------------+---------------+---------------+ 28| ExpCmdSN | +---------------+---------------+---------------+---------------+ 32| MaxCmdSN | +---------------+---------------+---------------+---------------+ 36| Reserved (0) | +---------------+---------------+---------------+---------------+ 40| Bookmark or Reserved (0) | | | +---------------+---------------+---------------+---------------+ 48| Digests if any... | +---------------+---------------+---------------+---------------+ / DataSegment (Text) / +/ / +---------------+---------------+---------------+---------------+ 2.9.1 F (Final) Bit When set to 1 in response to a text command with the Final bit set to 1 the F bit indicates that the target has finished it's operation. Otherwise if set to 0 in response to a text command with the Final Bit set to 1 it indicates that the target has more work to do (invites a follow-on text command). A text response with the F bit Satran, J. Standards-Track, Expire November 2001 66 iSCSI July 20, 2001 set to 1 in response to a text command with the F bit set to 0 is a protocol error. A text response with a F bit set to 1 MUST NOT contain key=value pairs that may require additional answers from the initiator. 2.9.2 B (Bookmark-valid) Bit A value of 1 indicates that the Bookmark field is valid. F bit must be 0. 2.9.3 Initiator Task Tag The Initiator Task Tag matches the tag used in the initial Text Command or the Login Initiator Task Tag. 2.9.4 Bookmark An opaque handle to be copied to the next text command by the initiator. It is supposed to allow a target to transfer a large amount of textual data over a sequence of text-command/text-response exchanges. The target associates the bookmark it issues with the Initiator Task Tag and a received Bookmark is considered valid by the Target only if received with the same Initiator Task Tag and if the target did not receive on the same connection a text command with a different Initiator Text Tag since it issued the Bookmark. A target MAY reject an old Bookmark. The Bookmark is valid only if the F bit is 0 and the B bit is 1. 2.9.5 Text Response Data The Text Response Data Segment contains responses in the same key=value format as the Text Command and with the same length and coding constraints. Appendix C lists some basic Text Commands and their Responses. Text response key=value pairs should be delivered in the same order as the command key=value pairs whenever applicable. Although the initiator is the requesting party and controls the request-response initiation and termination the target can offer key=value pairs of its own as part of a sequence and not only in response to an identical key=value pair offered by the initiator. Satran, J. Standards-Track, Expire November 2001 67 iSCSI July 20, 2001 2.10 Login Command After establishing a TCP connection between an initiator and a target, the initiator MUST issue a Login Command to gain further access to the target's resources. A Login Command MUST NOT be issued more than once on an iSCSI TCP connection. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|X|0| 0x03 |F| Reserved (0)| Version-max | Version-min | +---------------+---------------+---------------+---------------+ 4| Reserved (0) | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| CID | Reserved (0) | +---------------+---------------+---------------+---------------+ 12| ISID |TSID | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Reserved (0) | +---------------+---------------+---------------+---------------+ 24| CmdSN or Reserved (0) | +---------------+---------------+---------------+---------------+ 28| ExpStatSN or Reserved (0) | +---------------+---------------+---------------+---------------+ 32/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48/ DataSegment - Login Parameters in Text Command Format / +/ / +---------------+---------------+---------------+---------------+ 2.10.1 X - Restart If this bit is set to 1 then this command is an attempt to reinstate a failed connection. CID does not change and this command performs first the logout function of the old connection if an explicit logout was not performed earlier. In sessions with a single connection, this may imply the opening of a second connection with the sole purpose of cleaning-up the first. Targets should support opening a second connection even when not supporting multiple connections in full Satran, J. Standards-Track, Expire November 2001 68 iSCSI July 20, 2001 feature phase. A restart login indicates to the target that commands may be missing and therefore it should be handled immediately. 2.10.2 F (Final) Bit If set to 1 indicates that the initiator has no more parameters to set. 2.10.3 Version-max Maximum Version number supported. 2.10.4 Version-min Minimum Version supported The version number of the current draft is 0x2. 2.10.5 CID This is a unique ID for this connection within the session. 2.10.6 ISID This is an initiator-defined session-identifier. It MUST be the same for all connections within a session. An initiator is uniquely identified by the value pair (InitiatorName, ISID). When a target is detecting an attempt to open a new session by the same initiator (same InitiatorName and ISID) it MUST check if the old session is active. If it is not the old-session must be reset by the target and the new session established. Otherwise the Login MUST be terminated with a Login Response 2.10.7 CmdSN CmdSN is either the initial command sequence number of a session (for the first Login of a session - the "leading" login) or the command sequence number in the command stream (e.g., if the leading login carries the CmdSN 123 the next command carries the number 124 etc.). 2.10.8 ExpStatSN This is ExpStatSN for the old connection. Satran, J. Standards-Track, Expire November 2001 69 iSCSI July 20, 2001 This field is valid only if the X bit is set to 1. 2.10.9 Login Parameters The initiator MAY provide some basic parameters in order to enable the target to determine if the initiator may use the target's resources and the initial text parameters for the security exchange. The format of the parameters is as specified for the Text Command. Keys and their explanations are listed in the Appendixes. Satran, J. Standards-Track, Expire November 2001 70 iSCSI July 20, 2001 2.11 Login Response The Login Response indicates the progress and/or end of the login phase. Note that after security is established, the login response is authenticated. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|1|1| 0x23 |F| Reserved (0)| Version-max | Version-active| +---------------+---------------+---------------+---------------+ 4| Reserved (0) | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| Reserved (0) | +---------------+---------------+---------------+---------------+ 12| ISID |TSID | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Reserved (0) | +---------------+---------------+---------------+---------------+ 24| StatSN | +---------------+---------------+---------------+---------------+ 28| ExpCmdSN | +---------------+---------------+---------------+---------------+ 32| MaxCmdSN | +---------------+---------------+---------------+---------------+ 36| Status-Class | Status-Detail | Reserved (0) | +---------------+---------------+---------------+---------------+ 40/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48| Digests if any... | +---------------+---------------+---------------+---------------+ / DataSegment - Login Parameters in Text Command Format / +/ / +---------------+---------------+---------------+---------------+ 2.11.1 F (Final) bit Final bit is set to one in the Final Login Response. A Final bit of 0 indicates a "partial" response, which means "more negotiation needed". Satran, J. Standards-Track, Expire November 2001 71 iSCSI July 20, 2001 A login response with a F bit set to 1 MUST NOT contain key=value pairs that may require additional answers from the initiator. 2.11.2 Version-max This is the highest version number supported by the target. 2.11.3 Version-active/lowest Indicates the version supported (the highest version supported by the target and initiator). If the target is not supporting a version within the range specified by the initiator, the target rejects the login and this field indicates the lowest version supported by the target. 2.11.4 TSID The TSID is an initiator identifying tag set by the target. It MUST be valid only if the login is accepted and the F bit is 1 2.11.5 StatSN For the first Login Response this is the starting status Sequence Number for the connection (the next response of any kind will carry this number + 1). This field is valid only if the Status Class is 0. 2.11.6 Status-Class and Status-Detail The Status returned in a Login Response indicates the status of the login request. The status includes: Status-Class Status-Detail The Status-Class is sufficient for a simple initiator to use when handling errors, without having to look at the Status-Detail. The Status-Detail allows finer-grained error recovery for more sophisticated initiators, as well as better information for error logging. The status classes are as follows: Satran, J. Standards-Track, Expire November 2001 72 iSCSI July 20, 2001 0 - Success - indicates that the iSCSI target successfully received, understood, and accepted the request. The numbering fields (StatSN, ExpCmdSN, MaxCmdSN are valid only if Status- Class is 0). 1 - Redirection - indicates that further action must be taken by the initiator to complete the request. This is usually due to the target moving to a different address. All of the redirection status class responses MUST return one or more text key parameters of the type "TargetAddress", which indicates the target's new address. 2 - Initiator Error - indicates that the initiator likely caused the error. This MAY be due to a request for a resource for which the initiator does not have permission. 3 - Target Error - indicates that the target is incapable of fulfilling the request. The table below shows all of the currently allocated status codes. The codes are in hexadecimal; the first byte is the status class and the second byte is the status detail. The allowable state of the Final (F) bit in responses with each of the codes is also displayed. ----------------------------------------------------------------- Status | Code | F | Description |(hex) | bit | ----------------------------------------------------------------- Accept Login | 0000 | 1/0 | Login is OK, moving to Full Feature | | | Phase (F=1) or Operational Parameter | | | Negotiation (F=0). ----------------------------------------------------------------- Authenticate | 0001 | 0 | The iSCSI TargetName (ITN) exists and | | | authentication proceeds. ----------------------------------------------------------------- iSCSI Target | 0002 | 0 | The ITN must be provided Name required | | | for authentication to proceed. ----------------------------------------------------------------- Target Moved | 0101 | 1 | The requested ITN has moved Temporarily | | | temporarily to the address provided. ----------------------------------------------------------------- Target Moved | 0102 | 1 | The requested ITN has moved Permanently | | | permanently to the address provided. ----------------------------------------------------------------- Proxy Required| 0103 | 1 | The initiator must use an iSCSI | | | proxy for this target. Satran, J. Standards-Track, Expire November 2001 73 iSCSI July 20, 2001 | | | The address is provided. ----------------------------------------------------------------- Initiator | 0200 | 1 | Miscellaneous iSCSI initiator Error | | | errors ----------------------------------------------------------------- Security Nego-| 0201 | 1 | The security negotiation failed tiation Failed| | | ----------------------------------------------------------------- Forbidden | 0202 | 1 | The initiator is not allowed access Target | | | to the given target. ----------------------------------------------------------------- Not Found | 0203 | 1 | The requested ITN does not | | | exist at this address. ----------------------------------------------------------------- Target Removed| 0204 | 1 | The requested ITN has been | | | removed. No forwarding address is | | | provided. ----------------------------------------------------------------- Target | 0205 | 1 | Target is currently in use by Conflict | | | another initiator and does | | | not support multiple initiators. ----------------------------------------------------------------- Initiator | 0206 | 1 | Invalid TSID - no more connections SID error | | | accepted | | | ----------------------------------------------------------------- Missing | 0207 | 1 | Missing parameters (e.g., iSCSI parameter | | | Initiator and/or Target Name) ----------------------------------------------------------------- Can't include | 0208 | 1 | Target does not support session in session | | | spanning to this connection (address) ----------------------------------------------------------------- Session open | 0209 | 1 | The iSCSI InitiatorName and ISID already with | | | identify an existing session this Initiator| | | with this initiator ----------------------------------------------------------------- Session type | 020a | 1 | Target does not support this type of Not supported | | | of session (not from this Initiator) ----------------------------------------------------------------- Target Error | 0300 | 1 | Miscellaneous iSCSI target | | | errors (out of resources, etc.). ----------------------------------------------------------------- Service | 0301 | 1 | The iSCSI service or target is not Unavailable | | | currently operational. ----------------------------------------------------------------- Unsupported | 0302 | 1 | The required version is not Satran, J. Standards-Track, Expire November 2001 74 iSCSI July 20, 2001 version | | | supported by the target. ----------------------------------------------------------------- If the Status is "accept login" (0x0000) and the F bit is 1, the initiator may proceed to issue SCSI commands. If the Status is "accept login" (0x0000) and the F bit is 0, the initiator may proceed to negotiate operational parameters. The target MUST not set the Status to 0x'0000' and the F bit to 1 if the Login Command had the F bit set to 0. If the Status Class is not 0, the initiator and target MUST close the TCP connection. If the target wishes to reject the login request for more than one reason, it should return the primary reason for the rejection. 2.12 NOP-Out Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|X|I| 0x00 |P| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Reserved (0) | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| LUN or Reserved (0) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag or Reserved (0xffffffff) | +---------------+---------------+---------------+---------------+ 20| Target Transfer Tag or Reserved (0xffffffff) | +---------------+---------------+---------------+---------------+ 24| CmdSN | +---------------+---------------+---------------+---------------+ 28| ExpStatSN | +---------------+---------------+---------------+---------------+ 32/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48| Digests if any... | +---------------+---------------+---------------+---------------+ / DataSegment - Ping Data (optional) / +/ / Satran, J. Standards-Track, Expire November 2001 75 iSCSI July 20, 2001 +---------------+---------------+---------------+---------------+ The NOP-Out with the P bit set acts as a "ping command". This form of the NOP-Out can be used to verify that a connection is still active and all its components are operational. This command MAY use in-order delivery or immediate delivery. The NOP-Out may be useful in the case where an initiator has been waiting a long time for the response to some command, and the initiator suspects that there is some problem with the connection. When a target receives the NOP-Out with the Ping bit set, it should respond with a Ping Response, duplicating the data that was provided in the NOP-Out as much as possible. If the initiator does not receive the NOP-In within some time (determined by the initiator), or if the data returned by the NOP-In is different from the data that was in the NOP-Out, the initiator may conclude that there is a problem with the connection. The initiator then closes the connection and may try to establish a new connection. A NOP-Out should also be used to confirm a changed ExpStatSN if there is no other PDU to carry it for a long time. The NOP-Out can be sent by an initiator because of a NOP-In with the poll bit set. In this case the Target Tag copies the NOP-In value, the P bit MUST be 0 and I bit must be 1. 2.12.1 P (Ping) Bit Request a NOP-In 2.12.2 Initiator Task Tag An initiator assigned identifier for the operation. The NOP-Out MUST have the Initiator Task Tag set only if the P bit is 1. 2.12.3 Target Transfer Tag A target assigned identifier for the operation. The NOP-Out MUST have the Target Tag set only if it is issued in response to a NOP-In with the P bit one, in which case it copies the Target Transfer Tag from the NOP-In PDU. Satran, J. Standards-Track, Expire November 2001 76 iSCSI July 20, 2001 When the Target Transfer Tag is set, the LUN field is also copied from the NOP-In. 2.12.4 Ping Data Ping data is reflected in the Ping Response. Note that the length of the reflected data is limited to 4096 bytes and the initiator should avoid sending more than 4096 bytes. Satran, J. Standards-Track, Expire November 2001 77 iSCSI July 20, 2001 2.13 NOP-In Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|1|1| 0x20 |P| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Reserved (0) | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| LUN or Reserved (0) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag or Reserved (0xffffffff) | +---------------+---------------+---------------+---------------+ 20| Target Transfer Tag or Reserved (0xffffffff) | +---------------+---------------+---------------+---------------+ 24| StatSN | +---------------+---------------+---------------+---------------+ 28| ExpCmdSN | +---------------+---------------+---------------+---------------+ 32| MaxCmdSN | +---------------+---------------+---------------+---------------+ 36/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48| Digests if any... | +---------------+---------------+---------------+---------------+ / DataSegment - Return Ping Data / +/ / +---------------+---------------+---------------+---------------+ When a target receives the NOP-Out with the P bit set, it MUST respond with a NOP-In with the same Initiator Task Tag that was provided in the NOP-Out Command. It SHOULD also duplicate up to first 4096 bytes of the initiator provided Ping Data. For such a response, the P bit MUST be 0. 2.13.1 P bit A target may issue a NOP-In on its own to test the connection and the state of the initiator. If the target wants to test the initiator, it sets the P bit to 1 in order to ask for an answer from the initiator. Satran, J. Standards-Track, Expire November 2001 78 iSCSI July 20, 2001 In this case the Initiator Task Tag MUST be 0xffffffff and the Target Tag MUST be set to a valid value (not 0xffffffff). The LUN field MUST also contain a valid LUN. If the target wants only to test the connection, the P bit is set to 0 and both tags MUST hold the reserved value 0xffffffff. A target may also issue a NOP-In on its own to carry a changed ExpCmdSN and/or MaxCmdSN if there is no other PDU to carry them for a long time. Whenever the NOP-In is not issued in response to a NOP-Out the StatSN field will contain as usual the next StatSN but StatSN for this connection is not advanced. 2.13.2 Target Transfer Tag A target assigned identifier for the operation. 2.13.3 LUN A LUN MUST be set to a correct value when the P bit is set to 1 and the Target Transfer Tag is set. Satran, J. Standards-Track, Expire November 2001 79 iSCSI July 20, 2001 2.14 Logout Command The Logout command is used to perform a controlled closing of a connection. An initiator MAY use a logout command to remove a connection from a session or to close an entire session. After sending the Logout PDU, an initiator MUST NOT send any new iSCSI commands on the closing connection except SNACK and task management commands required for recovery. After receiving the Logout command the target completes all pending commands (device activity, data to/from the initiator, R2T and status transfers) that it deems fit to conclude, and then issues the Logout response and half-closes the TCP connection (sends FIN). After receiving the Logout response and the FIN the initiator MUST completely close the logging-out connection. Note that a Logout for a CID may be performed on a different transport connection when the TCP connection for the CID had already been terminated. In such a case, only a logical "closing" of the iSCSI connection for the CID is implied with a Logout. All commands that were not completed (with status) when the connection is closed completely can be restarted on a new connection if the target supports in session command recovery. All the commands that were completed but whose status was not acknowledged when the connection is closed completely are subject to command replay if the target supports command replay. If a closed connection has status that was unacknowledged that status is either associated to a new connection by a login or will be cleared after If an initiator intends to start recovery for a failing connection it MUST use either the Logout command to "clean-up" the target end of a failing connection and enable recovery to start, or use the restart option of the Login command for the same effect. In sessions with a single connection, this may imply the opening of a second connection with the sole purpose of cleaning-up the first. In this case, the restart option of the Login should be used. Byte / 0 | 1 | 2 | 3 | Satran, J. Standards-Track, Expire November 2001 80 iSCSI July 20, 2001 / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|0|I| 0x06 |1| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Reserved (0) | +---------------+---------------+---------------+---------------+ 8| CID or Reserved | Reserved (0) |Reason Code | +---------------+---------------+---------------+---------------+ 12| Reserved (0) | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Reserved (0) | +---------------+---------------+---------------+---------------+ 24| CmdSN | +---------------+---------------+---------------+---------------+ 28| ExpStatSN or (0) | +---------------+---------------+---------------+---------------+ 32/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48 2.14.1 CID This is the connection ID of the connection to be closed (including closing the TCP stream). This field is valid only if the reason code is not "close session". 2.14.2 ExpStatSN This is the last ExpStatSN value for the connection to be closed. 2.14.3 Reason Code Indicate the reason for Logout: 0 - closes the session - the session is closed - all commands associated with the session (if any) are aborted 1 - closes the connection - the connection is closed - all commands associated with connection (if any) are aborted 2 - removes the connection for recovery - connection is closed and all commands associated with it (if any) are to be prepared for a new allegiance Satran, J. Standards-Track, Expire November 2001 81 iSCSI July 20, 2001 3 - removes the connection at target's request (requested through an Asynchronous Message) - will result in a logout only if the target issued the message Satran, J. Standards-Track, Expire November 2001 82 iSCSI July 20, 2001 2.15 Logout Response The logout response is used by the target to indicate that the cleanup operation for the connection has completed. After Logout, the TCP connection referred by the CID MUST be closed at both ends (or all connections must be closed if the logout reason was session close). Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|1|1| 0x26 |1| Reserved (0) | +---------------+---------------+---------------+---------------+ 4/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Reserved (0) | +---------------+---------------+---------------+---------------+ 24| StatSN | +---------------+---------------+---------------+---------------+ 28| ExpCmdSN | +---------------+---------------+---------------+---------------+ 32| MaxCmdSN | +---------------+---------------+---------------+---------------+ 36| Response | Reserved (0) | +---------------------------------------------------------------+ 40| Parameter2 or Reserved (0) | Parameter3 or Reserved (0) | +---------------+---------------+---------------+---------------+ 44| Reserved (0) | +---------------+---------------+---------------+---------------+ 48 2.15.1 Response Logout response: 0 - connection or session closed successfully Satran, J. Standards-Track, Expire November 2001 83 iSCSI July 20, 2001 1 - CID not found 2 - cleanup failed for various reasons 2.15.2 Parameter2 Minimum time to wait before Login for a new session on this target in seconds. 2.15.3 Parameter3 Maximum time to wait for a Login that associates non acknowledged status to a new connection. After this time the status is discarded as acknowledged by hiatus. 2.16 SNACK Request Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|0|1| 0x10 |1|Reserved(0)|S| Reserved (0) | +---------------+---------------+---------------+---------------+ 4/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag or Reserved (0xffffffff) | +---------------+---------------+---------------+---------------+ 20| BegRun | +---------------+---------------+---------------+---------------+ 24| RunLength | +---------------+---------------+---------------+---------------+ 28| ExpStatSN/ExpDataSN | +---------------+---------------+---------------+---------------+ 32/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 48 Support for SNACK is optional. SNACK request is used to request retransmission of numbered- responses, data or R2T PDUs from the target. The SNACK request indicates to the target the missed numbered-response or data run, where the run is composed of an initial missed StatSN, DataSN or R2TSN and the number of additional missed Status, Data or R2T PDUs (0 means only the initial). Satran, J. Standards-Track, Expire November 2001 84 iSCSI July 20, 2001 The numbered-response, Data or R2T PDUs requested by a SNACK have to be delivered as exact replicas of the ones the initiator missed including all its flags. Any SNACK requesting a numbered-response, Data or R2T that was not sent by the target must be silently discarded. 2.16.1 S If 1, indicates that this is a Status SNACK - i.e. requesting a numbered response; otherwise it is a Data or R2T SNACK. Data/R2T SNACK for a command MUST precede status acknowledgement for the given command. For a Data/R2T SNACK, the Initiator Task Tag MUST be set to the Initiator Task Tag of the referenced Command. Otherwise, it is reserved. An iSCSI target that does not support recovery within connection MAY discard status SNACK. If the target supports command recovery within session it MAY discard the SNACK after which it MUST issue an Asynchronous Message PDU with an iSCSI event indicating "Request Logout". 2.16.2 BegRun First missed DataSN, R2TSN or StatSN 2.16.3 RunLength Number of additional sequential missed DataSN or StatSN. If BegRun is the only one missing, RunLength MUST be 0. 2.16.4 ExpStatSN/ExpDataSN ExpStatSN if S is 1 otherwise ExpDataSN. Satran, J. Standards-Track, Expire November 2001 85 iSCSI July 20, 2001 2.17 Ready To Transfer (R2T) Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|1|1| 0x31 |1| Reserved (0) | +---------------+---------------+---------------+---------------+ 4/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Target Transfer Tag | +---------------+---------------+---------------+---------------+ 24| StatSN | +---------------+---------------+---------------+---------------+ 28| ExpCmdSN | +---------------+---------------+---------------+---------------+ 32| MaxCmdSN | +---------------+---------------+---------------+---------------+ 36| R2TSN | +---------------+---------------+---------------+---------------+ 40| Buffer Offset | +---------------+---------------+---------------+---------------+ 44| Desired Data Transfer Length | +---------------------------------------------------------------+ 48 When an initiator has submitted a SCSI Command with data passing from the initiator to the target (WRITE), the target may specify which blocks of data it is ready to receive. In general, the target may request that the data blocks be delivered in whichever order is convenient for the target at that particular instant. This information is passed from the target to the initiator in the Ready To Transfer (R2T) PDU. In order to allow write operations without an explicit initial R2T, the initiator and target MUST have agreed to do so by sending the InitialR2T=no key-pair to each other, which happens either during Login or through the Text Command/Response mechanism. A R2T MAY be answered with one or more SCSI Data-out PDUs with a matching Target Transfer Tag. If a R2T is answered with a single Data Satran, J. Standards-Track, Expire November 2001 86 iSCSI July 20, 2001 PDU, the Buffer Offset in the Data PDU MUST be the same as the one specified by the R2T. The data length of the Data PDU MUST not exceed the Desired Data Length specified in the R2T. If the R2T is answered with a sequence of Data PDUs the Buffer Offset and Length must be within the range of those specified by R2T, the last PDU should have the F bit set to 1. The Data-Out PDU ordering is governed by DataDeliveryOrder. If DataDeliveryOrder is set to yes the Buffer Offsets and Lengths for consecutive PDUs should form a continuous non-overlapping range and the PDUs should be sent in increasing offset order. The target may send several R2T PDUs (up to a negotiated number) and thus have a number of data transfers pending. Within a connection, outstanding R2Ts MUST be fulfilled by the initiator in the order in which they were received. Buffer offset ordering in consecutive R2Ts is governed by EMDP. If EMDP is 0 consecutive R2Ts SHOULD refer to continuous non-overlapping ranges. However, even when EMDP is 0, a target MAY send out-of-order R2Ts (e.g., for recovery) and an initiator MAY choose to terminate a command when receiving an out-of-order R2T that in can't fulfill, with an appropriate response after aborting the command at the target with the appropriate task management command. 2.17.1 R2TSN R2TSN is the R2T PDU number (starting with 0) within the command identified by the Initiator Task Tag. The number of R2Ts in a command MUST be less than 0xffffffff. 2.17.2 StatSN The StatSN field will contain as usual the next StatSN but StatSN for this connection is not advanced. 2.17.3 Desired Data Transfer Length and Buffer Offset The target specifies how many bytes it wants the initiator to send because of this R2T PDU. The target may request the data from the initiator in several chunks, not necessarily in the original order of the data. The target, therefore, also specifies a Buffer Offset that indicates the point at which the data transfer should begin, relative to the beginning of the total data transfer. The Desired Data Transfer Length should not be 0. Satran, J. Standards-Track, Expire November 2001 87 iSCSI July 20, 2001 2.17.4 Target Transfer Tag The target assigns its own tag to each R2T request that it sends to the initiator. This tag can be used by the target to easily identify the data it receives. The Target Transfer Tag is copied in the outgoing data PDUs and is used by the target only. There is no protocol rule about Target Transfer Tag, but it is assumed that it is used to tag the response data to the target (alone or in combination with the LUN). Satran, J. Standards-Track, Expire November 2001 88 iSCSI July 20, 2001 2.18 Asynchronous Message An Asynchronous Message may be sent from the target to the initiator without corresponding to a particular command. The target specifies the status and reason for the event and sense data. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|1|1| 0x32 |1| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Reserved (0) | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| Logical Unit Number (LUN) | + + 12| | +---------------+---------------+---------------+---------------+ 16/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 24| StatSN | +---------------+---------------+---------------+---------------+ 28| ExpCmdSN | +---------------+---------------+---------------+---------------+ 32| MaxCmdSN | +---------------+---------------+---------------+---------------+ 36|SCSI Event |iSCSI Event | Parameter1 or Reserved (0) | +---------------+---------------+---------------+---------------+ 40| Parameter2 or Reserved (0) | Parameter3 or Reserved (0) | +---------------+---------------+---------------+---------------+ 44| Reserved (0) | +---------------+---------------+---------------+---------------+ 48| Digests if any... | +---------------+---------------+---------------+---------------+ / DataSegment - Sense Data / +/ / +---------------+---------------+---------------+---------------+ Some Asynchronous Messages are strictly related to iSCSI while others are related to SCSI [SAM2]. An Asynchronous Message may contain both types of events. Satran, J. Standards-Track, Expire November 2001 89 iSCSI July 20, 2001 Please note that StatSN counts this PDU as an acknowledgeable event, allowing initiator and target state synchronization. 2.18.1 iSCSI Event The codes used for iSCSI Asynchronous Messages (Events) are: 0 No iSCSI Event 1 Target is being reset. 2 Target requests Logout. This Async Message MUST be sent on the same connection as the one being requested to be logged out. Initiator MUST honor this request by issuing a Logout as early as possible, but no later than Parameter3 seconds. Initiator MUST send a Logout with a reason code of "Close the connection" to cleanly shutdown the connection. If the initiator does not Logout in Parameter3 seconds, the target MAY send an Async PDU with iSCSI event code "Dropped the connection" if possible, or simply terminate the transport connection. Parameter1 and Parameter2 are reserved. 3 Target indicates it will drop the connection - the Parameter1 field will indicate on what CID while the Parameter2 field indicates, in seconds, the minimum time to wait before attempting to reconnect and Parameter3 the maximum time to reconnect and/or restart commands after the initial wait (Parameter2). If the initiator does not attempt to reconnect and/or restart the outstanding commands, within the time specified by Parameter3 or Parameter3 is 0 the target will terminate all outstanding commands on this connection, no other responses should be expected from the target for the outstanding commands on this connection and the initiator should generate the appropriate responses. A value of 0 for Parameter2 indicates that reconnect can be attempted immediately. 4 Target indicates it will drop all the connections of this session - the Parameter2 field indicates, in seconds, the minimum time to wait before attempting to reconnect and Parameter3 the maximum time to reconnect and restart commands after the initial wait (Parameter2). If the initiator does not attempt to reconnect within the time specified by Parameter 3 or Parameter 3 is 0 the session is terminated. In this case, the target will terminate all outstanding commands in this session, no other responses should be expected from the target for the outstanding commands in this session and the initiator should generate the appropriate responses. A value of 0 for Parameter2 indicates that reconnect can be attempted immediately. Satran, J. Standards-Track, Expire November 2001 90 iSCSI July 20, 2001 2.18.2 SCSI Event The following values are defined. (See [SAM2] for details): 0 No SCSI Asynchronous Event is reported. 1 A SCSI Asynchronous Event is reported in the sense data. Sense Data that accompanies the report, in the data segment, identifies the condition. Example the event that reports that LU data has changed - a new LUN has been added to the target: Sense data will be: 0x710006000000000000000003f0e DataSegmentLength is 14 Satran, J. Standards-Track, Expire November 2001 91 iSCSI July 20, 2001 2.19 Reject Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|1|1| 0x3f |1| Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Reserved (0) | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 40| Reason | Reserved (0) | First Bad Byte or Rsvd(0) | +---------------+---------------+---------------+---------------+ 44| Reserved (0) | +---------------+---------------+---------------+---------------+ 48| Digests if any... | +---------------+---------------+---------------+---------------+ xx/ Complete Header of Bad PDU / +/ / +---------------+---------------+---------------+---------------+ yy It may happen that a target receives a PDU with a format error (e.g., inconsistent fields etc.) or a digest error (e.g., invalid payload or header). The target returns the header (not including digest) of the PDU in error as the data of the response. 2.19.1 Reason The reject Reason is coded as follows: 1 - Format Error 2 - Data (payload) Digest Error 3 - Data-SNACK Reject 4 - Command-Retry Reject 5 - Protocol Error (e.g., SNACK request for a status that was already acknowledged) 6 - Command-in-progress 7 - Command Replay Not Supported 8 - Immediate Command Reject - too many immediate commands 9 - Immediate Command Retry Reject - task not found Satran, J. Standards-Track, Expire November 2001 92 iSCSI July 20, 2001 10 - Bookmark rejected (old or different ITT) 15 - Full Feature Phase Command before login Some of the reject reasons terminate or prevent the creation of a task at the target and no retry is possible in those cases. Format error for a command, Command Retry Reject and Full Feature Phase Command before login are in this category. In all the cases in which creation of a SCSI task is prevented or a SCSI task is terminated because of the reject, the target must issue a proper SCSI command response including a Check Condition Status (0x02). The sense key to be used is iSCSI REJECT (the numeric value and format for additional-sense-data to be coordinated with T10). If the error is detected while data from the initiator is still expected (the command PDU did not contain all the data and the target has not received a Data PDU with the final bit Set) the target MUST wait until it receives the Data PDU with the F bit set before sending the Response PDU. 2.19.2 First Bad Byte For a format error reject, this is the offset of the first offending byte in the header. Satran, J. Standards-Track, Expire November 2001 93 iSCSI July 20, 2001 3. SCSI Mode Parameters for iSCSI This chapter describes fields and mode pages that control and report the behavior of the iSCSI protocol. All fields not described here MUST control the behavior of iSCSI devices as defined by the corresponding command set standard. The mode parameters cannot be set by SCSI mode-set but can be retrieved by SCSI mode-sense commands. The mode-set commands will be executed without really changing the values of the mode page parameters (to ensure that old programs using this mechanism will not fail). The mode parameters can be set only through text command negotiations. The text commands offer the added convenience that at the end of the exchange the value selected is known to both parties. 3.1 SCSI Disconnect-Reconnect Mode Page use in iSCSI The following outlines the SCSI Disconnect-Reconnect mode page usage for iSCSI: Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|P|0| 0x02 | 0x0e | Reserved(0) | +---------------+---------------+---------------+---------------+ 4| Reserved (0) | +---------------+---------------+---------------+---------------+ 8| Reserved (0) | MaximumBurstSize | +---------------+---------------+---------------+---------------+ 12|E| 0 |M| 0 | Reserved (0) | FirstBurstSize | +---------------+---------------+---------------+---------------+ 3.1.1 MaximumBurstSize Field (16 bit) This field is used by iSCSI to define the maximum data payload in iSCSI data PDUs or as immediate data in command PDUs in units of 512 bytes. This value can also be set by a text-mode key=value pair (DataPDULength). 3.1.2 E - Enable Modify Data Pointers Bit (EMDP) Data PDU Sequences can be in any order (EMDP = 1) or at continuously increasing addresses (EMDP = 0). EMDP can also be set by a text-mode key=value pair (DataOrder). Satran, J. Standards-Track, Expire November 2001 94 iSCSI July 20, 2001 3.1.3 D - Immediate Data Disable This field is used to control the use of immediate data. A value of 1 in this field means that Immediate Data are disabled. D can also be set by a text-mode key=value pair (ImmediateData). 3.1.4 FirstBurstSize Field (16 bit) This field is used by iSCSI to define the maximum amount of unsolicited data an iSCSI initiator is allowed to send to the target in units of 512 bytes. This value can also be set by a text-mode key=value pair (FirstBurstSize). 3.1.5 Other Fields No other fields in this page are used by iSCSI. 3.2 iSCSI Logical Unit Control Mode Page The following outlines the iSCSI Port mode page: Satran, J. Standards-Track, Expire November 2001 95 iSCSI July 20, 2001 Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|P|0| 0x18 | 0x02 | 0x05 | Reserved (0)|C| +---------------+---------------+---------------+---------------+ 3.2.1 Enable CRN (C) When this field is set to 1 the CRN field is considered by LU. This field is LU specific and can be set only through the SCSI Mode Set. 3.3 iSCSI Port Mode Page The following outlines the iSCSI Port mode page: Satran, J. Standards-Track, Expire November 2001 96 iSCSI July 20, 2001 Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0|P|0| 0x19 | 0x06 | 0x05 | Reserved (0) | +---------------+---------------+---------------+---------------+ 4| LogoutLoginMinTime | LogoutLoginMaxTime | +---------------+---------------+---------------+---------------+ 3.3.1 Protocol Identifier (iSCSI) This field is set to the iSCSI code set by T10 (xx) 3.3.2 LogoutLoginMinTime Minimum time in seconds from a target initiated logout (disconnect asynchronous message) or connection drop after which an initiator may attempt a login. This value is returned also as parameter2 in an asynchronous disconnect message. LogoutLoginMinTime can also be negotiated through the corresponding key=value pair in a text command. 3.3.3 LogoutLoginMaxTime Maximum time in seconds after logout or disconnect asynchronous message up to which recovery actions can be attempted (resources are maintained by targets). This value is returned also as parameter3 in an asynchronous disconnect message. LogoutLoginMaxTime can also be negotiated through the corresponding key=value pair in a text command. Satran, J. Standards-Track, Expire November 2001 97 iSCSI July 20, 2001 4. Login Phase In the rest of this chapter, whenever we mention security we mean security and/or data integrity. The login phase establishes an iSCSI session between initiator and target. It sets the iSCSI protocol parameters, security parameters, and authenticates the initiator and target to each other. Operational parameters MAY be negotiated within or outside (after) the login phase. Security MUST be completely negotiated within the Login Phase or provided by external means (e.g., IPSec). In some environments, a target or an initiator is not interested in authenticating its counterpart. It is possible to bypass authentication through the Login Command and Response. The initiator and target MAY want to negotiate authentication and data integrity parameters. Once this negotiation is completed, the channel is considered secure. Authentication and a Secure Channel setup MAY be performed independent of iSCSI (as when using tunneling IPSec or some implementations of transport IPSec) in which case the Login phase can be reduced to operational parameter negotiations. The login phase is implemented via login and text commands and responses only. The login command is sent from the initiator to the target in order to start the login phase. The login response is sent from the target to the initiator to conclude the login phase. Text PDUs are used to implement negotiation, establish security, and set operational parameters. The whole login phase is considered as a single task and has a single Initiator Task Tag (similar to the linked SCSI commands). The login phase sequence of commands and responses proceeds as follows: - Login command (mandatory) - Login Partial-Response (optional) - Text Command(s) and Response(s) (optional) - Login Final-Response (mandatory) Satran, J. Standards-Track, Expire November 2001 98 iSCSI July 20, 2001 The Login Final-response accepts or rejects the Login Command. The Login Final-Response that accepts a Login Command can come only as a response to a Login command with the F bit set to 1 or a Text Command with the F bit set to 1. 4.1 Login Phase Start The login phase starts with a login request via a login command from the initiator to the target. The login request includes: -Protocol version supported by the initiator (currently 0x'02') -Session and connection Ids -Security/Integrity Parameters OR -iSCSI operational parameters A target MAY use the iSCSI Initiator Name as part of its access control mechanism; therefore, the iSCSI Initiator Name MUST be sent before the target is required to disclose its LUs. If the iSCSI Target Name is going to be used in determining the security mode or it is implicit part of authentication, then the iSCSI Target Name MUST be sent in the login command for the first connection of a session to identify the storage endpoint of the session. If sent on new connections within an existing session it MUST be the same as the one used for the leading connection. If the iSCSI Target Name is going to be used only for access control, it can be sent after the Security Context Complete is achieved. An unknown target can be accessed by using "iSCSI" as a placeholder for the iSCSI Target Name. The iSCSI Names MUST be in text command format. The target can answer in the following ways: -Login Response with Login Reject (and F bit 1). This is an immediate rejection from the target that causes the session to terminate. -Login Response with Login Accept with session ID and iSCSI parameters and F bit set to 1. This is a valid response only if the Login Command also had the F bit set to 1. In this case, the target does not support any security or authentication mechanism and starts with the session immediately (enters full feature phase). Satran, J. Standards-Track, Expire November 2001 99 iSCSI July 20, 2001 -Login Response with F bit 0 indicating the start of a negotiation sequence. The response includes the protocol version supported by the target and either security/integrity parameters or iSCSI parameters (when no security/integrity mechanism is chosen) supported by the target. It also indicates what sequence is expected next (security/integrity or iSCSI parameters negotiation). The initiator MAY decide to drop the connection if the sequence is not what it expects (e.g., an initiator that expects a security/integrity sequence and gets a response indicating that iSCSI parameters negotiation is the next phase expected by the initiator). 4.2 iSCSI Security and Integrity Negotiation The security exchange sets the security mechanism and authenticates the user and the target to each other. The exchange proceeds according to the algorithms that were chosen in the negotiation phase and is conducted by the login and text commands key=value parameters. The negotiable security mechanisms include the following modes: -Initiator-target authentication - the host and the target authenticate themselves to each other. A negotiable algorithm such as Kerberos provides this feature. -PDU integrity - an integrity/authentication digest is attached to each packet. The algorithm is negotiable. Using IPsec for encryption or authentication may eliminate part of the security negotiation at the iSCSI level but not necessarily all. If security is established in the login phase note that: -After the security context negotiation is complete, each iSCSI PDU MUST include the appropriate digest field if any. -The iSCSI parameter negotiation (non-security parameters) SHOULD start only after security is established. This should be performed using text commands. The negotiation proceeds as follows: -The initiator sends a text command with an ordered list of the options it supports for each subject (authentication algorithm, iSCSI parameters and so on). The options are listed in the initiator's order of preference. Satran, J. Standards-Track, Expire November 2001 100 iSCSI July 20, 2001 -The target MUST reply with the first option in the list it supports and is allowed for the specific initiator. The parameters are encoded in UTF8 as key=value. The initiator MAY also send proprietary options. The "none" option, if allowed, MUST be included in the list, which indicates that no algorithm is supported by the target. The operational parameters should be negotiated only after security is established at the desired level (i.e., if security is to be established, the initiator MUST NOT send parameters other than security parameters in the login command). When establishing the security context, any operational parameters sent before establishing a secure context MUST be discarded by both the target and the initiator. For a list of security parameters see Appendix A. -Every party in the security negotiation indicates that it has completed building its security context (has all the required information) by sending the key=value pair: SecurityContextComplete=yes The other party either offers some more parameters or answers with the same: SecurityContextComplete=yes The party that is ready keeps sending the SecurityContextComplete=yes pair (in addition to new security parameters if required) until the handshake is complete. If the initiator has been the last to complete the handshake it MUST NOT start sending operational parameters within the same text command; a text response including only SecurityContextComplete=yes concludes the security sub-phase. If the target has been the last to complete the handshake, the initiator can start the operational parameter negotiation with the next text command; the security negotiation sub-phase ends with the target text response. The SecurityContextComplete handshake MUST be performed if any of negotiating parties has offered a security/integrity item (even if it is not selected). All PDUs sent after the security negotiation sub phase MUST be built using the agreed security. Satran, J. Standards-Track, Expire November 2001 101 iSCSI July 20, 2001 If the security negotiation fails at the target then the target MUST send the appropriate Login Response PDU. If the security negotiation fails at the initiator, the initiator shall drop the connection. 4.3 Operational Parameter Negotiation During the Login Phase Operational parameter negotiation during the login MAY be done: - starting with the Login command if the initiator does not offer any security/ integrity option - starting immediately after the security/integrity negotiation if the initiator and target perform such a negotiation An operational parameter negotiation on a connection SHOULD not start before the security/integrity negotiation if such a negotiation exists. Operational parameters negotiated inadvertently before the security/integrity negotiation MAY be reset after the security/integrity negotiation at the explicit request of the initiator or target. Operational parameter negotiation MAY involve several request- response exchanges (login and/or text) started and terminated by the initiator. The initiator MUST indicate its intent to terminate the negotiation by setting the F bit to 1; the target sets the F bit to 1 on the last response. The last response MUST be the Login Response. If the target responds to a text or Login command with the F bit set to 1, with a text response with the F bit set to 0, or a login response with the F bit set to 0, the initiator must keep sending the text command (even empty) with the F bit set to 1 until it gets the Login Response with the F bit set to 1. In a negotiation sequence, the F bit settings in one pair of text/login request-responses have no bearing on the F bit settings of the next pair. An initiator having F bit set to 1 in one pair and being answered with an F bit setting of 0 may issue the next request with F bit set to 0. Whenever parameter action or acceptance is dependent of other parameters the dependent parameters MUST be sent after the parameters they are depending on. If they are sent within the same command a response for a parameter might imply responses for others. Satran, J. Standards-Track, Expire November 2001 102 iSCSI July 20, 2001 A target MUST NOT send more than one Login Response with the F bit set to 0. An initiator MUST send a single Login command per connection, per session. For a list of operational parameters, see Appendix D. Satran, J. Standards-Track, Expire November 2001 103 iSCSI July 20, 2001 5. Operational Parameter Negotiation Outside the Login Phase Operational parameters MAY be negotiated outside (after) the login phase. Operational parameter negotiation MAY involve several text request- response exchanges always started and terminated by the initiator. The initiator MUST indicate its intent to terminate the negotiation by setting the F bit to 1; the target sets the F bit to 1 on the last response. If the target responds to a text command with the F bit set to 1, with a text response with the F bit set to 0, the initiator must keep sending the text command (even empty) with the F bit set to 1 until it gets the text response with the F bit set to 1. Responding to a text command with the F bit set to 1 with an empty (no key=value pairs) is not an error but is discouraged. In a negotiation sequence in the F bit settings in one pair of text request-responses have no bearing on the F bit settings of the next pair. An initiator having F bit set to 1 in one pair and being answered with an F bit setting of 0 may have next request issued with F bit set to 0. Whenever parameter action or acceptance is dependent of other parameters the dependent parameters MUST be sent after the parameters they are depending on. If they are sent within the same command a response for a parameter might imply responses for others. Satran, J. Standards-Track, Expire November 2001 104 iSCSI July 20, 2001 6. State transitions An iSCSI connection and an iSCSI session go through several well- defined states from the time the connection and the session are created to the time they are cleared. An iSCSI connection is a transport connection that is used for carrying out iSCSI activity. The connection state transitions are described in two separate but dependent state diagrams for ease of understanding. The first of these two is called a "standard connection state diagram" and it describes the connection state transitions when the iSCSI connection is not in connection recovery mode. The second diagram is called a "connection recovery state diagram" which describes the connection state transitions while performing connection recovery. The "session state diagram" describes the state transitions an iSCSI session would go through during its lifetime, and it depends on the states of possibly multiple iSCSI connections that are participating in the session. 6.1 Standard connection state diagram Symbolic names for States: S1: FREE S2: XPT_WAIT (illegal for target) S3: XPT_UP S4: LOGIN_SENT (initiator)/LOGIN_RCVD (target) S5: FAILED S6: EXITING S7: LOGGED_IN (full-feature phase) S8: LOGO_SENT (initiator)/LOGO_RCVD(target) S9: LOGGED_OUT S10: ASYNC_MSG_SENT (target)/ ASYNC_MSG_RCVD(initiator) S11: LOGO_FAILED S12: XPT_CLEANUP S13: BUSY Satran, J. Standards-Track, Expire November 2001 105 iSCSI July 20, 2001 Due to the number of states and the transitions involved in the description, the standard connection state diagram is defined using only a state transition table. Each row represents the starting state for a given transition, which after taking a transition marked in a table cell would end in the state represented by the column of the cell (for example, from state S1, the connection takes the T4 transition to arrive at state S3). Transitions that take place because of the same set of events, and which arrive into the same end state (from different starting states), share the same transition number, but are given different suffixes. The fields marked "-" correspond to undefined transitions. +-----+---+---+---+---+----+---+----+----+-----+----+-----+-----+ |S1 |S2 |S3 |S4 |S5 |S6 |S7 |S8 |S9 |S10 |S11 |S12 |S13 | ---+-----+---+---+---+---+----+---+----+----+-----+----+-----+-----+ S1| - |T1 |T4 | - | - | - | - | - | - | - | - | - | - | ---+-----+---+---+---+---+----+---+----+----+-----+----+-----+-----+ S2|T3 |- |T2 | - | - | - | - | - | - | - | - | - | - | ---+-----+---+---+---+---+----+---+----+----+-----+----+-----+-----+ S3|T21-1|- |- |T5 | - |T9-1| - | - | - | - | - | - | - | ---+-----+---+---+---+---+----+---+----+----+-----+----+-----+-----+ S4|T21-2|- |T8 | - |T7 |T9-2|T6 | - | - | - | - | - | - | ---+-----+---+---+---+---+----+---+----+----+-----+----+-----+-----+ S5|T21-3|- |- | - | - |T9-3| - | - | - | - | - | - | - | ---+-----+---+---+---+---+----+---+----+----+-----+----+-----+-----+ S6|T10 |- |- | - | - | - | - | - | - | - | - | - | - | ---+-----+---+---+---+---+----+---+----+----+-----+----+-----+-----+ S7| - |- |- | - | - | - | - |T11 | - |T13-1| - |T20-1|T19-1| ---+-----+---+---+---+---+----+---+----+----+-----+----+-----+-----+ S8| - |- |- | - | - | - | - |T15 |T12 | - |T16 |T20-2|T19-2| ---+-----+---+---+---+---+----+---+----+----+-----+----+-----+-----+ S9|T21-4|- |- | - | - |T9-4| - | - | - | - | - | - | - | ---+-----+---+---+---+---+----+---+----+----+-----+----+-----+-----+ S10| - |- |- | - | - | - | - |T14 | - |T13-2| - |T20-3|T19-3| ---+-----+---+---+---+---+----+---+----+----+-----+----+-----+-----+ S11| - |- |- | - | - | - | - | - | - | - | - |T17 |T19-4| ---+-----+---+---+---+---+----+---+----+----+-----+----+-----+-----+ S12| - |- |- | - | - | - | - | - | - | - | - | - |T18 | ---+-----+---+---+---+---+----+---+----+----+-----+----+-----+-----+ S13| - |- |- | - | - | - | - | - | - | - | - | - | - | ---+-----+---+---+---+---+----+---+----+----+-----+----+-----+-----+ State transition descriptions: T1: Transport connect request was made (ex: TCP SYN sent). (Initiator only) Satran, J. Standards-Track, Expire November 2001 106 iSCSI July 20, 2001 T2: Transport connection is established. (Initiator only) T3: Transport connection request had timed out or failed. (Initiator only) T4: Transport connection is established. (Target only) T5: iSCSI login was sent by the initiator (or was received for a target). T6: A login success was received/sent T7: A login redirection/initiator error/target error was received, or login timed out. (Initiator only) T8: A login redirection/initiator error/target error was sent. (Target only) T9-1, T9-2, T9-3, T9-4: Transport disconnect request was sent/indication received (ex: TCP FIN received/sent). T10: Both sides closed the transport connection. T11: Logout was sent by the initiator (or was received for a target). T12: Logout Response (success) was received by the initiator (or sent by the target) T13-1, T13-2: Async PDU with iSCSI event 2 received by the initiator (or sent by the target) T14: Logout was sent by the initiator (or was received for a target) T15: Async PDU with iSCSI event 2 received (initiator only) T16: Logout Response (failure) was received by the initiator (or sent by the target) T17: Transport disconnect request was sent/indication received (ex: TCP FIN received/sent). T18: Both sides closed the transport connection. T19-1, T19-2, T19-3, T19-4: Transport connection deemed non- responsive by either end; or transport RESET received by either; or Async PDU with iSCSI event 3 (for this CID), or event 4 received by the initiator. T20-1, T20-2, T20-3: Unexpected transport disconnect indication received by either side. T21-1, T21-2, T21-3, T21-4: Transport connection deemed non- responsive by either end; or transport RESET received by either end. The BUSY state (S13) implies that there are possibly iSCSI tasks that have not reached conclusion and are still considered busy. 6.2 Connection recovery state diagram Symbolic names for states: R1: BUSY (same as S13) Satran, J. Standards-Track, Expire November 2001 107 iSCSI July 20, 2001 R2: IN_RECOVERY R3: RECOVERY_DONE (same as S1) Whenever a connection state machine (say, CSM-R) enters the BUSY state (S13), it must go through the state transitions additionally described in the connection recovery state diagram. These additional state transitions may be traversed either by using a connection in the LOGGED_IN state with an explicit logout (let us call it CSM-E), or on a new transport connection in the FREE state with an implicit logout (let us call it CSM-I). This recovery state diagram hence is applicable only to the instance of the connection in recovery, i.e. CSM-R. In the case of an implicit logout for example, CSM-R reaches RECOVERY_DONE at the time CSM-I reaches LOGGED_IN. In the case of an explicit logout, CSM-R reaches RECOVERY_DONE when CSM-E receives a successful logout response while continuing to be in the LOGGED_IN state. Satran, J. Standards-Track, Expire November 2001 108 iSCSI July 20, 2001 State diagram - ------- / R1 \ +--\ /<-+ / ---+--- \ / | \ M3 M1 | |M2 | | | / | | / | | / | V / | ------- / | / R2 \ | \ / | ------- | | | |M4 | | | | | | | V | ------- | / R3 \ +---->\ / ------- State transition table: +----+----+----+ |R1 |R2 |R3 | -----+----+----+----+ R1 | - |M2 |M1 | -----+----+----+----+ R2 |M3 | - |M4 | -----+----+----+----+ R3 | - | - | - | -----+----+----+----+ State transition descriptions: M1: Connection state timeout happened on either side. M2: An implicit /explicit logout was sent by the initiator (or received by the target) - In CSM-I case, a recovery login was sent by the initiator (or received by the target) in state S1. [OR] Satran, J. Standards-Track, Expire November 2001 109 iSCSI July 20, 2001 - In CSM-E case, a logout was sent by the initiator (or received by the target) in state S7. M3: Logout failure detected - CSM-I failed to reach S7, instead arrived into S1. [OR] - CSM-E either moved out of S7/Logout timed out and/or aborted/Logout Response (failure) received by the initiator (or sent by the target). M4: Successful implicit/explicit logout was performed. - CSM-I reached state S7. [OR] - CSM-E stayed in S7, and received Logout Response (success) by the initiator (or sent by the target). 6.3 Session state diagram If any connection participating in a session is LOGGED_IN (S7), the session state is LOGGED_IN (Q3 below). The first connection entering into S7 and the last connection leaving S7 toggle the session state. Symbolic Names for States: Q1: FREE Q2: ACTIVE Q3: LOGGED_IN Q4: FAILED Satran, J. Standards-Track, Expire November 2001 110 iSCSI July 20, 2001 State diagram: ------- / Q1 \ +->\ /<-+ / ---+--- \ / | \ N4 N6 | |N1 | | | / | | / | | / | V / | ------- / | N7 / Q2 \ | +->\ /<-+ | |+--+----+- | | || | | N3 | ||N5 |N2 | | || | | | || | | | || | | | |V V | -+--+-- -----+- / Q4 \ N5 / Q3 \ \ /<---\ / ------- ------- State transition table: +----+----+----+----+ |Q1 |Q2 |Q3 |Q4 | -----+----+----+----+----+ Q1 | - |N1 | - | - | -----+----+----+----+----+ Q2 |N4 | - |N2 |N5 | -----+----+----+----+----+ Q3 | - |N3 | - |N5 | -----+----+----+----+----+ Q4 |N6 |N7 | - | - | -----+----+----+----+----+ State transition descriptions: N1: At least one transport connection was established for the session. Satran, J. Standards-Track, Expire November 2001 111 iSCSI July 20, 2001 N2: At least one transport connection reached the LOGGED_IN state . N3: Last LOGGED_IN connection had ceased to be LOGGED_IN. N4: Last participating transport connection was dropped. N5: Session failure (all connections reported BUSY, or recovery failed) N6: Session state timeout happened on either side. N7: Session recovery attempt with an implicit logout (i.e. login). Satran, J. Standards-Track, Expire November 2001 112 iSCSI July 20, 2001 7. iSCSI Error Handling and Recovery For any outstanding SCSI command, it is assumed that iSCSI in conjunction with SCSI at the initiator is able to keep enough information to be able to rebuild the command PDU, and that outgoing data is available (in host memory) for retransmission while the command is outstanding. It is also assumed that at target, incoming data (read data) MAY be kept for recovery or it can be re-read from a device server. It is further assumed that a target will keep the "status & sense" for a command it has executed, if it supports status retransmission or command replay. Many of the recovery details in an iSCSI implementation are a local matter, beyond the scope of protocol standardization. However, some external aspects of the processing must be standardized, to ensure interoperability. This section (and the corresponding appendix in more detail) describes a general model for recovery, in support of interoperability. Compliant implementations need not match details of this model as presented, but the external behavior of such implementations must correspond to the externally observable characteristics model. 7.1 Usage of retry bit (X bit) in recovery Retry bit in the iSCSI command PDUs is used to signal to the target that the initiator is re-attempting the command for one of the three reasons. (b)Initiator is attempting to "plug" (what it thinks are) the discontinuities in CmdSN ordering on the target end. These discontinuities may have been created because of discarded command PDUs due to digest errors or format errors. (c)Initiator is signaling its intent to continue an already active command (but with no current connection allegiance) as part of connection recovery. This means that a new connection allegiance is being established for the command, associating it to the connection on which the retry is being issued. (d)Initiator is attempting to "replay" an entire command that was already satisfied by the target. A retry request issued after the target has sent status but before the initiator has acknowledged it is interpreted by the target as a replay request. Satran, J. Standards-Track, Expire November 2001 113 iSCSI July 20, 2001 Note that the retry bit MUST NOT be used for any reasons other than these. All PDU retransmission (for data, or status) requests for a currently allegiant command in progress must be conveyed to the target using only the SNACK mechanism already described. This does not however constitute a requirement on initiators to use SNACK. Initiators as part of addressing reason (b) above may inadvertently issue retries for allegiant commands already in progress at times (i.e. targets did not see the discontinuities in CmdSN ordering). Targets MUST reject such command PDUs with a reason code of "Command in progress". Targets MUST use the same reason code for any replay requests (reason (d) above) that they receive before they had reported status for the command. This also helps the initiators to tune their command retransmission logic, identifies inadvertent connection allegiance switching attempts, while updating the initiator of the target view of the command. In satisfying a command retry (borne out of reason (c) above), the targets SHOULD continue the command from its current state, for example taking advantage of ExpDataSN in the command PDU for read commands (must be set to zero if there had been no data transfer). However, targets MAY choose to send/receive the entire data on a re- establishment of connection allegiance, and it is not considered an error. When the retry bit (X bit) is specified, the command PDU MUST carry the original Initiator Task Tag and the original operational attributes (ex. flags, function names, LUN, CDB etc.). In addition, it MUST hold the original CmdSN. It is optional for targets to support the replay functionality (as agreed by the CommandReplaySupport text key at the login time) and the allegiance switching (as agreed by the CommandFailoverSupport text key at the login time), while they MUST support the retry bit and the rest of the retry functionality described in this section. When a target does not implement replay, it MUST reject the command with a reason code of "Command Replay Not Supported". 7.2 Usage of Reject PDU in recovery Targets MUST NOT implicitly terminate an active task by sending a Reject PDU for any PDU exchanged during the life of the task. If the target decides to terminate the task, a Response PDU (SCSI, Text, Task etc.) must be returned by the target to conclude the task. If the task had never been active before the Reject (i.e. the Reject is Satran, J. Standards-Track, Expire November 2001 114 iSCSI July 20, 2001 on the command PDU), targets should not send any further responses since the command itself is being discarded. The above rule means that the initiators can eventually expect a response even on seeing Rejects, if the Reject is not for the command itself. The non-command Rejects only have diagnostic value in logging the errors, and they may be used for retransmission decisions as well by the initiators. 7.3 Format Errors Explicit violations of the rules stated in this document are format errors. While a session is active, whenever a target receives an iSCSI PDU with a format error, it MUST answer with a Reject iSCSI PDU with a Reason-code of Format Error. It MUST also provide a 2-byte offset of the first offending byte in the rejected PDU. When an initiator receives an iSCSI PDU with a format error, for which it has an outstanding task, it MUST abort the target task and report the error through an appropriate service response (e.g., Target Failure). The exact coding of the service response is outside the scope of this document. 7.4 Digest Errors When a target receives any iSCSI PDU with a header digest error, it MUST silently discard the PDU. When a target receives any iSCSI PDU with a payload digest error, it MUST answer with a Reject iSCSI PDU with a Reason-code of Data- Digest-Error and discard the PDU. - If the discarded PDU is an iSCSI data PDU, a) Target MAY request retransmission with a R2T. [OR] b) Target MUST answer with a command response PDU with a response-code of delivery-subsystem- failure and terminate the task. If the target chooses to implement this, it MUST wait to receive all the data (signaled by a Data PDU with the final bit Set for all outstanding R2Ts) before sending the command response PDU. - No further action is necessary for targets if the discarded PDU is a non-data PDU. Satran, J. Standards-Track, Expire November 2001 115 iSCSI July 20, 2001 When an initiator receives any iSCSI PDU with a header digest error, it MUST discard the PDU. When an initiator receives any iSCSI PDU with a payload digest error, it MUST discard the PDU. - If the discarded PDU is an iSCSI data PDU - a) Initiator MAY request the missing data PDU through SNACK. In its turn, the target MUST either reject the SNACK with a Reject PDU with a reason-code of Data-SNACK-Reject or resend the data PDU. [OR] b) Initiator MUST abort the task and terminate the command with an error. - If the discarded PDU is a response PDU - c) Initiator MAY replay the command as described in section 7.1. [OR] d) Initiator MAY alternately request PDU retransmission with a status SNACK. [OR] e) If the initiator does not choose to do either, it MUST logout the connection for recovery and continue the tasks on a different connection instance as described in section 7.1. - No further action is necessary for initiators if the discarded PDU is an unsolicited PDU. 7.5 Sequence Errors When an initiator receives an iSCSI R2T/data PDU with an out-of-order DataSN or a SCSI response PDU with an ExpDataSN implying missing data PDU(s), it means that the initiator must have hit a header or payload digest error on one or more earlier R2T/data PDUs. Initiator MUST address these implied digest errors as described in section 7.4. When a target receives a data PDU with an out-of-order DataSN, it means that the target must have hit a header or payload digest error on at least one of the earlier data PDUs. Target MUST address these implied digest errors as described in section 7.4. When an initiator receives an iSCSI status PDU with an out-of-order StatSN implying missing responses, it MUST address the one or more missing status PDUs as described in section 7.4. As a side effect of receiving the missing responses, the initiator may discover missing data PDUs. The initiator MUST NOT acknowledge the received responses until it has completed receiving all the data PDUs of a SCSI command. Satran, J. Standards-Track, Expire November 2001 116 iSCSI July 20, 2001 7.6 SCSI Timeouts An iSCSI initiator SHOULD attempt to plug a command sequence gap on the target end (in the absence of an acknowledgement of the command by way of ExpCmdSN) before the ULP timeout by re-transmitting the unacknowledged command with the retry bit set, as described in section 7.1. On a ULP timeout for a command that carried a CmdSN of n, if the ExpCmdSN is still less than (n+1) on ULP timeout, the iSCSI initiator MUST assume a session failure and take the appropriate actions as described in section 7.11.4. 7.7 Negotiation failures Text command and response sequences when used to set/negotiate operational parameters constitute the negotiation/parameter setting. A negotiation failure is considered one or both of the following: - None of the choices or the stated value is unacceptable to one negotiating side. - The text command timed out, and possibly aborted. The following two rules are to be used to address negotiation failures. - During Login, any failure in negotiation MUST be considered as the login process failure and the connection must be dropped. - A failure in negotiation while in the full-feature phase MUST terminate the entire negotiation sequence possibly consisting of a series of text commands using the same Initiator Task Tag. The operational parameters of the session or the connection MUST continue to be the values agreed upon during an earlier successful negotiation - i.e. any partial results of this unsuccessful negotiation must be undone. 7.8 Protocol Errors The authors recognize that mapping framed messages over a "stream" connection, such as TCP, makes the proposed mechanisms vulnerable to simple software framing errors. The introduction of framing mechanisms may be onerous for performance and bandwidth. Command Sequence Numbers and the above mechanisms for connection drop and reestablishment help handle this type of mapping errors. 7.9 Connection Failure Satran, J. Standards-Track, Expire November 2001 117 iSCSI July 20, 2001 iSCSI can keep a session in operation if it is able to keep/establish at least one TCP connection between the initiator and target in a timely fashion. It is assumed that targets and/or initiators recognize a failing connection by either transport level means (TCP), a gap in the command or response stream that is not filled for a long time, or by a failing iSCSI NOP-ping. The latter MAY be used periodically by highly reliable implementations. Initiators and targets MAY also use the keep-alive option on the TCP connection to enable early link failure detection on otherwise idle links. At connection failure, the initiator and target MUST either attempt connection recovery within the session or session recovery. 7.10 Session Errors If all the connections of a session fail and cannot be reestablished in a short time or if initiators detect protocol errors repeatedly, an initiator may choose to terminate a session and establish a new session. It terminates all outstanding requests with an appropriate response before initiating a new session. The target takes the following actions: - Resets the TCP connections (closes the session). - Aborts all Tasks in the task set for the corresponding initiator. 7.11 Recovery Levels iSCSI enables the following levels of recovery (in increasing coverage order): - within a command (i.e., without requiring command restart). - within a connection (i.e., without requiring the connection to be rebuilt but perhaps requiring command restart). - within a session (i.e., perhaps requiring connections to be rebuilt and commands to be reissued). - session recovery. The recovery scenarios detailed in the rest of this section are representative rather than exclusive. In every case, they detail the lowest level recovery that MAY be attempted. The implementer is left to decide under which circumstances to raise the recovery level and/or what recovery levels to implement. At all levels, the implementer has the choice of deferring errors to the SCSI initiator (with an appropriate response code), in which case Satran, J. Standards-Track, Expire November 2001 118 iSCSI July 20, 2001 the task, if any, has to be removed from the target and all the side- effects (like ACA) have to be considered. Recovery within a connection and within a task MUST NOT be attempted before the connection is in full feature phase. 7.11.1 Recovery Within-command At the target, the following cases lend themselves to within-command recovery: (1)Lost data PDU - a data PDU may be lost due to a header digest error or a data digest error. In case of a data digest error, the error is recognized immediately, and the target MAY request the missing data through R2T. In case of a header digest error, the target will recognize the missing data either when receiving a subsequent piece out of sequence or by a timeout in completing a sequence (no data or partial-data-and- no-F-bit). In this case, too, the target MAY request the missing data through a R2T. The timeout value to be used by a target is outside the scope of this document. At the initiator, the following cases lend themselves to within- command recovery: (1)Lost data PDU or lost R2T - a data PDU or R2T may be lost due to a header digest error or a data digest error. In case of a data digest error, the error is recognized immediately and the initiator MAY request the missing data through SNACK. In case of a header digest error, the initiator recognizes the missing data or R2T either when receiving a subsequent piece out of sequence or by a timeout in completing a sequence (no status). In this case, the initiator MAY request the missing data or R2T through a SNACK. Note however that an initiator SHOULD not request a missing R2T by a SNACK after a timeout to avoid a race; this action is better left to the target. The timeout value to be used by an initiator is outside the scope of this document. Both the iSCSI target and initiator MAY resort to a more drastic, not-within-command recovery procedure in any of these cases. Satran, J. Standards-Track, Expire November 2001 119 iSCSI July 20, 2001 An iSCSI target MAY reject a data-SNACK with a reject response of data SNACK rejected. In this case, it MUST terminate the command with an iSCSI command response of SNACK rejected; the task is terminated and no future action is expected at target and initiator. An iSCSI target on detecting missing data MAY terminate the command with an iSCSI error response of Delivery Subsystem Failure. 7.11.2 Recovery Within-connection At the initiator, the following cases lend themselves to within- connection recovery: (1)Lost iSCSI numbered Response recognized by either receiving it with a data digest error or receiving a Response PDU with a higher StatSN than expected. The initiator MAY request the missing responses through SNACK, in which case the target MUST reissue them. (2)Requests not acknowledged for a long time. Requests are acknowledged explicitly through ExpCmdSN or implicitly by receiving data and/or status. The initiator MAY reissue non- acknowledged commands. The reissued, non-acknowledged commands MUST carry their original CmdSN and the X (retry) flag set to 1. N.B. While the original connection for a command is still "active" (i.e., has not been logged-out or restarted), any command MUST be retried only on the original connection. After logging out the original connection, commands can be retried on a different connection, but MUST still carry the original CmdSN. At the target, the following cases lend themselves to within- connection recovery: (1)Status/Response not acknowledged for a long time. The target MAY issue a NOP-IN, with the P bit set to 1 or 0, which indicates in the StatSN field the next status number it is going to issue. This helps the initiator detect missing StatSN and issue a SNACK-status. The time to timeout by both initiator and target are outside the scope of this document. Both the iSCSI target and initiator MAY resort to a more drastic, not-within-connection recovery procedure in any of those cases. 7.11.3 Recovery Within-session Satran, J. Standards-Track, Expire November 2001 120 iSCSI July 20, 2001 At an iSCSI initiator, the following cases lend themselves to within session recovery: (1)TCP connection failure. The initiator MUST close the connection following which it MUST either Logout the failed connection, or Login with an implied Logout, and reissue all commands associated with the failed connection on another connection (that MAY be a newly established connection) with the X (retry) flag set to 1. N.B. The logout function is mandatory, while a new connection establishment is mandatory only if the failed connection was the last or only connection in the session N.B. As an alternative to Logout and reissue commands, the initiator MAY instead reset the target and terminate all outstanding commands with a service response indicating Delivery Subsystem Failure. The initiator MUST perform one of the two actions. (2)Receiving an Asynchronous Message requiring recovery Logout. The initiator MUST handle it as a TCP connection failure for the connection referred to in the PDU. At an iSCSI target, the following cases lend themselves to within- session recovery (1)TCP connection failure. The target MUST close the connection and then, if more than one connection is available, the target SHOULD send an Asynchronous Message indicating it has dropped the connection. Following that, the target will wait for the initiator to continue recovery. 7.11.4 Session Recovery Session recovery is to be performed when all other recovery attempts have failed. Very simple initiators and targets MAY perform session recovery on all iSCSI errors and therefore place the burden of recovery on the SCSI layer and above. Session recovery implies closing of all TCP connections, aborting at target all executing and queued tasks for the given initiator, terminating at initiator all outstanding SCSI commands with an appropriate SCSI service response and restarting a session on a new Satran, J. Standards-Track, Expire November 2001 121 iSCSI July 20, 2001 connection set (TCP connection establishment and login on all new connections). Satran, J. Standards-Track, Expire November 2001 122 iSCSI July 20, 2001 8. Notes to Implementers This section notes some of the performance and reliability considerations of the iSCSI protocol. This protocol was designed to allow efficient silicon and software implementations. The iSCSI tag mechanism was designed to enable RDMA at the iSCSI level or lower. The guiding assumption made throughout the design of this protocol was that targets are resource constrained relative to initiators. 8.1 Multiple Network Adapters The iSCSI protocol allows multiple connections, not all of which need go over the same network adapter. If multiple network connections are to be utilized with hardware support, the iSCSI protocol command- data-status allegiance to one TCP connection insure that there is no need to replicate information across network adapters or otherwise require them to cooperate. However, some task management commands may require some loose form of cooperation or replication at least on the target. 8.2 Autosense and Auto Contingent Allegiance (ACA) Autosense refers to the automatic return of sense data to the initiator in case a command did not complete successfully. iSCSI mandates support for autosense. ACA helps preserve ordered command execution in presence of errors. As iSCSI can have many commands in-flight between initiator and target iSCSI mandates support for ACA. 8.3 Task Management Commands and Immediate Delivery A task management commands may reach the target and, in the case immediate delivery was requested, be executed before all of the tasks it was meant to act upon have been delivered or even reached the target. It is assumed that, while pending delivery from iSCSI to SCSI at the target, commands are kept in an iSCSI queue at both the initiator and the target and that the target queue contains both commands and "holes" (placeholders for commands not received yet). The following general mechanism can be used to achieve the effect of ordered delivery for task management commands while enabling the "urgent" delivery that some of them imply and immediate execution of Satran, J. Standards-Track, Expire November 2001 123 iSCSI July 20, 2001 the task management commands. The mechanism manages discarding commands while they are in the iSCSI layer at the target and prevents these discarded commands from being delivered to the target's SCSI layer. The initiator must keep a record of these commands to determine which will not receive a response. The target does not generate a response to a command that is aborted while in the iSCSI layer. The "barrier list" described in the following sections is a list containing information relating to all task management commands marked for immediate delivery. At the Initiator when a relevant task management command marked for immediate delivery is issued: a) if ExpCmdSN is equal to CmdSN (there are no commands in the queue) skip to step c b) mark all pending commands with a CmdSN field between the current ExpCmdSN and the current CmdSN as candidates for cleanup and retain CmdSN of the task management command in a "barrier list". c) send the task management command for immediate delivery to the target Note: for clarity, the barrier list contains "items" and the command queue contains "entries" At initiator when updating ExpCmdSN: a) if the "barrier list" is empty or ExpCmdSN is less than the CmdSN of the oldest item in the barrier list then skip to step d b) remove the oldest barrier list item, and remove and silently discard all entries marked for cleanup having a CmdSN field less than ExpCmdSN. c) go to step a d) release all queued entries between the old and new ExpCmdSN from the queue. Note: Any entries that had been marked as a candidate for cleanup have now been delivered by the target to its SCSI layer. The SCSI layer will have to determine if they are to be aborted. At the target when receiving a relevant task management command for immediate delivery: a) if ExpCmdSN is equal to CmdSN skip to step c Satran, J. Standards-Track, Expire November 2001 124 iSCSI July 20, 2001 b) mark all pending entries (commands received and placeholders) with a CmdSN field between ExpCmdSN and the current CmdSN as candidates for cleanup and retain CmdSN in a "barrier list" including the referenced LUN (or an ALL marker) c) send the task management command to SCSI for immediate execution At target when updating ExpCmdSN (releasing ordered commands to SCSI): a) if the "barrier list" is empty or ExpCmdSN is less than the oldest item in the barrier list then skip to step d b) remove the oldest barrier list item and evaluate all queued entries that have a CmdSN field less than ExpCmdSN, removing and silently discarding each queued command that meets the criteria for cleanup candidacy (as specified by the task management function) c) go to step a d) release all queued entries between the old and new ExpCmdSN from the queue Note that this scheme will withstand connection recovery. The following table details the candidates for cleanup: +---+------------------+------------------------------------------+ |No | Function | Candidacy Selection | +---+------------------+------------------------------------------+ | 1 | Abort Task | The task that are identified by the | | | | Referenced Task Tag Field and initiator | +---+------------------+------------------------------------------+ | 2 | Abort Task Set | All tasks associated with the specified | | | | LUN and initiator. | +---+------------------+------------------------------------------+ | 3 | Clear ACA | No entries are marked for candidacy. | +---+------------------+------------------------------------------+ | 4 | Clear Task Set | All tasks associated with the specified | | | | LUN and initiator. For all other | | | | initiators all tasks at LUN with no | | | | regard to order. | +---+------------------+------------------------------------------+ 8.4 How to Abort Safely a Command that Was Not Received Satran, J. Standards-Track, Expire November 2001 125 iSCSI July 20, 2001 To abort safely a task for which the task abort answer is "Command Not Received Yet" the initiator must issue another abort command on the same connection as the original command unless this connection was logged out in which case it may send it on any connection. The expected response for the second abort is Function Complete (if the command did not arrive) or "Task was not in task set". 8.5 Synch and steering layer and performance Although a synch and steering layer is optional, an initiator/target without synch and steering working against a target/initiator demanding synch and steering may experience performance degradation caused by packet reordering and loss. Providing a synch and steering mechanism is recommended for all high-speed implementations. 8.6 Unsolicited data and performance Unsolicited data on write are meant to reduce the effect of latency on throughput (no R2T is needed to start sending data). In addition immediate data are meant to reduce the protocol overhead (both bandwidth and execution time). However negotiating an amount of unsolicited data for writes and sending less than the negotiated amount when the total data amount to be sent by a command is larger than the negotiated amount may negatively impact performance and may not be supported by all the targets. Satran, J. Standards-Track, Expire November 2001 126 iSCSI July 20, 2001 9. Security Considerations Historically, native storage systems have not had to consider security because their environments offered minimal security risks. That is, these environments consisted of storage devices either directly attached to hosts or connected via a subnet distinctly separate from the communications network. The use of storage protocols, such as SCSI, over IP networks requires that security concerns be addressed. iSCSI implementations MUST provide means of protection against active attacks (pretending as another identity, message insertion, deletion, and modification) and MAY provide means of protection against passive attacks (eavesdropping, gaining advantage by analyzing the data sent over the line). The following section describes the security protection modes to be provided by an iSCSI implementation. Authentication and a Secure Channel setup MAY be performed independent of iSCSI (as when using tunneling IPSec or some implementations of transport IPSec). 9.1 iSCSI Security Protection Modes 9.1.1 No Security This mode does not authenticate nor does it encrypt data. This mode should only be used in environments where the security risk is minimal and configuration errors are improbable. 9.1.2 Initiator-Target Authentication In this mode, the target authenticates the initiator and the initiator optionally authenticates the target. An attacker should not gain any advantage by inspecting the authentication phase PDUs (i.e., sending "clear password" is out of the question). This mode protects against an unauthorized access to storage resources by using a false identity (spoofing). Once the authentication phase is completed, all PDUs are sent and received in clear. This mode should only be used when there is minimal risk to man-in-the-middle attacks, eavesdropping, message insertion, deletion, and modification. 9.1.3 Data Integrity and Authentication This mode provides origin authentication and data integrity for every PDU that is sent after a security context is established. It protects Satran, J. Standards-Track, Expire November 2001 127 iSCSI July 20, 2001 against man-in-the-middle attacks, message insertion, deletion, and modification. It is possible to use different authentication mechanisms for headers and data. Every compliant iSCSI initiator and target MUST be able to provide initiator-target authentication and data integrity and authentication. This quality of protection MAY be achieved on every connection through properly configured IPSec involving only administrative (indirect) interaction with iSCSI implementations. 9.1.4 Encryption This mode provides data privacy in addition to data integrity and authentication, and protects against eavesdropping, man-in-the-middle attacks, message insertion, deletion, and modification. A connection or multiple connections MAY be protected end-to-end or partial-path (gateway tunneling) by using IPSec. Satran, J. Standards-Track, Expire November 2001 128 iSCSI July 20, 2001 10. IANA Considerations There will be a well-known port for iSCSI connections. This well- known port will be registered with IANA. Satran, J. Standards-Track, Expire November 2001 129 iSCSI July 20, 2001 11. References and Bibliography [AC] A Detailed Proposal for Access Control, Jim Hafner, T10/99-245 [BOOT] P. Sarkar & team draft-ietf-ips-iscsi-boot-01.txt [CAM] ANSI X3.232-199X, Common Access Method-3 (Cam-3) [Castagnoli93] Guy Castagnoli, Stefan Braeuer and Martin Herrman "Optimization of Cyclic Redundancy-Check Codes with 24 and 32 Parity Bits", IEEE Transact. on Communications, Vol. 41, No. 6, June 1993 [CRC] ISO 3309, High-Level Data Link Control (CRC 32) [NDT] M. Bakke & team, draft-ietf-ips-iSCSI- NamingAndDiscovery-00.txt [RFC793] Transmission Control Protocol, RFC 793 [RFC1122] Requirements for Internet Hosts-Communication Layer RFC1122, R. Braden (editor) [RFC1510] J. Kohl, C. Neuman, "The Kerberos Network Authentication Service (V5)", September 1993. [RFC1766] Alvestrand, H., "Tags for the Identification of Languages", March 1995. [RFC1964] J. Linn, "The Kerberos Version 5 GSS-API Mechanism", June 1996. [RFC1982] Elz, R., Bush, R., "Serial Number Arithmetic", RFC 1982, August 1996. [RFC1994] "W. Simpson, PPP Challenge Handshake Authentication Protocol (CHAP)", RFC 1994, August 1996. [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 3", RFC 2026, October 1996. [RFC2044] Yergeau, F., "UTF-8, a Transformation Format of Unicode and ISO 10646", October 1996. [RFC2119] Bradner, S. "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2025] C. Adams, "The Simple Public-Key GSS-API Mechanism (SPKM)", October 1996. [RFC2234] D. Crocker, P. Overell Augmented BNF for Syntax Specifications: ABNF [RFC2434] T. Narten, and H. Avestrand, "Guidelines for Writing an IANA Considerations Section in RFCs.", RFC2434, October 1998. [RFC2401] S. Kent, R. Atkinson, " Security Architecture for the Internet Protocol", RFC 2401, November 1998 [RFC2945], Wu, T., "The SRP Authentication and Key Exchange System", September 2000. [SAM2] ANSI X3.270-1998, SCSI-3 Architecture Model (SAM-2) [SBC] ANSI X3.306-199X, SCSI-3 Block Commands (SBC) [SCSI2] ANSI X3.131-1994, SCSI-2 Satran, J. Standards-Track, Expire November 2001 130 iSCSI July 20, 2001 [Schneier] Schneier, B., "Applied Cryptography: Protocols, Algorithms, and Source Code in C", 2nd edition, John Wiley & Sons, New York, NY, 1996. [SPC] ANSI X3.301-199X, SCSI-3 Primary Commands (SPC) Satran, J. Standards-Track, Expire November 2001 131 iSCSI July 20, 2001 12. Author's Addresses Julian Satran IBM, Haifa Research Lab MATAM - Advanced Technology Center Haifa 31905, Israel Phone +972 4 829 6264 Email: Julian_Satran@vnet.ibm.com Kalman Meth IBM, Haifa Research Lab MATAM - Advanced Technology Center Haifa 31905, Israel Phone +972 4 829 6341 Email: meth@il.ibm.com Ofer Biran IBM, Haifa Research Lab MATAM - Advanced Technology Center Haifa 31905, Israel Phone +972 4 829 6253 Email: biran@il.ibm.com Daniel F. Smith IBM Almaden Research Center 650 Harry Road San Jose, CA 95120-6099, USA Phone: +1 408 927 2072 Email: dfsmith@almaden.ibm.com Jim Hafner IBM Almaden Research Center 650 Harry Road San Jose, CA 95120 Phone: +1 408-927-1892 Email: hafner@almaden.ibm.com Costa Sapuntzakis Cisco Systems, Inc. 170 W. Tasman Drive San Jose, CA 95134, USA Phone: +1 408 525 5497 Email: csapuntz@cisco.com Mark Bakke Cisco Systems, Inc. Satran, J. Standards-Track, Expire November 2001 132 iSCSI July 20, 2001 6450 Wedgwood Road Maple Grove, MN USA 55311 Phone: +1 763-398-1000 E-Mail: mbakke@cisco.com Randy Haagens Hewlett-Packard Company 8000 Foothills Blvd. Roseville, CA 95747-5668, USA Phone: +1 (916) 785-4578 E-mail: Randy_Haagens@hp.com Matt Wakeley Agilent Technologies 1101 Creekside Ridge Drive Suite 100, M/S RH21 Roseville, CA 95661 Phone: +1 (916) 788-5670 E-Mail: matt_wakeley@agilent.com Efri Zeidner SANGate Israel efri@sangate.com Paul von Stamwitz (current address) TrueSAN Networks, Inc. Phone: +1(408)869-4219 E-mail: pvonstamwitz@truesan.com Luciano Dalle Ore Quantum Corp. Phone: +1(408) 232 6524 E-mail: ldalleore@snapserver.com Mallikarjun Chadalapaka Hewlett-Packard Company 8000 Foothills Blvd. Roseville, CA 95747-5668, USA Phone: +1 (916) 785-5621 E-mail: cbm@rose.hp.com Yaron Klein SANRAD 24 Raul Valenberg St. Satran, J. Standards-Track, Expire November 2001 133 iSCSI July 20, 2001 Tel-Aviv, 69719 Israel Phone: +972-3-7659998 E-mail: klein@sanrad.com Comments may be sent to Julian Satran Satran, J. Standards-Track, Expire November 2001 134 iSCSI July 20, 2001 Appendix A. iSCSI Security and Integrity 01 Security Keys and Values The parameters (keys) negotiated for security are: - Digests (HeaderDigest, DataDigest) - Authentication method (AuthMethod) Digests enable checking end-to-end data integrity beyond the integrity checks provided by the link layers and covering the whole communication path including all elements that may change the network level PDUs like routers, switches, proxies, etc. The following table lists cyclic integrity checksums that can be negotiated for the digests and MUST be implemented by every iSCSI initiator and target. Note that these digest options have only error detection significance. +---------------------------------------------+ | Name | Description | +---------------------------------------------+ | crc-32C | 32 bit CRC | 11EDC6F41 | +---------------------------------------------+ | none | no digest | +---------------------------------------------+ The generator polynomial for this digest is given in hex-notation, for example 3b stands for 0011 1011 - the polynomial x**5+X**4+x**3+x+1. The generator polynomial selected is evaluated in [Castagnioli93]. When using the CRC the CRC register must be initialized to all 1s (0xFFFFFFFF) and the CRC bits must be complemented before transmission. Padding bytes, when presents in a segment covered by a CRC, have to be set to 0 and are included in the CRC. Implementations MAY also negotiate digests with security significance for data authentication and integrity as detailed in the following table: Satran, J. Standards-Track, Expire November 2001 135 iSCSI July 20, 2001 +-------------------------------------------------------------+ | Name | Description | Definition | +-------------------------------------------------------------+ | KRB5_MD5 | the SGN_CKSUM field (8 bytes) | RFC1964 | | | of the GSS_GetMIC() token in | | | | GSS_KRB5_INTEG_C_QOP_MD5 QOP | | | | (partial MD5 ("MD2.5") ) | | +-------------------------------------------------------------+ | KRB5_DES_MD5 | the SGN_CKSUM field (8 bytes) | RFC1964 | | | of the GSS_GetMIC() token in | | | | GSS_KRB5_INTEG_C_QOP_DES_MD5 | | | | QOP (DES MAC of MD5) | | +-------------------------------------------------------------+ | KRB5_DES_MAC | the SGN_CKSUM field (8 bytes) | RFC1964 | | | of the GSS_GetMIC() token in | | | | GSS_KRB5_INTEG_C_QOP_ DES_MAC | | | | QOP (DES MAC) | | +-------------------------------------------------------------+ | SPKM | the int-cksum field of the | RFC2025 | | | SPKM-MIC token, calculated | | | | without the optional int-alg | | | | and snd-seq fields of the | | | | mic-header (i.e., the default | | | | I-ALG is used, and sequencing | | | | service is not used). | | +-------------------------------------------------------------+ Note: the KRB5_* digests are allowed only when combined with KRB5 authentication method (see below) (i.e., the initiator may offer one of these digests only if it also offers KRB5 as AuthMethod, and the target may respond with one of these digests only if it also responds with KRB5 as the AuthMethod). Similarly, the SPKM digest is allowed only when combined with SPKM-1 or SPKM-2 authentication methods (see below). Other and proprietary algorithms MAY also be negotiated. The none value is the only one that MUST be supported. The following table details authentication methods: Satran, J. Standards-Track, Expire November 2001 136 iSCSI July 20, 2001 +------------------------------------------------------------+ | Name | Description | +------------------------------------------------------------+ | KRB5 | Kerberos V5 | +------------------------------------------------------------+ | SPKM-1 | Simple Public-Key GSS-API Mechanism | +------------------------------------------------------------+ | SPKM-2 | Simple Public-Key GSS-API Mechanism | +------------------------------------------------------------+ | SRP | Secure Remote Password | +------------------------------------------------------------+ | CHAP | Challenge Handshake Authentication Protocol| +------------------------------------------------------------+ | none | No authentication | +------------------------------------------------------------+ KRB5 is defined in [RFC1510], SPKM-1, SPKM-2 are defined in [RFC2025], Secure Remote Password is defined in [RFC2945] and CHAP is defined in [RFC1994]. Initiator and target MUST implement SRP. 02 Authentication The authentication exchange authenticates the initiator to the target, and optionally the target to the initiator. Authentication is not mandatory and is distinct from the data integrity exchange. The authentication methods to be used are KRB5, SPKM-1, SPKM-2, SRP, CHAP, or proprietary. For KRB5 (Kerberos V5) [RFC1510], the initiator MUST use: KRB_AP_REQ= where KRB_AP_REQ is the client message as defined in [RFC1510]. If the initiator authentication fails, the target MUST return an error. Otherwise, if the initiator has selected the mutual authentication option (by setting MUTUAL-REQUIRED in the ap-options field of the KRB_AP_REQ), the target MUST reply with: KRB_AP_REP= where KRB_AP_REP is the server's response message as defined in Satran, J. Standards-Track, Expire November 2001 137 iSCSI July 20, 2001 [RFC1510]. KRB_AP_REQ,KRB_AP_REP are large binaries encoded as hexadecimal strings. For SPKM-1,SPKM-2 [RFC2025], the initiator MUST use: SPKM-REQ= where SPKM-REQ is the first initiator token as defined in [RFC2025]. [RFC2025] defines situations where each side may send an error token which may cause the peer to re-generate and resend his last token. This scheme is followed in iSCSI, and the error token syntax is: SPKM-ERROR= However, SPKM-DEL tokens that are defined by [RFC2025] for fatal errors will not be used by iSCSI. If the target needs (by [RFC2025]) to send SPKM-DEL token, it will, instead, send a Login "login reject" message and terminate the connection. If the initiator needs to send SPKM-DEL token, it will just abort the connection. In the sequel, we assume that no SPKM-ERROR tokens are required: If the initiator authentication fails, the target MUST return an error. Otherwise, if the AuthMethod is SPKM-1 or if the initiator has selected the mutual authentication option (by setting mutual-state bit in the options field of the REQ-TOKEN in the SPKM-REQ), the target MUST reply with: SPKM-REP-TI= where SPKM-REP-TI is the target token as defined in [RFC2025]. If mutual authentication was selected and target authentication fails, the initiator MUST abort the connection. Otherwise, if the AuthMethod is SPKM-1, the initiator MUST continue with: SPKM-REP-IT= where SPKM-REP-IT is the second initiator token as defined in [RFC2025]. Satran, J. Standards-Track, Expire November 2001 138 iSCSI July 20, 2001 All the SPKM-* tokens are large binaries encoded as hexadecimal strings. For SRP [RFC2945], the initiator MUST use: U= TargetAuth=yes /* or TargetAuth=no */ The target MUST either return an error or reply with: N= g= s= The initiator MUST continue with: A= The target MUST either return an error or reply with: B= The initiator MUST either abort or continue with: M= If the initiator authentication fails, the target MUST return an error. Otherwise, If the initiator sent TargetAuth=yes in the first message (requiring target authentication) the target MUST reply with: HM= Where U, N, g, s, A, B, M and H(A | M | K) are defined in [RFC2945]. U is a text string, N,g,s,A,B,M and H(A | M | K) are numbers. For CHAP [RFC1994], the initiator MUST use: A= Where A1,A2... are proposed algorithms, in order of preference. The target MUST either return an error or reply with: A= I= C= Where A is one of A1,A2... that were proposed by the initiator. Satran, J. Standards-Track, Expire November 2001 139 iSCSI July 20, 2001 The initiator MUST continue either with: N= R= or, if he requires target authentication, with: N= R= I= C= If the initiator authentication fails, the target MUST return an error. Otherwise, if the initiator required target authentication, the target MUST reply with N= R= Where N, (A,A1,A2), I, C, R are (correspondingly) the Name, Algorithm, Identifier, Challenge and Response as defined in [RFC1994]. N is a text string, A,A1,A2,I are numbers and C,R are large binaries encoded as hexadecimal strings. For the Algorithm, as stated in [RFC1994], one value is required to be implemented: 5 (CHAP with MD5) To guarantee interoperability, initiators SHOULD always offer it as one of the proposed algorithms. 03 Login Phase Examples In the first example, the initiator and target authenticate each other via Kerberos: I-> Login InitiatorName=iqn.com.os.hostid.77 TargetName=iqn.com.acme.diskarray.sn.88 HeaderDigest=KRB5_MD5,KRB5_DES_MAC,crc-32C,none DataDigest=crc-32C,none AuthMethod=SRP,KRB5,none T-> Login-PR HeaderDigest=KRB5_MD5 DataDigest=crc-32C AuthMethod=KRB5 (Login-PR stands for Login-Partial-Response) I-> Text KRB_AP_REQ= (krb_ap_req contains the Kerberos V5 ticket and authenticator with MUTUAL-REQUIRED set in the ap-options field) Satran, J. Standards-Track, Expire November 2001 140 iSCSI July 20, 2001 If the authentication is successful, the target proceeds with: T-> Text KRB_AP_REP= SecurityContextComplete=yes (krb_ap_rep is the Kerberos V5 mutual authentication reply) If the authentication is successful, the initiator proceeds: I-> Text SecurityContextComplete=yes T-> Text SecurityContextComplete=yes From this point on, any Text command and each PDU thereafter has a KRB5_MD5 digest for the header and a crc-32C for the data. The initiator may proceed: I-> Text ... iSCSI parameters T-> Text ... iSCSI parameters And at the end: I-> Text optional iSCSI parameters F bit set to 1 T-> Login "login accept" TargetName=iqn.com.acme.diskarray.sn.88 If the initiator authentication by the target is not successful, the target responds with: T-> Login "login reject" instead of the Text KRB_AP_REP message, and terminates the connection. If the target authentication by the initiator is not successful, the initiator terminates the connection (without responding to the Text KRB_AP_REP message). In the next example only the initiator is authenticated by the target via Kerberos: I-> Login InitiatorName=iqn.com.os.hostid.77 TargetName=iqn.com.acme.diskarray.sn.88 HeaderDigest=KRB5_MD5,KRB5_DES_MAC,crc-32C,none DataDigest=crc-32C,none AuthMethod=SRP,KRB5,none T-> Login-PR HeaderDigest=KRB5_MD5 DataDigest=crc-32C AuthMethod=KRB5 Satran, J. Standards-Track, Expire November 2001 141 iSCSI July 20, 2001 I-> Text KRB_AP_REQ=krb_ap_req SecurityContextComplete=yes (MUTUAL-REQUIRED not set in the ap-options field of krb_ap_req) If the authentication is successful, the target proceeds with: T-> Text SecurityContextComplete=yes From this point on, any Text command and each PDU thereafter MUST have a KRB5_MD5 digest for the header and a crc-32C for the data. I-> Text ... iSCSI parameters T-> Text ... iSCSI parameters . . . T-> Login "login accept" TargetName=iqn.com.acme.diskarray.sn.88 In the next example, the initiator and target authenticate each other via SPKM-1: I-> Login InitiatorName=iqn.com.os.hostid.77 TargetName=iqn.com.acme.diskarray.sn.88 HeaderDigest=KRB5_MD5,KRB5_DES_MAC,SPKM,crc-32C,none DataDigest=crc-32C,SPKM,none AuthMethod=SPKM-1,KRB5,none T-> Login-PR HeaderDigest=SPKM DataDigest=SPKM AuthMethod=SPKM-1 I-> Text SPKM-REQ= (spkm-req is the SPKM-REQ token with the mutual-state bit in the options field of the REQ-TOKEN set) T-> Text SPKM-REP-TI= If the authentication is successful, the initiator proceeds: I-> Text SPKM-REP-IT= SecurityContextComplete=yes If the authentication is successful, the target proceeds with: T-> Text SecurityContextComplete=yes Satran, J. Standards-Track, Expire November 2001 142 iSCSI July 20, 2001 From this point on, any Text command and each PDU thereafter has SPKM digests for the header and data. The initiator may proceed: I-> Text ... iSCSI parameters T-> Text ... iSCSI parameters And at the end: I-> Text optional iSCSI parameters F bit set to 1 T-> Login "login accept" TargetName=iqn.com.acme.diskarray.sn.88 If the target authentication by the initiator is not successful, the initiator terminates the connection (without responding to the Text SPKM-REP-TI message). If the initiator authentication by the target is not successful, the target responds with: T-> Login "login reject" instead of the Text SecurityContextComplete=yes message, and terminates the connection. In the next example, the initiator and target authenticate each other via SPKM-2: I-> Login InitiatorName=iqn.com.os.hostid.77 TargetName=iqn.com.acme.diskarray.sn.88 HeaderDigest=SPKM,crc-32C,none DataDigest=crc-32C,SPKM,none AuthMethod=SPKM-1,SPKM-2,none T-> Login-PR HeaderDigest=SPKM DataDigest=SPKM AuthMethod=SPKM-2 I-> Text SPKM-REQ= SecurityContextComplete=yes (spkm-req is the SPKM-REQ token with the mutual-state bit in the options field of the REQ-TOKEN not set) If the authentication is successful, the target proceeds with: T-> Text SecurityContextComplete=yes Satran, J. Standards-Track, Expire November 2001 143 iSCSI July 20, 2001 From this point on, any Text command and each PDU thereafter has SPKM digests for the header and data. The initiator may proceed: I-> Text ... iSCSI parameters T-> Text ... iSCSI parameters And at the end: I-> Text optional iSCSI parameters F bit set to 1 T-> Login "login accept" TargetName=iqn.com.acme.diskarray.sn.88 In the next example, the initiator and target authenticate each other via SRP: I-> Login InitiatorName=iqn.com.os.hostid.77 TargetName=iqn.com.acme.diskarray.sn.88 HeaderDigest=crc- 32C,none DataDigest=crc-32C,none AuthMethod=KRB5,SRP,none T-> Login-PR HeaderDigest=crc-32C DataDigest=crc-32C AuthMethod=SRP I-> Text U= TargetAuth=yes T-> Text N= g= s= I-> Text A= T-> Text B= I-> Text M= If the initiator authentication is successful, the target proceeds: T-> Text HM= SecurityContextComplete=yes If the target authentication is not successful, the initiator terminates the connection. Otherwise it proceeds: I-> Text SecurityContextComplete=yes T-> Text SecurityContextComplete=yes Where N, g, s, A, B, M, and H(A | M | K) are defined in [RFC2945]. From this point on, any Text command and each PDU thereafter has a crc-32C digest for the header and the data. Satran, J. Standards-Track, Expire November 2001 144 iSCSI July 20, 2001 I-> Text ... iSCSI parameters T-> Text ... iSCSI parameters And at the end: I-> Text optional iSCSI parameters and F bit set to 1 T-> Login "login accept" TargetName=iqn.com.acme.diskarray.sn.88 If the initiator authentication is not successful, the target responds with: T-> Login "login reject" Instead of the T-> Text HM= message and terminates the connection. In the next example only the initiator is authenticated by the target via SRP: I-> Login InitiatorName=iqn.com.os.hostid.77 TargetName=iqn.com.acme.diskarray.sn.88 HeaderDigest=crc- 32C,none DataDigest=crc-32C,none AuthMethod=KRB5,SRP,none T-> Login-PR HeaderDigest=crc-32C DataDigest=crc-32C AuthMethod=SRP I-> Text U= TargetAuth=no T-> Text N= g= s= I-> Text A= T-> Text B= I-> Text M= SecurityContextComplete=yes If the initiator authentication is successful, the target proceeds: T-> Text SecurityContextComplete=yes From this point on, any Text command and each PDU thereafter has a crc-32C digest for the header and the data. I-> Text ... iSCSI parameters T-> Text ... iSCSI parameters And at the end: Satran, J. Standards-Track, Expire November 2001 145 iSCSI July 20, 2001 I-> Text optional iSCSI parameters and F bit set to 1 T-> Login "login accept" TargetName=iqn.com.acme.diskarray.sn.88 In the next example the initiator and target authenticate each other via CHAP: I-> Login InitiatorName=iqn.com.os.hostid.77 TargetName=iqn.com.acme.diskarray.sn.88 HeaderDigest=crc- 32C,none DataDigest=crc-32C,none AuthMethod=KRB5,CHAP,none T-> Login-PR HeaderDigest=crc-32C DataDigest=crc-32C AuthMethod=CHAP I-> Text A= T-> Text A= I= C= I-> Text N= R= I= C= If the initiator authentication is successful, the target proceeds: T-> Text N= R= SecurityContextComplete=yes If the target authentication is not successful, the initiator abort the connection. Otherwise it proceeds: I-> Text SecurityContextComplete=yes T-> Text SecurityContextComplete=yes From this point on, any Text command and each PDU thereafter has a crc-32C digest for the header and the data. I-> Text ... iSCSI parameters T-> Text ... iSCSI parameters And at the end: I-> Text optional iSCSI parameters and F bit set to 1 T-> Login "login accept" TargetName=iqn.com.acme.diskarray.sn.88 If the initiator authentication is not successful, the target responds with: T-> Login "login reject" Satran, J. Standards-Track, Expire November 2001 146 iSCSI July 20, 2001 Instead of the Text R= SecurityContextComplete=yes message and terminates the connection. In the next example only the initiator is authenticated by the target via CHAP: I-> Login InitiatorName=iqn.com.os.hostid.77 TargetName=iqn.com.acme.diskarray.sn.88 HeaderDigest=crc- 32C,none DataDigest=crc-32C,none AuthMethod=KRB5,CHAP,none T-> Login-PR HeaderDigest=crc-32C DataDigest=crc-32C AuthMethod=CHAP I-> Text A= T-> Text A= I= C= I-> Text N= R= SecurityContextComplete=yes If the initiator authentication is successful, the target proceeds: T-> Text SecurityContextComplete=yes I-> Text ... iSCSI parameters T-> Text ... iSCSI parameters And at the end: I-> Text optional iSCSI parameters and F bit set to 1 T-> Login "login accept" TargetName=iqn.com.acme.diskarray.sn.88 In the next example, the initiator does not offer any security/integrity parameters, so it may offer iSCSI parameters on the Login PDU with the F bit set to 1, and the target may respond with a final Login Response PDU immediately: I-> Login InitiatorName=iqn.com.os.hostid.77 TargetName=iqn.com.acme.diskarray.sn.88 ... iSCSI parameters T-> Login "login accept" TargetName=iqn.com.acme.diskarray.sn.88 ... ISCSI parameters In the next example, the initiator does offer security/integrity parameters on the Login PDU, but the target does not choose any (i.e., chooses the "none" values): I-> Login InitiatorName=iqn.com.os.hostid.77 Satran, J. Standards-Track, Expire November 2001 147 iSCSI July 20, 2001 TargetName=iqn.com.acme.diskarray.sn.88 HeaderDigest=crc- 32C,none DataDigest=crc-32C,none AuthMethod:KRB5,SRP,none T-> Login-PR, HeaderDigest=none, DataDigest=none, AuthMethod=none I-> Text SecurityContextComplete=yes T-> Text SecurityContextComplete=yes I-> Text ... iSCSI parameters T-> Text ... iSCSI parameters And at the end: I-> Text optional iSCSI parameters F bit set to 1 T-> Login "login accept" TargetName=iqn.com.acme.diskarray.sn.88 Note that SecurityContextComplete=yes is required although no security mechanism was chosen. Satran, J. Standards-Track, Expire November 2001 148 iSCSI July 20, 2001 Appendix B. Examples 04 Read Operation Example |Initiator Function| PDU Type | Target Function | +------------------+-----------------------+----------------------+ | Command request |SCSI Command (READ)>>> | | | (read) | | | +------------------+-----------------------+----------------------+ | | | Prepare Data Transfer| +------------------+-----------------------+----------------------+ | Receive Data | <<< SCSI Data | Send Data | +------------------+-----------------------+----------------------+ | Receive Data | <<< SCSI Data | Send Data | +------------------+-----------------------+----------------------+ | Receive Data | <<< SCSI Data | Send Data | +------------------+-----------------------+----------------------+ | | <<< SCSI Response |Send Status and Sense | +------------------+-----------------------+----------------------+ | Command Complete | | | +------------------+-----------------------+----------------------+ Satran, J. Standards-Track, Expire November 2001 149 iSCSI July 20, 2001 05 Write Operation Example +------------------+-----------------------+---------------------+ |Initiator Function| PDU Type | Target Function | +------------------+-----------------------+---------------------+ | Command request |SCSI Command (WRITE)>>>| Receive command | | (write) | | and queue it | +------------------+-----------------------+---------------------+ | | | Process old commands| +------------------+-----------------------+---------------------+ | | | Ready to process | | | <<< R2T | WRITE command | +------------------+-----------------------+---------------------+ | Send Data | SCSI Data >>> | Receive Data | +------------------+-----------------------+---------------------+ | | <<< R2T | | +------------------+-----------------------+---------------------+ | | <<< R2T | | +------------------+-----------------------+---------------------+ | Send Data | SCSI Data >>> | Receive Data | +------------------+-----------------------+---------------------+ | Send Data | SCSI Data >>> | Receive Data | +------------------+-----------------------+---------------------+ | | <<< SCSI Response |Send Status and Sense| +------------------+-----------------------+---------------------+ | Command Complete | | | +------------------+-----------------------+---------------------+ 06 R2TSN/DataSN use examples Output (write) data DataSN/R2TSN Example +------------------+-----------------------+----------------------+ |Initiator Function| PDU Type & Content | Target Function | +------------------+-----------------------+----------------------+ | Command request |SCSI Command (WRITE)>>>| Receive command | | (write) | | and queue it | +------------------+-----------------------+----------------------+ | | | Process old commands | +------------------+-----------------------+----------------------+ | | <<< R2T | Ready for data | | | R2TSN = 0 | | +------------------+-----------------------+----------------------+ | | <<< R2T | Ready for more data | | | R2TSN = 1 | | Satran, J. Standards-Track, Expire November 2001 150 iSCSI July 20, 2001 +------------------+-----------------------+----------------------+ | Send Data | SCSI Data >>> | Receive Data | | for R2TSN 0 | DataSN = 0, F=0 | | +------------------+-----------------------+----------------------+ | Send Data | SCSI Data >>> | Receive Data | | for R2TSN 0 | DataSN = 1, F=1 | | +------------------+-----------------------+----------------------+ | Send Data | SCSI Data >>> | Receive Data | | for R2TSN 1 | DataSN = 0, F=1 | | +------------------+-----------------------+----------------------+ | | <<< SCSI Response |Send Status and Sense | | | ExpDataSN = 0 | | | | ExpR2TSN = 2 | | +------------------+-----------------------+----------------------+ | Command Complete | | | +------------------+-----------------------+----------------------+ Input (read) data DataSN Example +------------------+-----------------------+----------------------+ |Initiator Function| PDU Type | Target Function | +------------------+-----------------------+----------------------+ | Command request |SCSI Command (READ)>>> | | | (read) | | | +------------------+-----------------------+----------------------+ | | | Prepare Data Transfer| +------------------+-----------------------+----------------------+ | Receive Data | <<< SCSI Data | Send Data | | | DataSN = 0, F=0 | | +------------------+-----------------------+----------------------+ | Receive Data | <<< SCSI Data | Send Data | | | DataSN = 1, F=0 | | +------------------+-----------------------+----------------------+ | Receive Data | <<< SCSI Data | Send Data | | | DataSN = 2, F=1 | | +------------------+-----------------------+----------------------+ | | <<< SCSI Response |Send Status and Sense | | | ExpDataSN = 3 | | | | ExpR2TSN = 0 | | +------------------+-----------------------+----------------------+ | Command Complete | | | +------------------+-----------------------+----------------------+ Satran, J. Standards-Track, Expire November 2001 151 iSCSI July 20, 2001 Bi-directional DataSN Example +------------------+-----------------------+----------------------+ |Initiator Function| PDU Type | Target Function | +------------------+-----------------------+----------------------+ | Command request |SCSI Command >>> | | | (Read-Write) | Read-Write | | +------------------+-----------------------+----------------------+ | | | Process old commands | +------------------+-----------------------+----------------------+ | | <<< R2T | Ready to process | | | R2TSN = 0 | WRITE command | +------------------+-----------------------+----------------------+ | * Receive Data | <<< SCSI Data | Send Data | | | DataSN = 0, F=0 | | +------------------+-----------------------+----------------------+ | * Receive Data | <<< SCSI Data | Send Data | | | DataSN = 1, F=1 | | +------------------+-----------------------+----------------------+ | * Send Data | SCSI Data >>> | Receive Data | | for R2TSN 0 | DataSN = 0, F=1 | | +------------------+-----------------------+----------------------+ | | <<< SCSI Response |Send Status and Sense | | | ExpDataSN = 2 | | | | ExpRT2SN = 1 | | +------------------+-----------------------+----------------------+ | Command Complete | | | +------------------+-----------------------+----------------------+ *) Send data and Receive Data may be transferred simultaneously as in an atomic Read-Old-Write-New or sequential as in an atomic Read- Update-Write (in the alter case the R2T may follow the received data) Unsolicited and immediate output (write) data with DataSN Example +------------------+-----------------------+----------------------+ |Initiator Function| PDU Type & Content | Target Function | +------------------+-----------------------+----------------------+ | Command request |SCSI Command (WRITE)>>>| Receive command | | (write) |F=0 | and data | |+ immediate data | | and queue it | +------------------+-----------------------+----------------------+ | Send Unsolicited | SCSI Write Data >>> | Receive more Data | | Data | DataSN = 0, F=1 | | +------------------+-----------------------+----------------------+ | | | Process old commands | Satran, J. Standards-Track, Expire November 2001 152 iSCSI July 20, 2001 +------------------+-----------------------+----------------------+ | | <<< R2T | Ready for more data | | | R2TSN = 0 | | +------------------+-----------------------+----------------------+ | Send Data | SCSI Write Data >>> | Receive Data | | for R2TSN 0 | DataSN = 0, F=1 | | +------------------+-----------------------+----------------------+ | | <<< SCSI Response |Send Status and Sense | | | ExpDataSN = 0 | | | | ExpR2TSN = 1 | | +------------------+-----------------------+----------------------+ | Command Complete | | | +------------------+-----------------------+----------------------+ 07 CRC Examples N.B. all Values are Hexadecimal Byte: 0 1 2 3 0: 01 a0 00 00 4: 00 00 00 00 8: 00 00 00 00 12: 00 00 00 00 16: 04 05 00 00 20: 00 01 00 00 24: 00 00 00 05 28: 00 00 00 04 32: 2a 00 00 00 36: 00 00 00 00 40: 80 00 00 00 44: 00 00 00 00 CRC: db 51 70 93 32 bytes of zeroes: Byte: 0 1 2 3 0: 00 00 00 00 ... 28: 00 00 00 00 CRC: 8a 91 36 aa 32 bytes of ones: Satran, J. Standards-Track, Expire November 2001 153 iSCSI July 20, 2001 Byte: 0 1 2 3 0: ff ff ff ff ... 28: ff ff ff ff CRC: 21 44 df 1c 32 bytes of incrementing 00..1f: Byte: 0 1 2 3 0: 00 01 02 03 ... 28: 1c 1d 1e 1f CRC: 46 dd 79 4e Satran, J. Standards-Track, Expire November 2001 154 iSCSI July 20, 2001 Appendix C. Synch and Steering with Fixed Interval Markers This appendix presents a simple scheme for synchronization (PDU boundary retrieval). It uses markers including synchronization information placed at fixed intervals in the TCP stream. A Marker consists of: Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Next-iSCSI-PDU-start pointer - copy #1 | +---------------+---------------+---------------+---------------+ 4| Next-iSCSI-PDU-start pointer - copy #2 | +---------------+---------------+---------------+---------------+ The Marker indicates the offset to the next iSCSI PDU header. The Marker is eight bytes in length, and contains two 32-bit offset fields that indicate how many bytes to skip in the TCP stream in order to find the next iSCSI PDU header. The offset is counted from the marker end to the beginning of the next header. The marker uses two copies of the pointer so that a marker spanning a TCP packet boundary should leave at least one valid copy in one of the packets. The use of markers is negotiable. The initiator and target MAY indicate their readiness to receive and/or send markers during login separately for each connection. The default is NO. In certain environments a sender not willing to supply markers to a receiver willing to accept markers MAY suffer from a considerable performance degradation. 08 Markers At Fixed Intervals At fixed intervals in the TCP byte stream, a marker is inserted. Each end of the iSCSI session specifies during login the interval at which it is willing to receive the marker or disables the marker altogether. If a receiver indicates that it desires a marker, the sender SHOULD agree (during negotiation) and provide the marker at the desired interval. The marker interval and the initial marker-less interval are counted in terms of the TCP stream data. Anything counted in the TCP sequence-number is counted for the interval and the initial marker- less interval. Specifically this includes any bytes "inserted" in the TCP stream by an UFL. Satran, J. Standards-Track, Expire November 2001 155 iSCSI July 20, 2001 When reduced to iSCSI terms markers MUST indicate the offset to a 4- byte word boundary in the stream. The last 2 bits of each marker word are reserved and are considered 0 for offset computation. Padding iSCSI PDU payloads to 4-byte word boundaries simplifies marker manipulation. 09 Initial Marker-less Interval To enable the connection setup including the login phase negotiation, marking (if any) is started only at the first marker interval after the end of the login phase. Satran, J. Standards-Track, Expire November 2001 156 iSCSI July 20, 2001 Appendix D. Login/Text Operational Keys ISID and TSID form collectively the SSID (session id). A TSID of zero indicates a leading connection. Some session specific parameters MUST be carried only on the leading connection and cannot be changed after the leading connection login (e.g., MaxConnections, the maximum number of connections). This holds even for a single connection session with regard to connection restart. The keys that fall into this category have the use defined as LO (Leading Only). Keys that can be used only during login have the use defined as IO (initialize only) while those that can be used in both the login phase and full feature phase have the use defined as ALL. Unless explicitly stated otherwise, all key=value pairs specified here are session specific. 10 MaxConnections Use: LO Who can send: Initiator and Target MaxConnections= Default is 8. Initiator and target negotiate the maximum number of connections requested/acceptable. The lower of the 2 numbers is selected. 11 SendTargets Use: ALL Who can send: Initiator SendTargets= This key is used within a text command to request a list of targets and their addresses be sent back to the initiator in a text response. This key must be the single key of a text command. A detailed description is provided in a separate Appendix. 12 TargetAddress Use: ALL/LO Satran, J. Standards-Track, Expire November 2001 157 iSCSI July 20, 2001 Who can send: Target TargetAddress=domainname[:port][,portal-group-tag] If the TCP port is not specified, it is assumed to be the IANA-assigned default port for iSCSI. If the TargetAddress is being returned in a login response as the result of a redirect status, the comma and portal group tag are omitted. If the TargetAddress is being returned within a SendTargets response, the portal group tag is required. Examples: TargetAddress=10.0.0.1:5003,1 TargetAddress=12.5.7.10.0.0.1,65 TargetAddress=computingcenter.acme.com,23 The TargetAddress key is more fully described in the SendTargets Appendix. 13 TargetName Use: LO by initiator ALL by target Who can send: Initiator and Target TargetName= Examples: TargetName=iqn.com.disk-vendor.diskarrays.sn.45678 TargetName=eui.020000023B040506 TargetName=iSCSI This key MUST be provided by the initiator of the TCP connection to the remote endpoint before the end of the login phase. The iSCSI Target Name specifies the worldwide unique name of the target. The non-unique default name "iSCSI" may be used to indicate whatever default target exists at the address to which the connection was made. Some targets MAY require this key before authenticating. The TargetName key may also be returned by the "SendTargets" text command (and that is its only use when issued by a target). Satran, J. Standards-Track, Expire November 2001 158 iSCSI July 20, 2001 14 InitiatorName Use: LO Who can send: Initiator InitiatorName= Examples: InitiatorName=iqn.com.os-vendor.plan9.cdrom.12345 InitiatorName=iqn.com.service-provider.users.customer235.host90 InitiatorName=iSCSI This key MUST be provided by the initiator of the TCP connection to the remote endpoint before the end of the login phase. The Initiator key enables the initiator to identify itself to the remote endpoint. The use of the group name "iSCSI" is interpreted as "other side of TCP connection". The target may silently ignore this key if it does not support it, and does not need to track or verify which initiators use it. A target that supports this field may use it to allow or deny access to an initiator. 15 TargetAlias Use: ALL Who can send: Target TargetAlias= Examples: TargetAlias=Bob's Disk TargetAlias=Database Server 1 Log Disk TargetAlias=Web Server 3 Disk 20 If a target has been configured with a human-readable name or description, this name MUST be communicated to the initiator during a Login Response PDU. This string is not used as an identifier, but can be displayed by the initiator's user interface in a list of targets to which it is connected. 16 InitiatorAlias Use: ALL Who can send: Initiator Satran, J. Standards-Track, Expire November 2001 159 iSCSI July 20, 2001 InitiatorAlias= Examples: InitiatorAlias=Web Server 4 InitiatorAlias=spyalley.nsa.gov InitiatorAlias=Exchange Server If an initiator has been configured with a human-readable name or description, it may be communicated to the target during a Login Request PDU. If not, the host name can be used instead. This string is not used as an identifier, but can be displayed by the target's user interface in a list of initiators to which it is connected. This key SHOULD be sent by an initiator within the Login phase if available. 17 TargetAddress Use: ALL Who can send: Target TargetAddress=domainname[:port]/iSCSI-Name N.B. If the address contains a iSCSI-Name part then this is a LO parameter. Examples: TargetAddress=10.0.0.1/com.disk-vendor.diskarrays.sn.45678 TargetAddress=12.5.7.10.0.0.1/com.gateways.yourtargets.24 TargetAddress=computingcenter.acme.com/com.disk- vendor.diskarrays.sn.45678 The response to a SendTargets text command returns one or more target addresses for each iSCSI Target Name it returns. This field is used to indicate one of the known addresses of the target. If the list can't be delivered, as a single text response PDU, several lists can be sent using several text response PDUs and the list perceived by the initiator is a logical merge of the individual lists. 18 AccessID Use: ALL Who can send: Initiator Satran, J. Standards-Track, Expire November 2001 160 iSCSI July 20, 2001 AccessID= Deliver a SCSI AccessID to the target 19 FMarker Use: LO Who can send: Initiator and Target FMarker= This is a connection specific parameter. Examples: I->FMarker=send-receive T->FMarker=send-receive results in Marker being used in both directions while I->FMarker=send-receive T->FMarker=receive results in Marker being used from the initiator to the target but not from the target to initiator. 20 RFMarkInt Use: LO Who can send: Initiator and Target RFMarkInt=[,] This is a connection specific parameter. The receiver indicates the minimum to maximum interval (in 4-byte words) the receiver wants the markers. In case the receiver wants only a specific value, only a single value has to be specified. The sender selects a value within the minimum and maximum the receiver requires (or the only value the receiver requires) or indicates through the FMarker key=value its inability to set markers. The interval is measured from the end of a marker to the beginning of the next marker. For example, a value of 1024 means 1024 words (4096 bytes of "pure" payload between markers). Satran, J. Standards-Track, Expire November 2001 161 iSCSI July 20, 2001 Default is 2048. 21 SFMarkInt Use: LO Who can send: Initiator and Target SFMarkInt= This is a connection specific parameter. Indicates at what interval (in 4-byte words) the sender accepts to send the markers. The number MUST be within the range required by the receiver. The interval is measured from the end of a marker to the beginning of the next marker. For example, a value of 1024 means 1024 words (4096 bytes of "pure" payload between markers). Default is 2048. 22 InitialR2T Use: ALL Who can send: Initiator and Target InitialR2T= Examples: I->InitialR2T=no T->InitialR2T=no Default is yes. The InitialR2T key is used to turn off the default use of R2T, thus allowing an initiator to start sending data to a target as if it has received an initial R2T with Buffer Offset=0 and Desired Data Transfer Length=min (FirstBurstSize, Expected Data Transfer Length). The default action is that R2T is required, unless both the initiator and the target send this key-pair attribute specifying InitialR2T=no. Once InitialR2T has been set to 'no', it cannot be set back to 'yes'. Note that only the first outgoing data item (either immediate data or a separate PDU) can be sent unsolicited by a R2T. 23 BidiInitialR2T Use: ALL Who can send: Initiator and Target Satran, J. Standards-Track, Expire November 2001 162 iSCSI July 20, 2001 BidiInitialR2T= Examples: I->BidiInitialR2T=no T->BidiInitialR2T=no The BidiInitialR2T key is used to turn off the default use of BiDiR2T, thus allowing an initiator to send data to a target without the target having sent a R2T to the initiator for the output data (write part) of a Bi-directional command (having both the R and the W bits set). The default action is that R2T is required, unless both the initiator and the target send this key-pair attribute specifying BidiInitialR2T=no. Once BidiInitialR2T has been set to 'no', it cannot be set back to 'yes'. Note that only the first outgoing data burst (immediate data or separate PDUs) can be sent unsolicited by a R2T. 24 ImmediateData Use: LO Who can send: Initiator and Target ImmediateData= Default is yes. Initiator and target negotiate support for immediate data. If ImmediateData is set to yes and InitialR2T is set to yes (default) then only immediate data are accepted in the first burst. If ImmediateData is set to no and InitialR2T is set to yes then the initiator MUST NOT send unsolicited data and the target MUST reject them with the corresponding response code. If ImmediateData is set to no and InitialR2T is set to no then the initiator MUST NOT send unsolicited immediate data but MAY send one unsolicited burst of Data-OUT PDUs. If ImmediateData is set to yes and InitialR2T is set to no then the initiator MAY send unsolicited immediate data and/or one unsolicited burst of Data-OUT PDUs. This field sets the D field in the Disconnect-Reconnect mode page. The value one of the D field means ImmediateData=no. Satran, J. Standards-Track, Expire November 2001 163 iSCSI July 20, 2001 The following table is a summary of unsolicited data options: +----------+-------------+--------------------------------------+ |InitialR2T|ImmediateData| Result (up to FirstBurstSize) | +----------+-------------+--------------------------------------+ | no | no | Unsolicited data in data PDUs only | +----------+-------------+--------------------------------------+ | no | yes | Immediate & separate unsolicited data| +----------+-------------+--------------------------------------+ | yes | no | Unsolicited data disallowed | +----------+-------------+--------------------------------------+ | yes | yes | Immediate unsolicited data only | +----------+-------------+--------------------------------------+ 25 DataPDULength Use: LO Who can send: Initiator and Target DataPDULength= Default is 16 units. Initiator and target negotiate the maximum data payload supported for SCSI command or data PDUs in units of 512 bytes. The minimum of the 2 numbers is selected. This parameter sets the maximum-burst-size value stored in the SCSI disconnect-reconnect mode page. The value can subsequently be retrieved with the mode sense SCSI command. A value of 0 indicates no limit. 26 FirstBurstSize Use: LO Who can send: Initiator and Target FirstBurstSize= Default is 128 units. Initiator and target negotiate the maximum length supported for unsolicited data in units of 512 bytes. The minimum of the 2 numbers is selected. This parameter sets the first-burst-size value stored in Satran, J. Standards-Track, Expire November 2001 164 iSCSI July 20, 2001 the SCSI disconnect-reconnect mode page. The value can subsequently be retrieved with the mode sense SCSI command. A value of 0 indicates no limit. 27 LogoutLoginMinTime Use: LO Who can send: Initiator and Target LogoutLoginMinTime= Default is 1. Initiator and target negotiate the minimum time in seconds a Login may follow a Logout response or Asynchronous Message announcing disconnect. The maximum of the 2 values is selected. 28 LogoutLoginMaxTime Use: LO Who can send: Initiator and Target LogoutLoginMaxTime= Default is 3. Initiator and target negotiate the maximum time in seconds after which recovery is still possible after a logout or Asynchronous Message announcing disconnect. The minimum of the 2 values is selected. 29 MaxOutstandingR2T Use: LO Who can send: Initiator and Target MaxOutstandingR2T= The default is 8. Initiator and target negotiate the maximum number of outstanding R2Ts per task. 30 DataOrder Satran, J. Standards-Track, Expire November 2001 165 iSCSI July 20, 2001 Use: LO Who can send: Initiator and Target DataOrder= The default is yes but targets MAY support no. No is used by iSCSI to indicate that the data PDU Sequences can be in any order (EMDP = 1). Yes is used to indicate that data PDU Sequences have to be at continuously increasing addresses (EMDP = 0). This is a SCSI ordering parameter. For Write it indicates that the target should request the data in order (R2Ts in order). For Read it indicates that Data-In PDU Sequences have to be in order (at continuously increasing addresses). This also sets the Connect-Disconnect mode page EMDP bit. 31 DataDeliveryOrder Use: LO Who can send: Initiator and Target DataDeliveryOrder= The default is yes but targets MAY support no. No is used by iSCSI to indicate that the data PDUs within sequences can be in any order. Yes is used to indicate that data PDUs within sequences have to be at continuously increasing addresses and overlays are forbidden. 32 CommandReplaySupport Use: LO Who can send: Initiator and Target CommandReplaySupport= Default is no. CommandReplaySupport MAY be set to yes during the login phase indicating the ability to use command replay. Either initiator or target may initiate the negotiation. Satran, J. Standards-Track, Expire November 2001 166 iSCSI July 20, 2001 33 CommandFailoverSupport Use: LO Who can send: Initiator and Target CommandFailoverSupport= Default is no. CommandFailoverSupport MAY be set to yes during the login phase indicating the ability of both target and initiator to continue commands across connection failures. Either initiator or target may initiate the negotiation. Targets that set this key to yes MUST support data and status PDU retransmission (of those required PDUS that were transmitted on the original connection the command was allegiant to) to be able to successfully switch the connection allegiance. 34 SessionType Use: LO Who can send: Initiator SessionType= Default is Normal. The Initiator indicates the type of session it wants to create. The target can accept or reject it. A Boot Session indicates to the Target that the only purpose of this Session is boot. The target MAY restrict the type of iSCSI requests it accepts in such a Session to Logout, NOP-out, and SCSI read commands. Accepting other commands in this type of session is vendor-dependent. A target MAY reject a boot-session. A CopyManager session MAY indicates to the Target that the only purpose of this Session is a Copy Manager Function. The target MAY restrict the type of SCSI requests it accepts in such a Session. A target MAY reject a copy manager session. A Discovery session indicates to the Target that the only purpose of this Session is discovery. The only command accepted by a target in this type of session is a text command with a SendTargets key. 35 OpParmReset Satran, J. Standards-Track, Expire November 2001 167 iSCSI July 20, 2001 Use: IO Who can send: Initiator and Target OpParmReset= OpParmReset enables an Initiator or Target to request the operational parameters to be reset to the values they had before login. Either the initiator or target may choose to do so but only after and only if a SecurityContextComplete handshake is completed on the connection. The resetting should involve only parameters that where set during login on the connection in which the OpParmReset is issued. Please note that since either initiator or target may request this behavior there is no need to reply. 36 The Glen-Turner Vendor Specific Key Format Use: ALL Who can send: Initiator and Target X-reversed.vendor.dns_name.do_something= Keys with this format are used for vendor-specific purposes. These keys always start with X- . To identify the vendor it is suggested to use the reversed DNS-name as a prefix to the key-proper. Satran, J. Standards-Track, Expire November 2001 168 iSCSI July 20, 2001 Appendix E. SendTargets operation To reduce the amount of configuration required on an initiator, iSCSI provides the SendTargets text command. This command is sent by the initiator to request a list of targets to which it may have access, as well as the list of addresses (IP address and TCP port) on which these targets may be accessed. To make use of SendTargets, an initiator must first be logged in to one of two types of targets. If the initiator is logged in to the default target (target name "iSCSI"), the only session type that it can be used is a discovery session. If it logs in to any other target, the session the session can be either a discovery session or a normal operational session. A system containing targets MUST support login to the default "iSCSI" target, and MUST support the SendTargets command to this target. A target MUST support the SendTargets command on operational sessions; these will only return address information about the target to which the session is connected, and do not return information about other targets. An initiator MAY make use of the SendTargets as it sees fit. A SendTargets command consists of a single Text Command PDU. This PDU contains exactly one text key and value. The text key shall be SendTargets. The expected response depends upon the value, as well as whether the session is a discovery or operational session. The value must be one of: all The initiator is requesting that information on all relevant targets known to the implementation be returned. This value MUST be supported on a discovery session, and MAY NOT be supported on an operational session. Satran, J. Standards-Track, Expire November 2001 169 iSCSI July 20, 2001 If an iSCSI target name is specified, the session should respond with addresses for only the named target, if possible. This value MUST be supported on discovery sessions. A discovery session MUST be capable of returning addresses for those targets that would have been returned had value=all been designated. ? If no target name is specified, the session should respond with addresses only for the target to which the session is logged in. This MUST be supported on operational sessions, and MAY NOT return targets other than the one to which the session is logged in. The response to this command is a text response containing a list of targets and their addresses. Each target is returned as a target record. A target record begins with the TargetName text key, followed by a list of TargetAddress text keys, and bounded by the end of the text response or the next TargetName key, which begins a new record. No text keys other than TargetName and TargetAddress are permitted within a SendTargets response. A discovery session MAY respond to a SendTargets request with its complete list of targets, or with a list of targets that is based on the name of the initiator logged in to the session. A SendTargets response MAY contain no target names, if there are no targets for the requesting initiator to access. Each target record returned includes zero or more TargetAddress fields. A SendTargets response MAY NOT contain iSCSI default target names. Each target record starts with one text key of the form: TargetName= Followed by zero or more address keys of the form: TargetAddress=[:], The hostname-or-ipaddress and tcp port are as specified in the "Naming and Addressing" section. Satran, J. Standards-Track, Expire November 2001 170 iSCSI July 20, 2001 Each TargetAddress belongs to a portal group, identified by its numeric, decimal portal group tag. The iSCSI target name, together with this tag, constitutes the SCSI port identifier; the tag need be unique only within a given target name's list of addresses. iSCSI addresses belonging with the same portal group tag support spanning multiple-connection sessions across this set of addresses. iSCSI addresses that do not support multiple-connection sessions with other addresses must have their own unique portal group tag. If a SendTargets response reports an iSCSI address for a target, it SHOULD also report all other addresses in its portal group in the same response. A SendTargets text response can be longer than a single Text Response PDU, and makes use of the long text responses as specified. After obtaining a list of targets from the discovery target session, an iSCSI initiator may initiate new sessions to log in to the discovered targets for full operation. The initiator MAY keep the session to a default target open, and MAY send subsequent SendTargets commands to discover new targets. Examples: This example is the SendTargets response from a single target that has no other interface ports. Initiator sends text command containing: SendTargets=all Target sends text response containing: TargetName=iqn.com.acme.diskarray.sn.8675309 Note that all it really had to return in the simple case was the target name. It is assumed by the initiator that the IP address and TCP port for this target are the same as used on the current connection to the default iSCSI target. The next example has two internal iSCSI targets, each accessible via two different ports with different IP addresses. Here's the text response: Satran, J. Standards-Track, Expire November 2001 171 iSCSI July 20, 2001 TargetName=iqn.com.acme.diskarray.sn.8675309 TargetAddress=10.1.0.45:3000,1 TargetAddress=10.1.1.45:3000,2 TargetName=iqn.com.acme.diskarray.sn.1234567 TargetAddress=10.1.0.45:3000,1 TargetAddress=10.1.1.45:3000,2 Note that both targets share both addresses; the multiple addresses are likely used to provide multi-path support. The initiator may connect to either target name on either address. Each of the addresses has its own portal group tag; they do not support spanning multiple-connection sessions with each other. Keep in mind also that the portal group tags for the two named targets are independent of one another; portal group "1" on the first target is not necessarily the same as portal group "1" on the second. Also note that in the above example, a DNS host name could have been returned instead of an IP address, and that an IPv6 addresses (5 to 16 dotted-decimal numbers) could have been returned as well. The next text response shows a target that supports spanning sessions across multiple addresses, indicating this using the portal group tags: TargetName=iqn.com.acme.diskarray.sn.8675309 TargetAddress=10.1.0.45:3000,1 TargetAddress=10.1.1.46:3000,1 TargetAddress=10.1.0.47:3000,2 TargetAddress=10.1.1.48:3000,2 TargetAddress=10.1.1.49:3000,3 In this example, any of the target addresses can be used to reach the same target. A single-connection session can be established to any of these TCP addresses. A multiple-connection session could span addresses .45 and .46, or .47 and .48, but cannot span any other combination. A TargetAddress with its own tag (.49) cannot be combined with any other address within the same session. Note that this SendTargets response does not indicate whether .49 supports multiple connections per session; this is communicated via the MaxConnections text key upon login to the target. Satran, J. Standards-Track, Expire November 2001 172 iSCSI July 20, 2001 Appendix F. Algorithmic presentation of error recovery levels This appendix illustrates the error recovery levels using a pseudo- programming-language. The procedure names are chosen to be obvious to most implementers, and each of the recovery levels described has initiator procedures as well as target procedures. Readers may please note that these algorithms focus on outlining the mechanics of error recovery levels, and ignore all other aspects/cases. Examples of this approach are: - Handling for only certain Opcode types is shown. - Only certain reason codes (for example, Recovery in Logout command) are outlined. - Resultant cases like recovery of synchronization on a header digest error are considered out-of-scope in these algorithms. In this particular example, header digest error may lead to connection recovery if synch and steering layer is not implemented. 37 General Data structure and procedure description This section defines the procedures and data structures that are commonly used by all the error recovery algorithms. Please note that the structures may not be the exhaustive representations of what is required for a typical implementation. Data structure definitions - struct TransferContext { int TargetTransferTag; int ExpectedDataSN; }; struct TCB { Boolean SoFarInOrder; int ExpectedDataSN; /* used for both R2Ts, and Data */ int MissingDataSNList[MaxMissingDPDU]; Boolean FbitReceived; Boolean StatusXferd; Boolean CurrentlyAllegiant; int ActiveR2Ts; int Response; struct TransferContext TransferContextList[MaxOutStandingR2T]; int InitiatorTaskTag; int CmdSN; }; Satran, J. Standards-Track, Expire November 2001 173 iSCSI July 20, 2001 struct Connection { struct Session SessionReference; Boolean SoFarInOrder; int CID; int State; int NextCmdSN; int ExpectedStatSN; int MissingStatSNList[MaxMissingSPDU]; Boolean PerformConnectionRecovery; }; struct Session { int NumConnections; int ISID; int TSID; int Maxconnections; Boolean CommandReplaySupport; struct iSCSIEndpoint OtherEndInfo; struct Connection ConnectionList[MaxSupportedConns]; }; Procedure descriptions - Receive-a-In-PDU(transport connection, inbound PDU); check-basic-validity(inbound PDU); Start-Timer(timeout handler, argument, timeout value); Build-And-Send-Reject(transport connection, bad PDU, reason code); 38 Within-command error recovery algorithms 1 Procedure descriptions Recover-Data-if-Possible(last required DataSN, task control block); Build-And-Send-DSnack(task control block); Build-And-Send-Abort(task control block); SCSI-Task-Completion(task control block); Build-And-Send-a-Data-Burst(transport connection, R2T PDU, task control block); Build-And-Send-R2T(transport connection, description of data, task control block); Build-And-Send-Status(transport connection, task control block); Transfer-Context-Timeout-Handler(transfer context); Implementation-specific tunables - InitiatorDataSNACKEnabled, TargetDataSNACKSupported, TargetRecoveryR2TEnabled. Satran, J. Standards-Track, Expire November 2001 174 iSCSI July 20, 2001 Notes: - Two procedures used in this section - Recover-Status-if- Possible, Handle-Status-SNACK-request - are defined in Within-connection recovery algorithms. - The Response processing pseudo-code shown in the target algorithms applies to all solicited PDUs carrying StatSN - SCSI Response, Text Response etc. 2 Initiator algorithms Recover-Data-if-Possible(LastRequiredDataSN, TCB) { if (InitiatorDataSNACKEnabled) { if (# of missing PDUs is trackable) { Note the missing DataSNs in TCB. Build-And-Send-DSnack(TCB); } else { TCB.Response = DeliveryFailure; } } else { TCB.Response = DeliveryFailure; } if (TCB.Response = DeliveryFailure) { Clear the missing PDU list in the TCB. Build-And-Send-Abort(TCB); } } Receive-a-In-PDU(Connection, CurrentPDU) { check-basic-validity(CurrentPDU); if (Header-Digest-Bad) discard, return; Retrieve TCB for CurrentPDU.InitiatorTaskTag. if ((CurrentPDU.type = Data) or (CurrentPDU.type = R2T)) { if (Data-Digest-Bad) { send-data-SNACK = TRUE; LastRequiredDataSN = CurrentPDU.DataSN; } else { if (TCB.SoFarInOrder = TRUE) { if (current DataSN is expected) { Increment TCB.ExpectedDataSN. } else { TCB.SoFarInOrder = FALSE; send-data-SNACK = TRUE; Satran, J. Standards-Track, Expire November 2001 175 iSCSI July 20, 2001 } } else { if (current DataSN was considered missing) { remove current DataSN from missing PDU list. } else if (current DataSN is higher than expected) { send-data-SNACK = TRUE; } else { discard, return; } Adjust TCB.ExpectedDataSN if appropriate. } LastRequiredDataSN = CurrentPDU.DataSN - 1; } if (current PDU has F-bit set) { TCB.FbitReceived = TRUE; } if (send-data-SNACK is TRUE and task is not already considered failed) { Recover-Data-if-Possible(LastRequiredDataSN, TCB); } if (missing data PDU list is empty) { TCB.SoFarInOrder = TRUE; } if (CurrentPDU.type = R2T) { Increment ActiveR2Ts for this task. Build-And-Send-A-Data-Burst(Connection, CurrentPDU, TCB); } } else if (CurrentPDU.type = Response) { if (Data-Digest-Bad) { send-status-SNACK = TRUE; } else { TCB.StatusXferd = TRUE; Store the status information in TCB. if (ExpDataSN does not match) { TCB.SoFarInOrder = FALSE; Recover-Data-if-Possible(current DataSN, TCB); } if (missing data PDU list is empty) { TCB.SoFarInOrder = TRUE; } if (Connection.SoFarInOrder is TRUE) { if (current StatSN is the expected) { Increment Connection.ExpectedStatSN. } else { Connection.SoFarInOrder = FALSE; send-status-SNACK = TRUE; Satran, J. Standards-Track, Expire November 2001 176 iSCSI July 20, 2001 } } else { if (current StatSN was considered missing) { remove current StatSN from the missing list. } else { if (current StatSN is higher than expected){ send-status-SNACK = TRUE; } else { discard, return; } } Adjust Connection.ExpectedStatSN if appropriate. if (missing StatSN list is empty) { Connection.SoFarInOrder = TRUE; } } } if (send-status-SNACK = TRUE) Recover-Status-if-Possible(Connection, CurrentPDU); } else { /* REST UNRELATED TO WITHIN-COMMAND-RECOVERY, NOT SHOWN */ } if (TCB.SoFarInOrder is TRUE ) { if (TCB.StatusXferd is TRUE and (TCB.FbitReceived is TRUE or task is already considered failed)) { SCSI-Task-Completion(TCB); } } } 3 Target algorithms Receive-a-In-PDU(Connection, CurrentPDU) { check-basic-validity(CurrentPDU); if (Header-Digest-Bad) { Build-And-Send-Reject(Connection, CurrentPDU, Header-Digest-Error); discard, return; } Retrieve TCB for CurrentPDU.InitiatorTaskTag. if (CurrentPDU.type = Data) { Retrieve TContext from CurrentPDU.TargetTransferTag); if (Data-Digest-Bad) { Build-And-Send-Reject(Connection, CurrentPDU, Satran, J. Standards-Track, Expire November 2001 177 iSCSI July 20, 2001 Payload-Digest-Error); Note the missing data PDUs in MissingDataRange[]. send-recovery-R2T = TRUE; } else { if (current DataSN is not expected) { Note the missing data PDUs in MissingDataRange[]. send-recovery-R2T = TRUE; } Increment TContext.ExpectedDataSN. if (CurrentPDU.Fbit = TRUE) { Decrement TCB.ActiveR2Ts. } } if (send-recovery-R2T is TRUE and task is not already considered failed) { if (TargetRecoveryR2TEnabled is TRUE) { Increment TCB.ActiveR2Ts. Build-And-Send-R2T(Connection, MissingDataRange, TCB); } else { TCB.Response = DeliveryFailure; } } if (TCB.ActiveR2Ts = 0) { Build-And-Send-Status(Connection, TCB); } } else if (CurrentPDU.type = SNACK) { if (this is data retransmission request) { if (TargetDataSNACKSupported) { if (the request is satisfiable) { Build-And-Send-A-Data-Burst(CurrentPDU, TCB); } else { TCB.Response = SNACKRejected; } } else { TCB.Response = SNACKRejected; } if (TCB.Response = SNACKRejected) { Build-And-Send-Reject(Connection, CurrentPDU, Data-SNACK-Reject); Build-And-Send-Status(Connection, TCB); } } else { Handle-Status-SNACK-request(Connection, CurrentPDU); } } else { /* REST UNRELATED TO WITHIN-COMMAND-RECOVERY, NOT SHOWN */ } Satran, J. Standards-Track, Expire November 2001 178 iSCSI July 20, 2001 } Transfer-Context-Timeout-Handler(TContext) { Retrieve TCB and Connection from TContext. Decrement TCB.ActiveR2Ts. if (TargetRecoveryR2TEnabled is TRUE and task is not already considered failed) { Note the missing data PDUs in MissingDataRange[]. Build-And-Send-R2T(Connection, MissingDataRange, TCB); } else { TCB.Response = DeliveryFailure; if (TCB.ActiveR2Ts = 0) { Build-And-Send-Status(Connection, TCB); } } } 39 Within-connection recovery algorithms 4 Procedure descriptions Procedure descriptions: Recover-Status-if-Possible(transport connection, currently received PDU); Retransmit-Command-if-Possible(transport connection, CmdSN); Build-And-Send-SSnack(transport connection); Build-And-Send-Command(transport connection, task control block, Retrybit); Command-Acknowledge-Timeout-Handler(task control block); Status-Expect-Timeout-Handler(transport connection); Build-And-Send-Nop-Out(transport connection); Handle-Status-SNACK-request(transport connection, status SNACK PDU); Retransmit-Status-Burst(status SNACK, task control block); Is-Acknowledged(beginning StatSN, run size); Implementation-specific tunables - InitiatorCommandRetryEnabled, InitiatorStatusExpectNopEnabled, InitiatorProactiveSNACKEnabled, InitiatorStatusSNACKEnabled, TargetStatusSNACKSupported. Notes: - The initiator algorithms deal only with unsolicited Nop-In PDUs for generating status SNACKs. Solicited Nop-In PDU has an assigned StatSN which when out-of- order could trigger the out-of-order StatSN handling in Satran, J. Standards-Track, Expire November 2001 179 iSCSI July 20, 2001 Within-command algorithms, again leading to Recover- Status-if-Possible. - The pseudo-code shown may result in retransmission of unacknowledged commands in more cases than is necessary. This will not however affect the correctness of operation since the target is required to discard the duplicate CmdSNs. - The procedure Build-And-Send-Async is defined in Within-session recovery algorithms. - The procedure Status-Expect-Timeout-Handler describes how initiators may proactively attempt to retrieve Status if they choose to. This procedure is assumed to be triggered much before the standard ULP timeout. 1. Initiator algorithms Recover-Status-if-Possible(Connection, CurrentPDU) { if ((Connection.state = LOGGED_IN) and connection is not already considered failed) { if (InitiatorStatusSNACKEnabled) { if (# of missing PDUs is trackable) { Note the missing StatSNs in TCB; Build-And-Send-SSnack(Connection); } else { Connection.PerformConnectionRecovery = TRUE; } } else { Connection.PerformConnectionRecovery = TRUE; } if (Connection.PerformConnectionRecovery is TRUE) { Start-Timer(Connection-Recovery-Handler, Connection, 0); } } } Retransmit-Command-if-Possible(Connection, CmdSN) { if (InitiatorCommandRetryEnabled) { Retrieve the InitiatorTaskTag, and thus TCB for the CmdSN. Build-And-Send-Command(Connection, TCB, Retrybit); } } Receive-a-In-PDU(Connection, CurrentPDU) { Satran, J. Standards-Track, Expire November 2001 180 iSCSI July 20, 2001 check-basic-validity(CurrentPDU); if (Header-Digest-Bad) discard, return; Retrieve TCB for CurrentPDU.InitiatorTaskTag. if (CurrentPDU.type = Nop-In) { if (the PDU is unsolicited) { if (current StatSN is not expected) { Recover-Status-if-Possible(Connection, CurrentPDU); } if (current ExpCmdSN is not our NextCmdSN) { Retransmit-Command-if-Possible(Connection, CurrentPDU.ExpCmdSN); } } } else if (CurrentPDU.type = Reject) { if (it is a data digest error on immediate data) { Retransmit-Command-if-Possible(Connection, CurrentPDU.BadPDUHeader.CmdSN); } } else { /* REST UNRELATED TO WITHIN-CONNECTION-RECOVERY, * NOT SHOWN */ } } Command-Acknowledge-Timeout-Handler(TCB) { Retrieve the Connection for TCB. Retransmit-Command-if-Possible(Connection, TCB.CmdSN); } Status-Expect-Timeout-Handler(Connection) { if (InitiatorStatusExpectNopEnabled) { Build-And-Send-Nop-Out(Connection); } else if (InitiatorProactiveSNACKEnabled){ if ((Connection.state = LOGGED_IN) and connection is not already considered failed) { Build-And-Send-SSnack(Connection); } } } 2. Target algorithms Handle-Status-SNACK-request(Connection, CurrentPDU) { if (TargetStatusSNACKSupported) { Satran, J. Standards-Track, Expire November 2001 181 iSCSI July 20, 2001 if (request for an acknowledged run) { Build-And-Send-Reject(Connection, CurrentPDU, Protocol-Error); } else if (request for an untransmitted run) { discard, return; } else { Retransmit-Status-Burst(CurrentPDU, TCB); } } else { Build-And-Send-Async(Connection, DroppedConnection, 0, TargetConnectionRecoveryTimeout); } } 5 Within-session recovery algorithms 3. Procedure descriptions Build-And-Send-Async(transport connection, reason code, minimum time, maximum time); Pick-A-Logged-In-Connection(session); Build-And-Send-Logout(transport connection, logout connection identifier, reason code); PerformImplicitLogout(transport connection, logout connection identifier, target information); PerformLogin(transport connection, target information); CreateNewTransportConnection(target information); Build-And-Send-Command(transport connection, task control block, bits to set); Connection-Recovery-Handler(transport connection); Connection-Resource-Timeout-Handler(transport connection); Quiesce-And-Prepare-for-New-Allegiance(session, task control block); Build-And-Send-Logout-Response(transport connection, CID of connection in recovery, reason code); Establish-New-Allegiance(task control block, transport connection); Schedule-Command-To-Continue(task control block); Schedule-Command-For-Replay(task control block); Notes: - Transport exception conditions such as unexpected connection termination, connection reset, hung connection while the connection is in the full-feature phase, are all assumed to be asynchronously signaled to iSCSI layer using the Transport_Exception_Handler procedure. 4. Initiator algorithms Satran, J. Standards-Track, Expire November 2001 182 iSCSI July 20, 2001 Receive-a-In-PDU(Connection, CurrentPDU) { check-basic-validity(CurrentPDU); if (Header-Digest-Bad) discard, return; Retrieve TCB from CurrentPDU.InitiatorTaskTag. if (CurrentPDU.type = Async) { if ((CurrentPDU.iSCSIEvent = LogoutRequest) or (CurrentPDU.iSCSIEvent = ConnectionDropped)) { Retrieve the AffectedConnection for CurrentPDU.Parameter1. AffectedConnection.State = ASYNC_MSG_RCVD; AffectedConnection.PerformConnectionRecovery = TRUE; Start-Timer(Connection-Recovery-Handler, AffectedConnection, CurrentPDU.Parameter2); } } else if (CurrentPDU.type = LogoutResponse) { Retrieve the RecoveryConnection for CurrentPDU.CID. if (CurrentPDU.Response = failure) { RecoveryConnection.State = BUSY; Start-Timer(Connection-Resource-Timeout-Handler, RecoveryConnection, InitiatorRecoveryTimeout); } else { RecoveryConnection.State = FREE; } } else if (CurrentPDU.type = LoginResponse) { if (this is a response to an implicit Logout) { Retrieve the RecoveryConnection. if (successful) { RecoveryConnection.State = FREE; Connection.State = LOGGED_IN; } else { RecoveryConnection.State = BUSY; DestroyTransportConnection(Connection); Start-Timer(Connection-Resource-Timeout-Handler, RecoveryConnection, InitiatorRecoveryTimeout); } } } else { /* REST UNRELATED TO WITHIN-SESSION-RECOVERY, * NOT SHOWN */ } if (RecoveryConnection.State = FREE) { for (each command that was active on RecoveryConnection) { NewConnection = Pick-A-Logged-In-Connection(Session); Build-And-Send-Command(NewConnection, TCB, Retrybit); } } } Satran, J. Standards-Track, Expire November 2001 183 iSCSI July 20, 2001 Connection-Recovery-Handler(Connection) { Retrieve Session from Connection. if (Connection can still exchange iSCSI PDUs) { NewConnection = Connection; } else { if (there are other logged-in connections) { NewConnection = Pick-A-Logged-In-Connection(Session); } else { NewConnection = CreateTransportConnection(Session.OtherEndInfo); Initiate an implicit Logout on NewConnection for Connection.CID. return; } } Build-And-Send-Logout(NewConnection, Connection.CID, RecoveryRemove); } Transport_Exception_Handler(Connection) { Connection.PerformConnectionRecovery = TRUE; if (the event is an unexpected transport disconnect) { Connection.State = XPT_CLEANUP; } else { Connection.State = BUSY; } Start-Timer(Connection-Recovery-Handler, Connection, 0); } 5. Target algorithms Receive-a-In-PDU(Connection, CurrentPDU) { check-basic-validity(CurrentPDU); if (Header-Digest-Bad) { Build-And-Send-Reject(Connection, CurrentPDU, Header-Digest-Error); discard, return; } else if (Data-Digest-Bad) { Build-And-Send-Reject(Connection, CurrentPDU, Payload-Digest-Error); discard, return; } Satran, J. Standards-Track, Expire November 2001 184 iSCSI July 20, 2001 Retrieve TCB and Session. if (CurrentPDU.type = Logout) { if (CurrentPDU.ReasonCode = RecoveryRemove) { Retrieve the RecoveryConnection from CurrentPDU.CID). for (each command active on RecoveryConnection) { Quiesce-And-Prepare-for-New-Allegiance(Session, TCB); TCB.CurrentlyAllegiant = FALSE; } Cleanup-Connection-State(RecoveryConnection); if ((quiescing successful) and (cleanup successful)) { Build-And-Send-Logout-Response(Connection, RecoveryConnection.CID, Sucess); } else { Build-And-Send-Logout-Response(Connection, RecoveryConnection.CID, Failure); } } } else if (CurrentPDU.type = Command) { if (current PDU has X-bit set) { if (task received first-time) { Start regular processing for the task. } else if (task is currently not allegiant) { Establish-New-Allegiance(TCB, Connection); TCB.CurrentlyAllegiant = TRUE; Schedule-Command-To-Continue(TCB); } else if (status had already been transferred) { if (Session.ReplaySupport = TRUE) { Schedule-Command-For-Replay(TCB); } else { Build-And-Send-Reject(Connection, CurrentPDU, ReplayReject); } } else { Build-And-Send-Reject(Connection, CurrentPDU, CommandInProgress); } } } else { /* REST UNRELATED TO WITHIN-SESSION-RECOVERY, * NOT SHOWN */ } } Transport_Exception_Handler(Connection) { Connection.PerformConnectionRecovery = TRUE; if (the event is an unexpected transport disconnect) { Satran, J. Standards-Track, Expire November 2001 185 iSCSI July 20, 2001 Connection.State = XPT_CLEANUP; } else { Connection.State = BUSY; } Start-Timer(Connection-Resource-Timeout-Handler, Connection, TargetConnectionRecoveryTimeout); if (this Session has full-feature phase connections left) { DifferentConnection = Pick-A-Logged-In-Connection(Session); Build-And-Send-Async(DifferentConnection, DroppedConnection, 0, TargetConnectionRecoveryTimeout); } } Satran, J. Standards-Track, Expire November 2001 186 iSCSI July 20, 2001 Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Satran, J. Standards-Track, Expire November 2001 187