Internet Draft Yaron Klein SANRAD Expires May 2001 November 2, 2000 Storage Virtualization with iSCSI Protocol Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or made obsolete by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Storage virtualization refers to hiding or masking the physical storage device from host server or application. This draft is complementary with the iSCSI draft [2], defining the mechanism to implement storage virtualization in iSCSI environment. The virtualization architecture and terminology is taken from IEEE 1244.6 [1]. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [2]. 1. Introduction A central assumption of storage virtualization is that virtual storage is to a network what virtual memory is to a single system. With virtual memory management, complex applications become easy to implement. The same can be said of virtual storage management, except that rewards are greater. Solutions to storage management problems have high business value. Virtualization of persistent storage, enables storage management functions great flexibility in distributing data throughout a network of diverse storage devices and reconfiguring network storage as needs change. Klein [page 1] iSCSI Virtualization November 2000 Some of the benefits of storage virtualization include the ability to: * Isolate applications from underlying physical devices. * Improve availability and maintainability of systems. * Expand storage on the fly, including volumes and files. * Reduce downtime for backup and other maintenance functions. * Migrate data from systems and applications. * Support large, high performance storage applications. * Mix and match various storage devices for investment protection. * Support advanced storage applications such as replication. * Support larger storage devices than are physically possible. In this draft we define the architecture, flow and mechanism for virtualization in the iSCSI protocol. We show that with minor optional changes, powerful virtualization can be accomplished in storage networks with the iSCSI protocol. This draft is organized as follows: Chapter 2 describes the considerations for defining virtualization in the iSCSI protocol. Chapter 3 describes the general flow of the protocol. Chapter 4 describes an example of virtualization. Chapter 5 describes in details the protocol modification for virtualization. 2. Considerations The virtualization in iSCSI can be accomplished in the following ways: 1. Implementing virtualization in the host, in a layer above the iSCSI. 2. External server that interacts with the physical storage nodes. 3. Third party add-ons or proprietary protocols. 4. Defining virtualization in the iSCSI protocol. Implementing virtualization layer in the host above the iSCSI may gain some benefits in a single host environments. However, in multi-host networks it will suffer from coherency problems. Thus, in most cases, this solution is not relevant. An external server that manages the storage nodes can implement virtualization in the current iSCSI framework. However, this solution has two major disadvantages: data is transferred twice in each transaction (from node to server and from server to host) and the server thus becomes a bottleneck. A third-party protocols or proprietary protocols can implement virtualization. However, to gain interoperability in the iSCSI world, it is preferable to use a standard protocol. 2.1 iSCSI virtualization The last option suggests defining elements in the iSCSI protocol that will enable the implementation of virtualization in a storage network based on the iSCSI standard. This sub-section describes the pros and cons of this alternative. Klein [page 2] iSCSI Virtualization November 2000 2.1.1. Standard The SCSI standard group (T10) has not yet defined a standard for virtualization. Thus, the iSCSI implementation of virtualization may coincide with future standard of the T10 group. Members of the iSCSI group suggested waiting for SCSI standard and then including it in the iSCSI protocol. However, the IEEE has already defined virtualization consideration, which we can take as reference for the new draft. 2.1.2 Implementation Putting the virtualization framework in the iSCSI protocol will prevent implementers from proprietary or existing methods. Those implementers may prefer third-party protocols. 2.1.3. Interoperability The standardization in the iSCSI protocol will enable systems' interoperability. That is, a plug and play stores and managers in any network regardless of their vendors. 2.1.4 OS Independence Implementing virtualization within the framework of the transport level (iSCSI) is OS independent. Thus, will enable the integration of different machines in a cross platform network. Note that any higher-level implementation will be OS dependent. Thus will prevent integrating different NFS or OS. 2.1.5 Upgrade of existing systems Implementing virtualization within the iSCSI framework will enable to upgrade existing systems to support virtualization. The virtualization can be inserted to the system as a software add-on. 2.1.6 Conclusions Although the issues in 2.1.1 and 2.1.2 claims for a non-standard approach or not in this time, we believe that the reasons in 2.1.3, 2.1.4 and 2.1.5 are strong enough to overcome them. We therefore believe that inserting virtualization options in the iSCSI protocol will benefit the storage community, thus expect the working group will accept the following draft. 3. Implementation This chapter defines the iSCSI protocol support for the implementation of the virtualization requirements. 3.1. System's Elements The system is illustrated in Figure 3-1. ---------- ---------- | host 1 | | host 2 | | | | | Klein [page 3] iSCSI Virtualization November 2000 ---------- ---------- | | | | ------------------------------------------------------------------- | | | | | | | | | | | | | | | | | | ----------- -------- -------- ----------- -------- -------- | manager | | Disk | | Disk | | manager | | Disk | | Disk | | A | | A1 | .. | An | | B | | B1 | ..| Bn | ----------- -------- -------- ----------- -------- -------- Figure 3-1. System architecture The system includes the following elements: 3.1.1 iSCSI Store: An iSCSI stote is a physical storage element (disk, gateway to disks) that attached to the network with iSCSI protocol. It has linear space and defined by: Store Identifier: provides a unique (global) identifier to the store. Metadata: Describes properties of the store. Class of service: specifies availabilities, cost, performance and security. 3.1.2 Storage Group: Storage group is a collection of one or more stores. Storage group may include recursively sub-groups. It is defined by: Group Identifier: provides a unique (global) identification of the storage group. Metadata: consists of membership list, access control lists, group properties, etc. Members: Stores in the group. 3.1.3. Storage Manager: The storage manager is a software entity, attached to the network and provides data access and management control to one or more storage groups. 3.2. Architecture The connection between all elements in the system is via the iSCSI protocol. The elements in the system have the following interfaces to each other: * The host has an (iSCSI) initiator interface to the manager and to the stores. * The manager has a target interface to the host and an initiator interface to the stores. * The stores have target interfaces toward the manager and the host. Klein [page 4] iSCSI Virtualization November 2000 3.3. Login Phase The Login environment includes connection and disconnection of the host to the group. The following chapters describe them. 3.3.1. Connection Upon startup, the host is aware of the managers (TBD in the Discovery WG). The host starts the environment establishment by sending an iSCSI login request to the manager. In the login phase, the host and manager MAY authenticate each other and negotiate security and other parameters. If the login phase ends successfully, a new session is established (or a TCP connection is added) between the host and manager. The end of the login phase is marked with a login response, sent from the manager to the host. The login response includes a header and attached data. The header sets the login status (accepted, rejected) and the attached data includes various parameters for the session. The header should include the number of stores in the group. The data should contain a list of all the stores in the group (store names and their IP addresses). The host MUST establish login sessions with all the stores in the group. That is, send a login request to each store, negotiate, authenticate, authorize and establish an iSCSI session. Each store completes the login phase with login response message, sent both to the host and the manager. Thus, giving the manager complete knowledge and control on the group's connectivity. Figure 3-2 illustrates the login flow. -------- ----------- | host | login request | manager | | | ----------------> | | -------- ----------- -------- login accept+ ----------- | host | list of stores | manager | | | <--------------- | | -------- ----------- -------- ----------- | host | login request | Disk A1 | | | ----------------> | | -------- ----------- ----------- -------- ----------- | manager | | host | login accept | Disk A1 | | | | | <---------------- | | ----------- -------- ----------- . . . Figure 3-2. Login phase Klein [page 5] iSCSI Virtualization November 2000 3.3.2 Disconnection Upon disconnection, the host sends a FIN message to the manager. It is the responsibility of the host to close all his record of the group. The manager however, closes all paths from the host to the group. 3.4. Event The iSCSI event notification is conducted via Asynchronous Event messages. The event can be classified into two classes: 1. Events that concern specific host, or 2. Event that concern the entire group. The first class can relate to failure occurred during a session with some host, thus concern him only. The second class can be a general failure of a store. In the first case, the store should send Asynchronous Event message to the related host and to the manager. The message to the manager should include in the attached header the host name that the message was sent to. In the second case, the Asynchronous Event message is sent to the manager only. It is upon the manager to consider forwarding it to the hosts. 3.5. Manager Interface The manager interface includes SCSI commands and data. The interface should address the issues raised in the guiding principals. The iSCSI protocol encapsulates SCSI commands and responses from initiator to the target and vice versa. The host MUST initiate SCSI commands only to the manager. The manager replies with iSCSI status message response that includes header and attached data. The attached data contains iSCSI commands and stores that the host MUST issue. At the end of each phase, the store MUST send the status message to the host and the manager. Figure 3-3 illustrates the messages' flow in case of SCSI read command. -------- ----------- | host | SCSI command | manager | | | ----------------> | | -------- ----------- -------- SCSI status+ ----------- | host | command for A | manager | | | command for B...| | -------- <--------------- ----------- -------- ----------- | host | SCSI command | Disk A | | | ----------------> | | -------- ----------- -------- ----------- | host | SCSI Data | Disk A | Klein [page 6] iSCSI Virtualization November 2000 | | <---------------- | | -------- ----------- ----------- -------- ----------- | manager | | host | SCSI Status | Disk A | | | | | <---------------- | | ----------- -------- . ----------- . . Figure 3-3. SCSI commands flow 3.5.1. Securing the Interface If the system is integrated and safe, the above procedure will result a successful SCSI transaction. However, in an unsafe environment, the host (or imposer) can send direct commands to the stores without the knowledge and control of the manager. There are several levels of protection: 1. No security. In this environment, the network is a safe one (a local network) and the hosts are reliable. Therefore, no precautions are taken. 2. Authentication. An iSCSI authentication (or other, e.g., IPsec) is taken, providing data integrity. 3. Double authentication. The manager sends to the host together with the iSCSI commands keys for each command. In the last case, the store MUST check the key attached to the SCSI command from the host and compare it to the key assign by the manager. The command should be executed only if the keys match. These security environment suites the case when the group is in a safe local network and the host is connected via unsecured network, e.g., the Internet. -------- ----------- | host | SCSI command | manager | | | ----------------> | | -------- ----------- -------- SCSI status+ ----------- | host | command+key for A | manager | | | command+key for B | | -------- <--------------- ----------- -------- ----------- | host | Key A | Disk A | | | ----------------> | | -------- ----------- . . . Figure 3-4. Double security Klein [page 7] iSCSI Virtualization November 2000 Note that this does not affect other security elements as defined in the iSCSI or other protocols. 3.6. Management The iSCSI protocol does not define management. However, since all elements are in the IP world, standard management tools (SNMP and more) can be used for managing each element in the system. It is most desirable that configuration and management of the group will be done with the full knowledge and control of the manager. 3.7. Data Access Direct data access is not desired in the system since it bypasses the manager, causing coherency and other problems. Furthermore, if double authentication is performed, it disables the host from operating direct commands. However, the implementer can choose to allow such access. Note that sniffing, snooping or other mechanism should be activated by the manager in order to keep in coherency with the group. 3.7.1 Inter-Store Transactions Transaction between stores such as copy from one store to another, are fundamental in many applications. For example, mirroring, backup, etc. In order to accomplish it without latency, it is preferable that the stores will initiate the commands to one another. This draft will not address those issues. 4. Virtualization Example Consider the following sub-system: ---------- | host 1 | | | ---------- | | ------------------------------------------------------- | | | | | | | | | ----------- -------- -------- -------- ---- | manager | | Disk | | Disk | | Disk | | A | | A1 | | A2 | | A3 | 1000 blocks ----------- -------- -------- -------- ---- Figure 4-1. Example system A virtual group A is constructed from a manager and three stores: Disk A1, Disk A2 and Disk A3. Each store contains 1000 blocks. Thus, the virtual Klein [page 8] iSCSI Virtualization November 2000 group reflected to the host contains 3000 blocks. 4.1. Login The first phase is the login phase. The host is only aware of the manager. It sends a login request according to the iSCSI draft as if it was a simple target. The host and manager establish a new session (negotiating parameters, authenticating each other and so on). If the login phase ends successfully, the manager sends a login response message with "login accept" as a status. This message has an attachment in the data part that includes the list of stores in the group and their IP address. -------- ----------- | host | login request | manager | | | ----------------> | | -------- ----------- -------- establishment, ----------- | host | negotiation | manager | | | <-------------> | | -------- ----------- -------- login accept,A1-IP ----------- | host | A2,IP A3,IP | manager | | | <---------------- | | -------- ----------- Figure 4-2. Login phase example This ends the login phase between the host and manager. The host now should issue a login session with each store in the group. -------- ----------- | host | login request | Disk A1 | | | ----------------> | | -------- ----------- -------- establishment, ----------- | host | negotiation | Disk A1 | | | <-------------> | | -------- ----------- ----------- -------- ----------- | manager | | host | login accept | Disk A1 | | | | | <---------------- | | ----------- -------- ----------- -------- ----------- | host | login request | Disk A2 | | | ----------------> | | -------- ----------- -------- establishment, ----------- | host | negotiation | Disk A2 | | | <-------------> | | -------- ----------- Klein [page 9] iSCSI Virtualization November 2000 ----------- -------- ----------- | manager | | host | login accept | Disk A2 | | | | | <---------------- | | ----------- -------- ----------- -------- ----------- | host | login request | Disk A3 | | | ----------------> | | -------- ----------- -------- establishment, ----------- | host | negotiation | Disk A3 | | | <-------------> | | -------- ----------- ----------- -------- ----------- | manager | | host | login accept | Disk A3 | | | | | <---------------- | | ----------- -------- ----------- Figure 4-3. Login with the group Each login phase is a separate one. At the end of each phase, the store sends the login response message to the host and also to the manager. 4.2. SCSI Command Assume that the host wishes to read blocks 500-599 from the virtual volume. This virtual segment is physically distributed in the following way: Virtual Volume -------- | | | | | | | | /--- Disk A1 100 109 | | / Disk A2 200 209 500|------|/ Disk A1 140 169 | | Disk A3 400 419 | | Disk A1 800 809 | | Disk A2 300 309 600|------|\ Disk A2 400 409 | | \---- | | | | | | | | | | -------- 3000 blocks Figure 4-4. Allocation example The host sends an iSCSI command message to the manager with the CDB Klein [page 10] iSCSI Virtualization November 2000 according to the virtual addresses, i.e., read from start address 500 and size 100. The manager replies with iSCSI status message that contains in its data fields 7 iSCSI commands with the appropriate CDBs to be executes. The host executes each iSCSI command according to the order they arranged. -------- SCSI command, CDB: ----------- | host | read from 500 size 100 | manager | | | ----------------> | | -------- ----------- -------- SCSI status, SCSI ----------- | host | command, ... SCSI command | manager | | | <------------- | | -------- ----------- -------- SCSI Command, CDB: ----------- | host | read from 100 size 10 | Disk A1 | | | -------------> | | -------- ----------- -------- ----------- | host | SCSI data | Disk A1 | | | <------------- | | -------- ----------- ----------- -------- ----------- | manager | | host | SCSI Status | Disk A1 | | | | | <---------------- | | ----------- -------- ----------- -------- SCSI Command, CDB: ----------- | host | read from 200 size 10 | Disk A2 | | | -------------> | | -------- ----------- -------- ----------- | host | SCSI data | Disk A2 | | | <------------- | | -------- ----------- ----------- -------- ----------- | manager | | host | SCSI Status | Disk A2 | | | | | <---------------- | | ----------- -------- . ----------- . . Figure 4-5. Command flow example The status of each transfer between the stores is sent both to the host and the manager. 5. Protocol Specification This chapter describes in details the changes in the iSCSI protocol Klein [page 11] iSCSI Virtualization November 2000 definitions. The key point of the definition is that virtualization is optional. That is, iSCSI implementation without virtualization will not be affected. 5.1. Login Phase As described in chapter ²3.3.1, the host establishes a session with the manager. When a session is established, the manager sends a login response to the host. ------------------------------------------------------- | opcode | reserved (0) | ------------------------------------------------------- | length | ------------------------------------------------------- | storage size| reserved (0) | ------------------------------------------------------- | reserved | ------------------------------------------------------- | ISID | TSID | ------------------------------------------------------- | InitStatRN or 0 | ------------------------------------------------------- | reserved | ------------------------------------------------------- | ExpCmdRN | ------------------------------------------------------- | MaxCmdRn | ------------------------------------------------------- | Status | reserved | ------------------------------------------------------- | reserved | ------------------------------------------------------- Figure 5-1. Login response header Storage Size: 0 - This is a target only and not a manager (for non-virtualization or login to the stores). 1-255 - Defines the number of stores in the group. If the storage size indicates managed group, the attached data contains the information on the stores. The information is in the form (UTF-8 Unicode format): Disk_A:192.168.223.207 Disk_B:192.168.223.208 ... Or, name:ip address of all the stores in the group. The host MUST save this information and conduct login session with all the stores in the list. 5.2. Event The iSCSI event notification is conducted via Asynchronous Event messages. Klein [page 12] iSCSI Virtualization November 2000 As described in ²3.4, the event can be sent to the manager or both the manager and the host. Each event has a header and attached data. The header is as follows: ------------------------------------------------------- | opcode | | ------------------------------------------------------- | length | ------------------------------------------------------- | Logical Unit Number | | | ------------------------------------------------------- | Event type | ------------------------------------------------------- | Reserved | ------------------------------------------------------- | StatRN | ------------------------------------------------------- | ExpCmdRN | ------------------------------------------------------- | MaxCmdRn | ------------------------------------------------------- | SCSI Event | iSCSI event | | ------------------------------------------------------- | reserved | ------------------------------------------------------- Figure 5-2. Event Header Where the field "Event Type" defined as: 0 - The event is reported to the manager alone (or to the initiator alone in non-virtualization environment). 1 - The event is reported to the host and the manager. If the field is set, the attached data MUST include the following phrase: Group_event:host ip Indicating to the manager that the event was also sent to the specified host. 5.3. Manager Interface The manager interface includes SCSI commands and data. The iSCSI protocol encapsulates SCSI commands and responses from initiator to the target and vice versa. The host MUST initiate SCSI commands only to the manager. The manager replies with iSCSI response that includes header and attached data. The iSCSI response header is as follows: ------------------------------------------------------- | opcode | U | | | ------------------------------------------------------- | length | ------------------------------------------------------- Klein [page 13] iSCSI Virtualization November 2000 | reserved | | | ------------------------------------------------------- | initiator task tag | ------------------------------------------------------- | Residual Count | ------------------------------------------------------- | StatRN | ------------------------------------------------------- | ExpCmdRN | ------------------------------------------------------- | MaxCmdRn | ------------------------------------------------------- | comm. status| iSCSI status| | ------------------------------------------------------- | res len | sense len | ------------------------------------------------------- Figure 5-3. Response header Where the "iSCSI status" is as follows: 0 - Good status 1 - iSCSI check 2 - Attached group commands If the field value is 2, the data part includes iSCSI commands' headers that the host MUST execute. The data part of the response includes the following statements: Group_command:Disk_A,0 Group_command:Disk_B,48 Where the first word is the store name and the second in the pointer (ASCII format) of the command in the data field of the response. The host MUST execute the command as ordered directly to the stores. Each store receive the command, executes it (either read, write or other) according to the iSCSI protocol (RTT management and so on). At the end of the SCSI command, the store sends the status (of the local transaction) to the host and the manager. Thus, giving it full control and knowledge about the group. 6. Authors' Addresses Yaron Klein SANRAD 24 Raul Valenberg St. Tel-Aviv, 69719 Israel Phone +972-3-7659998 Email: klein@sanrad.com 7. References [1] Virtual Storage Architecture Guide (VSAG) in Virtual Storage System (VSS)(P1244.6), http://www.ssswg.org/public_documents/swd/vsag_1.ps [2] iSCSI (Internet SCSI), Julian Satran et al, in Klein [page 14] iSCSI Virtualization November 2000 http://www.ietf.org/internet-drafts/draft-satran-iscsi-01.txt [3] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997