Network Working Group G. Gibson Internet-Draft Panasas Inc. & Carnegie Mellon Expires: April 18, 2005 B. Welch Panasas Inc. G. Goodson P. Corbett Network Appliance Inc. October 18, 2004 Parallel NFS Requirements and Design Considerations draft-gibson-pnfs-reqs-00.txt Status of this Memo This document is an Internet-Draft and is subject to all provisions of section 3 of RFC 3667. By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she become aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 18, 2005. Copyright Notice Copyright (C) The Internet Society (2004). Abstract This draft specifies the requirements that should be satisfied in the definition of a parallel NFS protocol and the considerations recommended for its designs. It responds to the scalable bandwidth Gibson, et al. Expires April 18, 2005 [Page 1] Internet-Draft pNFS Requirements and Design Considerations October 2004 problem described in the pNFS Problem Statement, draft-gibson-pnfs-problem-statement-01.txt. In the interest of a timely adoption of scalable bandwidth file service, parallel NFS is proposed to be a NFSv4 minor extension for communicating file layout available through existing and future storage subsystem protocols such as other NFSv4 file servers (NFS), block-based SCSI subsystems (SBC), and object-based SCSI (OSD) subsystems. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. NFSv4 Minor Extension . . . . . . . . . . . . . . . . . . . . 5 3. Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1 Scalable Bandwidth . . . . . . . . . . . . . . . . . . . . 6 3.2 Scalable Capacity . . . . . . . . . . . . . . . . . . . . 6 4. Interoperability . . . . . . . . . . . . . . . . . . . . . . . 7 4.1 NFSv4 Interoperability . . . . . . . . . . . . . . . . . . 7 4.2 Storage Protocol Interoperability . . . . . . . . . . . . 7 4.3 Separability of Storage Protocols . . . . . . . . . . . . 7 5. Concurrent Sharing . . . . . . . . . . . . . . . . . . . . . . 8 5.1 Shared Direct Access to Storage . . . . . . . . . . . . . 8 5.2 Attribute Updates . . . . . . . . . . . . . . . . . . . . 8 5.3 Client caching . . . . . . . . . . . . . . . . . . . . . . 8 6. Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11 7.1 File Storage Access Protocols . . . . . . . . . . . . . . 11 7.2 Object Storage Access Protocols . . . . . . . . . . . . . 11 7.3 Block Storage Access Protocols . . . . . . . . . . . . . . 11 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 13 Intellectual Property and Copyright Statements . . . . . . . . 15 Gibson, et al. Expires April 18, 2005 [Page 2] Internet-Draft pNFS Requirements and Design Considerations October 2004 1. Introduction In many application areas, single system servers are rapidly being replaced by clusters of inexpensive commodity computers. As clustering technology has improved, the barriers to running application codes on very large clusters have been lowered. Examples of application areas that are seeing the rapid adoption of scalable client clusters are data intensive applications such as genomics, seismic processing, data mining, content and video distribution, and high performance computing. The aggregate storage I/O requirements of a cluster can scale proportionally to the number of computers in the cluster. It is not unusual for clusters today to make bandwidth demands that far outstrip the capabilities of traditional file servers. A natural solution to this problem is to enable file service to scale as well, by increasing the number of server nodes that are able to service a single file system to a cluster of clients. Scalable bandwidth can be claimed by simply adding multiple independent servers to the network. Unfortunately, this leaves to file system users the task of spreading data across these independent servers. Because the data processed by a given data-intensive application is usually logically associated, users routinely co-locate this data in a single file system, directory or even a single file. The NFSv4 protocol currently requires that all the data in a single file system be accessible through a single exported network endpoint, constraining access to be through a single NFS server. A better way of increasing the bandwidth to a single file system is to enable access to be provided through multiple endpoints in a coordinated or coherent fashion. Separation of control and data flows provides a straightforward framework to accomplish this, by allowing transfers of data to proceed in parallel from many clients to many data storage endpoints. Control and file management operations, inherently more difficult to parallelize, can remain the province of a single NFS server, inheriting the simple management of today's NFS file service, while offloading data transfer operations allows bandwidth scalability. Data transfer may be done using NFS or other protocols, such as iSCSI, under the control of an NFSv4 server with parallel NFS extensions. Such an approach protects the industry's large investment in NFS, since the bandwidth bottleneck no longer needs to drive users to adopt a proprietary alternative solution, and leverages SAN storage infrastructures, all within a common architectural framework. This document sets requirements for extensions to the NFSv4 protocol, the parallel NFS extensions, to enable the extended NFSv4 server to Gibson, et al. Expires April 18, 2005 [Page 3] Internet-Draft pNFS Requirements and Design Considerations October 2004 manage clients that are enabled to directly access storage. Gibson, et al. Expires April 18, 2005 [Page 4] Internet-Draft pNFS Requirements and Design Considerations October 2004 2. NFSv4 Minor Extension This document includes the definition of the requirements for protocol extensions to implement Parallel NFS. It is believed that this extension can fit within the minor-versioning of the NFSv4 protocol framework presented in RFC 3050. NFSv4's minor-versioning requirement specifies that no changes are to be made to an existing operation's arguments or results (with the exception of GETATTR4). Also, new operations may only be added to the COMPOUND and CB_COMPOUND procedures. Minor-versioning also requires that the Parallel NFS extension is compatible with all preceding NFSv4 minor versions. Accordingly, until a minor extension is accepted, its requirements may be impacted by the approval of another minor extension, although an impact like this by one minor extension on another is typically to be avoided. Gibson, et al. Expires April 18, 2005 [Page 5] Internet-Draft pNFS Requirements and Design Considerations October 2004 3. Scalability 3.1 Scalable Bandwidth A principle purpose for parallel NFS is to enable clients of an NFS service to achieve individual and aggregate file and file system bandwidths that can scale with storage device, storage networking and client resources. The core point in the parallel NFS problem statement [1] is that bandwidth scaling is not provided by the existing NFS approach of forwarding all data through a single network endpoint associated with the NFS file server. Parallel NFS must enable high bandwidth access by single clients and aggregates of clients, especially clusters of clients, into one file system, into possibly small and arbitrary collections of files, and into just one file. Moreover, a parallel NFS solution for scalable bandwidth must enable an NFS client to directly and in parallel access a file, possibly small and arbitrary collection of files or a file system that is spread over multiple distinct network endpoints. That is, it must be possible for single files and collections of related files to be "striped" over physically different storage subsystems each with its own network endpoint. 3.2 Scalable Capacity Parallel NFS must enable the capacity of a single file, a possibly small and arbitrary collection of files and a single file system to grow in proportion to the available storage resources. This reflects a recognition that when bandwidth scales, the size of the file(s) accessed should be expected to grow proportionately, and that striping over network endpoints is not required to be effective with arbitrarily small amounts of data residing at a single network endpoint. This requirement does not supersede file and file system limitations on the size of an individual file or file system. Gibson, et al. Expires April 18, 2005 [Page 6] Internet-Draft pNFS Requirements and Design Considerations October 2004 4. Interoperability 4.1 NFSv4 Interoperability Parallel NFS is a optional minor extension of NFSv4. Accordingly, any client capable of using the parallel NFS extensions must also be able to interoperate with an NFSv4 server that is not capable of using the parallel NFS extensions, and any NFSv4 server that is capable of using the parallel NFS extensions must also be able to provide full service for an NFSv4 client that is not capable of using the parallel NFSv4 extensions. 4.2 Storage Protocol Interoperability The protocols used by parallel NFS capable clients to directly access storage must be well defined, standards-based storage protocols. In the interest of wider applicability of parallel NFS, the extensions to NFSv4 that enable and manage a client's opportunity to directly access storage subsystems must be agnostic to actual storage protocol employed, and that it be possible for new storage protocols to be added to the set that a parallel NFS server supports. It is anticipated that parallel NFS storage protocols will be defined using (possibly) non-parallel NFSv4 as a storage protocol, using block-based SCSI (SBC) as a storage protocol and using object-based SCSI (OSD) as a storage protocol. SBC and OSD SCSI storage protocols, in at least some implementations, are anticipated to employ an iSCSI storage transport protocol. 4.3 Separability of Storage Protocols The interpretation of a layout, the bits a parallel NFS server gives to a parallel NFS client to enable the client to know how and where to directly access a file or file system striped over multiple storage network endpoints, is not needed for correct execution of the parallel NFS extension operations. At least one instance of a parallel NFS layout format and storage access protocol must be fully specified and multiply implemented. Gibson, et al. Expires April 18, 2005 [Page 7] Internet-Draft pNFS Requirements and Design Considerations October 2004 5. Concurrent Sharing 5.1 Shared Direct Access to Storage The parallel NFS extension should support shared access to storage by many clients. This includes access to the same storage devices by multiple clients, as well as access to the same files stored on one or more storage devices. The result extends the basic shared file system abstraction provided by NFS giving clients direct access to storage devices under the overall control of an NFS server responsible for authorizing such direct access and delimiting its scope and duration. The parallel NFS extension should allow clients to specify points in time at which updates must be made visible to other clients. This requirement is more conducive to optimizations that can lead to high performance. It also complements the programming model used by parallel applications. In this model, individual clients compute independently, generate results, and then synchronize with the overall computation. When storing results to shared storage, it may be necessary to communicate with the NFS server to ensure that updates are visible to other clients. When making these updates visible, it is important for efficiency to limit the need for separate interactions with the server to those points that are truly required by the demands of the application. 5.2 Attribute Updates File updates include changes to associated attributes that include the file size (i.e., end-of-file position), file modify time, file access time, and file change time. The parallel NFS extension allows that updates to these attributes follow the same model as data updates where updates are only guaranteed to be visible to other clients in response to explicit operations performed by the modifying client. The values of these attributes at other times may not be strictly defined. The parallel NFS extension acknowledges that some implementations may provide looser semantics for file access time. As well, the extension does not mandate strict implementation of the file access time attribute. 5.3 Client caching The parallel NFS extension does not address issues around client caching and the coherency of data stored in different client caches. Gibson, et al. Expires April 18, 2005 [Page 8] Internet-Draft pNFS Requirements and Design Considerations October 2004 The extension assumes that the existing mechanisms that NFS clients use to manage their cached data apply equally when they use parallel NFS. Likewise, the this extension should not prevent the implementation of a richer/stronger set of caching and coherency semantics. Gibson, et al. Expires April 18, 2005 [Page 9] Internet-Draft pNFS Requirements and Design Considerations October 2004 6. Recovery Error recovery is often the most difficult aspect of a protocol to achieve interoperability. For this reason these requirements place the most stringent demands on parallel NFS servers. But in the interests of performance and scalability, these requirements leave it open for client implementations to more fully participate in error recovery. Specifically, it should be possible for client implementations using parallel NFS extensions to have very simple recovery actions, albeit probably lowered performance, when coping with errors on the storage access protocols. Simple clients are envisioned to respond to storage access protocols by immediately notifying the managing parallel NFS server of the error. Upon completion of the NFS server's recovery, simple clients should be able to complete the action causing the error by re-execution. To make this especially simple, it must be possible for a simple parallel NFS client to re-execute using only NFSv4 operations. As a consequence of this recovery model, an operation, composed of one ore more component actions, applied by parallel NFS clients directly on storage must be idempotent at the client level. This is not a requirement for atomicity or transactions of the storage access protocol, only that it be possible to re-execute the client-level operation that experienced error, possibly using different component operations directly on storage or through the parallel NFS server, and achieve the same transformation on stored information. Gibson, et al. Expires April 18, 2005 [Page 10] Internet-Draft pNFS Requirements and Design Considerations October 2004 7. Security Considerations The parallel NFS extension must provide a level of security that is comparable to that defined in the NFSv4 specification. NFSv4 mandates end to end mutual authentication. All existing NFSv4 security mechanisms apply to the operations introduced by the parallel NFS extension. In all cases, this extension allows use of the direct NFSv4 path of sending both metadata and data requests through the metadata server. The security model provided by all specified parallel NFS storage access protocols must be well documented. Various storage access protocols will have different security mechanisms that protect against different types of attacks. Access protocols that rely on trusted environments should not be foreclosed. However, protocols that provide strong security guarantees will be available. 7.1 File Storage Access Protocols A file storage access protocol may have the same security mechanism between the client and metadata server as between the client and data server. ACLs set at the metadata server are effective at the data servers and need not be visible (via getattr) at the data servers. 7.2 Object Storage Access Protocols An object storage access protocol may rely on a cryptographically secure capability to control accesses at the data servers. These capabilities can be generated by the metadata server after it checks access control for a client. They are returned to the client and passed to the object storage device, which verifies that the capability allows the requested operation. 7.3 Block Storage Access Protocols A block storage access protocol would rely on SAN-based security, and the trust that clients will only access the blocks they have been directed to use. There are LUN masking/unmapping and zone-based security schemes that can be manipulated to fence clients from each other's data. Block storage access protocols may provide no guarantee of data integrity, since any client can modify any data block to which it has physical access. Gibson, et al. Expires April 18, 2005 [Page 11] Internet-Draft pNFS Requirements and Design Considerations October 2004 8. IANA Considerations The parallel NFS protocol extension provides for the naming of the specific storage access protocol. The storage access protocol's name is used by the client to interpret the layout information it receives from the metadata server. As well, the name specifies the storage access protocol to be used for accessing the data servers. The namespace is separated into (at least) three ranges. First, a range of names reserved for future standards-based storage protocol specifications (e.g., a block, file, and object storage protocol standard). Second, a range of names reserved for vendor proprietary protocols. Third, a range of names that are reserved for non-approved protocols (e.g., custom in-house protocols or for testing). Similar to NFSv4 named attributes, the parallel NFS protocol does not define the specific assignment of names to storage access protocols (nor does it define any specific storage access protocols). However, an IANA registry should be created for the registration of names in order to prevent collisions within the namespace. Along with the name, the format of the data layout and the storage access protocol should be well defined. The goal is to promote the interoperability of parallel NFS clients and servers. Gibson, et al. Expires April 18, 2005 [Page 12] Internet-Draft pNFS Requirements and Design Considerations October 2004 9. Acknowledgements Many members of the pNFS informal working group have helped considerably. The authors would like to thank Andy Adamson, David Black, Gary Grider, Benny Halevy, Dean Hildebrand, Peter Honeyman, Dave Noveck, Julian Satran, and Tom Talpey. 10 References [1] Gibson et. al, "pNFS Problem Statement", July 2004, . Authors' Addresses Garth Gibson Panasas Inc. & Carnegie Mellon 1501 Reedsdale Street Pittsburgh, PA 15233 USA Phone: +1 412 323 3500 EMail: ggibson@panasas.com Brent Welch Panasas Inc. 6520 Kaiser Drive Fremont, CA 94555 USA Phone: +1 510 608 7770 EMail: welch@panasas.com Garth Goodson Network Appliance Inc. 495 East Java Drive Sunnyvale, CA 94089 USA Phone: +1 408 822 6847 EMail: goodson@netapp.com Gibson, et al. Expires April 18, 2005 [Page 13] Internet-Draft pNFS Requirements and Design Considerations October 2004 Peter Corbett Network Appliance Inc. 375 Totten Pond Road Waltham, MA 02451 USA Phone: +1 781 768 5343 EMail: peter@pcorbett.net Gibson, et al. Expires April 18, 2005 [Page 14] Internet-Draft pNFS Requirements and Design Considerations October 2004 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Gibson, et al. Expires April 18, 2005 [Page 15]