nfsv4 D. Noveck Internet-Draft EMC Expires: September 30, 2011 P. Erasani L. Bairavasundaram NetApp P. Dai C. Karamonolis Vmware March 29, 2011 Storage Control Extensions for NFS Version 4 draft-dnoveck-nfsv4-storage-control-01 Abstract Developments in storage systems have made it important for applications to have control over the characteristics of the storage that will be used for their particular files. The development of pNFS has added to the usefulness of such control mechanisms as it has created the opportunity for the hierarchical organization of file names to be separated from the control of storage characteristics for individual files, including the assignment to storage locations to reflect the performance or other needs of those specific files. This document proposes extensions to NFS version 4 to allow storage requirements to be communicated to the NFS version 4 server. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 30, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. Noveck, et al. Expires September 30, 2011 [Page 1] Internet-Draft storage_ctl March 2011 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Noveck, et al. Expires September 30, 2011 [Page 2] Internet-Draft storage_ctl March 2011 Table of Contents 1. Storage Control Issues . . . . . . . . . . . . . . . . . . . . 4 2. Storage Choice and API Definition . . . . . . . . . . . . . . 6 3. Modes of Storage Choice . . . . . . . . . . . . . . . . . . . 7 4. Assuring Extensability . . . . . . . . . . . . . . . . . . . . 8 4.1. Requirements for Extensability . . . . . . . . . . . . . . 8 4.2. XDR Encoding for Extensability . . . . . . . . . . . . . . 9 5. Storage Control . . . . . . . . . . . . . . . . . . . . . . . 11 5.1. Property Types . . . . . . . . . . . . . . . . . . . . . . 11 5.1.1. Informative Properties . . . . . . . . . . . . . . . . 11 5.1.2. Enforceable Properties . . . . . . . . . . . . . . . . 12 5.2. Base Property Specifications . . . . . . . . . . . . . . . 14 5.2.1. Storage Size . . . . . . . . . . . . . . . . . . . . . 15 5.2.2. Storage Use Duration . . . . . . . . . . . . . . . . . 16 5.2.3. Storage Device Failure Limit . . . . . . . . . . . . . 16 5.2.4. Storage System Failure Limit . . . . . . . . . . . . . 17 5.2.5. Storage System Failure RPO . . . . . . . . . . . . . . 17 5.2.6. Storage System Failure RTO Properties . . . . . . . . 17 6. Uses of the Attribute storage_ctl . . . . . . . . . . . . . . 19 6.1. Use of storage_ctl when creating a file . . . . . . . . . 19 6.2. Use of storage_ctl in SETATTR . . . . . . . . . . . . . . 20 6.3. Use of storage_ctl in GETATTR/READDIR . . . . . . . . . . 21 6.4. Use of storage_ctl in VERIFY/NVERIFY . . . . . . . . . . . 21 7. The FETCH_SCNOTE Operation . . . . . . . . . . . . . . . . . . 23 8. Attribute Extension . . . . . . . . . . . . . . . . . . . . . 25 8.1. Experimental and Other Non-standardized Extensions . . . . 25 8.2. Standardized Extensions . . . . . . . . . . . . . . . . . 26 8.3. The storage_ext attribute . . . . . . . . . . . . . . . . 26 9. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 9.1. Errors . . . . . . . . . . . . . . . . . . . . . . . . . . 27 9.2. Semantic constraints . . . . . . . . . . . . . . . . . . . 28 10. Possible Future Work . . . . . . . . . . . . . . . . . . . . . 30 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 31 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 32 Noveck, et al. Expires September 30, 2011 [Page 3] Internet-Draft storage_ctl March 2011 1. Storage Control Issues Storage to which files may be assigned can differ in a number of ways, raising the issue of how to control the choice of storage for specific files. The range of such choices is not static but can be expected to increase as flash memory becomes an option whose use needs to be controlled, or various choices of types of local caching need to be made. Although all files may well be helped by such approaches, the degree to which they will be helped will vary with the type of file and the typical application reference pattern for it. In addition, the value of improved access will differ with quick access to certain files being of much greater value, thereby justifying the allocation of more expensive storage resources to such files. The traditional way that user decisions regarding assignment of storage resources have been effected is by assigning specific file systems to specific disks or sets of disks. Files placed in that file system thereby get the storage characteristics assigned to that file system. Where file systems contain storage of various types, various heuristics are used to assign files or pieces thereof, to storage of various types, generally without any external input about application needs. The creation of pNFS modifies this pattern in that data and metadata are separated. Where pNFS is used, assigning a file to a specific file system now controls only where the metadata is located. Different files may have their data assigned to different sorts of storage, potentially located on different servers. This gives rise to the need for a means by which the storage choice for a particular file may be made. NFS version 4.1 contains a layouthint attribute but this does not really address the problem. The focus of the layouthint attribute is on the striping configuration, but there is a need to control storage characteristics other than this. This is the case even when there is only a single stripe (that is, no striping). Even though this is not "parallel NFS," using pNFS in this way to provide a separation of data and metadata, with the ability to choose locations for data based on its characteristics subject to later change in a user- transparent manner is very powerful, particularly if the storage location is subject to intelligent management. Additionally, more sophisticated storage management arrangements make it desirable to have a way to specify details for storage handling, even when pNFS is not used. When a file system contains different sorts of storage, input regarding desired or necessary storage characteristics can be used to make storage assignment choices more Noveck, et al. Expires September 30, 2011 [Page 4] Internet-Draft storage_ctl March 2011 in line with application needs. As a result, the ability to specify desired storage characteristics can provide benfits, both when pNFS is used and when it is not, although pNFS has the most immediate set of needs for means by which to control storage selection. Noveck, et al. Expires September 30, 2011 [Page 5] Internet-Draft storage_ctl March 2011 2. Storage Choice and API Definition It needs to be noted that existing API's may not provide means by which some of the storage characteristics described herein may be communicated to NFSv4 in-kernel clients and from there, to NFSv4 servers. Nevertheless, definition of a means by which these storage characteristics may be communicated to the NFSv4 server is still useful for a number of reasons: Embedded clients for particular applications may specify this information even without any API deinition. Client implementations may use various less-than-perfect ways of specifying storage characteristics, assigning storage chatcteristics based on file ownership or other nominally unrealated characteristics that that corelate well with customer intentions. Note that if the absence of a standard kernel API were sufficient to stop this work, it also probably be the case that the absence of a means to communicate the information to remote servers might make the definition of that API not worth the effort. By defining some storage characteristics and a general means of communicating them and others (via an extension mechanism) we allow for either: The later development of API's to specify these storage characteristics. The developemt of API's to specify different sets of storage characteristics that can then be easily assimilated to this mechanism as extensions. Noveck, et al. Expires September 30, 2011 [Page 6] Internet-Draft storage_ctl March 2011 3. Modes of Storage Choice There are a number of different ways in which storage choices may be indicated: o The specific file system location(s) might be specified. o Specific types of storage might be specified with selection of such choices as SSD, SATA, or fiber channel SAN drives being made by the client and effected by the MDS. o Desired characteristics of storage including speed (latency and/or throughput), amount of storage that will be needed, safety (raid- level). Available storage would be selected to meet the required characteristics and would be subject to active management as the environment changes. These different modes of storage choice are all useful in different environments. Specification of a specific file system imposes the least need for a storage management infrastructure but it requires user/application knowledge. The other modes imply a sequence of progressively greater infrastructure requirements to map specifications to specific storage systems and a correspondingly smaller need for user/application knowledge of the storage environment. However, such modes of operation are very different from existing storage management paradigms and the precise ways in which applications and storage might communicate are not fully understood. Noveck, et al. Expires September 30, 2011 [Page 7] Internet-Draft storage_ctl March 2011 4. Assuring Extensability 4.1. Requirements for Extensability As the examples of different modes of storage choice suggest, there are potentially a large number of specific items that might be specified in order to effect storage choice. Further, in many cases, expected future developments in the area of storage can be expected to extend and otherwise modify the characteristics which might be specified. The need for extensibility is important as one might expect many ongoing developments, including those in the areas of storage hardware, and file systems, to create corresponding needs to specify relevant storage chatacteristics. For example, local caching, including writeback caching using flash, creates the opportunity for greatly improved performance, at the risk of greater complexity in dealing with network failures. This raises the issue of allowing the user to make the choice of whether this greater performance is worth the risks and difficulties. Similarly, the development of distributed file systems raises many choices where performance will need to be balanced against various forms of safety issues, with specific choices reflecting the specific needs of applications dealing with the storage. These situations and others that we may not be able to predict, require that any attribute scheme in this area allow the specification of multiple storage characteristics with the ability to easily extend the specification so that it incorporates new characteristics to govern storage selection. Further, the need for actual use testing before incorporation in an IETF standard, imposes new requirements as far as organizing specification of the characteristics. Having "working code" to effect characteristic selection is not sufficient to demonstrate usefulness. The working code may be trivial while finding out whether this set of characteristics make sense for applications to use or requires extension or modification before assuming its final form is not trivial. This may require significant trial use among a large set of users running different applications, before the details are ready to be standardized. These factors increase the need for flexibility, including non- private use of characteristics not yet standardized. Accommodating this need for flexibility has the potential for unduly interfering with interoperability and the design of this feature will need to Noveck, et al. Expires September 30, 2011 [Page 8] Internet-Draft storage_ctl March 2011 avoid that. 4.2. XDR Encoding for Extensability While each storage property could conceivably be made its own attribute, the burden that this would place on the IETF process would be immense. There would be necessary co-ordination (and almost certain confusion) as individual experimental properties needed temporary attribute numbers and then had to shift them to other more permanent numbers. Further, and even more of an issue, storage property definition would seem to require a minor version, which seems too heavyweight. This would slow down the process beyond what should be for something which was its own standard-track RFC. In order to address these issues, individual properties will be treated as sub-attributes within a single storage_ctl attribute. To simplify assignment of sub-attribute numbers, mainly in support of experimental use, multiple sub-attribute spaces will be supported, to allow independent development of features each involving multiple storage properties. Once such a feature is standardized, the definition of the specific sub-atribute space could simply be made the subject of a standards-track RFC, with no change to those using it. typedef uin32_t spacenum_sc; /* Individual property space id. */ typedef uint32_t bitmap_sc<*>; /* Bit map for the presence or absence of individual properties using bit numbers assigned for the space. Like bitmap4. */ typedef opaque proplist_sc<*>; /* Data associated with each of the properties in the bitmap_sc. Like attrlist4. */ struct section_sc { spacenum_sc SpaceSection; /* Section number. */ bitmap_sc WhichProperties;/* Bit map of properties present. */ proplist_sc PropertyData; /* Data for each of the properties specified in this section. */ }; typedef section_sc fattr4_storage_ctl<*>; /* The attribute may have one or more property sections. */ This form of property encoding allows the property set to be extended without requiring a new minor version. Also, by allowing property Noveck, et al. Expires September 30, 2011 [Page 9] Internet-Draft storage_ctl March 2011 space numbers to be assigned, property sets can be developed independently, and converted to a standard state without undue interruption to those using the earlier form. Noveck, et al. Expires September 30, 2011 [Page 10] Internet-Draft storage_ctl March 2011 5. Storage Control Storage, along with compute, memory, and network, is an integral part of an application's resources. Much like the other types of resources consumed by an application, storage needs can be described using a set of properties. These properties may serve to describe the characteristics of the storage, the intended usage both temporal and spatial, quality of service expectations, physical layout over available storage media, data access locations, geographical distribution, just to name a few. The collection of such properties together define the control an application ultimately wants to have on storage; conversely, they enable the storage system to more effectively and dynamically meet the application's needs as specifically expressed, rather than inferred, based on fallible heuristics. Henceforth, we will use the term control to refer to the property collection. It is not difficult to conceive various storage properties. In fact, there are numerous of them, due to the diversity of applications and the corresponding workload characteristics, the ever increasing storage value-adds in the form of data services, and the fast changing business requirements. It is an impossible task to capture all of them here. Rather, the goal of this document is to define a framework in which new properties can be easily added and new semantics of the properties can be introduced as necessary without disruption. It is desired that they be capable of being used in more limited situations, refined as necessary. 5.1. Property Types There may be numerous storage properties as mentioned above. We need, however, to distinguish at least two types, namely, informative properties and enforceable properties. There may very well be other systems or criteria when it comes to the classification of storage properties; and extensibility shall apply in this case just as it does to adding new storage properties. However, there is a need to explicitly capture the distinctions between informative and enforceable properties in the data model, due to the impact on the storage protocol semantics. 5.1.1. Informative Properties An informative property, as the name suggests, provides some descriptive information about the storage in question. Such information is furnished in a single direction from the application to the storage system with absolutely no "contractual" implications. The storage system may use the information captured in such a property for storage optimization. But it is not obligated to do so. Noveck, et al. Expires September 30, 2011 [Page 11] Internet-Draft storage_ctl March 2011 More importantly, the application is not offered any transparency as to how the storage system may utilize this information. As such, the information flow is strictly one-way without the prospect for any feedback. Examples of informative properties are the access pattern of the storage in use, the expected capacity need, and the estimated growth rate. 5.1.2. Enforceable Properties In contrast, an enforceable property may have embedded in it varying degrees of binding effect. By that, it means the application specifying the property has expectations that the storage system not only acts upon but also conveys the action status back in some way. Unlike the case of an informative property, the information flow in this case is truly bi-directional, with the backward direction for monitoring property status, including information on whether a property has been satisfied or is in the process of being satisfied. In that sense, an enforceable property has a resemblance to an agreement, where one might monitor the performance of the other party. Applications seeking tighter control of the storage may resort to the enforceable properties. Examples of enforceable properties could include the type and speed of sorage but could also include the availability, reliability, and average throughput and latency. 5.1.2.1. Enforcement Level To allow varying degrees of control, an enforcement level may be associated with an enforceable property. There are two levels of control possible, namely, advisory and mandatory. Regardless of the level, the storage system should strive to fulfill an enforceable property. The difference lies in the treatment of an inability to do so. With an advisory enforcement level, the storage system shall continue to carry out the operation even if the property could not be fulfilled; whereas with mandatory, the storage shall fail the operation without making any modification. In any case, the failure to fulfill an enforceable property can be communicated to the application. 5.1.2.2. Compliance Status While control may suffice to describe the ultimate storage requirements, i.e., the intended behavior once it has been fully implemented, it does not by itself capture the dynamic aspects of the implementation process. This is encompassed by the concept of "compliance" which indicates the extent to which requested storage properties have or have not been provided or whether they are still Noveck, et al. Expires September 30, 2011 [Page 12] Internet-Draft storage_ctl March 2011 in the process of being provided. Note that the word "compliance" as used here has no connection with this word as used to describe issues conformance with a set of legal requirements for record-keeping, among other matters. Control implementation can be a fairly heavyweight process by nature due to the data intensity involved. This may be true whether it is during the initial provisioning of storage, or the subsequent change management, or the remediation of compliance violation. The data intensive nature of the control implementation process implies that the transition from non-compliance to compliance will not be instantaneous in the general case. In other words, the implementation process remains asynchronous relative to the operation that triggers it. The asynchronous nature of the control implementation process may be captured by the compliance status. The compliance status may have three different values, namely, Current, Complying, and Failed. The value Current represents a fully compliant state. The value Complying refers to a transient state in which the transition to current is in progress. The value Failed represents an indefinite state of non-compliance. In the last case, the storage system may have made the determination that it is unable to fulfill some or all of the storage properties given the physical resources available. The application will work without, but its performance may not be what is desired. The compliance status describes the state of the control fulfillment as it pertains to each property. It applies to an enforceable property only. Its presence is not a syntactic requirement as defined by the XDR specification. Depending on the operational context in which the enforceable property is specified, specification of compliance status may be either invalid, required, or optional with the specification of more that one such status values possible in some cases. 5.1.2.3. XDR Encoding for Enforceable Properties Enforceable properties contain a word which is of type enforce_sc and allows the enforcement level and compliance status to be specified. To allow greatest flexibility, all enforcement statuses and compliance status values are specified as bit values, allowing sets of enforcement levels and complicance status, to be specified, as appropriate. Noveck, et al. Expires September 30, 2011 [Page 13] Internet-Draft storage_ctl March 2011 typedef uint32_t enforce_sc; const enforce_sc ENFORCE_MANDATORY = 0x1; const enforce_sc ENFORCE_ADVISORY = 0x2; const enforce_sc ENFORCE_CURENT = 0x10; const enforce_sc ENFORCE_COMPLYING = 0x20; const enforce_sc ENFORCE_FAILED = 0x40; For most purposes, enforcement words should have a single enforcement level, either ENFORCE_MANDATORY ENFORCE_ADVISORY. Any enforcement word containing both bits will result in NFS4ERR_SCTL_BADENF being returned. Specification of an enforcement word containing neither will generally result in in NFS4ERR_SCTL_BADENF being returned. However, it may be specified, when doing a SETATTR that specifies a reserved empty parameter value to remove a property specification. Also, it may be specified when doing an VERIFY or NVERIFY to specify a property without a defined enforcement level. When specifying a storage property as part of a OPEN, CREATE. or SETATTR, no enforcement level bits should be specified. If they are, the error NFS4ERR_SCTL_BADENF is returned. For values returned by the server in response to GETATTR, enforcement words, containing exactly one compliance status bit will be returned. When using storage properties as part of VERIFY or NVERIFY compliance words containing no compliance bits or any subset of the valid compliance status bits may be specified. 5.2. Base Property Specifications The goal for initial inclusion in an NFS version 4 minor version is to define a small set of property specifications that are generally useful and do not require a large management infrastructure to implement. The following are the three property specifications that fit the description. Noveck, et al. Expires September 30, 2011 [Page 14] Internet-Draft storage_ctl March 2011 const spacenum_sc SCNUM_BASE = 1; /* Base property space id for all properties in this group. */ const uint32_t SCBASE_SIZE = 0; /* Informative property for size. */ const uint32_t SCBASE_DURATION = 1; /* Informative property for duration. */ const uint32_t SCBASE_DEVFAIL = 2; /* Enforceable property for a device failure limit. */ const uint32_t SCBASE_SYSFAIL = 3; /* Enforceable property for a system failure limit. */ const uint32_t SCBASE_FAIL_RPO = 4; /* Enforceable property for a recovery point objective in the event of failure. */ const uint32_t SCBASE_SFAIL_RTO = 5;/* Enforceable property for a recovery time objective in the event of system failure. */ const uint32_t SCBASE_DLOSS_RTO = 6;/* Enforceable property for a recovery time objective in the event of data loss. */ const uint32_t SCBASE_DISASTER_RTO = 7;/* Enforceable property for a recovery time objective in the event of disaster. */ 5.2.1. Storage Size The storage size is an informative property that allows the specification of the expected amount of storage to be needed. It may be used by the server in seeing if appropriate space is available and in reserving space. It is specified as a 64-bit unsigned value giving a quantity of storage expressed in bytes. typedef uint64_t propbase_size; This value may be different from the expected file size. Areas not allocated, because of holes for example, are not included. This amount of storage may not be required immediately if the file starts small and grows. Any derating of specified values is purely a matter of server implementation choice and will typically reflect the ability to move data to respond to storage overcommitment. A value of zero is invalid and would result in the error NFS4ERR_SCTL_BADPARM when used in an OPEN or CREATE. When used in SETATTR, it causes deletion of a previous storage size specification. Noveck, et al. Expires September 30, 2011 [Page 15] Internet-Draft storage_ctl March 2011 5.2.2. Storage Use Duration The storage use duration is an informative property that allows the specification of the amount of time that the storage is expected to be needed. It may be used in assigning files to storage so that space conflicts are reduced. It is specified as a 64-bit unsigned value giving a duration in milliseconds. typedef uint64_t propbase_duration; This allows times from 1 millisecond up to approximately 500 million years to be specified. A value of zero is invalid and would result in the error NFS4ERR_SCTL_BADPARM when used in an OPEN or CREATE. When used in SETATTR, it causes deletion of a previous storage duration specification. 5.2.3. Storage Device Failure Limit The storage device failure limit is an enforceable property that allows the specification of a number of disk drives (or other devices) that can fail simultaneously with no data loss and that incurs zero recovery time. It must be the case that any set of devices of the specified can fail without data loss and with zero recovery time. Even though there is no recovery time, there may be a significant recovery period of modestly reduced performance while adaptation to the failure is done and until the completion of which, additional device failures will be considered simultaneous. The limit is specified as a 32-bit unsigned value giving the minimum count of simultaneous failures that can result in data loss to clients accessing the file. Storage is assigned which either matches this specification or provides a greater value. When pNFS is involved the specification applies to storage for the MDS and each DS. typedef uint32_t prop_dev_fail_lim; struct propbase_device_failure_limit { enforce_sc DflEnforce; prop_dev_fail_lim DflLimit; }; This allows values from zero to approximately 4 billion to be specified. A value of zero is valid and specifies that data loss is tolerable in the event of single device failure. (e.g. RAID-0) Noveck, et al. Expires September 30, 2011 [Page 16] Internet-Draft storage_ctl March 2011 5.2.4. Storage System Failure Limit The storage system failure limit is an enforceable property that allows the specification of the number of storage systems that must be able to fail simultaneously without complete data loss. Storage is assigned which either matches this specification or provides a greater value. When pNFS is involved the specification applies to storage for the MDS and DS's as a unit. typedef uint32_t prop_sys_fail_lim; struct propbase_system_failure_limit { enforce_sc SflEnforce; prop_sys_fail_lim SflLimit; }; This allows values from zero to approximately four billion to be specified. A value of zero is valid and specifies data loss in the event of a single storage system failure is tolerable. 5.2.5. Storage System Failure RPO The recovery point objective (RPO) is the age of files that must be recovered from backup storage for normal operations to resume if a computer, system, device, or network failure results in data loss. The RPO is expressed backward in time (that is, into the past) from the instant at which the failure occurs, and can be specified in seconds. It is an important consideration in disaster recovery planning. typedef uint64_t prop_sys_fail_RPO; struct propbase_system_failure_RPO { enforce_sc SfrpoEnforce; prop_sys_fail_RPO SfrpoTime; }; This allows values from zero seconds to a value far beyond the age of the universe to be specified. A value of zero is valid and indiactes that a real-time backup that reflects changes immediately as made is required. 5.2.6. Storage System Failure RTO Properties Recovery time objective (RTO) properties specify is the maximum tolerable length of time that storage assigned may be unavailable in the event of various classes of failures. There are three associated properties, each of which specifies this value for a particular class Noveck, et al. Expires September 30, 2011 [Page 17] Internet-Draft storage_ctl March 2011 of failure: The system failure RTO property, with the property id SCBASE_SFAIL_RTO, defines the recovery time objective in the event of failures that do not not involve data loss or data corruption. The data loss RTO property, with the property id SCBASE_DLOSS_RTO, defines the recovery time objective in the event of failures that do not not involve the occurrence of a disaster, defined as a major environmental event such as a hurricane, earthquake, or flood, etc. The system failure RTO property, with the property id SCBASE_DISASTER_RTO, defines the recovery time objective in the event of any falure including disasters. The actual RTO is a function of the extent to which the interruption disrupts normal operations and the provisions made to ameliorate this situation. The desired RTO is a function of the urgency to re- establish operations and the consequences of failure to promptly do so. It is an important consideration in recovery planning. typedef uint64_t propbase_sys_fail_RTO; struct propbase_system_failure_RTO { enforce_sc SfrtoEnforce; prop_sys_fail_RTO SfrtoTime; }; RTO values for all of these properties is specified as a 64-bit integer which specifies a number of microseconds. Although sub- second RTO values may be difficult, the specification allows small values which might be useful in the future. The maximum value is approximately five-hundred thousand years. Noveck, et al. Expires September 30, 2011 [Page 18] Internet-Draft storage_ctl March 2011 6. Uses of the Attribute storage_ctl There are four occasions in which the storage_ctl attribute is referred to as part of an fattr4 when the storage_ctl mask is present. o As an attribute specified when creating a file or similar object by means of an OPEN or CREATE operation, in order to specify the specific storage properies to control the locations on which the data is to be put and other associated properties. o As an attribute set in a SETATTR operation to change the requested location properties. Servers may or may not have the ability to change locations on request, but the operation structure will indicate whether the server has or doesn't have this ability when it is requested. o As an attribute read in a GETATTR or READDIR operation to determine the currently requested storage properties and the degree to which they are current being complied with. o As an attribute specified in VERIFY or NVERIFY to test for current location property compliance status. In addition to the above, a fattr4_storage_ctl of the of the same structure as storage_ctl attribute (although not within an fattr) also appears within the response data in the following situations. For the OPEN, CREATE, and SETATTR operations, when the error returned is NFS4ERR_SCTL_FAIL. (See Use of storage_ctl when creating a file and Use of storage_ctl in SETATTR for details). For the response to the FETCH_SCNOTE operation, when there is a pending storage control note to be reported. For most purposes, a fattr4_storage_ctl which appears in OPEN, CREATE, and SETATTR requests are handled the same and a fattr4_storage_ctl which appears in the responses for OPEN, CREATE, and SETATTR are handled similarly, while the VERIFY and NVERIFY requests form a third similarity group. 6.1. Use of storage_ctl when creating a file When the storage_ctl attribute is specified when creating a file, it helps decide on the location selected for the file data. If all enforceable properties can be immediately satisfied, then the operation proceeds normally. Noveck, et al. Expires September 30, 2011 [Page 19] Internet-Draft storage_ctl March 2011 If an enforceable property specified as with the manadatory enforcement level cannot be satisfied then the operation fails with the error NFS4ERR_SCTL_FAIL. The response contains, for the case NFS4ERR_SCTL_FAIL, a fattr4_storage_ctl value which consists all such enforceable properties which could not be satisfied. If there is a situation which is not as serious as the failure above, but still of note, then information relevant to that situation is stored as a pending storage control note, where it can be fetched (in the same COMPOUND) by the FETCH_SCNOTE operation. The following three classes of items are included in situations leading to a pending storage control note being created. o An enforceable property of the advisory enforcment level which could not be satisfied, i.e its compliance status is indicated as failed. o An enforceable property of the advisory enforcement level which could not be immediately satisfied, i.e. its compliance status is indicated as Complying. o An enforceable property of the mandatory enforcement level which could not be immediately satisfied, i.e. its compliance status is indicated as Complying. 6.2. Use of storage_ctl in SETATTR A value of the storage_ctl attribute with a structure similar to the OPEN case is used to change properties for an existing file. Existing elements properties, not changed by the storage_ctl attribute remain in effect. An enforceable property type and the same enforcement level status is overridden by a corresponding one in the new attributes. To delete such an enforceable property element without setting a new one, an enforceable property with no parameter values is used. Similarly, an informative property will override an existing one of the same type and use of the that property specification with no parameters is used to delete an existing informative propety specification without replacing it. Failures and notifications are indicated via the error code NFS4ERR_SCTL_FAILED and creation of pending storage control notes, just as in the case of OPEN. Noveck, et al. Expires September 30, 2011 [Page 20] Internet-Draft storage_ctl March 2011 6.3. Use of storage_ctl in GETATTR/READDIR When the storage_ctl attribute is requested as part of GETATTR or READDIR, the fattr4_storage_ctl returned within the file attributes reflects the current informative properties together with the enforceable properties and together with its current compliance status. The order of the elements need not reflect that used when the attribute was first set. When enforceable properties specify a range of multiple possible values, the one returned in the attribute will reflect the value actually assigned. 6.4. Use of storage_ctl in VERIFY/NVERIFY The storage_ctl attribute presented to VERIFY or NVERIFY is interpreted as a series of properties each of which results in a truth value. When the truth value for all properties presented is true, VERIFY succeeds and NVERIFY fails. Conversely when not all properties have that truth value, VERIFY fails and NVERIFY succeeds. When informative properties are present they are compared to the value set at OPEN, CREATE, or the last SETATTR. If no such value had been previously set, the result is treated as non-matching. Enforceable properties are classified according to three criteria: o Whether they have parameters that indicate specific values (With-P) or are the special values defined for that purpose for each parameter, which are treated as without parameters (Non-P) where the parameter values taken are those specified in the corresponding property within the file's attributes. o Whether they are, an enforcement level specified (With-Enf) or not (Non-Enf). o Whether they are together with one or more compliance level levels specified (With-Comp) or not (Non-Comp). Given the above classifications, the following sets of characteristics for enforceable properties in the context of storage_ctl for VERIFY, NVERIFY are treated as errors and should cause the return of the error NFS4ERR_SCTL_BAD. o Non-Comp/Non-Enf/Non-P o Non-Comp/Non-Enf/With-P Noveck, et al. Expires September 30, 2011 [Page 21] Internet-Draft storage_ctl March 2011 o With-Comp/non-Enf/Non-P o With-Comp/With-Enf/With-P Given the above classifications, the following sets of characteristics for enforceable properties in the context of storage_ctl for VERIFY, NVERIFY are handled as discussed below. Non-Comp/With-Enf/Non-P: is true iff there exists an enforceable property containing elements of the associated enforcement status as part of the storage_ctl attribute of the file. Non-Comp/With-Enf/With-P: is true iff the enforceable proeprty specified is compatible with the corresponding enforceable property of the associated enforcement level, i.e. if it is possible to satisfy both at the same time, without reference to whether both or either actually is satisfied. With-Comp/Non-Enf/With-P: is true iff the enforceable property (including a set of of property specifications of the same type) which appear in the storage_ctl attribute passed to the op is consistent with the set of compliance levels (often a single level but sometimes two) in the specification. That is, the actual compliance level must be one of the ones that is specified. With-CompB/With-Enf/Non-P: is true iff the enforceable property designated by this specification (i.e. that being of the same type of specification and the same enforcement level) is consistent with the set of compliance levels (often a single level but sometimes two) in this specification. That is, the actual compliance level must be one of the ones that is specified. Noveck, et al. Expires September 30, 2011 [Page 22] Internet-Draft storage_ctl March 2011 7. The FETCH_SCNOTE Operation 7.1. SYNOPSIS (cfh) -> note_pres, note_fattr 7.2. ARGUMENT /* CURRENT_FH: */ void; 7.3. RESULT enum SCFres_type { SCFres_ABSENT = 0, SCFres_PRESENT = 1 }; union SCFresok switch (SCFres_type note_pres) { case FETCH_PRES: fattr4_storage_ctl note_attr; case FETCH_ABS: void; }; union FETCHres switch (nfsstat4 status) { case NFS4_OK: /* CURRENT_FH: opened file */ FETCH4resok resok4; default: void; }; 7.4. DESCRIPTION The FETCH_SCNOTE operation is used to fetch a pending storage control note for a specified file handle (the current file handle). Note that these notes are stored according to the current file handle when the operation which gave rise to them was executed. Thus it will be the directory on (most) OPENs, and the specific file in the event of SETATTR. This operation uses the current filehandle value to identify the storage control note being sought. The operation returns an indication of whether the note is present Noveck, et al. Expires September 30, 2011 [Page 23] Internet-Draft storage_ctl March 2011 and if it is a fattr4_storage_ctl value which consists all enforceable properties where there is a lack of adequate compliance to be noted. The use of the the enum scnote_respval rather than a boolean value allows later extension. If the note is present, it ceases to be so once the operation is executed. 7.5. IMPLEMENTATION Storage control note items are maintained on a per-COMPOUND-request basis and cease to exist when a COMPOUND fails due to completion or an the occurrence of an error. This makes it desirable to place the FETCH_SCNOTE operation close to, generally immediately after the operation capable of generating the storage control note. Noveck, et al. Expires September 30, 2011 [Page 24] Internet-Draft storage_ctl March 2011 8. Attribute Extension 8.1. Experimental and Other Non-standardized Extensions In order to support development of extensions to allow control of new file system support attributes, extensions may be defined, each with their own proper space id. The goal is to allow quick deployment of new features, including those that are vendor-specific at the time with the definitions of extensions being publicly available. Each such extension set should be registered with IANA. The registration will include o A short name (a few words) by which the extension will be known. o The name or corporate identity of the owner of the extension. o Data for the first version of the namespace extension, as described below. IANA will assign a space id by which the extension will be known. Successive versions of spaceid properties should be registered by the owner of the extension. The registration should include: o The namespace name and number. o The namespace version number. The version number is in the form a series of small (< 256) integers. The length of the series will probably be restricted to something between four and six. The version numbers will not be checked for order but only that they are unique for a given extension. o A document in the form of an internet draft with information on the namespace elements paralleling this one. The document will contain definitions and property numbers with the space id for all of properties within the extension. Successive version may add properties but may not delete them, clarifications to the semantics of existing properties may be made but substantive changes in their semantics should not be made. Existing properties may not be defines as invalid or mandatory-to- not-implement but they may be defined as incompatible with some set of new properties. The definitional document should be subject to expert review but the purpose of the review is to ensure that the document describes the Noveck, et al. Expires September 30, 2011 [Page 25] Internet-Draft storage_ctl March 2011 extension adequately. It should not be rejected simply because the expert would do things differently or believe the specified properties are useful. 8.2. Standardized Extensions Storage properties may be extended via a standards-track document in a number of ways. Such an extension may be part of a new minor version, but may also be done independent of in a standards-track document other than for a new NFSv4 minor version. When the extension occurs in a new minor version the document should make clear whether the additional properties are recommended (as is normally the case) or mandatory. The following forms of extension are all valid options: Adding additional properties to existing standardized property set such as PROP_BASE. Creating a new property set its own property set id. Converting a previous experimental property set to standards-track status based on the publication of the RFC [Need to clarify any possible transfer of ownership issues.] 8.3. The storage_ext attribute The storage_ext attribute is a per-fs attribute which contains information on the storage_ctl extensions suported by the server when used on the associated file system. Servers will often report the same value of the storage_ext attribute for all file systems, but client should not assume that this is the case. struct section_se { spacenum_sc SpaceSction; /* Section number. */ bitmap_sc WhichProperties;/* Supported properties. */ }; typedef section_se fattr4_storage_ext<�>; The storage_ext attribute consists of section_se arrays, each of which specify the supported properties for a specific space_id. The section_se arrays should be reported in ascending numeric order of spacenum_sc values. Noveck, et al. Expires September 30, 2011 [Page 26] Internet-Draft storage_ctl March 2011 9. Summary This chapter serves a reference guide to things discussed above. For a more discursive treatment, with less attention due syntax details, see above. 9.1. Errors This proposal would involve adding the following new errors to the NFS version 4 minor version in which it is included. NFS4ERR_SCTL_BADPROP Returned when the storage_ctl attribute contains properties with a space id unknown to the server, or with property bits whose diplacement in the bitmap corresponds to property numbers not known to the server as being associated with the current space id. This error is returnable by OPEN, CREATE, SETATTR, VERIFY, and NVERIFY. NFS4ERR_SCTL_BADPARM Returned when the storage_ctl attribute contains parameters defined as not valid in connection with the current property. This includes situations in which multiple properties contain values that are defined as inconsistent (as opposed to not being satisfiable). This error is returnable by OPEN, CREATE, SETATTR, VERIFY, and NVERIFY. NFS4ERR_SCTL_BADENF Returned when the the storage_ctl attribute contains a enforceable property whose enforce_sc is invalid, in that it contain multiple enforcement level bits, contains no enforcement level bits, in a context in which that is not allowed or contains a set of compliance specification bits that is not appropriate in the current context. This error is returnable by OPEN, CREATE, SETATTR, VERIFY, and NVERIFY. NFS4ERR_SCTL_BADDATA Returned when the storage_ctl contains a section_sc whose PropertyData array does not match the length of the properties specified in the associated WhichProperties. This error is returnable by OPEN, CREATE, SETATTR, VERIFY, and NVERIFY. Noveck, et al. Expires September 30, 2011 [Page 27] Internet-Draft storage_ctl March 2011 NFS4ERR_SCTL_FAIL Returned when a required storage_ctl element cannot be satisfied. This is as opposed to the case in which it is not being able to be satisfied immediately but is in the process of being satisfied. This error is returnable by OPEN, CREATE, and SETATTR only. 9.2. Semantic constraints This section lists the semantic contraints on property specifications. We will have situations in which the attribute will fully match specified XDR specification but the specification will not be in line with appropriate contextual constraints. This section will list those constraints, in order to complement the XDR definition above. There are four categories of constraints that need to be dealt with: o Whether the properties have the associated parameters specified. o Whether the properties have an associated enforcement level specified. o Whether the properties have associated compliance level(s) specified. o Constraints that involve the validity of combinations of what are otherwise allowed situations with regard to the above. Each property specifies a particuar value which is invalid and is to be treated as inicateing the absence of property parameters (zero values, zero-length arays, etc.). Specification of the parameters associated with storage properties are generally required and so these special value result in NFS4ERR_SCTL_BADPARM being returned. The only exceptions are SETATTR, for which a storage property without parameters serves to delete the corresponding storage propery in the existing attribute, and VERIFY/NVERIFY where it is allowed under some circumstances, to be discussed below. Specification of the enforcement level is generally required for enforceable properties. The only exception is VERIFY/NVERIFY where it is allowed under some circumstances, to be discussed below. Specification of the compliance status for enforceable properties depends on the context in which the properties appears. For OPEN, CREATE, and SETATTR, specification of compliance status is not allowed. VERIFY/NVERIFY specification of multiple compliance status values is allowed, subject to the specific combination constraints Noveck, et al. Expires September 30, 2011 [Page 28] Internet-Draft storage_ctl March 2011 appropriate to VERIFY and NVERIFY as listed below. For all other contexts, whether in GETATTR, READDIR, the responses in the NFS4ERR_SCTL_FAIL case, or in the response to the FETCH_SCNOTE operation, specification of compliance status is required but only a single compliance status must appear. In addition to the constraints listed above, in the case of a storage_ctl attribute within VERIFY/NVERIFY, the properties within the attribute must meet the additional constraints described in the section Use of storage_ctl in VERIFY/NVERIFY When sending responses to GETATTR, READDIR, OPEN, CREATE, and SETATTR, the server MUST obey these constraints. When receiving OPEN, SETATTR, VERIFY, and NVERIFY requests that contain the storage_ctl attribute, the server MUST return the error NFS4ERR_SCTL_BADENF if the attribute does not follow the specified constraints and is otherwise valid (matching the XDR property deinition). These constraints apply to properties introduced by extensions to the storage_ctl attirbute unless explicitly overridden in the document defining the extension. Such a document may add other contextual constraints that apply to the properties defined by that extension. Noveck, et al. Expires September 30, 2011 [Page 29] Internet-Draft storage_ctl March 2011 10. Possible Future Work This document describes a basic framework for storage control and a basic set of properties. It is a base for development of this feature and could have considerable additions before incorporation in NFSv4 an minor version. On the other hand, the feature is intended to be defined with sufficient flexibility that many of these additions to the feature might be done as subsequent extensions, after the basic feature is made part of an NFSv4 minor version. The question of which additions are required for an initial version of the feature, which are best deferred to later and which proposed extensions don't really belong is a complex one and will be a major subject of the development of the feature. The following list, illustrates some of the possible additions that have had some preliminary discussion. It is not intended to be exhaustive, and the examination of other additions not yet thought of is definitely part of the work to be done: Addition of other properties to those in this document, that make sense as a basic set of properties, both informative and enforceable, for an initial set to be part of an NFSv4 minor version. Mechanisms to allow a set of properties to be applied to a large set of files, including those that are directory-based (with inheritance a possible part of the mix), by bulk attribute change on a client-specified set of files, or by allowing the client to store some set of properties as a persistent object in file system, and allowing subsequent storage control attributes to reference that persistent object. Mechanisms to enable the client to determine possible choices (or ranges) for some properties within the context of a given server. This would be to simplify and streamline property negotation. Mechanisms by which a server could advertise various possible sets of property choices to deal with environments where there only exists a small set of possible choices each effecting a particular choice for many properties, as opposed to a case where multiple independent property choices are possible. Noveck, et al. Expires September 30, 2011 [Page 30] Internet-Draft storage_ctl March 2011 11. Acknowledgments Mike Eisler reviewed early drafts of this work and made important contributions in helping define the direction of the effort. David Black reviewed many drafts of this work and made many helpful suggestions that improved the quality of the result. Noveck, et al. Expires September 30, 2011 [Page 31] Internet-Draft storage_ctl March 2011 Authors' Addresses David Noveck EMC 228 South St. Hopkinton, MA 01748 US Phone: +1 508 249 5748 Email: david.noveck@emc.com Pranoop R. Erasani NetApp 48980 Oat Grass Terrace Fremont, CA 94539 US Phone: +1 408 822 3282 Email: pranoop@netapp.com Lakshmi N. Bairavasundaram NetApp 475 East Java Drive Sunnyvale, CA 94089 US Phone: +1 408 419 5616 Email: lakshmib@netapp.com Peng Dai Vmware 5 Cambridge Center Cambridge, MA 02142 US Phone: +1 617 528 7592 Email: pdai@vmware.com Noveck, et al. Expires September 30, 2011 [Page 32] Internet-Draft storage_ctl March 2011 Christos Karamonolis Vmware 3401 Hillview Ave. Palo Alto, CA 94304 US Phone: +1 650 427 2329 Email: ckaramonolis@vmware.com Noveck, et al. Expires September 30, 2011 [Page 33]