INTERNET DRAFT EXPIRES JULY 1998 INTERNET DRAFT Network Working Group M. McKinlay Internet Draft Cumulus Data Systems (UK) Ltd Category: Infomational January 1997 Proposal for the object-oriented, cross-platform filesystem (OFS) Status of This Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet- Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2 2. Overview of the object-oriented filesystem . . . . . . . . 2 3. Structure of the sub-filesystem (SFS) layer . . . . . . . 3 3.1 Filesystem identification . . . . . . . . . . . . . . . 3 3.2 Filesystem information . . . . . . . . . . . . . . . . . 4 3.3 Bootstrap code . . . . . . . . . . . . . . . . . . . . . 4 3.4 How objects are allocated using the SFS layer . . . . . 5 3.5 Example . . . . . . . . . . . . . . . . . . . . . . . . 6 3.6 Note . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4. Object storage layer (OSL) structure . . . . . . . . . . . 8 4.1 ID . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.2 Parent . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.3 Owner . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.4 Group . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.5 Start . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.6 Flags . . . . . . . . . . . . . . . . . . . . . . . . 10 4.6.1 Object type . . . . . . . . . . . . . . . . . . . . 10 4.6.1.1 Normal object . . . . . . . . . . . . . . . . . . 10 4.6.1.2 Directory object . . . . . . . . . . . . . . . . . 11 4.6.1.3 Block and character devices . . . . . . . . . . . 11 4.6.1.4 Link . . . . . . . . . . . . . . . . . . . . . . . 11 4.6.1.5 User and group . . . . . . . . . . . . . . . . . . 12 4.6.1.6 Class . . . . . . . . . . . . . . . . . . . . . . 12 4.6.2 Access modifiers . . . . . . . . . . . . . . . . . . 12 4.7 Name . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.8 How the object table expands and shrinks . . . . . . . 14 5. Compatibility and cross-platform issues . . . . . . . . 14 6. Filesystem maintenance tools . . . . . . . . . . . . . . 14 6.1 Filesystem check . . . . . . . . . . . . . . . . . . . 14 6.2 Defragmenter . . . . . . . . . . . . . . . . . . . . . 15 7. Security considerations . . . . . . . . . . . . . . . . 15 8. The virtual OFS . . . . . . . . . . . . . . . . . . . . 15 9. Author's address . . . . . . . . . . . . . . . . . . . . 15 McKinlay [Page 1] Proposal for the object-oriented, January 1998 cross-platform filesystem (OFS) 1. Introduction This memo is being distributed to members of the Internet community in order to solicit their reactions to the proposals contained in it. While the issues discussed may not be directly relevant to the research problems of the Internet, they may be interesting to a number of researchers and implementers. This memo attempts to lay out the structure and programming methodology required to implement the object-oriented filesystem (OFS) on a range of differing computing platforms. Currently, there are many different methods of depositing objects within computer storage media, on a variety of computing platforms. Each has it's own advantages and disadvantages, but it is very rare to find a single type of filesystem used on a range of different platforms. Since the Internet and Intranets have recently gained in popularity, both in home, academic and corporate use, it has become far more the norm to transfer objects between different types of computer system. Because of this, the OFS, as defined by this proposal, was designed with cross-platform compatibility issues in mind. This document covers the following areas: o Overview of the object-oriented filesystem o Structure of the sub-filesystem (SFS) layer o Object storage layer (OSL) structure o Compatibility and cross-platform issues o Filesystem maintenance o Security Considerations o The Virtual OFS 2. Overview of the object-oriented filesystem The object-oriented filesystem (OFS) is fundamentally split into two sections; the sub-filesystem (SFS) layer and the object storage layer (OSL), which is the area of the computer media where the actual objects are stored. McKinlay [Page 2] Because the two layers do not overlap, it is not totally necessary for the two layers to be kept distinct. Having said this, it can be useful, for the sake of modularity and flexibility, to keep them separate. The SFS layer simply handles block allocation, filesystem identification and bootstrap code, therefore it is possible to implement a completely different type of filesystem, based on the sub-filesystem layer of the OFS. Similarly, it is possible to make use of the OSL, using a completely different sub-filesystem layer as the basis for the OFS. The inherent modularity allows for developer to be very flexible, whilst still retaining compatibility, despite customising the SFS layer to work with different platforms. 3. Structure of the sub-filesystem (SFS) layer The SFS layer, as briefly explained previously, is split into the following sections: o Filesystem identification and information o Block allocation o Platform-specific bootstrap code 3.1 Filesystem identification The filesystem is identified by sixteen bytes, the first eight of which are ignored, that mark the filesystem as OFS-compatible. The identification block is located at the very beginning of the storage media. The first eight bytes should usually contain a platform-specific instruction to jump to the start of the bootstrap code and execute it. The eight bytes that succeed them contain the following: ------------------------------------------------- Byte Ý 8 Ý 9 Ý A Ý B Ý C Ý D Ý E Ý F Ý Ý-----Ý-----Ý-----Ý-----Ý-----Ý-----Ý-----Ý-----Ý Value Ý 'M' Ý 'i' Ý 'm' Ý 'a' Ý 's' Ý 'F' Ý 'S' Ý NUL Ý ------------------------------------------------- This is simply the string 'MimasFS' followed by a single null byte (ASCII code 00). This string must be located on the media starting at the ninth byte (byte 8), and ending at the sixteenth byte (byte 15, or byte Fh). McKinlay [Page 3] 3.2 Filesystem information The filesystem information table has the following structure: Byte Contents Size -------------------------------------------------------------------- 0F Filesystem size (in 4kb blocks) 32 bits (4 bytes) 13 Start of second layer (in 4kb blocks) 32 bits (4 bytes) 24 Platform-specific data 64 bits (8 bytes) -------------------------------------------------------------------- The filesystem size (in blocks) is calculated by taking the size of the filesystem (in kilobytes) and dividing by four. For example, for a 2000kb filesystem, the filesystem size value would be calculated as: Filesystem size = 2000 / 4 = 500 blocks Hence the value entered into the first field would be 500. The block number of the start of the second layer is calculated as: Size of the identification table plus the size of the information table plus the number of blocks multiplied by four plus the size of the platform-specific bootstrap code. This value is divided by 4096, and rounded up to the nearest whole number, this value is incremented by one (as blocks as numbered from 1 not 0, as 0 indicates an unused block in the allocation table). The remaining 64 bits are left 'reserved' for platform-specific use that is not defined by this memo. 3.3 Bootstrap code The bootstrap code is entirely platform specific, and, when executed, should normally perform one of the following tasks: o Display a message to the user that the media contains no operating system; McKinlay [Page 4] or o load the operating system from the media into the computer's memory. 3.4 How objects are allocated using the SFS layer This section describes how the sub-filesystem layer handles the allocation of blocks for object storage. Fundamentally, the block table stores the information relating to the 'chains' of blocks which comprise each object. As an example, the first few entries in an empty (excluding a 1-block SFS layer) block table would look like this: Entry number Value ------------------------------------------ 00000001 00000001 00000002 00000000 00000003 00000000 00000004 00000000 00000005 00000000 : : nnnnnnnn 00000000 ------------------------------------------ The first entry in the table, relating to the first block in the OSL, has it's value set to it's block number. This signifies that it it is the last block in an object chain. In this case, it is the only entry in the chain (it a special entry, that is marked as allocated, to prevent operating systems from overwriting the SFS layer). The remaining entries in the table have the value 00000000, indicating that they are unused. The number of blocks that are needed to store an object is calculated by taking the size of an object, in bytes, dividing by 4096, and then rounding up to the nearest whole number. Each block is allocated by the operating system searching the block table for an unused entry (value 00000000). There are various methods of doing this, some dependant on the size of the object, but are not defined by this memo. Each entry in the block table normally contains the number of the next block in it's repective chain. Exceptions to this are if the block is that last in a chain, in which case it's value is it's block number, of if the block is unused, in which case it's value is 00000000. McKinlay [Page 5] No block may be a member of more than one chain, and a chain cannot contain the same block more than once. See section 6.1, Filesystem check, for more information on the rules governing blocks and chains. 3.5 Example As an example, let us say that we have a 6 kilobyte object. The number of blocks required to store this object would be 2 (6 / 4 = 1.5, rounded up = 2). The operating system allocates two blocks. For this example, let us say that blocks 00000012 and 00000015 were allocated by the operating system for use by this object. After the appropriate modifications to the object table have been made, the resultant block table would look something similar to this: Entry number Value ------------------------------------------ 00000001 00000001 : 00000012 00000015 : 00000015 00000015 : nnnnnnnn 00000000 ------------------------------------------ The value in entry 00000012 indicates that the next block in the chain is block 00000015. The value in entry 00000015, is the same as it's entry number, indicating that it is the last block in the chain. Let us say that the object grows by 12 kilobytes (and hence three blocks). The operating system allocates three new blocks, numbers 00000024, 00000032, 00000047. After the appropriate entries have been modified, the block table would look similar to this: Entry number Value ------------------------------------------ 00000001 00000001 : 00000012 00000015 : 00000015 00000024 : 00000024 00000032 : 00000032 00000047 : 00000047 00000047 : nnnnnnnn 00000000 ------------------------------------------ McKinlay [Page 6] Notice now that the chain has grown, and that block number 00000047 is now the last block in the chain, where previously entry 00000015 contained the value indicating that it was the last block in the chain. Entry 00000015 now contains the value of the next block in the chain, 000000024. So far, we have seen how to create an object, and allocate blocks for it's storage as it grows, but what about the mechanism used when an object shrinks, or is deleted. As an example, let us say that the object described above shrinks by one block. All that has to be done is to mark the penultimate block as the last block, and mark what was the last block as unused. After doing this, the entries would look like this: Entry number Value ------------------------------------------ 00000001 00000001 : 00000012 00000015 : 00000015 00000024 : 00000024 00000032 : 00000032 00000032 : 00000047 00000000 : nnnnnnnn 00000000 ------------------------------------------ Notice that block 00000032 is now the last block in the chain, and that block 00000047 is unused. To completely delete an object, the operating system should simply mark all of the blocks in a given object's chain as unused (i.e value 0000000). 3.6 Note Please note that: The SFS layer does not handle the following aspects of the object storage mechanism: o The actual size of the object (in bytes, not the number of blocks used to store it). o Locating the start of each object chain. McKinlay [Page 7] o The hierachical filesystem structure. These aspects are handled by the object storage layer, as described below. 4. Object storage layer (OSL) structure The object storage layer works in tandem with the SFS layer to provide a means of locating files by name and to provide a structure to the filesystem. The OSL is based around a single object, the object storage table (OST), the size of which can grow or shrink depending on the number of objects in the table. The means by which the table can change size will be described later on in this memo. Each record in the OST has the following structure: Field Contents Size -------------------------------------------------------------------- ID A unique value identifying the 32 bits object. Parent The ID of the object's parent. 32 bits Owner The user ID of the object's owner. 32 bits Group The group ID of the object's group. 32 bits Start The block number of the start of the 32 bits object's block chain. Flags Various attributes that describe an 12 bits object's type, visibility, and so on. Name The name of the object. 256 bytes -------------------------------------------------------------------- Each of the fields in the record strucure will be discussed below: 4.1 ID The ID field is simply a unique value, assigned by the operating system, which identifies the object within the filesystem. The developer may choose any method to generate the ID, providing that: a) The value is greater than zero McKinlay [Page 8] b) The value fits into a 32-bit unsigned word If the value is zero, or already assigned to an object, then the entry is invalidated. 4.2 Parent The parent field contains the ID of the parent object, i.e. the object that this object is contained within. If the object does not have a parent (i.e. it is situated at root level), then the value of this field should be zero. 4.3 Owner The value of this field can be one of the following, depending on how the operating system handles ownership: a) The ID of the user that owns the object (the ID is assigned by the operating system). b) The ID of the user object, that represents the user that owns the object (see section 4.6.1.5). The value of 0 is taken to mean the operating system's root (Administrator) user, which has full control over all objects in the filesystem, regardless of owner and flags settings. 4.4 Group The group field works in exactly the same way as the owner field, except that it specifies the ID of a group of users, instead of a single user. The flags may be set so that all members of this group have full access to this object. This is usually the primary group that the user specified by the user field is a member of. 4.5 Start This specifies the first block (relative to the start of the filesystem, not the OSL), in the chain of blocks that make up this object. A value of zero indicates that this object is 'empty'. This field cannot point to a block which is: a) Unused b) Part of a different chain McKinlay [Page 9] 4.6 Flags The flags field specifies various attributes which may or may not be set on an object, and are used to describe an object's core type, and who can access the object in different ways. Bits Contents -------------------------------------------------------------------- 0-2 Object type 3-5 World access modifier 6-8 Group access modifier 9-11 Owner access modifier -------------------------------------------------------------------- 4.6.1 Object type Binary Decimal Meaning -------------------------------------------------------------------- 000 00 Normal object 001 01 Directory 010 02 Block device 011 03 Character device 100 04 Link 101 05 User 110 06 Group 111 07 Class -------------------------------------------------------------------- 4.6.1.1 Normal object A normal object is simply an object that stores data, and optionally other objects. It may only store one type of data, but can act as an 'index' for the objects it contains, specifying positioning information, and so on. The index may be in any format, for example, HTML, but every object starts with a header, followed by a single blank line, then the object's data itself. The header is made of any number of fields (most of which are optional). Each field is written by specifiying the field name, a colon, ':', a space (ASCII 32), the value of the field, and then a carriage return. McKinlay [Page 10] The 'Content-type' field must always be present, to identify the format of the object's content. Any other fields are operating system and application specific, and any 'unknown' fields should be ignored. Also, the 'Class-type' field must also be present, and this specifies the object ID of the class that defines that object. When an object is read by the operating system, it's fields should be skipped, and accessed via a separate interface. The 'Content-type' field should specify the type as a MIME (Multi-purpose Internet Mail Extensions) type, as used by the HTTP protocol for internet and intranet page transmission, in the form of web pages. 4.6.1.2 Directory object A directory object optionally contains other objects, but has no data of it's own. Directory objects work in the same way as folders on the Apple Macintosh platform and directories under DOS and UNIX. 4.6.1.3 Block and character devices The two device object types represent I/O devices. The actual information relating to device types, driver parameters, and so on, is operating system specific, and is stored in the data associated with the object. The developer may decide to only allow devices to be stored in a certain location within the filesystem (i.e., within an object called 'Devices'). This is entirely operating system specific and so not covered by this memo. 4.6.1.4 Link A link object should work in the same way that a symbolic link should work under UNIX, a shortcut works under Windows 95 or an alias works on the Macintosh. Basically, a link acts as a 'shadow' of an object - some operations performed on the link affect the target of the link, whereas some operations only affect the link itself. The data stored with the link should simply be a header field specifying the ID of the target object. Any other data should be ignored, and is operating system specific. The following table describes which operations affect the link object itself, and which affect the target: McKinlay [Page 11] Operation Affects ------------------------------------------------------------------ Read Target Write Target Execute Target Delete Link Rename Link Copy Can be either (defaults to the target) Move Link ------------------------------------------------------------------ 4.6.1.5 User and group The user and group object types define users and groups of users respectively. The data associated with these objects is operating system specific. The developer may decide to only allow the user and group objects to be stored within certain other objects (e.g. a top-level object called 'Users'), and prevent other objects from being stored within them. This is entirely operating system specific, and is to be decided by the developer. 4.6.1.6 Class Class objects define the different classes of objects available within a system. Classes should support calling of methods, setting properties, and inheritance. Basically, the other objects stored within the filesystem are all instances of the various class objects. How each class object reads the data associated within each object, and how classes themselves are handled is to be decided by the operating system developer and is not covered by this memo. 4.6.2 Access modifiers Access modifiers specify what access different users (or groups of users) have to an object. McKinlay [Page 12] The 'owner' access modifier specifies what access the owner of the object has to it. (Read, write, execute, or a combination of the three). The owner and operating system defined 'root' user can change the access modifiers set on an object (whatever they may be currently set to). For example, if a user accidentally sets an object so that nobody has access, although they can't read, write or execute the object, they can still change the access modifiers to give themselves access. The 'group' access modifier specify the access that users, other than the owner of the object, in the same group as the owner, have to an object. This is typically set so that members of the group have execute and read access, but not write access. The 'world' access modifier covers anybody else not covered by the previous two modifiers; i.e. users who aren't in the same group as the owner. The values of each access modifier are shown below, and are normally written in octal, owner acess first, group second, world last. For example, 750 would mean the owner had an access modifier of 7, the group would have an access modifier of 5, and everybody else would have an access modifier of 0. The meanings of these values will be explained by the table below: Binary Decimal Meaning ------------------------------------------------------------------ 000 00 No access 001 01 Execute access 010 02 Write access 011 03 Execute and write access 100 04 Read access 101 05 Read and execute access 110 06 Read and write access 111 07 Read, write and execute access (full) ------------------------------------------------------------------ 4.7 Name The name field stores the name of each object. The name of each object may be as long as 256 characters long, and must be padded with null (ASCII 00) characters. The name field may not contain the colon, ':', or forward-slash, '/', characters. McKinlay [Page 13] 4.8 How the object table expands and shrinks The object table expands and shrinks in exactly the same way as any other object stored within the filesystem would. Using the block-allocation method, as described earlier (section 3), the object table is itself stored within the filesystem. It is up to the developer as to whether the object table keeps an entry for itself. The object-table chain always starts at the block succeeding the last block in the boot-block chain. 5. Compatibility and cross-platform issues There are several areas of this memo which deliberately leave sections to the developer's own initiative. These are mainly due to the fact that certain things (such as device drivers) are handled in completely different ways by different operating systems. Similarly, the bootstrap code, because it is executable ode, is by definition platform-specific. Certain other parts of the specification have been deliberately optional, such as the user and group objects, which also may be restricted in certain ways. These types of object, along with the device drivers would not usually be copied between filesystems, except under special circumstances. Despite the large amounts of flexibility, it is still viable for objects to be shared easily between different platforms, without the need for cumbersome FTP to a computer that is within arm's reach. 6. Filesystem maintenance tools There are two types of filesystem maintenance tool that would normally be used on an OFS filesystem: o Filesystem checking o Defragmenters 6.1 Filesystem check The filesystem check should perform the following tasks: a) Ensure that the signature and information of the filesystem is valid and correct. McKinlay [Page 14] b) Check that the block table contains no invalid entries, such as allocated blocks that are not part of any chains, chains which loop back on themseleves, cross-linked chains, and broken chains. c) Ensuring that the object table is not corrupt, and also does not contain any invalid entries. 6.2 Defragmenters Defragmenters simply reorganise the blocks so that objects are stored contiguously; preferably at the start of the filesystem, leaving the free space at the end. After heavy use on a filesystem, the block chains can become very confusing, with chains 'jumping' all over the filesystem. A defragmenter simply re-organises the chains so that the block numbers are in sequence, and can even go so far as moving executable objects closer to the start of the filesystem, (meaning they are usually accessed quicker). 7. Security considerations This memo does not address any of the security issues that would arise when implementing a filesystem; that is, at this stage, left to the operating-system developer. This may change in the future. 8. The virtual OFS The virtual OFS is simply a complete OFS filesystem, stored within a file on another filesystem, and could even be mounted as a virtual disk drive within the operating system. This has advantages in that it is possible to create a 'hybrid', that is, a mix between an OFS, and non-OFS filesystem. 9. Author's address Mo McKinlay Cumulus Data Systems (UK) Limited St. Albans Road Stafford Staffordshire England ST16 3DS Telephone: +44 (0) 1785 236416 Fax: +44 (0) 1785 249339 EMail: cirrus@io.soc.staffs.ac.uk INTERNET DRAFT EXPIRES JULY 1998 INTERNET DRAFT