INTERNET DRAFT		EXPIRES JULY 1998		INTERNET DRAFT

Network Working Group                                        M. McKinlay
Internet Draft		                   Cumulus Data Systems (UK) Ltd
Category: Infomational                                      January 1997


    Proposal for the object-oriented, cross-platform filesystem (OFS)
  		<draft-rfced-info-mckinlay-00.txt>

Status of This Memo

This document is an Internet-Draft.  Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups.  Note that other groups may also
distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time.  It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."

To learn the current status of any Internet-Draft, please check
the "1id-abstracts.txt" listing contained in the Internet-
Drafts Shadow Directories on ftp.is.co.za (Africa),
ftp.nordu.net (Europe), munnari.oz.au (Pacific Rim),
ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).

Distribution of this document is unlimited.

Table of Contents

   1.       Introduction . . . . . . . . . . . . . . . . . . . . . . . 2 
   2.       Overview of the object-oriented filesystem . . . . . . . . 2
   3.       Structure of the sub-filesystem (SFS) layer  . . . . . . . 3
   3.1        Filesystem identification  . . . . . . . . . . . . . . . 3
   3.2        Filesystem information . . . . . . . . . . . . . . . . . 4
   3.3        Bootstrap code . . . . . . . . . . . . . . . . . . . . . 4
   3.4        How objects are allocated using the SFS layer  . . . . . 5
   3.5        Example  . . . . . . . . . . . . . . . . . . . . . . . . 6
   3.6        Note . . . . . . . . . . . . . . . . . . . . . . . . . . 7
   4.       Object storage layer (OSL) structure . . . . . . . . . . . 8
   4.1        ID . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
   4.2        Parent . . . . . . . . . . . . . . . . . . . . . . . . . 9
   4.3        Owner  . . . . . . . . . . . . . . . . . . . . . . . . . 9
   4.4        Group  . . . . . . . . . . . . . . . . . . . . . . . . . 9
   4.5        Start  . . . . . . . . . . . . . . . . . . . . . . . . . 9
   4.6        Flags  . . . . . . . . . . . . . . . . . . . . . . . .  10
   4.6.1        Object type  . . . . . . . . . . . . . . . . . . . .  10
   4.6.1.1        Normal object  . . . . . . . . . . . . . . . . . .  10
   4.6.1.2        Directory object . . . . . . . . . . . . . . . . .  11
   4.6.1.3        Block and character devices  . . . . . . . . . . .  11
   4.6.1.4        Link . . . . . . . . . . . . . . . . . . . . . . .  11
   4.6.1.5        User and group . . . . . . . . . . . . . . . . . .  12
   4.6.1.6        Class  . . . . . . . . . . . . . . . . . . . . . .  12
   4.6.2        Access modifiers . . . . . . . . . . . . . . . . . .  12
   4.7        Name . . . . . . . . . . . . . . . . . . . . . . . . .  13
   4.8        How the object table expands and shrinks . . . . . . .  14
   5.       Compatibility and cross-platform issues  . . . . . . . .  14
   6.       Filesystem maintenance tools . . . . . . . . . . . . . .  14
   6.1        Filesystem check . . . . . . . . . . . . . . . . . . .  14
   6.2        Defragmenter . . . . . . . . . . . . . . . . . . . . .  15
   7.       Security considerations  . . . . . . . . . . . . . . . .  15
   8.       The virtual OFS  . . . . . . . . . . . . . . . . . . . .  15
   9.       Author's address . . . . . . . . . . . . . . . . . . . .  15

McKinlay                                                        [Page 1]

                Proposal for the object-oriented,        January 1998
                    cross-platform filesystem (OFS)

1. Introduction

   This memo is being distributed to members of the Internet community in
   order to solicit their reactions to the proposals contained in it.
   While the issues discussed may not be directly relevant to the
   research problems of the Internet, they may be interesting to a
   number of researchers and implementers.
   

   This memo attempts to lay out the structure and programming
   methodology required to implement the object-oriented filesystem
   (OFS) on a range of differing computing platforms.
   
   Currently, there are many different methods of depositing objects
   within computer storage media, on a variety of computing platforms.
   Each has it's own advantages and disadvantages, but it is very rare
   to find a single type of filesystem used on a range of different
   platforms.

   Since the Internet and Intranets have recently gained in popularity,
   both in home, academic and corporate use, it has become far more the
   norm to transfer objects between different types of computer system.
   Because of this, the OFS, as defined by this proposal, was designed
   with cross-platform compatibility issues in mind.

   This document covers the following areas:

   o   Overview of the object-oriented filesystem

   o   Structure of the sub-filesystem (SFS) layer

   o   Object storage layer (OSL) structure

   o   Compatibility and cross-platform issues

   o   Filesystem maintenance

   o   Security Considerations

   o   The Virtual OFS
  

2. Overview of the object-oriented filesystem

   The object-oriented filesystem (OFS) is fundamentally split into two
   sections; the sub-filesystem (SFS) layer and the object storage
   layer (OSL), which is the area of the computer media where the
   actual objects are stored.


McKinlay                                                        [Page 2]


   Because the two layers do not overlap, it is not totally necessary
   for the two layers to be kept distinct. Having said this, it can be
   useful, for the sake of modularity and flexibility, to keep them
   separate.


   The SFS layer simply handles block allocation, filesystem
   identification and bootstrap code, therefore it is possible to
   implement a completely different type of filesystem, based on the
   sub-filesystem layer of the OFS.

   Similarly, it is possible to make use of the OSL, using a completely
   different sub-filesystem layer as the basis for the OFS.

   The inherent modularity allows for developer to be very flexible,
   whilst still retaining compatibility, despite customising the SFS
   layer to work with different platforms.


3. Structure of the sub-filesystem (SFS) layer

   The SFS layer, as briefly explained previously, is split into the
   following sections:

   o   Filesystem identification and information

   o   Block allocation

   o   Platform-specific bootstrap code


3.1 Filesystem identification

   The filesystem is identified by sixteen bytes, the first eight of
   which are ignored, that mark the filesystem as OFS-compatible. The
   identification block is located at the very beginning of the storage
   media.

   The first eight bytes should usually contain a platform-specific
   instruction to jump to the start of the bootstrap code and execute
   it. The eight bytes that succeed them contain the following:

          -------------------------------------------------      
   Byte   İ  8  İ  9  İ  A  İ  B  İ  C  İ  D  İ  E  İ  F  İ
          İ-----İ-----İ-----İ-----İ-----İ-----İ-----İ-----İ
   Value  İ 'M' İ 'i' İ 'm' İ 'a' İ 's' İ 'F' İ 'S' İ NUL İ
          -------------------------------------------------

   This is simply the string 'MimasFS' followed by a single null byte
   (ASCII code 00). This string must be located on the media starting
   at the ninth byte (byte 8), and ending at the sixteenth byte (byte
   15, or byte Fh).

McKinlay                                                        [Page 3]


3.2 Filesystem information

   The filesystem information table has the following structure:


   Byte  Contents                                 Size
   --------------------------------------------------------------------
   0F    Filesystem size (in 4kb blocks)          32 bits (4 bytes)

   13    Start of second layer (in 4kb blocks)    32 bits (4 bytes)

   24    Platform-specific data                   64 bits (8 bytes)
   --------------------------------------------------------------------

   The filesystem size (in blocks) is calculated by taking the size of
   the filesystem (in kilobytes) and dividing by four. For example, for
   a 2000kb filesystem, the filesystem size value would be calculated
   as:

   Filesystem size         =         2000 / 4
                           =          500 blocks

   Hence the value entered into the first field would be 500.

   The block number of the start of the second layer is calculated as:


           Size of the identification table

   plus the size of the information table
   
   plus the number of blocks multiplied by four

   plus the size of the platform-specific bootstrap code.
           

   This value is divided by 4096, and rounded up to the nearest whole
   number, this value is incremented by one (as blocks as numbered from
   1 not 0, as 0 indicates an unused block in the allocation table).

   The remaining 64 bits are left 'reserved' for platform-specific use
   that is not defined by this memo.


3.3 Bootstrap code

   The bootstrap code is entirely platform specific, and, when
   executed, should normally perform one of the following tasks:


   o   Display a message to the user that the media contains no
       operating system;

McKinlay                                                        [Page 4]


       or

   o   load the operating system from the media into the computer's
       memory.


3.4 How objects are allocated using the SFS layer

   This section describes how the sub-filesystem layer handles the
   allocation of blocks for object storage. Fundamentally, the block
   table stores the information relating to the 'chains' of blocks
   which comprise each object. As an example, the first few entries
   in an empty (excluding a 1-block SFS layer) block table would
   look like this:

   Entry number              Value
   ------------------------------------------
   00000001                  00000001
   00000002                  00000000
   00000003                  00000000
   00000004                  00000000
   00000005                  00000000
   :
   :
   nnnnnnnn                  00000000
   ------------------------------------------

   The first entry in the table, relating to the first block in the
   OSL, has it's value set to it's block number. This signifies that it
   it is the last block in an object chain. In this case, it is the
   only entry in the chain (it a special entry, that is marked as
   allocated, to prevent operating systems from overwriting the SFS
   layer). The remaining entries in the table have the value 00000000,
   indicating that they are unused.

   The number of blocks that are needed to store an object is
   calculated by taking the size of an object, in bytes, dividing by
   4096, and then rounding up to the nearest whole number.

   Each block is allocated by the operating system searching the
   block table for an unused entry (value 00000000). There are various
   methods of doing this, some dependant on the size of the object,
   but are not defined by this memo.

   Each entry in the block table normally contains the number of the
   next block in it's repective chain. Exceptions to this are if the
   block is that last in a chain, in which case it's value is it's
   block number, of if the block is unused, in which case it's value is
   00000000.

McKinlay                                                        [Page 5]


   No block may be a member of more than one chain, and a chain cannot
   contain the same block more than once. See section 6.1, Filesystem
   check, for more information on the rules governing blocks and chains.

3.5 Example

   As an example, let us say that we have a 6 kilobyte object. The
   number of blocks required to store this object would be 2
   (6 / 4 = 1.5, rounded up = 2).

   The operating system allocates two blocks. For this example, let us
   say that blocks 00000012 and 00000015 were allocated by the operating
   system for use by this object.

   After the appropriate modifications to the object table have been
   made, the resultant block table would look something similar to this:

   Entry number              Value
   ------------------------------------------
   00000001                  00000001
   :
   00000012                  00000015
   :
   00000015                  00000015
   :
   nnnnnnnn                  00000000
   ------------------------------------------

   The value in entry 00000012 indicates that the next block in the
   chain is block 00000015. The value in entry 00000015, is the same
   as it's entry number, indicating that it is the last block in the
   chain.

   Let us say that the object grows by 12 kilobytes (and hence three
   blocks). The operating system allocates three new blocks, numbers
   00000024, 00000032, 00000047. After the appropriate entries have
   been modified, the block table would look similar to this:

   Entry number              Value
   ------------------------------------------
   00000001                  00000001
   :
   00000012                  00000015
   :
   00000015                  00000024
   :
   00000024                  00000032
   :
   00000032                  00000047
   :
   00000047                  00000047
   :
   nnnnnnnn                  00000000
   ------------------------------------------


McKinlay                                                        [Page 6]


   Notice now that the chain has grown, and that block number 00000047
   is now the last block in the chain, where previously entry 00000015
   contained the value indicating that it was the last block in the
   chain. Entry 00000015 now contains the value of the next block in
   the chain, 000000024.

   So far, we have seen how to create an object, and allocate blocks for
   it's storage as it grows, but what about the mechanism used when an
   object shrinks, or is deleted.

   As an example, let us say that the object described above shrinks by
   one block. All that has to be done is to mark the penultimate block
   as the last block, and mark what was the last block as unused. After
   doing this, the entries would look like this:

   Entry number              Value
   ------------------------------------------
   00000001                  00000001
   :
   00000012                  00000015
   :
   00000015                  00000024
   :
   00000024                  00000032
   :
   00000032                  00000032
   :
   00000047                  00000000
   :
   nnnnnnnn                  00000000
   ------------------------------------------

   Notice that block 00000032 is now the last block in the chain, and
   that block 00000047 is unused.

   To completely delete an object, the operating system should simply
   mark all of the blocks in a given object's chain as unused (i.e value
   0000000).


3.6  Note

Please note that:

   The SFS layer does not handle the following aspects of the object
   storage mechanism:

   o   The actual size of the object (in bytes, not the number of blocks
       used to store it).

   o   Locating the start of each object chain.


McKinlay                                                        [Page 7]


   o   The hierachical filesystem structure.

   These aspects are handled by the object storage layer, as described
   below.


4.  Object storage layer (OSL) structure

   The object storage layer works in tandem with the SFS layer to
   provide a means of locating files by name and to provide a structure
   to the filesystem.
   
   The OSL is based around a single object, the object storage table
   (OST), the size of which can grow or shrink depending on the number
   of objects in the table. The means by which the table can change
   size will be described later on in this memo.


   Each record in the OST has the following structure:


   Field     Contents                             Size
   --------------------------------------------------------------------
   ID        A unique value identifying the       32 bits
             object.

   Parent    The ID of the object's parent.       32 bits
 
   Owner     The user ID of the object's owner.   32 bits

   Group     The group ID of the object's group.  32 bits

   Start     The block number of the start of the 32 bits
             object's block chain.

   Flags     Various attributes that describe an  12 bits
             object's type, visibility, and so
             on.

   Name      The name of the object.              256 bytes
   --------------------------------------------------------------------

   Each of the fields in the record strucure will be discussed below:


4.1  ID

   The ID field is simply a unique value, assigned by the operating
   system, which identifies the object within the filesystem. The
   developer may choose any method to generate the ID, providing that:

   a)   The value is greater than zero

McKinlay                                                        [Page 8]


   b)   The value fits into a 32-bit unsigned word

   If the value is zero, or already assigned to an object, then the
   entry is invalidated.

4.2  Parent

   The parent field contains the ID of the parent object, i.e. the
   object that this object is contained within. If the object does
   not have a parent (i.e. it is situated at root level), then the value
   of this field should be zero.


4.3  Owner

   The value of this field can be one of the following, depending on how
   the operating system handles ownership:

   a)   The ID of the user that owns the object (the ID is assigned by
        the operating system).

   b)   The ID of the user object, that represents the user that owns
        the object (see section 4.6.1.5).

   The value of 0 is taken to mean the operating system's root
   (Administrator) user, which has full control over all objects in the
   filesystem, regardless of owner and flags settings.


4.4  Group

   The group field works in exactly the same way as the owner field,
   except that it specifies the ID of a group of users, instead of a
   single user. The flags may be set so that all members of this
   group have full access to this object. This is usually the primary
   group that the user specified by the user field is a member of.


4.5  Start

   This specifies the first block (relative to the start of the
   filesystem, not the OSL), in the chain of blocks that make up this
   object.

   A value of zero indicates that this object is 'empty'.

   This field cannot point to a block which is:

   a)   Unused

   b)   Part of a different chain


McKinlay                                                        [Page 9]


4.6  Flags

   The flags field specifies various attributes which may or may not
   be set on an object, and are used to describe an object's core type,
   and who can access the object in different ways.

   Bits  Contents
   --------------------------------------------------------------------
   0-2   Object type                              

   3-5   World access modifier                    

   6-8   Group access modifier                    

   9-11  Owner access modifier                    
   --------------------------------------------------------------------

4.6.1  Object type

   Binary  Decimal        Meaning            
   --------------------------------------------------------------------
   000       00           Normal object           

   001       01           Directory               

   010       02           Block device            

   011       03           Character device

   100       04           Link

   101       05           User

   110       06           Group

   111       07           Class
   --------------------------------------------------------------------


4.6.1.1  Normal object

   A normal object is simply an object that stores data, and optionally
   other objects. It may only store one type of data, but can act as an
   'index' for the objects it contains, specifying positioning
   information, and so on. The index may be in any format, for example,
   HTML, but every object starts with a header, followed by a single
   blank line, then the object's data itself.

   The header is made of any number of fields (most of which are
   optional). Each field is written by specifiying the field name,
   a colon, ':', a space (ASCII 32), the value of the field, and then
   a carriage return.

McKinlay                                                       [Page 10]


   The 'Content-type' field must always be present, to identify the
   format of the object's content. Any other fields are operating system
   and application specific, and any 'unknown' fields should be ignored.

   Also, the 'Class-type' field must also be present, and this specifies
   the object ID of the class that defines that object.

   When an object is read by the operating system, it's fields should be
   skipped, and accessed via a separate interface.

   The 'Content-type' field should specify the type as a MIME
   (Multi-purpose Internet Mail Extensions) type, as used by the HTTP
   protocol for internet and intranet page transmission, in the form
   of web pages.


4.6.1.2  Directory object

   A directory object optionally contains other objects, but has no
   data of it's own. Directory objects work in the same way as folders
   on the Apple Macintosh platform and directories under DOS and UNIX.


4.6.1.3  Block and character devices

   The two device object types represent I/O devices. The actual
   information relating to device types, driver parameters, and so on,
   is operating system specific, and is stored in the data associated
   with the object.

   The developer may decide to only allow devices to be stored in a
   certain location within the filesystem (i.e., within an object called
   'Devices'). This is entirely operating system specific and so not
   covered by this memo.


4.6.1.4  Link

   A link object should work in the same way that a symbolic link should
   work under UNIX, a shortcut works under Windows 95 or an alias works
   on the Macintosh.

   Basically, a link acts as a 'shadow' of an object - some operations
   performed on the link affect the target of the link, whereas some
   operations only affect the link itself.

   The data stored with the link should simply be a header field
   specifying the ID of the target object. Any other data should be
   ignored, and is operating system specific.

   The following table describes which operations affect the link object
   itself, and which affect the target:

McKinlay                                                       [Page 11]


   Operation                Affects
   ------------------------------------------------------------------
   Read                     Target

   Write                    Target

   Execute                  Target

   Delete                   Link

   Rename                   Link

   Copy                     Can be either (defaults to the target)

   Move                     Link
   ------------------------------------------------------------------


4.6.1.5  User and group

   The user and group object types define users and groups of users
   respectively. The data associated with these objects is operating
   system specific.

   The developer may decide to only allow the user and group objects
   to be stored within certain other objects (e.g. a top-level object
   called 'Users'), and prevent other objects from being stored within
   them. This is entirely operating system specific, and is to be
   decided by the developer.

4.6.1.6  Class

   Class objects define the different classes of objects available
   within a system. Classes should support calling of methods,
   setting properties, and inheritance.

   Basically, the other objects stored within the filesystem are all
   instances of the various class objects. How each class object reads
   the data associated within each object, and how classes themselves
   are handled is to be decided by the operating system developer and
   is not covered by this memo.


4.6.2   Access modifiers

   Access modifiers specify what access different users (or groups of
   users) have to an object.


McKinlay                                                       [Page 12]


   The 'owner' access modifier specifies what access the owner of the
   object has to it. (Read, write, execute, or a combination of the
   three). The owner and operating system defined 'root' user can change
   the access modifiers set on an object (whatever they may be currently
   set to). For example, if a user accidentally sets an object so that
   nobody has access, although they can't read, write or execute the
   object, they can still change the access modifiers to give themselves
   access.


   The 'group' access modifier specify the access that users, other than
   the owner of the object, in the same group as the owner, have to an
   object. This is typically set so that members of the group have
   execute and read access, but not write access.

   The 'world' access modifier covers anybody else not covered by the
   previous two modifiers; i.e. users who aren't in the same group as
   the owner.

   The values of each access modifier are shown below, and are normally
   written in octal, owner acess first, group second, world last. For
   example, 750 would mean the owner had an access modifier of 7, the
   group would have an access modifier of 5, and everybody else would
   have an access modifier of 0. The meanings of these values will be
   explained by the table below:

   Binary       Decimal        Meaning
   ------------------------------------------------------------------
   000            00           No access
   001            01           Execute access
   010            02           Write access
   011            03           Execute and write access
   100            04           Read access
   101            05           Read and execute access
   110            06           Read and write access
   111            07           Read, write and execute access (full)
   ------------------------------------------------------------------


4.7  Name

   The name field stores the name of each object. The name of each
   object may be as long as 256 characters long, and must be padded
   with null (ASCII 00) characters. The name field may not contain
   the colon, ':', or forward-slash, '/', characters.


McKinlay                                                       [Page 13]


4.8  How the object table expands and shrinks

   The object table expands and shrinks in exactly the same way as any
   other object stored within the filesystem would. Using the
   block-allocation method, as described earlier (section 3), the
   object table is itself stored within the filesystem. It is up to the
   developer as to whether the object table keeps an entry for itself.

   The object-table chain always starts at the block succeeding the last
   block in the boot-block chain.


5.  Compatibility and cross-platform issues

   There are several areas of this memo which deliberately leave
   sections to the developer's own initiative. These are mainly due to
   the fact that certain things (such as device drivers) are handled in
   completely different ways by different operating systems. Similarly,
   the bootstrap code, because it is executable ode, is by definition
   platform-specific.

   Certain other parts of the specification have been deliberately
   optional, such as the user and group objects, which also may be
   restricted in certain ways. These types of object, along with the
   device drivers would not usually be copied between filesystems,
   except under special circumstances.


   Despite the large amounts of flexibility, it is still viable for
   objects to be shared easily between different platforms, without the
   need for cumbersome FTP to a computer that is within arm's reach.


6. Filesystem maintenance tools

   There are two types of filesystem maintenance tool that would
   normally be used on an OFS filesystem:

   o   Filesystem checking

   o   Defragmenters


6.1  Filesystem check

   The filesystem check should perform the following tasks:

   a)   Ensure that the signature and information of the filesystem is
        valid and correct.


McKinlay                                                       [Page 14]


   b)   Check that the block table contains no invalid entries, such as
        allocated blocks that are not part of any chains, chains which
        loop back on themseleves, cross-linked chains, and broken
        chains.

   c)   Ensuring that the object table is not corrupt, and also does not
        contain any invalid entries.


6.2  Defragmenters

   Defragmenters simply reorganise the blocks so that objects are stored
   contiguously; preferably at the start of the filesystem, leaving the
   free space at the end.

   After heavy use on a filesystem, the block chains can become very
   confusing, with chains 'jumping' all over the filesystem. A
   defragmenter simply re-organises the chains so that the block numbers
   are in sequence, and can even go so far as moving executable objects
   closer to the start of the filesystem, (meaning they are usually
   accessed quicker).


7.  Security considerations


   This memo does not address any of the security issues that would
   arise when implementing a filesystem; that is, at this stage, left to
   the operating-system developer. This may change in the future.


8.  The virtual OFS

   The virtual OFS is simply a complete OFS filesystem, stored within a
   file on another filesystem, and could even be mounted as a virtual
   disk drive within the operating system. This has advantages in that
   it is possible to create a 'hybrid', that is, a mix between an OFS,
   and non-OFS filesystem.

9.  Author's address

   Mo McKinlay
   Cumulus Data Systems (UK) Limited
   St. Albans Road
   Stafford
   Staffordshire
   England ST16 3DS

   Telephone:    +44 (0) 1785 236416
   Fax:          +44 (0) 1785 249339
   EMail:        cirrus@io.soc.staffs.ac.uk


INTERNET DRAFT		EXPIRES JULY 1998		INTERNET DRAFT