IETF media feature registration WG Graham Klyne Internet draft Content Technologies/5GM 8 July 1998 Expires: January 1999 An algebra for describing media feature sets Status of this memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress''. To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Copyright (C) 1998, The Internet Society Abstract A number of Internet application protocols have a need to provide content negotiation for the resources with which they interact [1]. A framework for such negotiation is described in [2]. Part of this framework is a way to describe the range of media features which can be handled by the sender, recipient or document transmission format of a message. A format for a vocabulary of individual media features and procedures for registering media features are presented in [3]. This document describes an algebra and syntax that can be used to define feature sets which are formed from combinations and relations involving individual media features. Such feature sets are used to describe the media feature handling capabilities of message senders, recipients and file formats. Klyne [Page 1] Internet draft 8 July 1998 An algebra for describing media feature sets This document also outlines an algorithm for feature set matching. (In its present form, it needs considerable refinement, but does indicates the availability of a deterministic solution.) Table of contents 1. Introduction.............................................3 1.1 Structure of this document ...........................4 1.2 Discussion of this document ..........................4 1.3 Amendment history ....................................4 1.4 Unfinished business ..................................5 2. Terminology and definitions..............................5 3. Media feature values.....................................6 3.1 Complexity of feature algebra ........................6 3.2 Sufficiency of simple types ..........................7 3.2.1 Unstructured data types..........................7 3.2.2 Cartesian product................................8 3.2.3 Discriminated union..............................8 3.2.4 Array............................................8 3.2.5 Powerset.........................................9 3.2.6 Sequence.........................................9 4. Feature set predicates...................................9 4.1 An algebra for data file format selection ............10 4.1.1 Describing feature sets..........................11 4.1.1.1 Feature value ranges 11 4.1.1.2 Feature value combinations 12 4.1.1.3 Using meta-features to group features 13 4.1.2 Content, sender and recipient capabilities.......14 4.2 Conclusion and proposal ..............................14 5. Indicating preferences...................................15 5.1 Combining preferences ................................15 5.2 Representing preferences .............................15 6. Feature set representation...............................17 6.1 Textual representation of predicates .................18 6.2 Arithmetic expressions ...............................19 6.3 Named and auxiliary predicates .......................20 6.3.1 Defining a named predicate.......................20 6.3.2 Invoking named predicates........................21 6.3.3 Auxiliary predicates in a filter.................21 6.4 Feature set definition examples ......................21 6.4.1 Single predicate.................................21 6.4.2 Predicate with auxiliary predicate...............22 6.5 ASN.1 representation .................................22 7. Content negotiation protocol processing..................23 7.1 Matching feature sets ................................23 7.1.1 Feature set matching strategy....................25 7.1.2 Formulating the problem..........................26 7.1.3 Simplifying primitive predicates.................26 7.1.3.1 Primitive predicate properties 27 Klyne [Page 2] Internet draft 8 July 1998 An algebra for describing media feature sets 7.1.4 Conversion to canonical form.....................29 7.1.5 Grouping of feature predicates...................31 7.1.6 Remove simultaneous equality constraints.........32 7.1.7 Merge single-feature constraints.................34 7.1.8 Test simultaneous feature constraints............34 7.1.8.1 Convex sets 34 7.1.8.2 Testing for satisfiability 36 7.2 Effect of named predicates ...........................37 7.3 Overlapping features .................................37 7.4 Unknown feature value data types .....................38 7.5 Worked example .......................................38 7.6 Algorithm source code ................................38 8. Security considerations..................................39 9. Copyright................................................40 10. Acknowledgements........................................40 11. References..............................................41 12. Author's address........................................43 1. Introduction A number of Internet application protocols have a need to provide content negotiation for the resources with which they interact [1]. A framework for such negotiation is described in [2]. A part of this framework is a way to describe the range of media features which can be handled by the sender, recipient or document transmission format of a message. Descriptions of media feature capabilities need to be based upon some underlying vocabulary of individual media features. A format for such a vocabulary and procedures for registering media features are presented in [3]. This document defines an algebra which can be used to describe feature sets which are formed from combinations and relations involving individual media features. Such feature sets are used to describe the media handling capabilities of message senders, recipients and file formats. This document also outlines a syntax for describing feature sets, and an algorithm for feature set matching. The feature set algebra is built around the principle of using feature set predicates as "mathematical relations" which define constraints on feature handling capabilities. The idea is that the same form of feature set expression can be used to describe sender, receiver and file format capabilities. This has been loosely modelled on the way that the Prolog programming language uses Horn Clauses to describe a set of result values. Klyne [Page 3] Internet draft 8 July 1998 An algebra for describing media feature sets In developing the algebra, examples are given using notation drawn from the Prolog programming language. Later, a syntax for expressing feature predicates is suggested, based on LDAP search filters. Finally, an algorithm for feature set matching is described. 1.1 Structure of this document The main part of this draft addresses the following main areas: Section 2 introduces and references some terms which are used with special meaning. Section 3 discusses constraints on the data types allowed for individual media feature values. Section 4 introduces and describes the algebra used to construct feature set descriptions with expressions containing media features. The first part of this section contains a development of the ideas, and the second part contains the conclusions and proposed algebra. Section 5 introduces and describes extensions to the algebra for indicating preferences between different feature sets. Section 6 contains a description of recommended representations for describing feature sets based on the previously-described algebra. 1.2 Discussion of this document Discussion of this document should take place on the content negotiation and media feature registration mailing list hosted by the Internet Mail Consortium (IMC): Please send comments regarding this document to: ietf-medfree@imc.org To subscribe to this list, send a message with the body 'subscribe' to "ietf-medfree-request@imc.org". To see what has gone on before you subscribed, please see the mailing list archive at: http://www.imc.org/ietf-medfree/ 1.3 Amendment history 00a 11-Mar-1998 Document initially created. Klyne [Page 4] Internet draft 8 July 1998 An algebra for describing media feature sets 01a 05-May-1998 Mainly-editorial revision of sections describing the feature types and algebra. Added section on indicating preferences. Added section describing feature predicate syntax. Added to security considerations (based on fax negotiation scenarios draft). 01b 25-Jun-1998 New Internet draft boilerplate in 'status' preface. Review and rationalization of sections on feature combinations. Added numeric expressions, named predicates and auxiliary predicates as options in the syntax. Added examples of text string predicate representation. 02a 08-Jul-1998 Added chapter on protocol processing considerations, and in particular outlined an algorithm for feature set matching. Added restrictions to the form of arithmetic expression to allow deterministic feature set matching. 1.4 Unfinished business . Array values: are they needed? (section 3.2.4) . Matching feature sets (section 7.1). As described, the algorithm is rather clunky and ad-hoc, and would bear considerable refinement. There is also much work to tidy it up, as noted in comments throughout that section. . Discuss determination of qvalues in the feature set matching algorithm. . Use of unknown data types for feature values (section 7.3) . Add source code for feature matching implementation. . Should ASN.1 representation be pursued? If so, should it be aligned with LDAP filter representation? (section 6.5) 2. Terminology and definitions Feature Collection is a collection of different media features and associated values. This might be viewed as describing a specific rendering of a specific instance of a document or resource by a specific recipient. Klyne [Page 5] Internet draft 8 July 1998 An algebra for describing media feature sets Feature Set is a set of zero, one or more feature collections. Feature set predicate A function of an arbitrary feature collection value which returns a Boolean result. A TRUE result is taken to mean that the corresponding feature collection belongs to some set of media feature handling capabilities defined by this predicate. Other terms used in this draft are defined in [2]. 3. Media feature values This document assumes that individual media feature values are simple atomic values: . Boolean values . Enumerated values . Text string values (treated as atomic entities, like enumerated value tokens). . Numeric values More complex media feature values might be accommodated, but they would (a) be undesirable because they would complicate the algebra, and (b) are not necessary. These statements are justified in the following sub-sections. NOTE The following sub-sections are not part of the algebraic framework description, and may be skipped by readers not concerned with design rationale. 3.1 Complexity of feature algebra Statement (a) above is justified as follows: predicates constructed as expressions containing media feature values must ultimately resolve to a logical combination of feature value tests. A full range of simple tests for all of the data types listed above can be performed based on just two fundamental operations: equality and less-than. All other meaningful tests can be constructed as predicates incorporating these two basic tests. Klyne [Page 6] Internet draft 8 July 1998 An algebra for describing media feature sets For example: ( a != b ) iff !( a == b ) ( a <= b ) iff !( b < a ) ( a > b ) iff ( b < a ) ( a >= b ) iff !( a < b ) If additional (composite) data types are introduced, then additional operators must be introduced to test their component parts: the addition of just one further comparison operator increases the number of such operators by 50%. 3.2 Sufficiency of simple types To justify statement (b), let us first review the range of composite data types that might reasonably be considered. In 1972, a paper "Notes on data structuring" by C. A. R. Hoare was published in the book "Structured Programming" [4]. This was an early formalization of data types used in programming languages, and its content has formed a sufficient basis for describing the data types in almost every programming language which has been developed. This gives good grounds to believe that the type framework is also sufficient for media features. The data types covered by Hoare's paper are: . Unstructured data types: (integer, real, enumeration, ordered enumeration, sub-ranges). . Cartesian product (e.g. C 'struct'). . Discriminated union (e.g. C 'union'). . Array. . Powerset (e.g. Pascal 'SET OF'). . Sequence (e.g. C string, Pascal 'FILE OF'). To demonstrate sufficiency of simple types for media features we must show that the feature-set defining properties of these composite types can be captured using predicates on the simple types described previously. 3.2.1 Unstructured data types The unstructured data types noted correspond closely to, and can be represented by the proposed simple value types for media features. Klyne [Page 7] Internet draft 8 July 1998 An algebra for describing media feature sets 3.2.2 Cartesian product A Cartesian product value (e.g. resolution=[x,y]) is easily captured as a collection of two or more separately named media features (e.g. x-resolution=x, y-resolution=y). 3.2.3 Discriminated union A discriminated union value is an either/or type of choice. For example, a given workstation might be able to display 16K colours at 1024x768 resolution, OR 256 colours at 1280x1024 resolution. These possibilities are captured by a logical-OR of predicates: ( ( x-resolution <= 1024 ) && ( y-resolution <= 768 ) && ( colours <= 16384 ) ) || ( ( x-resolution <= 1280 ) && ( y-resolution <= 1024 ) && ( colours <= 256 ) ) 3.2.4 Array An array represents a mapping from one data type to another. For example, the availability of pens in a pen plotter might be represented by an array which maps a pen number to a colour. If the array index which forms the basis for defining a feature set is assumed to be a constant, then each member can be designated by a feature name which incorporates the index value. For example: Pen-1=black, pen-2=red, etc. Another example where an array might describe a media feature is a colour palette: an array is used to associate a colour value (in terms of RGB or some other colour model) with a colour index value. In this case is possible to envisage a requirement for a particular colour to be loaded in the palette without any knowledge of the index which maps to it. In this case, the colour might be treated as a named Boolean attribute: if TRUE then that colour is deemed to be available in the palette Feature selection based on a variable array index is more difficult, but it is believed that this is not required for media selection. [[I cannot think of any example of feature selection which involves a variable index into an array. If such a feature is presented, an array type could be added to the set of allowable media feature types, and an array selection operator added to the algebra.]] Klyne [Page 8] Internet draft 8 July 1998 An algebra for describing media feature sets 3.2.5 Powerset A powerset is a collection of zero, one or more values from some base set of values. A colour palette may be viewed as a powerset of colour values, or the fonts available in a printer as a powerset of all available fonts. A powerset is very easily represented by a separate Boolean-valued feature for each member of the base set. The value TRUE indicates that the corresponding value is a member of the powerset value. 3.2.6 Sequence A sequence is a list of values from some base set of values, which are accessed sequentially. A sequence can be modelled by an array if one assumes integer index values starting at (say) 1 and incrementing by 1 for each successive element of the sequence. Thus, the considerations described above relating to array values can be considered as also applying (in part) to sequence values. That is, if arrays are deemed to be adequately handled, then sequence values too can be handled. 4. Feature set predicates A model for data file selection is proposed, based on relational set definition and subset selection, using elements of the Prolog programming language [5] as a descriptive notation for this purpose. NOTE The use of Prolog as a syntax for feature description is NOT being proposed; rather, the Prolog-like notation is used to develop the semantics of an algebra. These semantics could be mapped to any convenient syntax. For the purposes of developing this algebra, examples are drawn from the media features described in "Media Features for Display, Print, and Fax" [6], which in summary are: pix-x=n (Image size, in pixels) pix-y=m res-x=n (Image resolution, pixels per inch) res-y=m Klyne [Page 9] Internet draft 8 July 1998 An algebra for describing media feature sets UA-media= screen|stationary|transparency|envelope| continuous-long papersize= na-letter|iso-A4|iso-B4|iso-A3|na-legal color=n (Colour depth in bits) grey=n (Grey scale depth in bits) 4.1 An algebra for data file format selection The basic idea proposed here is that a feature capability of the original content, sender, data file format or recipient is represented as a predicate applied to a collection of feature values. Under universal quantification (i.e. selecting all possible values that satisfy it), a predicate indicates a range of possible combinations of feature values). This idea is inherent in Prolog clause notation, which is used in the example below to describe a predicate 'acceptable_file_format(File)', which yields a set of possible file transfer formats, using other predicates which indicate the file formats available to the sender and feature capabilities of the file format, original content: acceptable_file_format(File) :- sender_available_file_format(File), match_format(File). (Read this statement as: 'File' is an acceptable file format IF it is a format available to the sender AND it matches the requirements defined by 'match_format'.) match_format(File) :- pix_x(File,Px), content_pix_x(Px), recipient_pix_x(Px), pix_y(File,Py), content_pix_x(Py), recipient_pix_y(Py), res_x(File,Rx), content_res_x(Rx), recipient_res_x(Rx), res_y(File,Ry), content_res_y(Ry), recipient_res_y(Ry), colour(File,C), content_colour(C), recipient_colour(C), grey(File,G), content_grey(G), recipient_grey(G), ua_media(File,M), content_ua_media(M), recipient_ua_media(M), papersize(File,P), content_papersize(P), recipient_papersize(P). (Read this as: 'File' satisfies the constraints of 'match_format' IF there is some 'pix_x' feature value 'Px' that is supported by the file format AND is suitable for representing the message content to be sent AND can be displayed by the message recipient, AND there is some 'pix_y' feature ...) Klyne [Page 10] Internet draft 8 July 1998 An algebra for describing media feature sets Essentially, 'acceptable_file_format' selects a set of file transfer formats from those available (as defined by 'sender_available_file_format'), choosing any whose feature capabilities have a non-empty intersection with the feature capabilities of the original content and the recipient. 4.1.1 Describing feature sets The above framework suggests file format capabilities are described as enumerated feature sets. As an abstract theory, this works fine but for practical use it has a couple of problems: (a) description of features with a large number of possibilities (b) describing features which are supported in specific combinations A typical case of (a) would be where a feature (e.g. size of image in pixels) can take any value from a range. To present and test each value separately is not a practical proposition, even if it were possible. (A guide here as to what constitutes a practical approach is to make a judgement about the feasibility of writing the corresponding Prolog program.) A typical case of (b) would be where different values for certain features can occur only in combinations (e.g. allowable combinations of resolution and colour depth on a given video display). If the features are treated independently as suggested by the framework above, all possible combinations would be allowed, rather than the specifically allowable combinations. 4.1.1.1 Feature value ranges The first issue can be addressed by considering the type of value which can represent the allowed features of a data file format. The features of a specific data file are represented as values from an enumeration (e.g. ua_media, papersize), or a numeric values (integer or rational). The description of allowable file format feature needs to represent all the allowable values. The Prolog clauses used above to describe file format features already allow for multiple enumerated values. Each acts as a mathematical relation to select a subset of the set of file values allowed by the preceding predicates. Section 3 of this document describes proposed media feature value types. Klyne [Page 11] Internet draft 8 July 1998 An algebra for describing media feature sets For numeric feature values, a sequence of two numbers to represent a closed interval is suggested, where either value may be replaced by an empty list to indicate no limiting value. Thus: [m,n] => { x : m <= x <= n } [m,[]] => { x : m <= x } [[],n] => { x : x <= n } The following Prolog could be used to describe such range matching: feature_match(X,[[],[]]). feature_match(X,[L,[]]) :- L <= X. feature_match(X,[[],H]) :- X <= H. feature_match(X,[L,H]) :- L <= X, X <= H. feature_match(X,X). (This example stretches standard Prolog, which does not support non-integer numbers. The final clause allows 'feature_match' to deal with equality matching for the normal enumerated and text string value case.) Similar constructs might be used with enumeration-valued and string-values features for which an ordering relationship is defined. 4.1.1.2 Feature value combinations The approach to representing allowed combinations of feature values presented here is to use additional predicates to describe relationship constraints between them. For example, consider a display capable of handling any x- and y- resolution between 72dpi and 600dpi. This might be represented by the constraint clauses: feature_match(Rx,[72,600]), feature_match(Ry,[72,600]) If x- and y- resolutions are to be further constrained to square or semi-square aspect-ratios, the following additional predicate clause might be added to the feature set description: ( feature_match(Rx,Ry) ; feature_match(Rx,2*Ry) ; feature_match(2*Rx,Ry) ) Klyne [Page 12] Internet draft 8 July 1998 An algebra for describing media feature sets (Read this clause as: (Rx = Ry) OR (Rx = 2*Ry) OR (2*Rx = Ry).) Another example might be a display which supports 640x480, 800x600 and 1024x768 image pixel sizes: ( ( feature_match(Px,640), feature_match(Py,480) ) ; ( feature_match(Px,600), feature_match(Py,800) ) ; ( feature_match(Px,1024), feature_match(Py,768) ) ) This is based on the predicates 'pix_x(File,Px)', 'pix_y(File,Py)', 'res_x(File,Rx)' and 'res_y(File,Ry)' from the initial framework above.) 4.1.1.3 Using meta-features to group features The expression of feature combinations can sometimes be simplified by introducing "meta-features", or auxiliary predicates, to describe relationship constraints between feature values. Developing the previous examples, given: pix_x(File,Px), pix_y(File,Py), res_x(File,Rx), res_y(File,Ry), We can define meta-features 'pix' and 'res': pix(File,[Px,Py]) :- pix_x(File,Px), pix_y(File,Py). res(File,[Rx,Ry]) :- res_x(File,Rx), res_y(File,Ry). Then the additional constraints: pix(File,[640, 480]). pix(File,[800, 600]). pix(File,[1024,768]). serve to define three allowable pixel image sizes, and: res(File,[Rx,Ry]) :- feature_match(Rx,[72,600]), feature_match(Ry,[72,600]), ( feature_match(Rx,Ry) ; feature_match(Rx,2*Ry) ; feature_match(2*Rx,Ry) ). serves to represent the allowable resolutions described in the previous section. Klyne [Page 13] Internet draft 8 July 1998 An algebra for describing media feature sets 4.1.2 Content, sender and recipient capabilities It has been shown that feature set predicates can be used to describe the capabilities of a particular content file format, and also to represent constraints of feature value combinations. This use of feature predicates is equally applicable to describing the feature handling capabilities of senders, recipients and other elements in a message transmission. 4.2 Conclusion and proposal The previous sections show that file format capabilities can be described by feature set predicates: arbitrary logical expressions using AND, OR, NOT logical combining operators, and media feature value matching. Data file features, original content features, sender features and recipient features (and user features) can all be represented in this way. A key insight, which points to this conclusion, is that a collection of feature values can be viewed as describing a specific representation of the content of a specific document (for example, when rendered by a given recipient). The capabilities that we wish to describe, be they sender, file format, recipient or other capabilities, are sets of such feature collections, with the potential to be displayed using any of the feature value collections in the set. This raises a terminology problem, because the term "feature set" can be used to mean a collection of specific feature values and a range of possible feature values. Thus the more restricted definitions of "feature collection" and "feature set" which appear in the terminology section of this document. Original content, data files and recipients (and users) all embody the potential capability to deal with a "feature set". One of the aims of content negotiation is to select an available data file format (availability being circumscribed by the original content and sender capabilities) whose feature set intersection with the recipient feature set is non-empty. (The further issue of preference being deferred for later consideration.) The concept of a mathematical relation as a subset defined by a predicate can be used to define feature sets, using universal quantification (i.e. using the predicate to select from some notional universe of all possible feature collections). Klyne [Page 14] Internet draft 8 July 1998 An algebra for describing media feature sets Thus, a common framework of predicates can be used to represent the feature capabilities of original content, data file formats, recipients and any other participating entity which may impose constraints on the usable feature sets. Within this framework, it is sufficient to represent individual feature values as Booleans, enumerated values, text strings or numeric ranges. The thesis in section 3 of his document, combined with a study of "Media Features for Display, Print, and Fax" [6], indicate that more complex media feature values can be handled by predicates. 5. Indicating preferences 5.1 Combining preferences The general problem of describing and combining preferences among feature sets is very much more complex than simply describing allowable feature sets. For example, given two feature sets: {A1,B1} {A2,B2} where: A1 is preferred over A2 B2 is preferred over B1 Which of the feature sets is preferred? In the absence of additional information or assumptions, there is no generally satisfactory answer to this. The proposed resolution of this issue is simply to assert that preference information cannot be combined. Applied to the above example, any preference information about A1 in relation to A2, or B1 in relation to B2 is not presumed to convey information about preference of {A1,B1} in relation to {A2,B2}. In practical terms, this restricts the application of preference information to top-level predicate clauses. A top-level clause completely defines an allowable feature set; clauses combined by logical-AND operators cannot be top-level clauses. 5.2 Representing preferences A convenient way to represent preferences is by numeric "quality values", as used in HTTP "Accept" headers, etc. (see RFC 2068 [9], section 3.9). Klyne [Page 15] Internet draft 8 July 1998 An algebra for describing media feature sets It has been suggested that numeric quality values, as used in some HTTP negotiations, are misleading and are really just a way of ranking options. Attempts to perform arithmetic on quality values do seem to degenerate into meaningless juggling of numbers. Numeric quality values in the range 0 to 1 (as defined by RFC 2068 [9], section 3.9) are used to rank feature sets according to preference. Higher values are preferred over lower values, and equal values are presumed to be equally preferred. Beyond this, the actual number used has no significance, and should not be used as a basis for any arithmetic operation. In the absence of any explicitly applied quality value, a value of "1" is assumed, suggesting an "ideal" option that is equally or more preferred than any other. This approach can be represented by extending the Prolog-based framework of an earlier example as follows: match_format(File,Qvalue) :- match_format(File), Qvalue=1. match_format(File) :- pix(File,[1024,768], res(File,[Rx,Ry]). match_format(File,Qvalue) :- pix(File,[800, 600]), res(File,[Rx,Ry]), Qvalue=0.9. match_format(File,Qvalue) :- pix(File,[640, 480]). res(File,[Rx,Ry]), Qvalue=0.8. res(File,[Rx,Ry]) :- feature_match(Rx,[72,600]), feature_match(Ry,[72,600]), ( feature_match(Rx,Ry) ; feature_match(Rx,2*Ry) ; feature_match(2*Rx,Ry) ). This example applies image preference ranking based solely on the size of the image, provided that the resolution constraints are satisfied. Klyne [Page 16] Internet draft 8 July 1998 An algebra for describing media feature sets 6. Feature set representation The foregoing sections have described a framework and semantics for defining feature sets with predicates applied to feature collections. This section proposes some concrete representations for these feature set predicates. Rather than invent an all-new notation, this proposal adapts a notation already defined for directory access [7,8]. Observe that a feature collection is similar to a directory entry, in that it consists of a collection of named values. Further, the semantics of the mechanism for selecting feature collections from a feature set is in most respects identical to selection of directory entries from a directory. Differences between directory selection (per [7]) and feature set selection described previously are: . Directory selection provides substring-, approximate- and extensible- matching for attribute values. Directory selection may also be based on the presence of an attribute without regard to its value. . Directory selection provides for matching rules which are dependent upon the declared data type of an attribute value. . Feature selection provides for the association of a quality value with a feature predicate as a way of ranking the selected value collections. . Feature selection contains provisions for defining relationships between feature values. The idea of substring matching does not seem to be relevant to feature set selection, and is excluded from these proposals. The idea of extensible matching and matching rules dependent upon data types are facets of a problem not addressed by this memo, but which do not necessarily affect the feature selection syntax. An aspect which might have a bearing on the syntax would be a requirement to specify a matching rule explicitly as part of a selection expression. Testing for the presence of a feature may be useful in some circumstances, but does not sit comfortably within the semantic framework. Feature sets are described by implied universal quantification over predicates, and the absence of reference to a given feature means the set is not constrained by that feature. Against this, it is difficult to define what might be meant by Klyne [Page 17] Internet draft 8 July 1998 An algebra for describing media feature sets "presence" of a feature, so this option is not included in these proposals. An effect similar to testing for the presence of a feature can be achieved by defining a Boolean-valued feature. 6.1 Textual representation of predicates The text representation of a feature set is based on RFC 2254 "The String Representation of LDAP Search Filters" [8], excluding those elements not relevant to feature set selection (discussed above), and adding elements specific to feature set selection (e.g. options to associate quality values with predicates). The format of a feature predicate is defined by the production for "filter" in the following, using the syntax notation of [10]: filter = "(" filtercomp *( ";" parameter ) )" parameter = "q" "=" qvalue / ext-param "=" ext-value qvalue = ( "0" [ "." 0*3DIGIT ] ) / ( "1" [ "." 0*3("0") ] ) ext-param = ALPHA *( ALPHA / DIGIT / "-" ) ext-value = filtercomp = and / or / not / item and = "&" filterlist or = "|" filterlist not = "!" filter filterlist = 1*filter item = simple / set set = attr "=" "[" setentry *( "," setentry ) "]" setentry = value "/" range range = value ".." value simple = attr filtertype value filtertype = equal / greater / less equal = "=" approx = "~=" greater = ">=" less = "<=" attr = ftag value = fvalue ftag = fvalue = (Subject to constraints imposed by the protocol that carries a feature predicate, whitespace characters may appear between any pair of syntax elements or literals that appear on the right hand side of these productions.) Klyne [Page 18] Internet draft 8 July 1998 An algebra for describing media feature sets As described, the syntax permits parameters (including quality values) to be attached to any "filter" value in the predicate (not just top-level values). Only top-level quality values are recognized. If no explicit quality value is given, a value of '1.0' is applied. NOTE The flexible approach to quality values and other parameter values in this syntax has been adopted for two reasons: (a) to make it easy to combine separately constructed feature predicates, and (b) to provide an extensible tagging mechanism (for example, to incorporate a conceivable requirement to explicitly specify a matching rule). 6.2 Arithmetic expressions It can be observed that some feature value constraints (e.g. the aspect ratio constraints in section 4.1.1.2) depend upon being able to express equivalence an arithmetic expression of some feature value (or values) and some other value. To capture this idea, our predicate representation needs to represent arithmetic expressions in addition to just logical combinations of feature comparisons. The syntax for this extends that given previously for "attr": value =/ expr expr = [ "+" / "-" ] term *( ( "+" / "-" ) term ) term = factor *( ( "*" / "/" ) factor ) factor = ftag / number / "(" expr ")" number = 1*DIGIT [ "." 1*DIGIT ] The form of an 'expr' is further constrained by the requirement that it is linear function of a single feature tag value. NOTE The linearity constraint is imposed to allow an easy algorithm for matching feature sets with simultaneous constraints involving multiple feature tags. The single variable constraint is to avoid the need to solve simultaneous equations in more than two variables. The allowable forms could be extended at some cost in algorithmic complexity. At the time of writing, there are no likely feature matching scenarios known to the author that would require more complex forms. Klyne [Page 19] Internet draft 8 July 1998 An algebra for describing media feature sets An "expr" appears only on the right hand side of a filter. The left hand side is always a feature tag. The main reason for this is to follow the general style of the LDAP filter string upon which the predicate syntax is based, but it also appears that it may simplify some aspects of feature set matching. 6.3 Named and auxiliary predicates Named and auxiliary predicates can serve two purposes: (a) making complex predicates easier to write and understand, and (b) providing a possible basis for naming and registering feature sets. 6.3.1 Defining a named predicate A named predicate definition has the following form: named-pred = "(" fname *pname ")" ":-" filter fname = pname = 'fname' is the name of the predicate. 'pname' is the name of a formal parameter which may appear in the predicate body, and which is replaced by some supplied value when the predicate is invoked. 'filter' is the predicate body. It may contain references to the formal parameters, and may also contain references to feature tags and other values defined in the environment in which the predicate is invoked. References to formal parameters may appear anywhere where a reference to a feature tag ('ftag') is permitted by the syntax for 'filter'. The only specific mechanism defined by this memo for introducing a named predicate into a feature set definition is the "auxiliary predicate" described later. Specific negotiating protocols or other memos may define other mechanisms. NOTE There has been some discussion of creating a registry for feature sets as well as individual feature values. Such a registry might be used to introduce named predicates corresponding to these feature sets into the environment of a capability assertion. Detailed discussion of this idea is beyond the scope of this memo. Klyne [Page 20] Internet draft 8 July 1998 An algebra for describing media feature sets 6.3.2 Invoking named predicates Assuming a named predicate has been introduced into the environment of some other predicate, it can be invoked by a filter "item" of the form: item =/ fname *param param = expr The number of parameters must match the definition of the named predicate that is invoked. 6.3.3 Auxiliary predicates in a filter A auxiliary predicate is attached to a filter definition by the following extension to the "filter" syntax: filter =/ "(" filtercomp *( ";" parameter ) ")" "where" 1*( named-pred ) "end" The named predicates introduced by "named-pred" are visible from the body of the "filtercomp" of the filter to which they are attached, but are not visible from each other. They all have access to the same environment as "filter", plus their own formal parameters. (Normal scoping rules apply: a formal parameter with the same name as a value in the environment of "filter" effectively hides the environment value from the body of the predicate to which it applies.) NOTE Recursive predicates are not permitted. The scoping rules should ensure this. 6.4 Feature set definition examples This section re-casts the Prolog example given in section 5.2 using the textual form syntax described in sections 6.1-6.3. 6.4.1 Single predicate (| (& (Pix_x=1024) (Pix_y=768) (Res_x=[72..600]) (Res_y=[72..600]) (| (Res_x=Res_y) (Res_x=Res_y*2) (Res_y=Res_x*2) ) ) Klyne [Page 21] Internet draft 8 July 1998 An algebra for describing media feature sets (& (Pix_x=800) (Pix_y=600) (Res_x=[72..600]) (Res_y=[72..600]) (| (Res_x=Res_y) (Res_x=Res_y*2) (Res_y=Res_x*2) ) );q=0.9 (& (Pix_x=640) (Pix_y=480) (Res_x=[72..600]) (Res_y=[72..600]) (| (Res_x=Res_y) (Res_x=Res_y*2 (Res_y=Res_x*2) ) );q=0.8 ) 6.4.2 Predicate with auxiliary predicate (| (& (Pix_x=1024) (Pix_y=768) (Res Res_x Res_y) ) (& (Pix_x=800) (Pix_y=600) (Res Res_x Res_y) );q=0.9 (& (Pix_x=640) (Pix_y=480) (Res Res_x Res_y) );q=0.8 ) where (Res Res_x Res_y) :- (& (Res_x=[72..600]) (Res_y=[72..600]) (| (Res_x=Res_y) (Res_x=Res_y*2) (Res_y=Res_x*2) ) ) end Note that the formal parameters of "Res", "Res_x" and "Res_y", prevent the body of the named predicate from referencing similarly- named feature values. 6.5 ASN.1 representation Should it be required, the LDAP search filter model provides the basis for an ASN.1 representation of a feature predicate. The following ASN.1 is adapted from RFC 2251 "Lightweight Directory Access Protocol (v3)" [7] (also contained in RFC 2254 "The String Representation of LDAP Search Filters" [8]) to mirror the adaptation of the string representation presented above [[The following ASN.1 fragment does not include provision for quality value (and possibly other parameter values). Also, if using an ASN.1-derived representation it would seem appropriate to use an ISO object identifier for the feature tag, and an ASN.1 type for the feature value. Such changes would remove any semblance of compatibility with LDAP, but that may not matter.]] Klyne [Page 22] Internet draft 8 July 1998 An algebra for describing media feature sets Filter ::= CHOICE { and [0] SET OF Filter, or [1] SET OF Filter, not [2] Filter, equalityMatch [3] AttributeValueAssertion, greaterOrEqual [5] AttributeValueAssertion, lessOrEqual [6] AttributeValueAssertion } AttributeValueAssertion ::= SEQUENCE { featureTag OCTET STRING, featureValue OCTET STRING } 7. Content negotiation protocol processing This section addresses some issues that may arise when using feature set predicates as part of some content negotiation or file selection protocol. 7.1 Matching feature sets Matching a feature set to some given feature collection is esentially very straightforward: the feature set predicate is simply evaluated for the given feature collection, and the result indicates whether the feature collection matches the capabilities, and the associated quality value can be used for selecting among alternative feature collections. Matching a feature set to some other feature set is less straightforward. Here, the problem is to determine whether or not there is at least one feature collection that matches both feature sets (e.g. is there an overlap between the feature capabilities of a given file format and the feature capabilities of a given recipient?) This feature set matching is accomplished by a combination of logical expression manipulation and partial evaluation of the predicate constraints. For this procedure to work reliably, the predicates must be reduced to a canonical form. One such form is "clausal form", and procedures for converting general expressions in predicate calculus are given in [5] (section 10.2), [11] (section 2.13), [13] (chapter 4) and [14] (section 5.3.2). "Clausal form" for a predicate is similar to "conjunctive normal form" for a proposition, which consists of a conjunction (logical ANDs) of disjunctions (logical ORs). A related form that is better Klyne [Page 23] Internet draft 8 July 1998 An algebra for describing media feature sets suited to feature set matching is "disjunctive normal form", which consists of a logical disjunction of conjunctions. In this form, it is sufficient to show that at least one of the disjunctions can be satisfied by some feature collection. A syntax for disjunctive normal form is: filter = orlist orlist = "(" "|" andlist ")" / term andlist = "(" "&" termlist ")" / term termlist = 1*term term = "(" "!" simple ")" / simple where "simple" is as described previously in section 6.1. Thus, the canonicalized form has at most three levels: an outermost "(|...)" disjunction of "(&...)" conjunctions of possibly negated feature value tests. NOTE (a theoretical diversion): Is this consideration of "clausal form" really required? After all, the feature predicates are just Boolean expressions, aren't they? Well, no. A feature predicate is a Boolean expression containing primitive feature value tests (comparisons), represented by 'item' in the feature predicate syntax. If these tests could all be assumed to be independently 'true' or 'false', then each could be regarded as an atomic proposition, and the whole predicate could be dealt with according to the (relatively simple) rules of the Propositional Calculus. But, in general, the same feature tag may appear in more than one predicate 'item', so the tests cannot be regarded as independent. Indeed, interdependence is needed in any meaningful application of feature set matching, and it is important to capture these dependencies (e.g. does the set of resolutions that a sender can supply overlap the set of resolutions that a recipient can handle?). Thus, we have to deal with elements of the Predicate Calculus, with its additional rules for algebraic manipulation. This section aims to show that these additional rules are more unfamiliar than complicated. In practice, it seems that the way feature predicates are constructed and used actually avoids some of the complexity of dealing with fully-generalized Predicate Calculus. Klyne [Page 24] Internet draft 8 July 1998 An algebra for describing media feature sets 7.1.1 Feature set matching strategy The overall strategy for matching feature sets, expanded in the following sections, is: 1. Formulate the hypothesis. 2. Reduce the feature predicates to just <= and >=. 3. Reduce the hypothesis to a canonical form (disjunctive normal form). 4. For each of the disjunctions, attempt to show that it can be satisfied by some feature collection. Any that cannot be satisfied are discarded. 4.1 Separate the feature value tests into the smallest possible independent groups, such that each group contains all tests involving one or more feature values. That is: no group contains a predicate involving any feature tag that also appears in a predicate in some other group. 4.2 Within each group, process simultaneous equality constraints, to eliminate them from the overall set of constraints. The result of this step is that simultaneous inequality constraints are not dependent upon equality constraints. 4.3 For each group, merge the various constraints involving just a single feature tag to a minimum form. 4.4 For each group, ensure that all the constraints on the different feature values can be satisfied simultaneously. This involves enumerating and evaluating combinations of predicates within the group, hence the previous step to separate predicates into small groups. 5. If the remaining disjunction is non-empty, then the constraints are shown to be satisfiable. Further, it can be used as a statement of the resulting feature set for possible further matching operations. Klyne [Page 25] Internet draft 8 July 1998 An algebra for describing media feature sets 7.1.2 Formulating the problem A formal statement of the problem we need to solve can be given as: given two feature set predicates, '(P x)' and '(Q x)', where 'x' is some feature collection, we wish to establish the truth or otherwise of the proposition: EXISTS(x) : (P x) AND (Q x) i.e. does there exist a feature collection that satisfies both predicates, 'P' and 'Q'? The predicates are derived from the predicate syntax described previously, and contain a number of primitive predicate functions such as '=', '<=', '>=' as well as logical connectives. The primitive predicate functions have a number of well known properties that we shall need to exploit in order reach a useful conclusion; e.g. (A = B) == (B = A) (A <= B) == (B >= A) (A = B) & (B = C) => (A = C) (A <= B) & (B <= C) => (A <= C) and so on (where '==' is used to mean "is equivalent to", and '=>' to mean "implies"). These rules form a core body of logic statements against which the goal predicate can be evaluated. The form in which these statements are expressed is important to realizing an effective predicate matching algorithm (i.e. one that doesn't loop or fail to find a valid result). The first step in forumulating these rules is to simplify the framework of primitive predicates. 7.1.3 Simplifying primitive predicates The primitive predicates from which feature set definitions are constructed are '=', '<=' and '>='. Observe that, given any pair of feature values, the relationship between them must be exactly one of the following: (LT a b): 'a' is less than 'b'. (EQ a b): 'a' is equal to 'b'. (GT a b): 'a' is greater than 'b'. (NR a b): 'a' is not equal and not related to 'b'. (The final case arises when two values are compared for which no ordering relationship is defined, and the values are not equal; e.g. two unequal string values.) Klyne [Page 26] Internet draft 8 July 1998 An algebra for describing media feature sets These four cases can be captured by a pair of primitive predicates: (LE a b): 'a' is less than or equal to 'b'. (GE a b): 'a' is greater than or equal to 'b'. The four cases described above are prepresented by the following combinations of primitive predicate values: (LE a b) (GE a b) | relationship ---------------------------------- TRUE FALSE | (LT a b) TRUE TRUE | (EQ a b) FALSE TRUE | (GT a b) FALSE FALSE | (NR a b) Thus, the original 3 imitive predicates can be translated to combinations of just LE and GE, reducing the number of additional relationships that must be subsequently captured: (<= a b) --> (LE a b) (>= a b) --> (GE a b) (= a b) --> (& (LE a b) (GE a b) ) 7.1.3.1 Primitive predicate properties This section captures and codifies a set of rules that express the additional properties of the primitive predicates. The form of rules presented is intended to be used for performing feature set matching computations. Each rule is presented as: . one or two constraint predicates, . a single replacement constraint predicate or the value TRUE or FALSE, . optionally, a condition to be satisfied for the replacement to be valid. The relations '==' and '!=' are used here to denote equality and inequality in an unordered data type. A 'FALSE' or '(!TRUE)' result corresponds to failure (non- satisfiability) of any conjunction that contains it. A 'TRUE' or '(!FALSE)' result can be deleted from any conjunction that contains it. Klyne [Page 27] Internet draft 8 July 1998 An algebra for describing media feature sets (<= f f) --> TRUE (<= f a) (<= f b) --> (<= f a), a<=b, a==b (<= f b), a>b FALSE, a!=b (*1) (<= f a) (>= f b) --> FALSE, a FALSE, a<=b, a==b (*3) (<= f a), a!=b (<= f a) (!(>= f b)) --> (<= f a), a= f b)), a>=b FALSE a==b (>= f f) --> TRUE (>= f a) (>= f b) --> (>= f a), a>=b, a==b (>= f b), a= f a) (!(>= f b)) --> FALSE, b<=a, a==b (*3) (>= f a), a != b (>= f a) (!(<= f b)) --> (>= f a), a>b, a!=b (*4) (!(<= f b)), a<=b, FALSE, a==b (!(<= f a)) (!(<= f b)) --> (! (<= f a) ), a>=b, a==b (! (>= f b) ), a= f b)) --> FALSE, a>=b, a==b (!(>= f a)) (!(>= f b)) --> (! (>= f a) ), a<=b, a==b (! (<= f b) ), a>b NOTES (*1) these cases correspond to there being no ordering relationship on the value of 'f', hence can only be satisfied if '(f=a) & (f=b) & (a<>b)', which is always FALSE. (*2) these cases correspond to inconsistent range constraints applied, with the lower bound greater than the upper bound. This can also correspond to inconsistent equality constraints (where equality is represented by a conjunction of '<=' and '>='). (*3) (!(<=...)) is equivalent to '>' or 'nr'. This is a variation of (*2) except that the interval is open at its lower bound. (*4) these cases include there being no ordering relationship on the value of 'f', hence can only be satisfied if '(f=a) & (f<>b)'. The latter term is always FALSE when 'a<>b', so the it reduces to just '(f=a)'. Klyne [Page 28] Internet draft 8 July 1998 An algebra for describing media feature sets Open boundaries ('<' and '>') are represented by a negation of '>=' and '<=' respectively. The rules here assume conjunction, so these are treated separately. [[[TODO: revise the above table to present rules for ordered and unordered features separately. Also, use '<', '>' instead of logical negations. This will make it much easier to very correctness by inspection.]]] [[[TODO: Work through this whole chapter to use consistent notation for describing the predicates being transformed.]]] [[[TODO: model the above system to confirm that it is complete and does indeed work properly in all cases.]]] 7.1.4 Conversion to canonical form The steps below (except step 0) are adapted from standard textbooks on logic and logic programing [5,11]. All steps mentioned for a textbook conversion to clausal form are indicated here for completeness, but some --indicated in parentheses-- are not applicable in this situation. Step 0 is not a textbook step, but is applied to get the predicate into a standard Boolean expression form. 0. Replace all "set" instances with "simple" forms: T = [ E1, E2, ... En ] => (| (T=[E1]) (T=[E2]) ... (T=[En]) ) (T=[R1..R2]) => (& (T>=R1) (T<=R2) ) (T=[E]) => (T=E) 1. (Remove implications and equivalences. Our predicate form is constructed without these, so there is nothing to do here.) 2. Move negation inwards, by application of De Morgan's Rule: (! (& A1 A2 ... An ) ) => (| (! A1 ) (! A2 ) ... (! An ) ) (! (| A1 A2 ... An ) ) => (& (! A1 ) (! A2 ) ... (! An ) ) (! (! A ) ) => A Repeat application of this to the inner expressions A1,A2,...An until all negations apply directly to "simple" forms. 3. (Remove existential quantifiers. This involves replacing existentially qualified variables with "Skolem constants" and/or "Skolem functions" in a procedure called "Skolemizing". Feature predicates do not use these so there is nothing to do here.) Klyne [Page 29] Internet draft 8 July 1998 An algebra for describing media feature sets NOTE Recall that the original goal was formulated using an existential quantifier: EXISTS(x) : (P x) AND (Q x) Applying Skolemization here means that every ocurrence of the variable 'x' is replaced by some unknown "Skolem constant", which is distinct from all other variables and constants in the predicate. In this case, 'x' represents some (unknown) feature collection, which in turn is a set of feature tag and feature value associations. The feature predicate syntax and examples have feature values referenced by their feature tags; each feature tag can be regarded as representing a component value of a feature collection. If we treat every feature tag reference as being implicitly qualified by the feature collection to which the predicate is applied, the predicate as presented is already Skolemized (noting that, by construction, feature tags satisfy the uniqueness requirements for Skolem constants; and, in the absence of universal quantification, Skolem functions are not required). 4. (Pull out universal quantifiers. Again, feature predicates do not use these so there is nothing to do here.) 5. Expand bracketed disjunctions, and flatten bracketed conjunctions and disjunctions: (& (| A1 A2 ... Am ) B1 B2 ... Bn ) => (| (& A1 B1 B2 ... Bn ) (& A2 B1 B2 ... Bn ) : (& Am B1 B2 ... Bn ) ) (& (& A1 A2 ... Am ) B1 B2 ... Bn ) => (& A1 A2 ... Am B1 B2 ... Bn ) (| (| A1 A2 ... Am ) B1 B2 ... Bn ) => (| A1 A2 ... Am B1 B2 ... Bn ) The result is a "disjunctive normal form", a disjunction of conjunctions: (| (& S11 S12 ... ) (& S21 S22 ... ) : (& Sm1 Sm2 ... Smn ) ) Klyne [Page 30] Internet draft 8 July 1998 An algebra for describing media feature sets where the "Sij" elements are either "simple" forms, or negations of "simple" forms. Each term within the top-level "(&...)" construct represents a single possible feature set that satisfies the goal. Note that the order of entries within the top-level '(|...)', and within each '(&...)', is immaterial. From here on, each conjunction '(&...)' is processed separately. Only one of these needs to be satisfiable for the original goal to be satisfiable. (A textbook conversion to clausal form uses slightly different rules to yield a "conjunctive normal form".) 7.1.5 Grouping of feature predicates NOTE: remember that from here on, each disjunction is treated separately. Each simple feature predicate contains a "left-hand" feature tag and may also contain another feature tag on the "right-hand" side. Thus, each simple predicate reduces to one of the forms: (<= f a) (<= f a+b*f1) (>= f a) (>= f a+b*f1) where "f" and "f1" are feature tags, and "a" and "b" are literal (i.e. known) constant values. To arrange these into independent groups: 1. Group all simple predicates according to their left hand feature tag ('f'). 2. Locate all predicates with feature tags ('f1') on their "right- hand" side, and combine the feature tag group in which they appear with the group that has that value as a "left-hand" tag. (e.g. given a predicate of the form '(<= pix_x 2*pix_y)', combine the groups containing predicates of the form '(?= pix_x ...)' and '(?= pix_y ...)', noting that they may already be combined.) One pass through all of the predicates should ensure that any interdependent predicates have been grouped together. (The act of joining predicate groups can never separate any feature tags, and at the end of one pass, every feature tag occurrence has been joined in a group with any other tags that appear in the same predicate.) Klyne [Page 31] Internet draft 8 July 1998 An algebra for describing media feature sets 7.1.6 Remove simultaneous equality constraints The purpose of this step is to remove all dependence of simultaneous inequality constraints on any equality constraint. This means that the only simultaneous constraints remaining are inequality constrains (i.e. '<=' or '>='). An equality constraint occurs when '(<= f e)' and '(>= f e)' are asserted in the same conjunction with the same values for 'f' and 'e', where 'e' may be any value or expression in the limited linear form allowed (i.e. 'a+b*f1'). In the remainder of this section, the expression '(= x y)' is used as a shorthand for both '(<= x y)' and '(>= x y)' appearing in the group with the same values for 'x' and 'y' and with neither being logically negated. For each equality constraint in the group ('(= f1 e)', consisting of '(<= f1 e)' and '(>= f1 e)' as noted above), scan all other constraints in the same group and apply the following rules: [A] (= f a ), 'a' is a literal constant, --> replace all right-hand occurrences of 'f' in the group with 'a'. [B] (= f1 a1+b1*f2), (= f1 a2+b2*f2) --> solve for 'f1', 'f2' using the formulae: f1 = (a1 b2 - a2 b1)/(b2 - b1) f2 = (a1 - a2)/(b2 - b1) replace the predicates with equivalents that represent the solutions, and replace all right-hand occurrences of 'f1', 'f2' in the group with these solutions. If the solution is degenerate (i.e. 'b2=b1') then: (a) if 'a1=a2', discard one of the constraints and continue processing with the other. (b) if 'a1<>a2', fail the current conjunction as there are no values of 'f1' and 'f2' that can satisfy both of these predicates. Klyne [Page 32] Internet draft 8 July 1998 An algebra for describing media feature sets [C] (= f1 a1+b1*f2), (= f2 a2+b2*f1) --> solve for 'f1', 'f2' using the formulae: f1 = (a1 + a2 b1)/(1 - b1 b2) f2 = (a2 + a1 b2)/(1 - b1 b2) replace the predicates with equivalents that represent the solutions, and replace all right-hand occurrences of 'f1', 'f2' in the group with these solutions. If the solution is degenerate (i.e. 'b1 b2=1') then: (a) if 'a1+a2 b1=0' (or 'a2+a1 b2=0'), discard one of the constraints and continue processing with the other. (b) if 'a1+a2 b1<>0', fail the current conjunction as there are no values of 'f1' and 'f2' that can satisfy both of these predicates. [D] (= f1 a1+b1*f2), (<= f1 a2+b2*f2) --> substitute for 'f1' in the second predicate, and reorganize to get the result into the required form: a1 + b1 f2 <= a2 + b2 f2 hence (b1 - b2) f2 <= a2 - a1 Now we need to consider 4 cases: (1) b1>b2, use: (<= f2 (a2-a1)/(b1-b2)) (2) b1= f2 (a2-a1)/(b1-b2)) (3) b1=b2, a2>=a1, use: TRUE (4) b1=b2, a2 substitute for 'f1' in the second predicate, and reorganize to get the result into the required form: f2 <= a2 + b2 (a1 + b1 f2) hence f2 <= a2 + b2 a1 + b2 b1 f2 (1 - b1 b2) f2 <= a2 + a1 b2 Now we need to consider 4 cases: (1) b1 b2 < 1, use: (<= f2 (a2 - a1*b2)/(1 - b1*b2) ) (2) b1 b2 > 1, use: (>= f2 (a2 - a1*b2)/(1 - b1*b2) ) (3) b1 b2 = 1, (a2 + a1 b2) >= 0, use: TRUE (4) b1 b2 = 1, (a2 + a1 b2) < 0, use: FALSE [F] (= f1 a1+b1*f2), (>= f1 a2+b2*f2) --> is the same as rule [D] (above) except that '<=' and '>=' predicates are exchanged. [G] (= f1 a1+b1*f2), (>= f2 a2+b2*f1) --> is the same as rule [E] (above) except that '<=' and '>=' predicates are exchanged. When processing of an equality predicate is complete, remove it from active consideration in the group (but keep it as part of the final solution). Klyne [Page 33] Internet draft 8 July 1998 An algebra for describing media feature sets On completion of this stage, all simultaneous equality predicates have been replaced by single feature value predicates, or by equality with an expression containing a feature whose value range is determined by one or more inequality predicates. [[[TODO: discussion of near-degenerate cases. Numerical stability issues, etc. How to handle?]]] [[[TODO: if the theory is correct, this stage should be unecessary, though it may yield performance gains if there is a large number of interdependent features (unlikely). Model the algorithm without this step to increase confidence that it works OK.]]] 7.1.7 Merge single-feature constraints Within each group, apply the predicate simplification rules from section 7.1.3 to eliminate redundant single-feature constraints. All single-feature predicates are reduced to an equality, inequality or range constraint on that feature. If the constraints on any feature are found to be contradictory, the current disjunction is removed from the feature set description. The resulting description is a minimal form of the particular disjunction of the feature set definition. [[[TODO: move rule descriptions to here when they are revised.]]] 7.1.8 Test simultaneous feature constraints The final stage is to examine any simultaneous inequality constrains on the feature values to determine whether or not they can be satisfied. During this stage, the feature set description predicates are not further modified. Previous processing has been directed toward reducing the feature set descriptions to some minimal format; the final stage is to examine the reduced rules to determine whether or not they can be satisfied. 7.1.8.1 Convex sets The algorithm for testing satisfiability of the simultaneous inequalities is based on the theory of conves sets. These are described in [16,17], or other good reference works on Linear Algebra and/or Linear Programming. Klyne [Page 34] Internet draft 8 July 1998 An algebra for describing media feature sets The idea is very simple: a convex set is a set where any linear combination of any two members with coefficients that sum to 1 is also a member of the set. A simple example is a line between two points: if P1 and P2 are points on that line, then (using normal vector addition and multiplication by a scalar) '(a1 P1 + a2 P2)' is also a point on that line, where 'a1+a2=1'. This idea generalizes easily to higher dimensions. CS1: A "convex set" is a set of vectors 'S' if 't V + (1-t) U' belongs to S whenever U and V belong to S and 0 <= t <= 1. CS2: The solution of any set of linear inequalities is a convex set. [16, chapter 7] Other definitions and properties of convex sets are: CS3: A "convex combination" of points 'V1,V2,...,Vn' is the point 'V = SUM(i=1,n; ti Vi)', where 'SUM(i=1,n; ti) = 1'. CS4: A point 'U' in a convex set 'C' is called an "extreme point" or "vertex" if it cannot be expressed as a convex combination of two other distinct points in 'C'. CS5: The "convex hull" 'C(S)' of a set of points 'S' is the set of all convex combinations of points from 'S'. The set 'S' is said to "span" 'C(S)'. CS6: A "convex polygonal region" or "convex polyhedron" is a convex hull spanned by a finite number of points 'V1,V2,...,Vn' is the set of all convex combinations of 'V1,...,Vn'. CS7: A bounded solution to a set of linear inequalities is a convex polyhedron. [16, chapter 7] CS8: A "simplex" is an n-dimensional convex polyhedron with exactly n+1 vertices. The boundary of a simplex contains simplices of lower dimension, which are called "simplicial faces". [17, ch 2, sect 3] [[[TODO: above are textbook definitions. Below are a number of definitions and hypotheses concerning convex sets that seem fairly obvious to me, but for which I don't have proofs or citations. These are currently tagged [GK]. Proofs or citations are needed.]]] CS9: A convex set is spanned by its vertices. [GK] Klyne [Page 35] Internet draft 8 July 1998 An algebra for describing media feature sets CS10: The vertices of the complex polyhedron that is the solution of a set of linear ineqalities are a subset of the solutions of the inequalities treated as equations. Specifically the linear equation solutions that lie within the solution convex set are its vertices. [GK] [[[cf Simplex method slack variables]]] [[[Unbounded solution sets]]] [[[Vertices for unbounded sets]]] [[[Develop open/closed boundaries]]] [[[Boundary points -- adjoining points]]] [[[Open and closed boundaries]]] [[[Open and closed vertices]]] [[[Relating open/closed vertices to inequalities]]] 7.1.8.2 Testing for satisfiability Assume that our dependency group contains N features. Then the convex set that comprises the solution space contains N-vectors. The vertices of this convex set will be obtained each as the solution of N simultaneous linear equations. As the original problem was constrained to allow a maximum of two features in any single primitive feature predicate, it will be possible to solve the simultaneous equations by direct substitution. To test a set of inequalities for satisfiability: 1. From the inequalities in the dependency group, select each combination that references all N interdependent features. (It is assumed that the total number of inequalities in a group will be quite small, so the total number of combinations should not get out of hand.) 2. For each selected combination, treat the inequalities as equations and solve for the feature values. If the solution is degenerate, discard that combination. Klyne [Page 36] Internet draft 8 July 1998 An algebra for describing media feature sets 3. For each solution generated at step 2, test it against all of the constraints in the set. If it satisfies or adjoins each constraint, add that solution to the set of vertices. If the solution fully satisfies every constraint, note that it is a 'closed' vertex. Otherwise, if it only adjoins (rather than fully satisfies) at least one of the constraints, note that it is an 'open' vertex. Discovery of any closed vertex indicates at least one solution that satisfies the current dependency group. 4. If all the vertices discovered are 'open' vertices, then construct a new point that is a convex combination of the vertices discovered, with all coefficients in the range 0= 100' and another feature set specifies 'Res_x_dpi <= 100'. A simple conjunction of these yields the theorem: (& ( Res_x_dpcm >= 100 ) ( Res_x_dpi <= 100 ) ) which appears to be quite satisfiable, despite what we know about the relationship between them. But we can add an extra clause that captures our knowledge of this relationship: (& ( Res_x_dpcm >= 100 ) ( Res_x_dpi <= 100 ) ( Res_x_dpi = Res_x_dpcm*2.54 ) ) The procedures described above for handling simultaneous feature constraints would cause this predicate to fail because there are no values for 'Res_x_dpcm' and 'Res_x_dpi' that simultaneously satisfy all the constraints. 7.4 Unknown feature value data types [[Discuss issues of specific features which may have feature- specific comparison rules, as opposed to generic Booleans, enumerations, strings and numbers which use comparison rules independent of the feature concerned.]] [[[TODO]]] 7.5 Worked example [[[TODO]]] 7.6 Algorithm source code [[[TODO]]] Klyne [Page 38] Internet draft 8 July 1998 An algebra for describing media feature sets 8. Security considerations Some security considerations for content negotiation are raised in [1,2,3]. The following are primary security concerns for capability identification mechanisms: . Unintentional disclosure of private information through the announcement of capabilities or user preferences. . Disruption to system operation caused by accidental or malicious provision of incorrect capability information. . Use of a capability identification mechanism might be used to probe a network (e.g. by identifying specific hosts used, and exploiting their known weaknesses). The most contentious security concerns are raised by mechanisms which automatically send capability identification data in response to a query from some unknown system. Use of directory services (based on LDAP [7], etc.) seem to be less problematic because proper authentication mechanisms are available. Mechanisms which provide capability information when sending a message are less contentious, presumably because some intention can be inferred that person whose details are disclosed wishes to communicate with the recipient of those details. This does not, however, solve problems of spoofed supply of incorrect capability information. The use of format converting gateways may prove problematic because such systems would tend to defeat any message integrity and authenticity checking mechanisms that are employed. Klyne [Page 39] Internet draft 8 July 1998 An algebra for describing media feature sets 9. Copyright Copyright (C) The Internet Society 1998. All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 10. Acknowledgements My thanks to Larry Masinter for demonstrating to me the breadth of the media feature issue, and encouraging me to air my early ideas. Early discussions of ideas on the IETF-HTTP and IETF-FAX discussion lists led to useful inputs also from Koen Holtman, Ted Hardie and Dan Wing. The debate later moved to the IETF conneg WG mailing list, where Al Gilman was particularly helpful in helping me to refine the feature set algebra. Several ideas for dealing with preferences were suggested by Larry Masinter. I would also like to thank Content Technologies Ltd and 5th Generation Messaging Ltd for supporting this work. Klyne [Page 40] Internet draft 8 July 1998 An algebra for describing media feature sets 11. References [1] "Scenarios for the Delivery of Negotiated Content" T. Hardie, NASA Network Information Center Internet draft: Work in progress, November 1997. [2] "Requirements for protocol-independent content negotiation" G. Klyne, Integralis Ltd. Internet draft: Work in progress, March 1998. [3] "Content feature tag registration procedures" Koen Holtman, TUE Andrew Mutz, Hewlett-Packard Ted Hardie, NASA Internet draft: Work in progress, November 1997. [4] "Notes on data structuring" C. A. R. Hoare, in "Structured Programming" Academic Press, APIC Studies in Data Processing No. 8 ISBN 0-12-200550-3 / 0-12-200556-2 1972. [5] "Programming in Prolog" (2nd edition) W. F. Clocksin and C. S. Mellish, Springer Verlag ISBN 3-540-15011-0 / 0-387-15011-0 1984. [6] "Media Features for Display, Print, and Fax" Larry Masinter, Xerox PARC Koen Holtman, TUE Andrew Mutz, Hewlett-Packard Dan Wing, Cisco Systems Internet draft: Work in progress, January 1998. [7] RFC 2251, "Lightweight Directory Access Protocol (v3)" M. Wahl, Critical Angle Inc. T. Howes, Netscape Communications Corp. S. Kille, Isode Limited December 1997. [8] RFC 2254, "The String Representation of LDAP Search Filters" T. Howes, Netscape Communications Corp. December 1997. Klyne [Page 41] Internet draft 8 July 1998 An algebra for describing media feature sets [9] RFC 2068, "Hyptertext Transfer Protocol -- HTTP/1.1" R. Fielding, UC Irvine J. Gettys, J. Mogul, DEC H. Frytyk, T. Berners-Lee, MIT/LCS January 1997. [10] RFC 2234, "Augmented BNF for Syntax Specifications: ABNF" D. Crocker (editor), Internet Mail Consortium P. Overell, Demon Internet Ltd. November 1997. [11] "Logic, Algebra and Databases" Peter Gray Ellis Horwood Series: Computers and their Applications ISBN 0-85312-709-3/0-85312-803-3 (Ellis Horwood Ltd) ISBN 0-470-20103-7/0-470-20259-9 (Halstead Press) 1984. [12] "Introduction to Expert Systems" Peter Jackson Addison Wesley, International computer science series ISBN 0-201-14223-6 1986. [13] "Elementary Logics: A procedural Perspective Dov Gabbay Prentice Hall, Series in computer science ISBN 0-13-726365-1 1998. [14] "Logic and its Applications" Edmund Burk and Eric Foxley Prentice Hall, Series in computer science ISBN 0-13-030263-5 1996. [15] "Metalogic: An Introduction to the Metatheory of Standard First Order Logic" Geoffrey Hunter University of California Press ISBN 0-520-02356-0 1971. [16] "Elementary Linear Algebra" Paul C Shields Worth Publishers Inc. ISBN 0-87901-121-1 1968, 1973, 1980. Klyne [Page 42] Internet draft 8 July 1998 An algebra for describing media feature sets [17] "Linear Programming" Saul I Gass, McGraw-Hill Inc. Library of Congress Catalog Card no 68-55267 (no ISBN) 1958, 1964, 1969. 12. Author's address Graham Klyne Content Technologies Ltd. 5th Generation Messaging Ltd. Forum 1 5 Watlington Street Station Road Nettlebed Theale Henley-on-Thames Reading, RG7 4RA RG9 5AB United Kingdom United Kingdom. Telephone: +44 118 930 1300 +44 1491 641 641 Facsimile: +44 118 930 1301 +44 1491 641 611 E-mail: GK@ACM.ORG Klyne [Page 43]