SIMPLE H. Schulzrinne Internet-Draft Columbia U. Expires: January 11, 2006 July 10, 2005 Composing Presence Information draft-schulzrinne-simple-composition-00 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 11, 2006. Copyright Notice Copyright (C) The Internet Society (2005). Abstract Composition creates a presence document from multiple components published by one or more sources. This document describes use cases for presence composition and identifies sources of information that a composer might draw on. The composing function can be complex, so we intentionally restrict the discussion to cases that are likely to be common across many users of presence systems. Schulzrinne Expires January 11, 2006 [Page 1] Internet-Draft Composition July 2005 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Scoping Composition . . . . . . . . . . . . . . . . . . . . . 4 2.1 Device tuples . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Service tuples . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Person tuples . . . . . . . . . . . . . . . . . . . . . . 6 3. Types of Information sources . . . . . . . . . . . . . . . . . 7 4. Information conflict . . . . . . . . . . . . . . . . . . . . . 8 4.1 Sources of Information Conflict . . . . . . . . . . . . . 8 4.2 Detecting information conflicts . . . . . . . . . . . . . 9 5. Composition Operations . . . . . . . . . . . . . . . . . . . . 10 5.1 Default Policy: Union . . . . . . . . . . . . . . . . . . 10 5.2 Tuple Discarding . . . . . . . . . . . . . . . . . . . . . 11 5.3 Combine Identical Contacts . . . . . . . . . . . . . . . . 11 5.4 Ambiguity Resolution . . . . . . . . . . . . . . . . . . . 11 5.5 Tuple Merging . . . . . . . . . . . . . . . . . . . . . . 12 5.6 Multi-Person Composition . . . . . . . . . . . . . . . . . 12 6. Security Considerations . . . . . . . . . . . . . . . . . . . 12 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 8.1 Normative References . . . . . . . . . . . . . . . . . . . 12 8.2 Informative References . . . . . . . . . . . . . . . . . . 13 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 14 A. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14 Intellectual Property and Copyright Statements . . . . . . . . 15 Schulzrinne Expires January 11, 2006 [Page 2] Internet-Draft Composition July 2005 1. Introduction Composition combines multiple presence or event sources into one view, which is then delivered, after various filtering operations, to watchers [6] [7]. Composition is required whenever there are several sources contributing information about a single presentity or event. [Note: The content in this draft overlaps with the Processing Model draft and needs to be reconciled. Here, the emphasis is on developing the foundations for a composition policy language, and deal with merging tuples.] For notational simplicity and since most of the discussion has focused on presence rather than general events, we will restrict our attention to presence information using the Presence Information Data Format (PIDF) [3] and extensions such as the Rich Presence Information Data format (RPID) [4], keeping in mind that other types of events or status may well be able to use many of the same mechanisms. We assume that a presentity is a single human being. There are other presentities, such as the collection of customer service agents in a call center, where consistency is much harder to define. We assume that the composition operation does not depend on the watcher identity, as there seems little functional gain by introducing per-watcher composing operations. The composed document contains the maximum set of information, i.e., no watcher can obtain more information than is contained in the composed raw presence document. Composition at the presence agent is just one component of providing useful and correct information to the watcher. We assume that composition is algorithmic, although manual composition by the presentity is theoretically possible. Given the automated nature of composition, there may well be situations where the best course of action is to expose the underlying data to the watcher, even though it may be contradictory. Indeed, in many cases, a mechanical composer may not even be able to detect whether information is contradictory or not. The goal of composition is to remove information that is either stale, contradictory or redundant. Stale information has been superseded by other, newer information. Contradictory information makes two statements about the presentity that cannot both be true. Redundant presence information provides information that is no longer of interest. For example, a presentity may decide to drop information about services whose status is closed if there are open services and may drop a service record referring to another person Schulzrinne Expires January 11, 2006 [Page 3] Internet-Draft Composition July 2005 via a element if the presentity itself is available. Composition is not designed to reduce the size of notification messages or to protect information for privacy. Various compression schemes and partial notification [10] are better suited to reduce message sizes. Privacy filtering [8] has the role of tailoring information to individual recipients, based on the presentity's privacy policy. We assume that composition primarily affects and elements and that each device only publishes one element. A composer may discard device elements that are not referenced by any service . In our model, the composer is reactive. In other words, it only creates a new presence document if one of the publishers updates parts of the presence document. An active composer could, for example, generate a new presence document after a certain time interval has elapsed or when timed presence [5] information is transitioning from the future to the presence. The goal of this document is to outline options and then to derive a composition policy language that addresses the most useful aspects of composition. Alternatively, a presence composition language can focus on the XML document and its components. Such a general presence composition language would have to be a full programming language, as it would need to support standard programming constructs such as conditionals, operations on XML elements in a document object model, history and external sources. This document focuses on content-aware policies rather than simple tools for mechanical transformations of XML presence documents. 2. Scoping Composition In our model, presence takes a presence document, made up of a set of , and tuples, each tuple consisting of one or more elements, and creates another valid presence document based on this information. First, we describe possible operation on these tuples, proceeding from the highest granularity to sub-element operations. In all cases, operations can be concatenation (union), merging (combining elements from several tuples into one) or selection (one-of-N). Within the same composition, operations for different tuple types may differ. Since the issues differ between tuple types, we discuss each in turn. In the tuple-level approach to composition, the integrity of individual tuples is maintained. The two possible operations are union and selection. For a union operation, tuples from different Schulzrinne Expires January 11, 2006 [Page 4] Internet-Draft Composition July 2005 presence sources, of the same kind and with different tuple identifiers are simply all copied to the composed presence document. This is the default composition policy described in the data model documents. Also, it appears likely that for different devices are simple collected, possibly removing unreferenced devices, i.e., devices that are not referred to by any service (). For selection, only a subset of tuples for each type are copied to the composed document, with all others being discarded. It is also conceivable that tuples are concatenated, but that elements from some of the tuples are removed, e.g., because they are considered less reliable. For element-level operations, one can either include multiple instances of the same element type, where allowed, or create a new element that combines values from multiple elements. Some elements can contain timing information indicating the range of time that the information is believed to be valid. It is probably not a good idea to combine elements that cover different, although maybe overlapping, time intervals. 2.1 Device tuples Device tuples are relatively easy to compose, as each device should publish one tuple for itself. Thus, later tuples simply replace earlier tuples. In addition, device tuples may be removed during composition if no service tuple () references the device, as a device tuple by itself has limited usefulness since it cannot be reached without service information. (This is not completely true as user activity information may still be useful even if the device itself is not part of any published service.) 2.2 Service tuples Composing service tuples differs depending on whether the element has the same or a different URI. (We leave aside for now the difficulty of deciding whether two URIs that are not lexically identical are indeed functionally the same.) Unlike for other tuple types, depending on the scope of composition, new service tuples may actually be created from a variety of service tuple inputs. For example, a composer could take a variety of services (e.g., cell phone, home phone, office phone, PC IM client) available under different contact URIs and create a new service tuple that "hides" all these services behind a single SIP URI. Schulzrinne Expires January 11, 2006 [Page 5] Internet-Draft Composition July 2005 When composing tuples with the same contact URI and service class, there are likely a few common rules for current RPID elements. Such a case is most likely to occur if the same service is being provided by a variety of presentity-operated devices. class: A single value needs to be chosen. device: If a service is offered by multiple devices, it makes sense to enumerate all the device identifiers. privacy: Since the caller cannot select the device that satisfies specific privacy requirements, the appropriate choice is to provide the most conservative indication of the privacy to be expected, i.e., the least privacy indicated among all the tuples for the contact URI. [TBD: Note, however, that there are proposals [11] to allow remote parties to indicate privacy requirements. relationship: If two tuples with the same contact URI differ in their relationship, the relationship element needs to be dropped. status icon: It is a local choice whether to present all status icons, as they may reflect specific capabilities, or choose one. user input: In a combined , it makes sense to reflect the most recent user input. 2.3 Person tuples In the current data model, each presence document only reports the status of one person. Below, we investigate the issues for RPID elements, as they illustrate the various issues and sensible options. activities: As noted earlier, activities reported by a variety of sources can easily all be true, as they may report on different facets of a person. For example, for a driver talking on a cell phone, a car sensor may report "steering", while the phone can report "on-the-phone", and the calendar can indicate "appointment". Given the difficulties of deciding which activities truly rule out each other, there are likely to be only a few viable heuristics, such as only reporting recent activities. class: As discussed for above. mood: This is similar to . place-is: Unless a tuple is considered unreliable, selecting the least favorable value for each media type appears to be a safe default operation. place-type: Place type values are largely mutually exclusive, with exceptions such as "outdoors", "stadium" and "public", so that a selection needs to be made, maybe with a rule engine that prefers more specific values over more general ones. (In our example, "stadium" might win over "public".) Schulzrinne Expires January 11, 2006 [Page 6] Internet-Draft Composition July 2005 privacy: As discussed for above. sphere: This element is also largely single-valued, so that a selection needs to be made. As explained in Section 5, certain sphere values may be preferred over others. status-icon: As discussed for above. time-offset: Given the one-person-in-one-place assumption, the most likely value needs to be chosen. user-input: As discussed for above. As other elements are added, they are likely to either fall into the category of elements where collecting all (recent) values makes most sense, such as activities and mood above, and where a choice among sources needs to be made. The choice can be made either on a per- tuple basis, so that all elements in the composed document come from the same source tuple, or on an element-by-element basis. 3. Types of Information sources Presence information can be contributed by many different sources, either directly, by publishers using PUBLISH requests or by a presence agent acting as a watcher receiving NOTIFY requests. We describe each mode of delivery operation in the following. In direct mode, the composer has direct access, without presence protocol mediation, to this information, e.g., via REGISTER requests or layer-2 operations or access to user keyboard activity. Secondly, sources can use SIP PUBLISH requests to update presence information. Finally, presence agents can in turn subscribe to presence information and receive NOTIFY requests. However, the mechanism of data delivery is likely to be less important than the original data source and how the information was derived. Thus, to the extent possible, information about the original source should be preserved as otherwise information might become more credible simply because it has been re-published. We focus here on the semantic source of the data, i.e., how it was derived, not how it was injected into the presence system. For simplicity, we do not try to assess the veracity of the presence document. In order to evaluate the usefulness of a presence document, we only care whether the presentity would want the information to appear that way, not whether this corresponds to observable facts. Thus, a presence document is correct in that sense if it indicates that the presentity is in a meeting even though the presentity has actually gone fishing if the presentity would like the rest of the world to believe that he is at work. It may, however, well be the case that composition policies find it easier to maintain the truth than keep lies consistent across sources of presence information. Schulzrinne Expires January 11, 2006 [Page 7] Internet-Draft Composition July 2005 We can distinguish the following sources of presence data: Reported current: Reported current information has been provided by the presentity within processing time delays of the current time. A presentity can update status information manually, by setting any of the element in a presence document. We assume that this information is correct when entered, but the trustworthiness of the information is likely to decay as time goes on, given that most human users will find it difficult to continuously keep presence information up-to-date. Reported scheduled: For reported scheduled information, a presentity indicates its plans for the future rather than the present, e.g., in a calendar. The reliability of this information depends largely on the diligence of the user in updating calendars and similar sources. Measured device information: Measured device information uses observed user behavior on communication devices, such as the act of placing or receiving calls or typing. The main source of error is that a device may not be able to tell whether the presentity itself is using the device or some other person. Measured by sensors: Presence information measured by sensors reflects the status of the presentity, e.g., its location, type of location, activity or other environmental factors. Examples of sensors include Global Positioning System (GPS) information for location or a BlueTooth beacon that announces the type of location, such as "theater", a person finds itself in. Sensors have the advantage that they do not rely on humans to keep the information up-to-date, but sensors are naturally subject to measurement errors. In particular, in quantum mechanical fashion, it is sometimes difficult to ascertain both the measured variable and the identity of the presentity. For example, a passive infrared sensor (PIR) can detect that somebody is in the office of the presentity, but cannot detect whether this is the presentity himself, cleaning staff or a dog. A GPS sensor cannot detect whether the cell phone is being used by the presentity or has been borrowed by the presentity's spouse. Derived: Presence information might be derived indirectly from other sources of data. For example, the basic open/closed status might be algorithmically derived from a variety of other, watcher- visible or not, elements. 4. Information conflict 4.1 Sources of Information Conflict The fundamental problem addressed by composition is information conflict, i.e., multiple sources (publishers) have different views of the presentity, some of which may be outdated or incorrect. Schulzrinne Expires January 11, 2006 [Page 8] Internet-Draft Composition July 2005 Information can be incorrect for any number of reasons, but some examples include: Location divergence: The publisher collecting the information may not be colocated with the presentity at this particular time. For example, Alice's home PC may report that the user is idle (not typing), but Alice is using the office PC. Update diligence: Some sources, particularly those updated manually, are prone to only approximate reality. For example, few users record all appointments or meetings in their calendar or, conversely, remove all canceled meetings. This is particularly true for regularly scheduled activities such as meals or commute times. Sensor failure: Sources that report their information differentially are subject to silence ambiguity. If such a source does not report new data, the receiver cannot tell whether the sensor is malfunctioning or whether the information last received is still current. This can be partially mitigated by requiring sources to report when they are no longer confident of the data. However, this does not deal with sudden source failures. Thus, some form of keep-alive mechanism may well be needed that overrides differential notification mechanisms. Even with keep-alive, there is likely to be a substantial period of time between source failure and failure detection, causing stale information. 4.2 Detecting information conflicts For mechanical composition, we would like to be able to detect information conflicts, so that appropriate processing logic can remove inaccurate information. Information conflicts can be classified as to whether they are detectable in a single element or only across elements and how easy it is to detect them. We distinguish single-element from multi-element conflicts. Single- element conflicts occur if two elements, say in RPID, in two sources cannot both be true or are highly unlikely to be true, without having to inspect any other element. A multi-element conflict occurs if only the combination of multiple elements indicates a conflict. Multi-element conflicts often have location, and properties known for this location, as the common element. For example, certain geospatial locations are known not to contain certain types of places. Thus, both the location and the information are, by themselves, each credible and possible, but are detectably wrong once considered together. These conflicts can be detected if location or time can be mapped to reliable information from external sources. Schulzrinne Expires January 11, 2006 [Page 9] Internet-Draft Composition July 2005 We distinguish three types of information conflict: obvious, probable and undetectable, described in turn below. For some pieces of presence information, information conflicts are obvious and readily detectable. For example, under the one-person- per-presentity assumption and common assumptions of physics, a single presentity can only be in one place at a time. Thus, if two sources report location information that differs by more than the margin of error, one must be wrong. In RPID, the , , , , , and elements have exlusive values, although in some cases, below the element level. For example, the field has information for both audio and video, and thus two sources may report different information for and still both be correct as long as they refer to different media types. For other types of information, an automaton can guess with some probability that two sources of information contradict each other, but this may well depend on the values themselves. For example, the combination of away, appointment, in-transit, meeting, on-the-phone, steering incrementally reported by different sources may well reflect the activity of the typical Wall Street commuter in the Lincoln Tunnel, speaking on his cell phone. One would hope, however, that combinations such as "steering, sleeping" are rarely true, although "sleeping, meeting" indicates that there are few activities that completely rule out others. Thirdly, undetectable information conflicts are those where a machine lacking human intelligence cannot reliable detect that the two pieces of information cannot both be true. For example, an automaton is unlikely to be able to decide which of several notes or free-text fields is valid, without basing this on other information in the tuple, person or device element. 5. Composition Operations Based on the exploration of common presence document operations earlier, we now discuss a variety of practical semantic composition operations and policies that may be useful. They are worded as possible choices or operations that can be applied, in a roughly sensible order of execution. 5.1 Default Policy: Union The default composition policy is designed to lose no information, at Schulzrinne Expires January 11, 2006 [Page 10] Internet-Draft Composition July 2005 the expense of presenting possibly contradictory information to watchers. This composition policy performs a union with replacement. Newly published elements replace earlier elements with the same 'id' attribute. We assume that each source chooses their own 'id' values. Other than this, all elements are simply enumerated as is, sorted by type (person, tuple, device). Elements within the , and elements are not modified at all, except possibly annotated with a source description (and timestamp?). This policy can also be seen as providing input to the following steps. 5.2 Tuple Discarding The next steps discards whole tuples, based on zero or more of the criteria below: Closed contacts: All s (service) with a basic status of 'closed' are removed. Old tuples: Tuples with a timestamp or time range older than a given threshold are discarded. 5.3 Combine Identical Contacts Next, a composer MAY combine all s with identical contacts, where identical is defined for each URI schema. It MAY also create a new tuple with a SIP AOR, representing several s. 5.4 Ambiguity Resolution For elements in or where only one value makes sense, but there are a number of source tuples, there are a number of possible approaches: Choose recent tuple: The composer chooses elements from the most recent tuple only or from tuples no more than a certain age. Simply choosing the most recently published value is likely to cause flip-flopping between dueling publishers. Choose trustworthy tuple: The composer chooses values from the most trustworthy tuple, typically based on the source or type of reporting. For example (Section 3), might rank sources in the order "reported current", "measured device information", "measured by sensors", "reported scheduled", and finally "derived". Schulzrinne Expires January 11, 2006 [Page 11] Internet-Draft Composition July 2005 Omit contradictions: A conservative approach omits any information where two source tuples contradict each other. Only the intersection of enumerated sets, with their associated notes, is used. Choose by sphere: Tuples belong to a certain sphere may be given precedence. Value precedence: Tuples containing certain values may be selected over others. For examples, activity indication of "meeting" might win over "sleeping" or a tuple for the person itself over one for a relationship. More specific values may displace less specific ones, e.g., for activities or location information. Location-based: If the location information is trustworthy, it may be possible to rule out certain values, e.g., for . However, this appears to be generally unlikely. 5.5 Tuple Merging As long as different sources report non-contradictory information, these can be merged into one tuple. Such merging hides their origin from multiple sources, which can be considered both a feature and a drawback. 5.6 Multi-Person Composition While this document focuses on single-person presentities, it should be noted that the same problem also exists for aggregate entities, such as the contact representing a call center. Examples of policies include composing for maximum availability ("at least one person is available") or commonality ("everybody is in the US"). 6. Security Considerations Composition itself does not create or reveal any new data. Thus, the security considerations are the same as those for the constituent presence information elements. 7. IANA Considerations This document does not request any IANA actions. 8. References 8.1 Normative References [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [2] Day, M., Rosenberg, J., and H. Sugano, "A Model for Presence and Schulzrinne Expires January 11, 2006 [Page 12] Internet-Draft Composition July 2005 Instant Messaging", RFC 2778, February 2000. [3] Sugano, H., Fujimoto, S., Klyne, G., Bateman, A., Carr, W., and J. Peterson, "Presence Information Data Format (PIDF)", RFC 3863, August 2004. 8.2 Informative References [4] Schulzrinne, H., "RPID: Rich Presence Extensions to the Presence Information Data Format (PIDF)", draft-ietf-simple-rpid-07 (work in progress), June 2005. [5] Schulzrinne, H., "Timed Presence Extensions to the Presence Information Data Format (PIDF) to Indicate Status Information for Past and Future Time Intervals", draft-ietf-simple-future-04 (work in progress), June 2005. [6] Rosenberg, J., "A Data Model for Presence", draft-ietf-simple-presence-data-model-02 (work in progress), February 2005. [7] Rosenberg, J., "A Processing Model for Presence", draft-rosenberg-simple-presence-processing-model-00 (work in progress), February 2005. [8] Schulzrinne, H., "A Document Format for Expressing Privacy Preferences", draft-ietf-geopriv-common-policy-04 (work in progress), February 2005. [9] Peterson, J., "A Presence-based GEOPRIV Location Object Format", draft-ietf-geopriv-pidf-lo-03 (work in progress), September 2004. [10] Lonnfors, M., "Session Initiation Protocol (SIP) extension for Partial Notification of Presence Information", draft-ietf-simple-partial-notify-05 (work in progress), May 2005. [11] Shacham, R., "Specifying Media Privacy Requirements in the Session Initiation Protocol (SIP)", draft-shacham-sip-media-privacy-00 (work in progress), June 2005. Schulzrinne Expires January 11, 2006 [Page 13] Internet-Draft Composition July 2005 Author's Address Henning Schulzrinne Columbia University Department of Computer Science 450 Computer Science Building New York, NY 10027 US Phone: +1 212 939 7004 Email: hgs+simple@cs.columbia.edu URI: http://www.cs.columbia.edu Appendix A. Acknowledgments This document is based on discussions within the IETF SIMPLE working group. Schulzrinne Expires January 11, 2006 [Page 14] Internet-Draft Composition July 2005 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Schulzrinne Expires January 11, 2006 [Page 15]