GISpL: Gestural Interface Specification Language
draft-echtler-gispl-specification-01

Abstract

This document introduces GISpL, the Gestural Interface Specification Language. GISpL enables the unambiguous description of gestures used in human-computer interfaces. This includes gestures on touch and multi-touch screens, with digital pens, with hand-held controllers or in free air. A matching engine analyzes motion data produced by the input device(s) and triggers registered gestures.

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on September 12, 2011.

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1. Introduction

This document describes GISpL (Gestural Interface Specification Language), a formal language for describing human-computer interfaces which use gestures. The term "gesture" is used in a very wide sense here, meaning any motion of the user which can be captured by an input device.

1.1. Motivation

As novel types of human-computer interfaces such as multi-touch screens, digital pens, hand-held controllers or even free-air gestures become more and more common, so does the number of special-purpose applications built to interact with these devices.

Using a dedicated formal language to describe the various types of gestural interaction which are possible with an application has advantages for various distinct groups:

Users: End users of gestural applications benefit in two ways: the behavior of such applications becomes more consistent across different platforms and the users are given the option to modify the behavior to suit their needs.
Researchers: When the behavior of a gestural interface is specified formally, it is easier for user interface researchers to compare different interfaces. Also, prototypical implementations of such interfaces are easier to modify and adapt during the design, development and evaluation cycle.
Developers: Since the formal specification enables a dedicated matching engine to perform the actual detection and matching of gestures, developers are spared tedious tasks such as constant reimplementation of basic gesture matching algorithms.

1.2. Design Considerations

GISpL serves two purposes:

describe what motions the users have to execute in order to trigger a certain response
deliver information about the actual execution of these motions to the application

Moreover, GISpL should be usable across a very wide range of platforms, even unconventional ones. One important example is the Firefox web browser which has recently acquired the capability to deliver multi-touch events to web applications.

Consequently, the requirements regarding GISpL are:

must be human-readable
must be machine-readable
must be supported on a wide range of platforms
must be lightweight for low-latency network transmission

Therefore, the decision was made to base GISpL on JSON [RFC4627]. Compared to XML, JSON gives a better balance between readability for humans and code size. It is supported in nearly all programming languages and platforms.

2. Specification

GISpL consists of three core elements: regions, gestures and features.

For the full formal ABNF specifictation, please see the appendix. This section will use a relaxed syntax for easier readability, with rule names enclosed by <angle braces>. JSON-related markup ( { } [ ] : , " ) will be reproduced verbatim.

2.1. Definitions

In the context of GISpL, the terms input object and input event will be used. An input object is any physical object which is detectable by the input hardware, such as a mouse, a Wiimote or the user's hand. Input objects are classified into one of 32 categories as given in Appendix Appendix B. Every input object is given a unique ID number according to the capabilities of the hardware. I.e., a uniquely identifiable tangible object always has the same ID, while anonymous touch points on an interactive surface will have IDs so that each touch point is uniquely identifiable during the duration of the touch. An input event is a single spatial measurement regarding the position (and optionally, orientation, dimensions etc.) of an input object as captured by one or more of the sensors used.

2.2. Region

Regions define spatial areas in which a certain set of gestures is valid and which capture motion data that falls within their boundaries. Regions are defined in one of two reference coordinate systems:

A 2D screen coordinate system where the x and y coordinates range from 0.0 to 1.0 across the physical extension of the user's screen, with the origin being in the lower left corner. This is indicated by the region flag "poly". The list of boundary points is interpreted as a closed polygon in the x-y plane. The z coordinate is not used.
A 3D metric coordinate system with an arbitrary reference point as origin (usually defined by the input device). This is indicated by the region flag "hull". The list of boundary points is interpreted as defining the convex hull of the region.

Regions are managed in an ordered list. Arriving input events are checked against this list, starting from the first element. If the position of the input event falls within the boundaries of the region and if the bit in the filter bitmask which corresponds to the category of the input event is set, the the input event is captured by the region. Captured events and their history are subsequently used to check whether one or more of the gestures attached to this region have been triggered.

point     = [ <number>, <number>, <number> ]
pointlist = [ null/<point> *( , <point> ) ]

regionflags = "poly" / "hull"

region = {
  "id":<string>,
  "flags":<regionflags>,
  "filters":<number>,
  "points":<pointlist>,
  "gestures":<gesturelist>
}

2.3. Gesture

Gestures appear in two variants: either as gesture specifications (describing what motions the user has to execute for a certain effect) or as gesture events (describing a) the fact that a matching motion event just took place and b) the motion data which triggered the event).

Whether a gesture object is a gesture event can be determined from looking at the flags. Should they contain the string "result", then it is a gesture event and the features composing this gesture will contain valid data in their "result" array. Otherwise, the object is a gesture specification and the features will contain valid data in their "constraints" array.

When a gesture has the "oneshot" flag, then it can only be triggered once by a given set of input IDs. Repeated triggering is only possible when the set of IDs captured by the containing region changes.

When a gesture has the "sticky" flag, then once this gesture has been triggered for the first time, all input events with participating IDs will continue to be sent to the original region, even if they subsequently leave the original boundaries.

When a gesture has the "default" flag set, it is added to a pool of default gestures. When a gesture specification with an empty feature list is encountered, then the name is looked up in the default gesture pool and if a match is found, the feature list is copied from the default gesture. Otherwise, the empty gesture is ignored.

gesturelist  = [ null/<gesture> *( , <gesture> ) ]
gestureflag  = "oneshot" / "sticky" / "default" / "result"
gestureflags = [ null/<gestureflag> (* , <gestureflag> ) ]

gesture = {
  "name":<string>,
  "flags":<gestureflags>,
  "features":<featurelist>
}

2.4. Feature

are the building blocks of gestures and are atomic mathematical properties of the raw motion data, such as the average motion vector or the diameter of the convex hull of all input object locations.

featurelist = [ null/<feature> *( , <feature> ) ]
featureitem = <point>/<number>
itemlist    = [ null/<featureitem>  *( , <featureitem>  ) ]

feature = {
  "type":<string>,
  "filters":<number>,
  "constraints":<itemlist>,
  "result":<itemlist>
}

A feature is classified by its type, a filter bitmask for input events as described in Section 2.2, a type-specific list of constraints and a type-specific list of results. Depending on whether the containing gesture is a "result" gesture or not (see Section 2.3), either the constraint list or the result list will be empty. All temporal relations (such as motion vector) are expressed relative to a time unit of one sensor frame, i.e. for a sensor setup running at 60 Hz, the temporal unit will be 16.66 ms.

Features are divided into two groups. The first one are single-match features, which will only generate a single result instance regardless of the number of input objects.

Feature Type	Constraint Types	Result Types	Description
Motion	<point>, <point>	<point>	Average motion vector, lower and upper limits per component
Rotation	<number>, <number>	<number>	Relative rotation angle in rad, lower and upper limits
Scale	<number>, <number>	<number>	Relative size change, lower and upper limits
Path	<point> *(, <point>)	<number>	Match for template path
Count	<number>, <number>	<number>	Number of input objects, lower and upper limits
Delay	<number>, <number>	<number>	Number of frames since first object entered, lower and upper limits

Single-Match Features

The second group of features are multi-match features, which will potentially generate one result instance for each input object (depending on the constraint values).

Feature Type	Constraint Types	Result Types	Description
ObjectID	<number>, <number>	<number>	Match range of unique IDs (e.g. for tangible objects)
ObjectParent	<number>, <number>	<number>	Match range of unique parent IDs (e.g. for fingers belonging to specific user)
ObjectPosition	<point>, <point>	<point>	Position, lower and upper limits per component
ObjectDimension	<point>, <point>, <point>, <point>, <number>, <number>	<point>, <point>, <number>	Object dimensions, lower and upper limits per component (major axis, minor axis, size)
ObjectGroup	<number>, <number>, <number>	<number>, <point>	Match group of input objects (min. count, max. count, max. radius). Result = count/centroid

Multi-Match Features

3. Runtime Behavior

This section details the behavior of the gesture matching engine.

The following steps are performed every time an input event has been received:

The following steps are performed every time after a full set of input events has been received:

4. Examples

For illustration purposes, this section gives some usage examples of GISpL.

4.1. Motion

The following GISpL example describes a simple drag gesture with an arbitrary pointing device. The <filters;> token is set to allow any type ID representing a pointing device or touch input event.

{
  "name":"drag",
  "flags":"sticky",
  "features":[
    {
      "type":"Motion",
      "filters":131071,
      "constraints":[],
      "result":[]
    }
  ]
}

4.2. Horizontal two-finger swipe

The following GISpL example describes a quick horizontal swipe with two fingers. The <filters> token contains a value of 2, representing a bitmask filtering for "generic touch" events.

{
  "name":"two_finger_swipe",
  "flags":"oneshot",
  "features":[
    {
      "type":"Count",
      "filters":2,
      "constraints":[2,2],
      "result":[]
    },{
      "type":"Motion",
      "filters":2,
      "constraints":[[100,0,0],[10000,10,10]],
      "result":[]
    }
  ]
}

4.3. React to tap by specific user

The following GISpL example describes a tap event which can only be triggered by a specific user with numeric ID 123. For this to work, the input sensor hardware has to support user identification. An example is the DiamondTouch system.

{
  "name":"tap_by_user123",
  "flags":"oneshot",
  "features":[
    {
      "type":"ObjectParent",
      "filters":2,
      "constraints":[123,123],
      "result":[]
    }
  ]
}

4.4. React to tagged object

The following GISpL example describes an event which is triggered by one of two tagged objects with IDs 320 or 321. The filters are set to consider only tagged objects for this feature.

{
  "name":"object321",
  "flags":"oneshot",
  "features":[
    {
      "type":"ObjectID",
      "filters":262144,
      "constraints":[320,321],
      "result":[]
    }
  ]
}

5. Security Considerations

Gesture events, if intercepted while in transit over the network, may disclose private data about the users' actions. Consequently, they should preferably be transmitted over secure channels such as private networks, VPNs or SSL-encrypted links.

6. IANA Considerations

This document has no actions for IANA.

7. References

7.1. Normative References

[RFC5234]

Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008.

7.2. Informative References

[RFC4627]

Crockford, D., "The application/json Media Type for JavaScript Object Notation (JSON)", RFC 4627, July 2006.

Appendix A. Formal GISpL Specification

The following formal specification of GISpL is given in ABNF [RFC5234]. JSON-related rule definitions (e.g. <number>, <string> ...) are to be found in [RFC4627].

quote = quotation-mark

point = begin-array
        number value-separator number value-separator number
        end-array

item  = point / number

itemlist    = begin-array item    *( value-separator item    ) end-array 
pointlist   = begin-array point   *( value-separator point   ) end-array
gesturelist = begin-array gesture *( value-separator gesture ) end-array
featurelist = begin-array feature *( value-separator feature ) end-array

filters     = quote "filter" quote name-separator number value-separator
regionid    = quote "id" quote name-separator string value-separator 
regionflags = quote "flags" quote name-separator
              ( ( quote "poly" quote ) / ( quote "hull" quote ) )
              value-separator

gesturename  = quote "name" quote name-separator string value-separator
gestureflag  = ( quote "oneshot" quote ) / ( quote "default" quote )
gestureflags = quote "flags" quote name-separator
               begin-array gestureflag
               *( value-separator gestureflag )
               end-array value-separator

featuretype  = quote "type" quote name-separator string value-separator
results      = quote "result" quote name-separator itemlist
constraints  = quote "constraints" quote name-separator
               itemlist value-separator

feature      = begin-object
               featuretype filters constraints results
               end-object

gesture      = begin-object
               gesturename gestureflags featurelist
               end-object

region       = begin-object
               regionid regionflags filters
               pointlist value-separator gesturelist
               end-object

Appendix B. Input Event Types

This list of type IDs is taken from TUIO 2.0 specification. Consequently, TUIO 2.0 events can be directly fed into the gesture recognition engine. A <filters> token contains an integer representing a bitmask with the corresponding bit set for those input types which are to be captured.

0: undefined or unknown pointer
1: default ID for an unknown finger (= right index finger ID)
1-5: fingers of the right hand starting with the index finger (index, middle, ring, little, thumb)
6-10: same sequence for the left hand
11: stylus
12: laser pointer
13: mouse
14: trackball
15: joystick
16: WiiMote or similar device
17: generic object
18: tagged object (uniquely identifiable)
21: right hand pointing
22: right hand open
23: right hand closed
24: left hand pointing
25: left hand open
26: left hand closed
27: right foot
28: left foot
29: head
30: person

Author's Address

Florian Echtler Munich Univ. of Applied Sciences Lothstr. 64 Munich, 80336 Germany Phone: +49 89 1265 3736 EMail: floe@butterbrot.org URI: http://floe.butterbrot.org/

1. Introduction
1.1. Motivation
1.2. Design Considerations
2. Specification
2.1. Definitions
2.2. Region
2.3. Gesture
2.4. Feature
3. Runtime Behavior
4. Examples
4.1. Motion
4.2. Horizontal two-finger swipe
4.3. React to tap by specific user
4.4. React to tagged object
5. Security Considerations
6. IANA Considerations
7. References
7.1. Normative References
7.2. Informative References
Appendix A. Formal GISpL Specification
Appendix B. Input Event Types
Author's Address