HTML Tables 7th July 1995 INTERNET DRAFT Dave Raggett, W3C Expires in six months email: HTML Tables Status of this Memo This document is an Internet draft. Internet drafts are working documents of the Internet Engineering Task Force (IETF), its areas and its working groups. Note that other groups may also distribute working information as Internet drafts. Internet Drafts are draft documents valid for a maximum of six months and can be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use Internet drafts as reference material or to cite them as other than as "work in progress". To learn the current status of any Internet draft please check the "lid-abstracts.txt" listing contained in the Internet drafts shadow directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or ftp.isi.edu (US West coast). Further information about the IETF can be found at URL: http://www.cnri.reston.va.us/ Distribution of this document is unlimited. Please send comments to the HTML working group (HTML-WG) of the Internet Engineering Task Force (IETF) at . Discussions of this group are archived at URL: http://www.acl.lanl.gov/HTML-WG/archives.html. Abstract The HyperText Markup Language (HTML) is a simple markup language used to create hypertext documents that are portable from one platform to another. HTML documents are SGML documents with generic semantics that are appropriate for representing information from a wide range of applications. This specification extends HTML to support a wide variety of tables. The model is designed to work well with associated style sheets, but does not require them. It also supports rendering to braille, or speech, and exchange of tabular data with databases and spreadsheets. The html table model embodies certain aspects of the CALS table model, e.g. the ability to group table rows into thead, tbody and tfoot sections, plus the ability to specify cell alignment compactly for groups of cells according to the context. Note that this is a preliminary draft, produced at short notice for discussion at the Stockholm IETF meeting of the HTML working group. A more detailed revision will be produced as soon as practical to fill in any gaps. Dave Raggett Page 1 HTML Tables 7th July 1995 Introduction This document sets out a revised proposal for the HTML table model, following discussions at the Danvers IETF meeting and discussions over email. The proposal is backwards compatible with the HTML+ specification and the implementation in the Netscape 1.1 browser. The HTML 3.0 specification is currently being rewritten to take into account recent discussions, and will incorporate the new table model in the next revision to the HTML 3.0 Internet Draft. The proposal goes beyond the earlier model and folds in some of the features of the CALS table model. The proposal has been designed with a view to the effective use of associated style sheets for richer control over appearence. Design Influences The html table model has evolved from studies of existing SGML tables models, the treatment of tables in common word processing packages, and looking at a wide range of tabular layout in magazines, books and other paper-based documents. The model was chosen to allow simple tables to be expressed simply with extra complexity only when needed. This makes it practical to create the markup for html tables with everyday text editors and reduces the learning curve for getting started. This feature has been very important to the success of html to date. Increasingly people are using filters from other document formats or direct wysiwyg editors for html. It is important that the html table model fits well with these routes for authoring html. This effects how the representation handles cells which span multiple rows or columns, and how alignment and other presentation properties are associated with groups of cells. A major consideration for the html table model is that the fonts and window sizes etc. in use with browsers are not under the author's control. This makes it impractical to rely on column widths specified in terms of absolute units such as picas or pixels. Instead, tables are dynamically sized to match the current window size and fonts. Authors can provide guidance as to the relative widths of columns, but user agents are expected to ensure that columns are wide enouh to avoid clipping cell contents. This proposal extends the html table model to support dynamic display of table contents as the table data arrives from the net. This feature requires the author to specify the number of columns, and includes provision for control of table width and the relative widths of different columns. The Department of Defense's work on CALS has established a de facto standard for SGML table models. The html table model builds upon experience gained with CALS, whilst avoiding some of the complexity associated with a simple minded adoption of the CALS model. Both models share the row-major treatment of tables, i.e. treating tables Dave Raggett Page 2 HTML Tables 7th July 1995 as a sequence of rows, which in turn, consist of a sequence of cells. This has worked well in practice. An alternative would be to represent the table as a sequence of columns, which in turn, consist of a sequence of cells. Both this and hybrid techniques were considered early on as candidates for html. In CALS you can group table rows into head, body and foot sections. This is frequently used to repeat table head and foot rows when breaking tables across page boundaries. CALS further allows you to use repeated groups of head/body/foot sections. This is generally thought to be unnecessary, and can be avoided by using a separate table for each such group. This proposal allows authors to use the head, body and foot sections for html tables only when needed. There are many potential properties relating to the presentation style of table cells, for instance, the border width, margins, vertical and horizontal alignment within cells, foreground and background colors and textures. The html proposal uses the same alignment properties as CALS, but separates off the representation of vertical and horizontal alignment using a more flexible treatment than CALS. The html table model also simplifies the treatment of borders, as this can be better handled along with other rendering properties through associated style sheets. The html table model permits arbitrary nesting of tables, unlike CALS. The processing time and memory requirements for laying out tables are linear with the depth of nesting and the size of table contents. The widespread deployment of Netscape 1.1 has provided an effective test of the core model, robustly handling a very wide range of tables. For the visually impaired, html offers the possibility of setting to rights the damage caused by the adoption of windows based graphical user interfaces. The html table model includes attributes for labelling each cell, to support high quality text to speech conversion. The same attributes can also be used to support automated import and export of table data to databases or spreadsheets. Dave Raggett Page 3 HTML Tables 7th July 1995 An Introduction to HTML Tables Table start with an optional caption followed one or more rows. Each row is formed by one or more cells, which are differentiated into header and data cells. Cells can be merged across rows and columns, and include attributes assisting rendering to speech and braille, or for exporting table data into databases. The model provides little direct support for control over appearence, for example border styles and margins, as these can be handled via subclassing and associated style sheets. Tables can contain a wide range of content, such as headers, lists, paragraphs, forms, figures, preformatted text and even nested tables. When the table is flush left or right, subsequent elements will be flowed around the table if there is sufficient room. This behaviour is disabled when the noflow attribute is given or the table align attribute is center (the default), or justify. Example
A test table with merged cells
Average other
category
Misc
heightweight
males1.90.003
females1.70.002
This would be rendered something like: A test table with merged cells /--------------------------------------------------\ | | Average | other | Misc | | |-------------------| category |--------| | | height | weight | | | |-----------------------------------------|--------| | males | 1.9 | 0.003 | | | |-----------------------------------------|--------| | females | 1.7 | 0.002 | | | \--------------------------------------------------/ There are several points to note: * By default, header cells are centered while data cells are flush left. This can be overriden by the ALIGN attribute for the cell or a matching HSPEC element. * Cells may be empty. * Cells spanning rows contribute to the column count on each of the spanned rows, but only appear in the markup once (in the Dave Raggett Page 4 HTML Tables 7th July 1995 first row spanned). * If the column count for the table is greater than the number of cells for a given row (after including cells for spanned rows), the missing cells are treated as occurring on the right handside of the table, and rendered as empty cells. * The row count is determined by the TR elements - any rows implied by cells spanning rows beyond this should be ignored. * The user agent should be able to recover from a missing tag prior to the first row as the TH and TD elements can only occur within the TR element. * It is invalid to have cells overlap, see below for an example. In such cases, the rendering is implementation dependent. An example of an invalid table: 1
2345
6
78
which looks something like: /-------------------\ | 1 | 2 | 3 | 4 | 5 | | |---------------| | | 6 | | | | The cells labelled 6 and 7 overlap! |---|...|-----------| | 7 : | 8 | | | \-------------------/ Dave Raggett Page 5 HTML Tables 7th July 1995 Some more features The next few sections introduce some more features of the html table model, leaving the advanced features to later on. Table Captions The CAPTION element can be used to define a table caption, and if present, must occur immediately following the TABLE start tag. You can specify the position of the caption relative to the table with the ALIGN attribute, e.g. the table caption The ALIGN attribute can take the values: TOP, BOTTOM, LEFT or RIGHT. The attribute value is case insensitive. Note that the end tag is optional. Head, Body and Foot table sections The THEAD, TBODY and TFOOT elements can be used to group table rows into corresponding head, body and foot sections. This gives user agents a better handle on rendering long tables, e.g. if the table has a large number of rows in the body section, the user agent could use a scrolling region to render the table compactly. When rendering to a paged output device tables will often have to be broken across page boundaries. The thead, tbody and tfoot elements allow the user agent to repeat the table foot at the bottom of the current page and the table head at the top of the new page before continuing on, where left off, with the body section. Another motivation for using thead, tbody and tfoot elements is the ability they give authors to more easily control the border style and the horizontal and vertical alignment of cell contents.
A test table with merged cells
Average other
category
Misc
heightweight
males1.90.003
females1.70.002
Note: the end tags for THEAD, TBODY and TFOOT can always be omitted, but if you have a THEAD, then you will need to include a TBODY start tag too. Otherwise, the parser won't be able to distinguish body rows from head rows. Dave Raggett Page 6 HTML Tables 7th July 1995 Border Styles The BORDER attribute on the TABLE element can be used to select which borders are drawn from several common categories. More detailed control may be obtained by using an associated style sheet. The default behaviour is to render the table without borders. border=none suppress borders - useful with graphics etc. border=frame outer border around table only border=basic horizontal border between THEAD and TBODY, and between TBODY and TFOOT border=rows like border=basic with an outer border, plus horizontal borders between rows border=cols like border=basic with an outer border, plus vertical borders between columns border=all borders around all cells, including an outer border Borderless tables are useful for layout purposes as well as their traditional role for tabular data, for instance with fill-out forms: name: [John Smith ] card number: [4619 693523 20851 ] expires: [03] / [97] telephone: [212 873 2739 ] This can be represented as a table with one row and two columns. The first column is right aligned, while the second is left aligned. This example could be marked up as:
name:
card number:
expires:
telephone:


/

Dave Raggett Page 7 HTML Tables 7th July 1995
The use of such techniques is one of the motivations for using nested tables, where borderless tables are used to layout cell contents for an enclosing table Table Layout Options Normally, tables are rendered in a two stage process. The first stage determines suitable column widths based on cell content, while the second stage actually renders the table using these widths. For large tables or slow network connections, you can give the user agent a chance to start displaying the table incrementally as the data arrives from the network. To do this use the COLS attribute on the TABLE element to specify the number of columns, e.g. . For the incremental display mode, the default table width is the current window size (the space between the left and right margins). You can set the table width relative to this default using the WIDTH attribute e.g. width="50%". By default, With the incremental display mode all columns have the same width. Note that the WIDTH attribute takes the form of a positive integer followed by a percent sign. You can specify the relative widths of some or all of the columns using the COLW element, e.g. . This example specifies a relative width of two and a half for column 3. The default relative width is 1.0. Note that the incremental mode may result in columns that are too small in some cases. The user agent can then choose to redraw the table with more appropriate column widths once all of the table data has been received. Both the COL and the WIDTH attributes are required for the COLW element. The WIDTH attribute takes the form of a positive number and may include a decimal point for floating point values. The COL attribute is a positive integer value. COL=1 denotes the first column, with COL=2 for the second column and so on, where columns are numbered from left to right. Simple column ranges are also permitted, e.g. COL=3-7 which matches columns 3, 4, 5, 6 and 7. The range is limited to the form lower-upper, when both values are positive integer values separated by a hyphen, and upper should be greater than lower. An example for an incremental mode table with 4 columns:
-- table data follows --
Dave Raggett Page 8 HTML Tables 7th July 1995 You can also specify relative widths for the auto layout mode. The same goes for the table width attribute. The sizing algorithm does its best to meet your suggestions, but ensures that all the columns are large enough for the cell contents. Further details on the algorithm are given in a later section. Dave Raggett Page 9 HTML Tables 7th July 1995 Horizontal Alignment of Cell Contents By default, data cells (TD) are left aligned while header cells (TH) are centered. You can override the default alignment with the ALIGN attribute on the table cell, e.g. . There are several attributes assoociated with horizontal alignment: ALIGN This can be one of: LEFT, CENTER, RIGHT, JUSTIFY and CHAR. User agents may treat Justify as left alignment if they lack support for text justification. ALIGN=CHAR is used for aligning cell contents on a particular character. The attribute value for ALIGN is case insensitive. CHAR This is used to specify an alignment character for use with align=char, e.g. char=":". The default character is "." CHAROFF This is used with align=char to specify the relative offset of the alignment character with respect to the width of the cell. The attribute value takes the form of a positive integer in the range 1 to 100 followed by a percent sign, e.g. charoff="50%". The earlier example of aligning form fields can be more simply achieved using align on char and using the CHAR attribute to set the alignment character to a convenient character, for example:
name:
card number:
expires: /
telephone:
Each line in the table is then indented so that all the colons are positioned under one another. Vertical Alignment of Cell Contents By default, cell contents are vertically aligned at the middle of each cell. The VALIGN attribute can be used with TH or TD to override this default. It can take one of the following values: TOP, MIDDLE or BOTTOM, e.g. . Note that the attribute value is case insensitive. Note: the ability to ensure several cells on the same row share the same baseline has been left out of this specification owing to difficulties in providing a adequate definition of this feature. Dave Raggett Page 10 HTML Tables 7th July 1995 More advanced ways of specifying alignment If there are lots of cells, it rapidly becomes tedious and inefficient to explicitly specify the horizontal and vertical alignment attributes on each cell. A more compact alternative is to use the HSPEC and VSPEC elements to specify alignment properties for groups of matching cells. HSPEC and VSPEC elements specify alignment properties for table cells and act like IF-THEN rules. The IF part is a conjunction of the following optional parts: 1. whether the cell is in thead, tbody or tfoot (rowgroup) 2. the class attribute of the current row (rowclass) 3. whether the cell is a header or data cell (celltype) 4. the class attribute of the cell itself (cellclass) 5. the cell's row and/or column number (row or col) If the cell straddles two or more rows or columns, the number of the first row/column is used for evaluating the match. The THEN part sets the horizontal or vertical alignment for the cell's contents. The class attribute of the hspec or tspec element matching a cell can also be used by style sheets to attach additional rending properties to groups of cells. Conflict resolution is real simple: 1. properties defined as attributes on cells always override hspec or vspec 2. hspec and vspec are lexically ordered from general to specific, i.e. the last matching hspec or vspec elements sets the cell's alignment properties Some simple examples: This example sets different alignments for cells in column 1 depending on whether they are header or data cells. When several HSPEC elements match a given cell, the last one wins. Note that an explicit ALIGN attribute set on the cell itself always wins over any HSPEC elements. Dave Raggett Page 11 HTML Tables 7th July 1995 This example sets different alignments for cells in rows 1 to 3 depending on whether they are in the THEAD or in the TBODY. When several VSPEC elements match a given cell, the last one wins. Note that an explicit VALIGN attribute set on the cell itself always wins over any VSPEC elements. Dave Raggett Page 12 HTML Tables 7th July 1995 Autolayout Table Sizing Algorithm The layout algorithm for the incremental display mode has already been presented in a previous section. If the COLS attribute is missing from the table start tag, then the user agent should use the autolayout sizing algorithm, which uses two passes through the table data. In the first pass, word wrapping is disabled, and the user agent keeps track of the minimum and maximum width of each cell. The maximum width is given by the widest line. As word wrap has been disabled, paragraphs are treated as long lines unless broken by
elements. The minimum width is given by the widest word or image etc. taking into account leading indents and list bullets etc. In other words, if you were to format the cell's content in a window of its own, determine the minimum width you could make the window before things begin to be clipped. The minimum and maximum cell widths are then used to determine the corresponding minimum and maximum widths for the columns. These in turn, are used to find the minimum and maximum width for the table. Note that cells can contain nested tables, but this doesn't complicate the code significantly. The next step is to assign column widths according to the current window size (more accurately - the width between the left and right margins). The table borders and intercell margins need to be included in the assignment step. There are three cases: 1. The minimum table width is equal to or wider than the available space. In this case, assign the minimum widths and allow the user to scroll horizontally. For conversion to braille, it will be necessary to replace the cells by references to notes containing their full content. By convention these appear before the table. 2. The maximum table width fits within the available space. In this case, set the columns to their maximum widths. 3. The maximum width of the table is greater than the available space, but the minimum table width is smaller. In this case, find the difference between the available space and the minimum table width, lets call it W. Lets also call D the difference between maximum and minimum width of the table. For each column, let d be the the difference between maximum and minimum width of that column. Now set the column's width to the minimum width plus d times W over D. This makes columns with lots of text wider than columns with smaller amounts. This assignment step is then repeated for nested tables. In this case, the width of the enclosing table's cell plays the role of the current window size in the above description. This process is Dave Raggett Page 13 HTML Tables 7th July 1995 repeated recursively for all nested tables. If the table width is specified with the WIDTH attribute, the user agent attempts to set column widths to match. The WIDTH attribute should be disregarded if this results in columns having less than their minimum widths. If relative widths are specified with the COLW element, the algorithm is modified to increase column widths over the minimum width to meet the relative width constraints. Dave Raggett Page 14 HTML Tables 7th July 1995 HTML Table DTD The DTD or document type definition provides a formal definition of the allowed syntax for html tables. Dave Raggett Page 15 HTML Tables 7th July 1995 Dave Raggett Page 16 HTML Tables 7th July 1995 %cell.halign; -- horizontal alignment -- %cell.valign; -- vertical alignment -- axis CDATA #IMPLIED -- defaults to cell content -- axes CDATA #IMPLIED -- list of axis names -- > Dave Raggett Page 17