Accessible tables in DocBook
Requested explanation as to the DocBook Publisher's RFE regarding the inclusion of the HTML table model in the Publisher's schema.
One of the issues in making tabular data accessible to blind people using a screen reader lies in providing the appropriate markup that will allow a correct correlation of table data with its headings.
Where sighted people can intuitively assign a header placed atop a table column or leftmost in a row (or any combination thereof), these cues through visual placement are not available to blind users.
- First, I'll give a quick overview of the different mechanisms available in HTML for achieving the correlation of data with their headers,
- then I'll look at the abilities of the CALS table model by means of an example,
- closing with a comparative summary of the two table models.
Fully accessible tables in HTML output
HTML provides three mechanisms for attributing the respective headers to tabular data that provide screen readers with the necessary cues to replace visual placement with explicit verbalizations thereof.
They enable the screen reader to repeat the associated header(s) as it reads through the data, thus expressing the semantic relationships between the information provided (as opposed to reading out the table's contents linearly, which invariably results in confusing gibberish):
On its simplest level, HTML provides these two elements to distiguish between headers and data. The Techniques section of the WCAG 2.0 actually recommends the bare use of these for simple tables, since some screen readers still seem to have issues in matching the correct headers with the table data via
scope(I'm not sure how up-to-date this is however).
is added to table headers (i.e.,
th, which can be column or row headers alike) but is only viable for simple tables (with simple header:data relationships).
A simple example to demonstrate its use:
<table> <caption>Shelly's Daughters</caption> <tr> <th scope="col">Name</th> <th scope="col">Age</th> <th scope="col">Birthday</th> </tr> <tr> <th scope="row">Jackie</th> <td>5</td> <td>April 5</td> </tr> <tr> <th scope="row">Beth</th> <td>8</td> <td>January 14</td> </tr> </table>
which translates to:
Shelly's Daughters Name Age Birthday Jackie 5 April 5 Beth 8 January 14
This duo is useful for complex tables in which a complex set of headers is to be attributed to the individual table data, and is recommended for complex tables in the WCAG 2.0 Techniques. In this case, the
idis added to the header cells, while each data cell references the respectively relevant header
id's via the
Here the example from above, with an added differentiation for more complexity:
<table> <caption>Shelly's Daughters</caption> <tr> <td> </td> <th id="name">Name</th> <th id="age">Age</th> <th id="birthday">Birthday</th> </tr> <tr> <th rowspan="2" id="birth">daughters by birth</th> <th id="jackie" headers="birth name">Jackie</th> <td headers="birth jackie age">5</td> <td headers="birth jackie birthday">April 5</td> </tr> <tr> <th id="beth" headers="birth name">Beth</th> <td headers="birth beth age">8</td> <td headers="birth beth birthday">January 14</td> </tr> <tr> <th id="step">daughters by marriage</th> <th id="jenny" headers="step name">Jenny</th> <td headers ="step jenny age">12</td> <td headers="step jenny birthday">Feb 12</td> </tr> </table>
which translates to:
Shelly's Daughters Name Age Birthday daughters by birth Jackie 5 April 5 Beth 8 January 14 daughters by marriage Jenny 12 Feb 12
None of these mechanisms is a one-size-fits all solution, their implementation must be decided upon from case to case, and their usefulness may change over time depending on screen reader developments.
The capabilities of the CALS table model
With regards to DocBook tables, I found that the CALS table model does not provide sufficient "hooks" which then could be used via XSL transformation to output the case-by-case best possible HTML to facilitate the understanding of table data via screen readers.
Here the example table from above in CALS markup (with possibly relevant code in bold):
<table frame='all' rowheader='firstcol'> <title>'Shelly's Daughters' in CALS Markup</title> <tgroup cols='4'> <colspec colname='provenience'/> <colspec colname='Name'/> <colspec colname='Age'/> <colspec colname='Birthday'/> <thead> <row> <entry> </entry> <entry>Name</entry> <entry>Age</entry> <entry>Birthday</entry> </row> </thead> <tbody> <row> <entry morerows="1" xml:id="Natural">Daughters by birth</entry> <entry xml:id="Jackie">Jackie</entry> <entry>5</entry> <entry>April 5</entry> </row> <row> <entry xml:id="Beth">Beth</entry> <entry>8</entry> <entry>January 14</entry> </row> <row> <entry xml:id="Step">Daughters by marriage</entry> <entry xml:id="Jenny">Jenny</entry> <entry>12</entry> <entry>Feb 12</entry> </row> </tbody> </tgroup> </table>
CALS does allow for the implicit establishment of simple semantic relationships via
thead (column headers) and the
rowheader attribute (here the shortcomings begin: only for the first column). For very simple tables, this would allow XSL transformation of the
entry's and those of the first column into the
th's needed to express their "header" nature in HTML.
However, it stops there. As soon as relationships become more complex, either by two-level asymmetric header relationships as in the example above, or because of spanned rows or columns, CALS does not offer the means to express these. Regarding the example:
- there is no implicit way of identifying the second column's contents as row (sub-)headers (along the lines of
rowheaderor the possibility of including more than one row of column headers within the
entry's can be unambiguously labelled via
xml:id's, or perhaps even (partially) via
colname(whereby a uniform method would surely be preferable?),
- but there is no way in CALS to associate them to one another (e.g. that "Jackie" and "Beth" are both "Daughters by birth": a screen reader typically would have problems especially with the second if there is no explicit pointer in place) - except perhaps by constructing a hack around
- And there is no way to reference the headers from the data
entry's (e.g. that "Daughter by birth", "named" "Beth", is "Age" "8"). This approach of explicit attribution becomes necessary as soon as relationships are non-linear to keep the meaning of complex data clear. Sighted readers intuit these by means of visual cues, but screen readers only see code, which can quickly become ambiguous with increasing complexity.
In going through the many options the CALS model offers for table markup, it becomes clear that these (especially also those available to the
entry element) predominantly target visual representation: size, spans, positioning, etc. CALS does not provide for the expression of semantic relationships between tabular data.
Providing table accessibility via DocBook
In a nutshell: CALS shows limitations regarding the semantic attribution of headers to complex data.
|Column headers||marked via
|Row headers||marked via attribute
|multiple header attributions||perhaps by customization???
So, although CALS does include the possibility of indicating which table cells are headers, this only covers the simplest of cases. In all other cases (such as the 'Overview' table above with its row-spanned cell), CALS does not of itself provide the necessary structures.
In the end, one is faced with the alternative of expanding or at least hacking the CALS model - or simply turning to the HTML model, which already comes with all the required features and possibilities of choice built-in and ready-for-use.
An easy decision, I would think :)
Thus my request to add the HTML table model to the DocBook Publishers schema, considering that the Publisher's realm does cover types of publications that will contain complex tables too.
2012-04-18, added CALS example 2012-06-19