mod_accessibility for Apache 2
This section of the site features articles published between 2002 and 2004. They remain here for reference purposes and may contain information that is out of date.
Technical level: Intermediate || Date: 15th August 2003 || Author: Nick Kew
Jakob Nielsen discusses accessibility in terms of dimensionality. He points out that an important difference between visual and non-visual presentations is that the former is inherently two-dimensional, whereas the latter is limited to one dimension. He argues that this is altogether more fundamental than the issue of inherently visual contents (such as images), which can in most cases be omitted or replaced by ALT texts without substantial loss. He concludes that optimal usability for users with disabilities requires new approaches and user interfaces to overcome the limitations of a one-dimensional view.
It is probably pure coincidence that this article's publication date is only just over a month after one such approach went live. mod_accessibility addresses the the problem by offering different "views" onto web content. It can transform web contents for accessibility in the manner pioneered by the BBC "Betsie" program, but more importantly in generates fundamentally _different_ views, including metadata comparable to the table-of-contents, index and references in traditional publishing, all available at-a-keyclick to the user. But unlike traditional publishing, mod_accessibility does not require proactive cooperation from authors. And unlike many accessibility-enhanced browsers, mod_accessibility places no financial or technological burden on the disabled user. Only the web server or proxy administrator need be concerned with it.
HTML (including XHTML, which for the purposes of this article is the same thing) is fundamentally an accessible medium. At its heart is text. Coupled to that, it supports navigation (hyperlinks), multimedia content (<img>, <object>, etc), interactive systems and controls (forms and script) and presentational detail. The fundamental principle of HTML accessibility is graceful fallback: narrative text is accessible, and in situations where other content is problematic (eg for a range of disabled users - most obviously the blind), non-textual content can be presented as simple text without substantial loss to contents, functionality or navigation.
However, there are some common problems regarding Web accessibility:
- Missing contents. In some of the worst cases, there is nothing meaningful in HTML; only Flash content, a Word document, or a series of images. At worst this is inaccessible at any price, and at best it relies on the user having expensive specialist equipment available.
- Poor markup that fails to use, or indeed actively abuses, HTML structure. This is widespread and happens for a number of reasons, including defective authoring and publishing software, ignorance amongst developers, and problems of communication between developers and their managers or clients.
- Markup that is unavoidably compromised to work around limitations of the HTML medium or popular browsers. A typical example arises where navigational devices (like a toolbar) are embedded in page contents. Further examples can be found in layout tables and clientside scripts.
- Contents that are inherently challenging. A complex document may never be accessible to a person without the relevant educational background, but can nevertheless be improved by presenting a good overview and navigation.
- Limitations in the abilities of web browsers to present the information available in webpages effectively to users.
Automated Transformation of Markup
One technique that may help improve accessibility is automatic on-the-fly transformation of markup. This can be implemented anywhere in the Web processing chain: at publication, on the Server, at a Proxy, or in a Browser. For example:
- At publication: static documents may be processed from a common repository through a tool such as AccessValet or Tidy. Compliance with WCAG is a good outcome.
- In a browser, any document will be parsed and presented according to the needs of its users and the capabilities of the medium (visual, text, audio, etc).
- A program on a server or proxy can transform markup. This approach was pioneered by the BBC "Betsie", a CGI program to linearise documents and strip presentational markup. Another such program was this author's Accessibility Proxy, an experimental service that ran for several months at Site Valet.
The key advantage of processing for accessibility at a server or proxy is that the benefits become available to users of any browser, including those affordable to the economically disadvantaged. A second advantage is that it can perform operations that would be disproportionately expensive for a browser, and share the results between all users.
mod_accessibility is designed as a fully-automatic drop-in solution to many of the problems of HTML accessibility. It cannot do anything about missing contents, but it serves to deal effectively with the other problems discussed above in a wide range of cases. It sits between the server and the user, offering the latter a choice of presentations of web contents. Each presentation is implemented with emphasis on different accessibility and usability techniques. Presenting the user with such a choice means that when a page causes problems in one view, there is always an instant switch to another option. It is not limited to serving users with special needs, but may also improve usability for able-bodied users with full-featured desktop browsers.
Technically, mod_accessibility works as an output filter for the Apache webserver. That means it has no effect on the production of the page, and it is automatically compatible not only with static pages, but also with dynamic contents such as CGI, PHP or XML/XSLT. It is controlled by the user through their browser, and optionally rewrites and enhances HTML contents as it leaves the server or proxy.
An important design decision was the use of a SAX-based parser. This means documents are parsed in a single pass from start to end, and never loaded into memory. SAX is (by far) the fastest and most efficient way to process markup, and scales to handle large documents without increasing memory requirements for processing. It means mod_accessibility is not subject to the performance and scalability limitations of a tool such as Betsie, which loads an entire document into memory and uses Perl generic text manipulation to process it. The downside to this decision is that processing based on document tree scanning, lookahead or backtracking is not possible.
The Basic Philosophy
Much of the developer's previous work has involved markup analysis and reporting, including tools for formal validation, WCAG and Section508 compliance testing. He is therefore extremely familiar with the techniques and problems involved. However, mod_accessibility takes a different approach: it is based on the spirit rather than the letter of accessibility. Its processing makes no direct reference to any accessibility guidelines beyond the requirement to avoid deprecated and bogus presentational markup (which it strips out).
- In summary, it addresses the practical question
- "What *can* we do within the contstraints on us?"
- rather than the more utopian
- "What *should* we do in an ideal world?"
Bearing this in mind, mod_accessibility can do four things:
- Transform contents in ways likely to improve accessibility.
- Offer users a choice of fundamentally different transforms.
- Offer the user summary details: automatic TOC and navigation help.
- Collect relevant external data and present them to the user.
The first three capabilities of course overlap somewhat with browser functions. But browsers that offer significant similar capabilities impose a serious technical and economic burden on the end-user, which may be acceptable in an office environment (where an employer bears the costs on behalf of employees), but not on the Web at large. The fourth would make little sense in a browser, as the overhead of collecting data can only really be justified when data will be shared between the (many) users of a server or proxy.
The mod_accessibility Views
In an ideal world, all web contents would be fully accessible, structured and indexed, and of course searchable. In the real world, we have to work with a range of markup, from the well- structured through to the purely presentational. Broadly speaking, accessibility is best served by leaving well-structured markup as-is, and transforming presentational markup to plain text. A fully-automated processor therefore has to make difficult judgements about the nature of the material it is processing, knowing that most real-world markup falls somewhere between the two extremes.
To take a case in point, consider HTML tables. It is widely accepted that structural tables should be preserved while layout tables are best linearised. It may be possible to guess what category a table falls into heuristically by examining it for structural elements, but such techniques will never be 100% reliable. Betsie simply linearises all tables. Mod_accessibility instead gives users the choice: the "betsie" view linearises all tables, while other views leave them untouched. So you can switch between tabular and linear views at a single click (or equivalent).
Extending this principle of user choice, mod_accessibility offers a menu of different views, according to what has been enabled by the administrator. The user is presented with a menu of views, though not with the level of confusing detail that would require a specific decision about particular elements such as tables! The recommended default view (called "Asis") leaves markup mostly untouched, except for cleaning up in a manner comparable to Tidy or AccessValet's cleanup options. Other options linearise transcluded content (frames), and can insert TITLE attributes into all links to improve navigation.
In addition to the "views", mod_accessibility offers meta-views: a Page Outline built from important structural elements (headings and tables), and a links list. The first of these (duplicated in some browsers) is particularly useful for navigating long and complex but structured pages, such as many of those at www.w3.org, as it overcomes the problems of a one-dimensional presentation in navigating a long text (and extra data such as navigation bars, where it provides a "skip navigation" function as a by-product). The second is particularly well-suited to a proxy or server, as it can provide more data than a browser without excessive cost by retrieving and presenting page titles from links.
Perspectives on mod_accessibility
- The End User
- The Content Provider
- The Server or Proxy Administrator
- The Manager
- The ISP
Server Administrator Perspective
In reviewing different perspectives on mod_accessibility, we'll start with the server administrator, who is responsible for installing and configuring mod_accessibility. This may be the least directly interesting, but the administrator's choices determine the details of how everyone else is affected.
At the most basic level, all the administrator needs to do is to add a directive to the Apache configuration file to load mod_accessibility. There are then several configuration options to choose, such as what views are enabled on the server, whether they will be accessible by AccessKey and/or Tabindex to end users, whether and where mod_accessibility should present its own toolbar on pages, and whether authors should be able to override the administrators defaults by means of .htaccess files.
The above is ample for a proxy, and sufficient for a server. However, an additional step is recommended for most servers: mod_accessibility should be made optional! A typical configuration for this is to create two virtual servers:
- content served as-is without mod_accessibility
- content processed on demand by mod_accessibility
This offers the best of both worlds to both authors and users.
Content Provider Perspective
The content provider is concerned with presenting contents and functionality to the user. Content providers are often also concerned with the fine-detail of the visual presentation to a class of readers regarded as normal or typical. Reconciling this with accessibility concerns may be considered an additional burden that is not always welcome.
A key design goal of mod_accessibility is that authors need not even be aware of it, and can safely ignore its existence. However, as ever things can be improved by using the available tools proactively, and the author may usefully take advantage of mod_accessibility.
A filter that transforms webpages could have various implications for authors. On the plus side, the basic purpose of mod_accessibility: it can enhance accessibility and usability for users. As against that, it risks disrupting the author's painstaking presentational work in the browsing situations "supported". Recommended practice for authors is to go ahead and develop their sites, and simply treat mod_accessibility as an "accessible version" of the site, with a simple pair of navigation links of the form:
<a href="http://www.example.com/">Main Site</a>
<a href="http://access.example.com/">Accessibility Options</a>
Majority users - those with no wish for accessibility help - will see exactly what the author wrote without interference, whereas users who follow the Accessibility Options link enjoy its benefits.
An author having mod_accessibility available can actively take advantage of it in limited ways. At the simplest level, viewing pages through mod_accessibility may serve to highlight in a visual browser where accessibility problems are likely to arise. In particular, the "betsie" view, an emulation of the transforming done by the BBC "Betsie" program, gives a fully linearised presentation of the page.
More controversially, an author may be able to relax compliance with accessibility standards, where mod_accessibility brings a page into compliance. Areas where this may be feasible include use of deprecated presentational markup such as frames, layout tables, fonts and colours. Since mod_accessibility offers a range of navigational aids, authors may also sometimes be able to reduce the amount of effort they devote to it. In using mod_accessibility in this way, it is important not to lose sight of the basic principles, or treat mod_accessibility as a panacea: it can of course never substitute for good, well-structured content, nor for due care and attention on the part of an author.
Finally, authors may use it as a component in a publishing system. This is rather different from its primary purpose, but since mod_accessibility parses outgoing markup, it can with no extra cost perform variable substitutions, for example, to insert a site-toolbar into every page. This is directly equivalent to Server-Side Includes (SSI), and although it does not currently support all SSI directives, future versions are likely to move towards full emulation. The reason for this is purely practical: both mod_accessibility and mod_ssi parse outgoing documents, and if we have to use both of them, we double the processing overhead by parsing it twice!
For the end user, we must make some distinctions. Firstly, between the two distinct cases:
- The "intranet" situation, where mod_accessibility is known to be available on the server(s). The user is fully in control, and can switch to or from mod_accessibility and between the different views at any time simply by following a link.
- Browsing the Web at large, where mod_accessibility may be available but there is no expectation of it. The user may select or unselect a mod_accessibility-enabled proxy, but this is now a browser configuration option and loses the convenience of simply following a link. Users wanting accessibility-enhanced browsing will see everything through mod_accessibility. The different views are of course still always available.
Secondly, the user may have a Smart Client or a conventional browser. With a conventional browser, mod_accessibility presents options to users as an additional menu in each page (the presentation of this menu is determined by the server configuration and page authors). A smart client is one that supports HTTP negotiation using additional headers defined by mod_accessibility, and can present the mod_accessibility options to the user in its own way - for example a right-click or hotkey menu. When this is available, the mod_accessibility menu can be suppressed from pages served, so it appears to the user as an additional function of the browser.
The manager has two primary concerns. Firstly, an organisation's own web-based systems should be accessible, for all the usual business and legal reasons. Secondly, an organisation's employees, including disabled employees, should be able to carry out thier work effectively. These concerns will have to be reconciled with other issues such as corporate branding, and met without excessive cost.
We have already discussed how deploying mod_accessibility on the server helps improve the accessibility of pages served. Deploying it on a proxy may bring greater benefits, particularly where employees may need to access the 'net at large. And unlike specialist browsers (which meet the needs of one user only), just a single mod_accessibility installation gives unlimited service to all users.
The ISP is under no obligation to offer an "accessibility enhanced" experience to users. However, mod_accessibility may be deployed on a proxy as a value-added service, in the manner of spam filtering, family filters, mail-by-web, and other services of interest to some section of the population.