structural semantics for accessibility and device independence
DESCRIPTION
Presentation describing the SADIe transcoding platform to the Information Management Group (IMG)TRANSCRIPT
SADIeStructural-Semantics for Accessibility
and Device Independence
Darren Lunn
The Web…
• The World’s largest repository of information
• Designed with a focus on presenting information in a visual manner● Images● Animations● JavaScript
• Some knowledge is only available implicitly from how the page looks
Implicit Knowledge
= Advertisement
= Banner
= Main Content
= Menu
Assistive Technologies
• Visually impaired users use assistive technologies, e.g. Screen Readers● Render pages sequentially in audio● Achieved by accessing the underlying HTML code
• But focus on visual presentation rather than content hampers this● Particularly if attention is not paid to coherent design● Tags and markup can be abused (e.g. using <h2> for large,
bold, rather than headers)● Subtleties of visual presentation can be lost
CNN Example
Assistive Technologies
• Traversal of content is in a serial “top-to-bottom”, “left-to-right” manner.● Based on the underlying HTML code.
• Important information may not be encountered until later on.
• Also, information such as menus or navigation may be repeated for every page on a site● This can prove tiresome if the user has to wait for the reader
to read the menu each time a new page is visited.
• Chunked pages and non-linear presentation further complicate matters
Existing Solution: Transcoding
• A method of adapting and reformatting Web content so that it is suitable for a wide range of client devices
• Heuristic Transcoding - Uses general rules and heuristics to find areas of the web page
• Semantic Transcoding - Uses annotations to add metadata to the Web page in order to explicitly state the meaning of the elements
Heuristic Transcoding
• Use general rules and heuristics to find areas of the web page
• Once an area is found, then modify it in some way
• EgIf (row at the top of table && Number of characters between <a> tags > Number of characters between <p> tags)
then (Element is a page menu so do something)
Heuristic Transcoding
<table cellspacing="0" cellpadding="0" border="0" class="cnnCeilnav"><tr valign="middle" height="22" > <td><a href="/">Home</a></td> <td><a href="/WORLD/">World</a></td> <td><a href="/US/">U.S.</a></td> <td><a href="/WEATHER/">Weather</a></td> <td><a href="http://money.cnn.com/index.html">Business</a> . . . </tr>
<ul> <li><a href="/">Home</a></li> <li><a href="/WORLD/">World</a></li> <li><a href="/US/">U.S.</a></li> <li><a href="/WEATHER/">Weather</a></li> <li><a href="http://money.cnn.com/index.html">Business</a></li>
. . .
</ul>
Heuristic Transcoding
• General enough to be applied to a large number of web pages● All CNN pages follow this pattern, as do other pages that
have a similar layout template
• Can be inaccurate if the page is slightly different from the pre-existing rules● Eg CNN inserts an additional row containing advertisements
Semantic Transcoding
• Uses annotations to add metadata to the Web page in order to explicitly state the meaning of the elements
• Eg
<menu> <table cellspacing="0" cellpadding="0" border="0" class="cnnCeilnav">
<tr valign="middle" height="22" > <td><a href="/">Home</a></td> <td><a href="/WORLD/">World</a></td> <td><a href="/US/">U.S.</a></td> <td><a href="/WEATHER/">Weather</a></td> <td><a href="http://money.cnn.com/index.html">Business</a> . . . </tr>
</menu>
Semantic Transcoding
• Very accurate● We can modify the page layout but as long as the annotations
remain, the transcoding will work.
• Every Web page must be annotated limiting the number of pages that can be transcoded● Time consuming● Issues of document ownership
CSS
• Cascading Style Sheets support the separation of presentation from content● Information about fonts, colour, positioning etc is held in the
style sheet.
• Style Sheets often have some implicit semantics● This semantics is encoded in the names of the elements
rather than in some formal structure.● Use of terms like header, footer or nav● Layout and presentation can add implicit meaning
SADIe
• Semantics are implicitly encoded within the visual presentation of the Web page
• Cascading Style Sheets define the visual presentation of the pages within a Website
• Defining the role of the Cascading Style Sheet element, by association, defines the role of the Web page element
• Gain the best of both worlds● Accurate transcoding in the same manner as Semantic
Transcoding● Element definitions of a single CSS can be applied to multiple
Web pages in a manner similar to Heuristic Transcoding
Annotating The CSS
Upper Level Ontology Extended Ontology
SADIe Application
cnnCeilnav
cnnBodyText
cnnBottomNav
cnnCSS
SADIe Implementation
• Implemented as a proxy● All browsing requests from the client pass through the proxy,
where transformation takes place. ● Proxy rewrites HTML pages to provide accessible version of
content
• Allows users to:● Defluff – Removing non-essential elements● Re-order – Promoting elements that are considered important
to the top of the page● Toggle Menus – Show/hide navigational menus
SADIe Application
SADIe Transcoder
SADIefied CNN
Evaluation
• We want to show that using SADIe decreases the time it takes to find information on the page
• Four methods of testing information retrieval on Web pages:● Simple Fact Question: Involves the user finding a fact on the
that is either true or false. ● Judgement Question: Involves the user viewing a Website
and providing a judgement● Comparison Of Fact Questions: Involves the user finding a
series of facts and then answering a question that is either true or false.
● Comparison Of Judgement Questions: Involves the user viewing a Website, comparing the facts and reaching a conclusion.
Evaluation Hypothesis
• H0:– The time it takes to complete a fact based task on a Webpage is the same regardless of whether the page that is used is SADIefied.
• H1:– The time it takes to complete a fact based task on a Web page using a SADIefied page is less than the time it takes to complete a task using a non-SADIefied page.
Evaluation Methodology
• 20 pages that had similar content that was predominantly text based● News e.g. CNN, BBC, New York Times…● Blogs e.g. Blogger, Xanga…
• Asked the user to find facts that were as similar for each page possible● Eg for news sites “What is the headline of the first story?”
• The user was presented with a page one at a time, some of which were SADIefied
• We timed how long it took the user to answer the question
Evaluation Results
• So far we have evaluated SADIe with a single user
• Results are encouraging and are significant using Randomization Testing…
• … but we would like more users to support our results.
Further Work
• This is still preliminary work, and much remains to do
• Analysis of coping strategies● Informing our transformations and transcodes
• (Semi)-Automation of mappings for stylesheets
• Richer upper level ontology● Currently the ontology is essentially a taxonomy
• More User Evaluations
Conclusions
• Browsing the Web can be difficult for those who are visually impaired
• SADIe can apply transcoding by using implicit information extracted from the CSS
• Initial evaluation results are promising and show that SADIe can help visually impaired users reach content more quickly
• More work still needs to be done
Questions?
http://www.cs.manchester.ac.uk/img/sadie