all presentation material copyright eurostep group ab ® the semantic web made simple david price...
TRANSCRIPT
All Presentation Material Copyright Eurostep Group AB
®
The Semantic Web Made Simple
David PriceDecember 2004
®
All Presentation Material Copyright Eurostep Group AB
Agenda
• The Current Web– and its technologies– How’s it work now?
• The Semantic Web– is adding semantics– How’s it going to work in the future?
®
All Presentation Material Copyright Eurostep Group AB
The Current Web
• Web core concepts– People read Web pages– Web page authors can control basic layout– Web pages need to link to each other– Web pages need to link to online media
• that people read, view, listen to or interpret
– People use tools that search/recall Web content (Yahoo, Google, Lycos, their own bookmarks)
®
All Presentation Material Copyright Eurostep Group AB
What’s on a Web Page?Text that’s actually
graphicsCategories of
articles
A photograph
Online shopping linkArticle title and link
Article abstractDate and time
Location, temperature and unit
®
All Presentation Material Copyright Eurostep Group AB
What we saw
• Things on the NY Times site– Text that’s actually a graphic– Categories of articles– Online shopping link– A photograph– Article title and link– Article abstract– Date and time– Location, temperature and unit
• How did we know that?– Because we are humans who can read English
and who can interpret what we see
®
All Presentation Material Copyright Eurostep Group AB
What did the editors do?
• Determined the layout of the pages as a whole– Should it look like a real paper? Should there be
advertising?
• Wrote the text• Decided on navigation
– Articles categories called “International”, “National”, “Sports”, etc.
• Article categories list items link to separate page for each category with list of articles
– Users will have to scroll down the page to see the headline articles
– Articles titles will link directly to separate page for each article
®
All Presentation Material Copyright Eurostep Group AB
How did they do that?
• They used HTML and graphic images• Hypertext Markup Language (HTML) allows
editors to– control presentation and layout
• Paragraph, Bold, Table/Column/Row
– add links to other pages• Hyperlink Reference
– show graphics• Image of many types are natively supported by browsers
– link to other media that have software to present them
• music, video, PDF, documents, presentations
®
All Presentation Material Copyright Eurostep Group AB
A peek under the covers
®
All Presentation Material Copyright Eurostep Group AB
How does that work?
• HTML is a standard language– World Wide Web consortium standardized it
• Companies have written software that reads HTML and presents it to you– These are Web browsers
• The presentation capabilities of HTML, the related media and browsers are pretty powerful
®
All Presentation Material Copyright Eurostep Group AB
How does HTML really work?
• What do the browsers understand?– <P>This is a paragraph.</P>
• Present the text “This is a paragraph.” as a new paragraph
– <A HREF=“newsitems.html”>News</A>• Present a hyperlink of text “News” and if it’s selected
present new page from file “newsitems.html”
– <TR><TD>dog</TD><TD>cat</TD><TR>• In the current row of the table, present text “dog” in
column 1 of table and text “cat” in column 2 of table
– <IMG SRC=“p1.jpg” />• Present an image from whatever is in the file named
“p1.jpg”
®
All Presentation Material Copyright Eurostep Group AB
So, What’s the problem?
• Only a human being can read a Web page and extract any meaning from it– The Web browser does understand paragraph,
image, link– The Web browser does not know it’s linking to a
“News Article” or the image is a “picture of photographs”
• It’s the meaning that’s really important• Wouldn’t it be powerful if computers could
get some of the meaning out of Web pages?
®
All Presentation Material Copyright Eurostep Group AB
Why is it a powerful idea?
• Using our NY Times/newspaper site example…– Suppose you were an Environmental Group– Suppose you want to monitor news stories about
the environment or pollution– You could write a program that searches the Web
media outlets– That program could trigger a notification about
articles on environmental issues– Or, it could contact members of your group in
specific locations when it finds legislation related to pollution in particular US states
– This would save your members a lot of time searching for themselves, wouldn’t it?
®
All Presentation Material Copyright Eurostep Group AB
The Semantic Web
• Figuring out how to get meaning out of things on the Web using software is what “The Semantic Web” is all about– “using software” means “without humans doing
the interpretation”
• How would one do that?– Clearly, HTML is not sufficient, so more powerful
languages are required– Clearly, cannot replace everything already on
the Web, so ways to add meaning are required– Need to combine better
languages/communication, computer science and the study of what things mean
®
All Presentation Material Copyright Eurostep Group AB
Semantics
• People have been studying what things exist and what they mean for centuries– This is called Philosophy
• People have been studying how people communicate for decades– This is called Linguistics
• People have been studying how computers can “learn” for a few decades– This is called Artificial Intelligence
®
All Presentation Material Copyright Eurostep Group AB
The Semantic Web
• Vision of Web “inventer” Tim Berners-Lee and others– Wrote an article in Scientific American in 2001
• Goals– Go beyond processing by human beings– Make Web content computer processable
• How?– Add semantics using ontologies– Use inference/reasoning over ontologies
®
All Presentation Material Copyright Eurostep Group AB
Ontologies
• Ontology– A big word from philosophy, linguistics, and
computer science– A formal, machine readable specification of a
domain of interest• Names things and adds knowledge about and
constraints on the things• Allows relationships between terms within and between
different ontologies
• Semantic Web researchers and W3C have been working several years now
®
All Presentation Material Copyright Eurostep Group AB
OWL History
• US researchers produced DAML-ONT in 2000– DARPA Agent Markup Language – Ontology Language
• European researchers produced OIL about the same time– Ontology Inference Layer
• Merged to produce DAML+OIL and submitted as Note to W3C and formed the W3C WebOnt group in 2001
• W3C WebOnt Group produced OWL in 2003– OWL is now a W3C Recommendation
• This is not really that important for our purposes… just remember that OWL didn’t appear overnight
®
All Presentation Material Copyright Eurostep Group AB
What is OWL?
• The World Wide Web Consortium (W3C) created the HTML and XML standards
• OWL is a next-generation W3C Web standard– its purpose is to add “semantics” to the Web
• Therefore, it can be distributed and is Web-enabled and does not assume a single source for everything
– In concept, it is very much like other data modelling languages (it calls models or schemas “ontologies”)
• class, subclass, property, property type, instance/individual
– supports set theory and logic-based statements about the classes and individuals
– it has more than one syntax, XML being one
®
All Presentation Material Copyright Eurostep Group AB
RDF underlies OWL
• RDF is another W3C standard, the Resource Description Framework– RDF is simple in concept but sufficient for many
basic Semantic Web tasks (e.g. who created this presentation?)
– It allows you to assign a property with a value to a Web page (or any Web resource)
Resource
http://www.eurostep.com/TheSemanticWeb.ppt
Property Creator
Value David Price
®
All Presentation Material Copyright Eurostep Group AB
RDF underlies OWL
• RDF is another W3C standard, the Resource Description Framework– RDF is simple in concept but sufficient for many
basic Semantic Web tasks (e.g. who created this presentation?)
– RDF is often represented by nodes and arcs
http://www.eurostep.com/TheSemanticWeb.ppt “David Price”Creator
®
All Presentation Material Copyright Eurostep Group AB
Back to the NY Times
®
All Presentation Material Copyright Eurostep Group AB
What we saw… again
• Things on the NY Times site– Text that’s actually a graphic– Categories of articles– Online shopping link– A photograph– Article title and link– Article abstract– Date and time– Location, temperature and unit
• How did we know that?– Because we are humans who can read English
and who can interpret what we see
®
All Presentation Material Copyright Eurostep Group AB
A peek under the semantic covers
Newspaper ontology
Article Authors
Date
Article title
Article Subjects
®
All Presentation Material Copyright Eurostep Group AB
Without using an editor …
Now these are semantics a
software application can understand… Articles and
Authors
®
All Presentation Material Copyright Eurostep Group AB
On Annotating the Web
• You might ask: But what about the current Web content, we’re not going to rewrite it all are we?
• And we’d answer: Of course not, but you can “annotate” them to add semantics.
• What this means is:– Descriptive ontologies like the one for Newpapers are
being developed– Descriptions are then linked to already existing Web
pages, including any multi-media content (e.g. video)– The Semantic Web community calls this “annotating
a Web resource”– You’ll also hear people use the term “metadata” too
®
All Presentation Material Copyright Eurostep Group AB
So, How does OWL Work?
• An Ontology– is a formal description of a field of interest– defines Classes – the kinds of things of interest
• Article, Person, etc.
– defines Properties – the relationships and characteristics related to Classes
• Article is WrittenBy Person, Person has Name
• Then, based on the Ontology people create content– An author writes articles using software that understands
the Newspaper Ontology– The Publisher gathers all the articles, classifieds, etc. and
links them into the online version of the NY Times
®
All Presentation Material Copyright Eurostep Group AB
But how does that help?
• If everyone, or at least a reasonably large community, agreed on an ontology for Newspapers– then sharing articles between sites is possible– presentation can be layered on top of the semantic
content of the articles– Web robots, only smarter than Google, can find
and relate content about specific subjects, by specific authors, etc.
• The key is getting agreement on the ontologies– This is ongoing in various standards bodies,
consortia, etc. but remains a major issue for the Semantic Web
®
All Presentation Material Copyright Eurostep Group AB
In Conclusion
• The Semantic Web goal is to make semantic content of Web pages available for software applications
• Work has been ongoing for several years– Building on decades of research
• The OWL language is a key development– As are the languages upon which it is based,
such as RDF Schema• But that’s for another day…