1 developing an ontolog ontology denise a. d. bedford april 13, 2006
TRANSCRIPT
11
Developing an Ontolog OntologyDeveloping an Ontolog Ontology
Denise A. D. Bedford Denise A. D. Bedford
April 13, 2006April 13, 2006
22
Presentation GoalsPresentation Goals Primary Purpose of presentation today is to:Primary Purpose of presentation today is to:
Establish a framework for developing an ontology that will Establish a framework for developing an ontology that will focus on the current and future content of the Ontolog focus on the current and future content of the Ontolog community, support a range of uses of the Ontolog and community, support a range of uses of the Ontolog and Ontolog-referenced content, by Ontolog members and Ontolog-referenced content, by Ontolog members and non-membersnon-members
Provide a sustainable foundation for future variations in Provide a sustainable foundation for future variations in content, use and users - which is extensible without content, use and users - which is extensible without radical re-engineering going forwardradical re-engineering going forward
Provide a framework against which a basic set of Provide a framework against which a basic set of functional architecture requirements can be defined – June functional architecture requirements can be defined – June discussiondiscussion
Provide a framework against which various semantic Provide a framework against which various semantic technologies might be positioned to support Ontolog - technologies might be positioned to support Ontolog - April and June discussionsApril and June discussions
33
Presentation GoalsPresentation Goals
Secondary Purpose of presentation today is to:Secondary Purpose of presentation today is to:
Provide a basis for a case study in collaborative practice Provide a basis for a case study in collaborative practice domain ontology development and managementdomain ontology development and management
Provide a comparison – along the way – of the various Provide a comparison – along the way – of the various ontology reference models ontology reference models
If the group wishes – along the way – provide the If the group wishes – along the way – provide the community with guidance in positioning semantic community with guidance in positioning semantic solutions vis a vis semantic problemssolutions vis a vis semantic problems
44
Goal is not to…Goal is not to…
Advocate one particular semantic approach over Advocate one particular semantic approach over others because they all serve different purposesothers because they all serve different purposes
Provide a survey of or evaluate the individual Provide a survey of or evaluate the individual technologies on the market todaytechnologies on the market today
Suggest that any one person has a solution that Suggest that any one person has a solution that works for everyoneworks for everyone
Rather, to discuss a strategy or approach for Rather, to discuss a strategy or approach for addressing the problemaddressing the problem
55
Some Basic QuestionsSome Basic Questions How can we anchor our ontology? Ie. Where do we start?How can we anchor our ontology? Ie. Where do we start?
How do we know if we need one ontology or many?How do we know if we need one ontology or many?
How do we know if we need to create one or if we can How do we know if we need to create one or if we can borrow/adapt one from someone else?borrow/adapt one from someone else?
Let’s take as a starting point, a framework with three Let’s take as a starting point, a framework with three essential components that need to be addressed by any essential components that need to be addressed by any ontology we define:ontology we define: Content Content UsersUsers Use/processesUse/processes
These basic reference points should give us sufficient These basic reference points should give us sufficient scenarios to understand the basic functional requirements scenarios to understand the basic functional requirements our ontology will have to satisfyour ontology will have to satisfy
66
The Context for an Ontolog OntologyThe Context for an Ontolog Ontology
Users Use or Function
Information (Document)
Context
77
UsersUsers May seem like the easiest dimension to address – but we May seem like the easiest dimension to address – but we
need to make sure we have the same goals for the Ontolog need to make sure we have the same goals for the Ontolog ontologyontology
Do we assume that only Ontolog active members will be Do we assume that only Ontolog active members will be served by the ontology?served by the ontology?
Or, do we support all members and the general public who Or, do we support all members and the general public who might be interested in joining the community or who might might be interested in joining the community or who might find the wiki content a valuable resource for learning?find the wiki content a valuable resource for learning?
Are we assuming only ontolog-sophisticates or do we Are we assuming only ontolog-sophisticates or do we include general managers, novices, general public interest?include general managers, novices, general public interest?
88
User CommunityUser CommunityWhoWho Domain KnowledgeDomain Knowledge RolesRoles
Ontolog MemberOntolog Member WikiWiki Wiki ManagerWiki Manager
Ontolog Member/Non-Ontolog Member/Non-MemberMember
Ontology research & Ontology research & developmentdevelopment
Researchers, Researchers, discussants, presenters, discussants, presenters, novicesnovices
Ontolog Member/Non-Ontolog Member/Non-MemberMember
Computational linguisticsComputational linguistics Researchers, Researchers, discussants, presenters, discussants, presenters, novicesnovices
Ontolog Member/Non-Ontolog Member/Non-MemberMember
Standards development Standards development workwork
Participants, vendors, Participants, vendors, observers, implementorsobservers, implementors
Ontolog Member/Non-Ontolog Member/Non-MemberMember
MetadataMetadata Creators, users, Creators, users, semantics developers, semantics developers, computational linguistscomputational linguists
Ontolog Member/Non-Ontolog Member/Non-MemberMember
TaxonomiesTaxonomies Creators, designers, Creators, designers, users, semantics users, semantics developers, developers, computational linguistscomputational linguists
Ontolog Member/Non-Ontolog Member/Non-MemberMember
Information ArchitectureInformation Architecture Engineers, information Engineers, information scientistsscientists
Ontolog Member/Non-Ontolog Member/Non-MemberMember
Semantic TechnologiesSemantic Technologies Developers, users, Developers, users, implementors, linguists, implementors, linguists, novicesnovices
99
Use and ContextUse and Context
It is challenging for people who are so familiar with It is challenging for people who are so familiar with ontology development and semantic technologies to step ontology development and semantic technologies to step back and think about how an ontology would actually back and think about how an ontology would actually support our use of the Ontolog content support our use of the Ontolog content
But, this is a critical first step – without understanding the But, this is a critical first step – without understanding the use and context, we cannot establish a baseline ontologyuse and context, we cannot establish a baseline ontology
Without understanding use and context we will forever Without understanding use and context we will forever argue about which model works best, which tools work best argue about which model works best, which tools work best and who should do what – actually, there is room for and who should do what – actually, there is room for variation and negotiation herevariation and negotiation here
Following tables are the result of some brainstorming and Following tables are the result of some brainstorming and observations from the Ontolog community itselfobservations from the Ontolog community itself
1010
Possible Uses of Ontolog ContentPossible Uses of Ontolog Content
DoingDoing WhatWhat
FindFind Person who knows Person who knows something about an issuesomething about an issue
BrowseBrowse Issues that Ontolog has Issues that Ontolog has discusseddiscussed
FindFind All people who participated All people who participated in a discussionin a discussion
Learn AboutLearn About Reference models discussed Reference models discussed by Ontologby Ontolog
Get list ofGet list of Problems Ontolog identified Problems Ontolog identified that need attentionthat need attention
BrowseBrowse Collections by topicCollections by topic
SearchSearch Future conference call topicsFuture conference call topics
1111
Possible Uses of Ontolog ContentPossible Uses of Ontolog Content
DoingDoing WhatWhat
SearchSearch Next scheduled callNext scheduled call
SearchSearch Specific email messageSpecific email message
FindFind List of all members of List of all members of OntologOntolog
FindFind Specific Ontolog memberSpecific Ontolog member
Find Find Reference to ontology Reference to ontology standardsstandards
FindFind Book referencesBook references
FindFind Organizations working in this Organizations working in this areaarea
1212
Possible Uses of Ontolog ContentPossible Uses of Ontolog Content
DoingDoing WhatWhat
Find Find Upcoming conferences & Upcoming conferences & participantsparticipants
Generate Generate Knowledge map of who Knowledge map of who knows what in Ontologiesknows what in Ontologies
GenerateGenerate Map of the social networking Map of the social networking in Ontologin Ontolog
PublishPublish review of a new bookreview of a new book
StartStart Discussion of a new topicDiscussion of a new topic
Annotate/sujmarizeAnnotate/sujmarize Discussion threadDiscussion thread
Others??Others?? Others??Others??
1313
ContentContent Let’s do a simple exercise of defining the kinds of content that Let’s do a simple exercise of defining the kinds of content that
the ontology has to cover - this may seem like the easiest the ontology has to cover - this may seem like the easiest component to define, although a lot of the content is not as component to define, although a lot of the content is not as obvious as we might thinkobvious as we might think
When we began our work with semantic technologies at the When we began our work with semantic technologies at the World Bank five years ago we started from a content model World Bank five years ago we started from a content model perspective – all of our content types have data models (see perspective – all of our content types have data models (see next slide) – difference between ‘concepts’ and ‘instances’next slide) – difference between ‘concepts’ and ‘instances’
The content models, combined with the nature of their use and The content models, combined with the nature of their use and the type of user, helped us to identify the kinds of semantic the type of user, helped us to identify the kinds of semantic problems we would encounterproblems we would encounter
You can then evaluate semantic technologies vis a vis your You can then evaluate semantic technologies vis a vis your semantic problems – without this analysis, you may end up semantic problems – without this analysis, you may end up creating a situation you cannot manage or sustaincreating a situation you cannot manage or sustain
Let’s extract the set of content objects from the previous tables Let’s extract the set of content objects from the previous tables -- then let’s see what else people expect from the Ontolog -- then let’s see what else people expect from the Ontolog community wikicommunity wiki
1515
First Cut at Ontolog ContentFirst Cut at Ontolog Content
Ontolog People profiles/pagesOntolog People profiles/pages Ontolog presentationsOntolog presentations Ontolog discussion threadsOntolog discussion threads Ontolog conceptsOntolog concepts Ontolog Activity CalendarOntolog Activity Calendar Ontolog Conference call notesOntolog Conference call notes Ontolog Conference call Ontolog Conference call
agendasagendas Ontolog Conference call Ontolog Conference call
minutesminutes Ontolog Conference call Ontolog Conference call
transcriptstranscripts Email messagesEmail messages Discussion threads/forumsDiscussion threads/forums Wiki search logsWiki search logs
Professional Conference Professional Conference schedules & schedules & announcementsannouncements
Professional Conference Professional Conference representationrepresentation
Books on ontology topicsBooks on ontology topics Published articles on Published articles on
ontology topicsontology topics Reviews of books on Reviews of books on
ontologiesontologies Ontology standardsOntology standards Professional organizationsProfessional organizations Research institutions Research institutions
1616
Content Entity
Definition
Content Elements
Content
MetadataProfile
Ontolog Topic Class Scheme
Authority Control – Member Names
Thesaurus of Ontolog Concepts
Areas of Expertise
Authority Contro –Organizations
Has values
usesHas
Contains
UserHas relationship to
Has Meaning in
Use
ContextualMatrix &Sensiing
Understood in
uses
Profile
Has
Business Rule
Has
Ontology Architecture Begins to Ontology Architecture Begins to EmergeEmerge
Has values
Content Elements
Has
Content Model
Has
Aggregation Levels
1717
Functional Requirements Begin to Functional Requirements Begin to EmergeEmerge
We begin to see how all of the components of the semantic We begin to see how all of the components of the semantic architecture fit together….architecture fit together….
Metadata schema Metadata schema Different kinds of taxonomies (controlled lists, rings, Different kinds of taxonomies (controlled lists, rings,
hierarchies, concept networks)hierarchies, concept networks) Semantic analysis tools to support metadata capture Semantic analysis tools to support metadata capture Metadata encoding options (xml, rdf, etc.)Metadata encoding options (xml, rdf, etc.) Metadata storage options (e.g. embedded in document, Metadata storage options (e.g. embedded in document,
distinct database, etc.)distinct database, etc.) Search system which supports attribute searching & which Search system which supports attribute searching & which
leverages reference sources leverages reference sources Browse structureBrowse structure ReportingReporting Data mining and clustering Data mining and clustering Other more sophisticated inference and reasoning options Other more sophisticated inference and reasoning options
(shall we try to discover or test some standard axioms for (shall we try to discover or test some standard axioms for ontologies?)ontologies?)
1818
Metadata Schema and Metadata Schema and TaxonomiesTaxonomies
Schema needs to cover all kinds of content we’ve identified Schema needs to cover all kinds of content we’ve identified
We need to identify at least the basic attributes of the We need to identify at least the basic attributes of the content – keep it simple and purposeful – and targeted to content – keep it simple and purposeful – and targeted to use and usersuse and users
Discuss which attributes need to be managed and which Discuss which attributes need to be managed and which not managed?not managed?
Keep the horse in front of the cart -- what needs to be Keep the horse in front of the cart -- what needs to be managed should be analyzed in terms of its data structures, managed should be analyzed in terms of its data structures, syntax and semantics syntax and semantics before we can specify the type of before we can specify the type of ontology that is neededontology that is needed
1919
Faceted taxonomy at center – other types as controlling sources – distinct ontologies
Concept networks
Ontolog Topics Names
Any one value might have many synonyms (ring)
2020
Ontolog Metadata Coverage & Strategies Ontolog Metadata Coverage & Strategies
AttributeAttribute Semantic Semantic ChallengeChallenge
SolutionSolution
People Names, People Names, institution names, institution names, organization namesorganization names
Variations Variations Harmonization Harmonization through concept through concept extractionextraction
Ontolog TopicsOntolog Topics Distill the topics of Distill the topics of interest, maintaininterest, maintain
Automated Automated CategorizationCategorization
ConceptsConcepts Breadth of Breadth of coverage, coverage, variationsvariations
Concept Extraction, Concept Extraction, Harmonization Harmonization through clusteringthrough clustering
People skills & People skills & competenciescompetencies
Distill a list – Distill a list – maintainmaintain
Concept extraction, Concept extraction, harmonization harmonization through through categorizationcategorization
Domain knowledgeDomain knowledge Distill the list of Distill the list of domains, map to domains, map to topicstopics
Categorization, Categorization, harmonizationharmonization
2121
Semantic TechnologiesSemantic Technologies Most primitive level of semantic discovery and harmonization Most primitive level of semantic discovery and harmonization
is human brain and languageis human brain and language
But a human approach to metadata – based on our But a human approach to metadata – based on our experience is neither scalable nor practical – it can help you experience is neither scalable nor practical – it can help you to discover what your reference sources are but it won’t to discover what your reference sources are but it won’t sustain for tagging’sustain for tagging’
Cleanup, disconnects, amount of technical resources needed Cleanup, disconnects, amount of technical resources needed to compensate for unmanaged ‘human semantics’ can be to compensate for unmanaged ‘human semantics’ can be costly and resource intense to supportcostly and resource intense to support
Rather, leverage the human semantics to inform the Rather, leverage the human semantics to inform the semantics – not the other way aroundsemantics – not the other way around
Question then is how to leverage the semantic tools to Question then is how to leverage the semantic tools to support the ontology? support the ontology?
Where do the tools fit? What functions do they support? Where do the tools fit? What functions do they support?
What resources are needed to sustain them? What resources are needed to sustain them?
2222
Categorizing Content – Real World Categorizing Content – Real World ExampleExample
World Bank adopted an automated solution for ‘tagging’ content – all World Bank adopted an automated solution for ‘tagging’ content – all kinds of content – which is now operational in systems kinds of content – which is now operational in systems
Let’s take as examples selected attributes and illustrate how we’re Let’s take as examples selected attributes and illustrate how we’re categorizing our content to this structure automaticallycategorizing our content to this structure automatically
Topic classification, geographical region assignment, keywording Topic classification, geographical region assignment, keywording examplesexamples
This approach can be applied to any kind of content – as long as you This approach can be applied to any kind of content – as long as you have some electronic content to work with (electronic information have some electronic content to work with (electronic information about or from people can be used to generate people profiles)about or from people can be used to generate people profiles)
Enables us to build a robust metadata repository model, with strong Enables us to build a robust metadata repository model, with strong metadata quality, to move towards SI at the functional levelmetadata quality, to move towards SI at the functional level
Also note that we can do this across many languagesAlso note that we can do this across many languages
2323
Sidebar -- What is Teragram?Sidebar -- What is Teragram?
Semantic analysis tools which support concept extraction, Semantic analysis tools which support concept extraction, categorization, summarization and pattern matching rules categorization, summarization and pattern matching rules enginesengines
Teragram works in 23 languagesTeragram works in 23 languages
Use categorization to capture Topics, Business Activities, Use categorization to capture Topics, Business Activities, Regions, Sectors, Themes, etc.Regions, Sectors, Themes, etc.
Use Concept Extraction to capture keywordsUse Concept Extraction to capture keywords
Use Rules Engine to capture Loan #, Credit #, Project ID, Trust Use Rules Engine to capture Loan #, Credit #, Project ID, Trust Fund #, etc.Fund #, etc.
Use Summarization to generate a ‘gist’ of the contentUse Summarization to generate a ‘gist’ of the content
2424
Use of Semantic Technologies - Use of Semantic Technologies - ExampleExample
Sample structure –Topics Classification Scheme (hierarchical Sample structure –Topics Classification Scheme (hierarchical taxonomy)taxonomy)
Oracle data classes used to represent Topic Classification Oracle data classes used to represent Topic Classification scheme scheme
hierarchical taxonomy as reference source for the hierarchical taxonomy as reference source for the attribute – Topicattribute – Topic
used for Browse, Search, Content Syndication, used for Browse, Search, Content Syndication, PersonalizationPersonalization
11stst challenge is to architect the hierarchy correctly challenge is to architect the hierarchy correctly 3 distinct data classes, not a tree structure with 3 distinct data classes, not a tree structure with
inheritanceinheritance Allows you to use the three data classes for distinct Allows you to use the three data classes for distinct
functions across systems but still enforce relationships functions across systems but still enforce relationships across the classesacross the classes
3030
Example of use of Authority Control to capture country
names but extract ‘authorized’ version of
country name
Example of use of a gazetteer + concept
extraction + rules engine to support semantic
interoperability
3232
Caution Regarding ToolsCaution Regarding Tools
Not all tools will do what we describing hereNot all tools will do what we describing here
You need to have an underlying You need to have an underlying semantic enginesemantic engine which which can perform semantic analysis – Bayesian/statistical can perform semantic analysis – Bayesian/statistical data mining approaches will not work in this waydata mining approaches will not work in this way
You need to have a semantic engine in You need to have a semantic engine in multiple multiple languageslanguages – semantics vary by language – semantics vary by language
You need to have access to the programs through a You need to have access to the programs through a user-friendly interface so you can adapt them to your user-friendly interface so you can adapt them to your environment without having to have programming environment without having to have programming knowledgeknowledge
You need to have several different kinds of You need to have several different kinds of technologies to do what I’m describing heretechnologies to do what I’m describing here
Not all the tools on the market today support this workNot all the tools on the market today support this work
3434
Semantic Analysis BasicsSemantic Analysis Basics
Once you have made some sense of the sentence, Once you have made some sense of the sentence, reconstruct entities for information extraction (compose)reconstruct entities for information extraction (compose)
Identify names and other fixed form expressions – Identify names and other fixed form expressions – people, organizations, conferencespeople, organizations, conferences
Identify basic noun groups, verb groups, Identify basic noun groups, verb groups, presentations, other grammatical elementspresentations, other grammatical elements
Use exposed grammars to construct rules for Use exposed grammars to construct rules for targeted entity extraction - noun groups and verb targeted entity extraction - noun groups and verb groupsgroups
Identify event structuresIdentify event structures
Identify common elements and associate Identify common elements and associate
3535
Enterprise Profile
Development & Maintenance
Enterprise Metadata Profile
Concept Extraction TechnologyCountryOrganization NamePeople NameSeries Name/Collection TitleAuthor/CreatorTitlePublisher Standard Statistical VariableVersion/Edition
Categorization TechnologyTopic CategorizationBusiness Function CategorizationRegion CategorizationSector CategorizationTheme Categorization
Rule-Based CaptureProject IDTrust Fund #Loan #Credit #Series #Publication DateLanguage
Summarization
e-CDS Reference Sources forCountry, Region, Topics
Business Function, Keywords,Project ID, People, Organization
Data GovernanceProcess for
Topics, Business Function,Country, Region, Keywords,
People, Organizations, Project ID
Teragram Team
TK240 Client ISP IRIS ImageBankFactiva
JOLISE-Journals
Enterprise Profile Creation and Maintenance
UCM ServiceRequests
Update & Change Requests
3636
Next Steps - DiscussionNext Steps - Discussion
Purpose of this presentation is to try to frame the Purpose of this presentation is to try to frame the discussion for the Ontolog community going discussion for the Ontolog community going forwardforward
Next week we will have a panel of speakers who Next week we will have a panel of speakers who will talk about aspects of the challenge of will talk about aspects of the challenge of developing and applying an ontology for the developing and applying an ontology for the Ontolog contentOntolog content
In the time that we have remaining today, might In the time that we have remaining today, might we discuss what other issues need to be added to we discuss what other issues need to be added to the framework?the framework?