1 developing an ontolog ontology denise a. d. bedford april 13, 2006

37
1 Developing an Ontolog Developing an Ontolog Ontology Ontology Denise A. D. Bedford Denise A. D. Bedford April 13, 2006 April 13, 2006

Upload: shauna-robinson

Post on 25-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

11

Developing an Ontolog OntologyDeveloping an Ontolog Ontology

Denise A. D. Bedford Denise A. D. Bedford

April 13, 2006April 13, 2006

22

Presentation GoalsPresentation Goals Primary Purpose of presentation today is to:Primary Purpose of presentation today is to:

Establish a framework for developing an ontology that will Establish a framework for developing an ontology that will focus on the current and future content of the Ontolog focus on the current and future content of the Ontolog community, support a range of uses of the Ontolog and community, support a range of uses of the Ontolog and Ontolog-referenced content, by Ontolog members and Ontolog-referenced content, by Ontolog members and non-membersnon-members

Provide a sustainable foundation for future variations in Provide a sustainable foundation for future variations in content, use and users - which is extensible without content, use and users - which is extensible without radical re-engineering going forwardradical re-engineering going forward

Provide a framework against which a basic set of Provide a framework against which a basic set of functional architecture requirements can be defined – June functional architecture requirements can be defined – June discussiondiscussion

Provide a framework against which various semantic Provide a framework against which various semantic technologies might be positioned to support Ontolog - technologies might be positioned to support Ontolog - April and June discussionsApril and June discussions

33

Presentation GoalsPresentation Goals

Secondary Purpose of presentation today is to:Secondary Purpose of presentation today is to:

Provide a basis for a case study in collaborative practice Provide a basis for a case study in collaborative practice domain ontology development and managementdomain ontology development and management

Provide a comparison – along the way – of the various Provide a comparison – along the way – of the various ontology reference models ontology reference models

If the group wishes – along the way – provide the If the group wishes – along the way – provide the community with guidance in positioning semantic community with guidance in positioning semantic solutions vis a vis semantic problemssolutions vis a vis semantic problems

44

Goal is not to…Goal is not to…

Advocate one particular semantic approach over Advocate one particular semantic approach over others because they all serve different purposesothers because they all serve different purposes

Provide a survey of or evaluate the individual Provide a survey of or evaluate the individual technologies on the market todaytechnologies on the market today

Suggest that any one person has a solution that Suggest that any one person has a solution that works for everyoneworks for everyone

Rather, to discuss a strategy or approach for Rather, to discuss a strategy or approach for addressing the problemaddressing the problem

55

Some Basic QuestionsSome Basic Questions How can we anchor our ontology? Ie. Where do we start?How can we anchor our ontology? Ie. Where do we start?

How do we know if we need one ontology or many?How do we know if we need one ontology or many?

How do we know if we need to create one or if we can How do we know if we need to create one or if we can borrow/adapt one from someone else?borrow/adapt one from someone else?

Let’s take as a starting point, a framework with three Let’s take as a starting point, a framework with three essential components that need to be addressed by any essential components that need to be addressed by any ontology we define:ontology we define: Content Content UsersUsers Use/processesUse/processes

These basic reference points should give us sufficient These basic reference points should give us sufficient scenarios to understand the basic functional requirements scenarios to understand the basic functional requirements our ontology will have to satisfyour ontology will have to satisfy

66

The Context for an Ontolog OntologyThe Context for an Ontolog Ontology

Users Use or Function

Information (Document)

Context

77

UsersUsers May seem like the easiest dimension to address – but we May seem like the easiest dimension to address – but we

need to make sure we have the same goals for the Ontolog need to make sure we have the same goals for the Ontolog ontologyontology

Do we assume that only Ontolog active members will be Do we assume that only Ontolog active members will be served by the ontology?served by the ontology?

Or, do we support all members and the general public who Or, do we support all members and the general public who might be interested in joining the community or who might might be interested in joining the community or who might find the wiki content a valuable resource for learning?find the wiki content a valuable resource for learning?

Are we assuming only ontolog-sophisticates or do we Are we assuming only ontolog-sophisticates or do we include general managers, novices, general public interest?include general managers, novices, general public interest?

88

User CommunityUser CommunityWhoWho Domain KnowledgeDomain Knowledge RolesRoles

Ontolog MemberOntolog Member WikiWiki Wiki ManagerWiki Manager

Ontolog Member/Non-Ontolog Member/Non-MemberMember

Ontology research & Ontology research & developmentdevelopment

Researchers, Researchers, discussants, presenters, discussants, presenters, novicesnovices

Ontolog Member/Non-Ontolog Member/Non-MemberMember

Computational linguisticsComputational linguistics Researchers, Researchers, discussants, presenters, discussants, presenters, novicesnovices

Ontolog Member/Non-Ontolog Member/Non-MemberMember

Standards development Standards development workwork

Participants, vendors, Participants, vendors, observers, implementorsobservers, implementors

Ontolog Member/Non-Ontolog Member/Non-MemberMember

MetadataMetadata Creators, users, Creators, users, semantics developers, semantics developers, computational linguistscomputational linguists

Ontolog Member/Non-Ontolog Member/Non-MemberMember

TaxonomiesTaxonomies Creators, designers, Creators, designers, users, semantics users, semantics developers, developers, computational linguistscomputational linguists

Ontolog Member/Non-Ontolog Member/Non-MemberMember

Information ArchitectureInformation Architecture Engineers, information Engineers, information scientistsscientists

Ontolog Member/Non-Ontolog Member/Non-MemberMember

Semantic TechnologiesSemantic Technologies Developers, users, Developers, users, implementors, linguists, implementors, linguists, novicesnovices

99

Use and ContextUse and Context

It is challenging for people who are so familiar with It is challenging for people who are so familiar with ontology development and semantic technologies to step ontology development and semantic technologies to step back and think about how an ontology would actually back and think about how an ontology would actually support our use of the Ontolog content support our use of the Ontolog content

But, this is a critical first step – without understanding the But, this is a critical first step – without understanding the use and context, we cannot establish a baseline ontologyuse and context, we cannot establish a baseline ontology

Without understanding use and context we will forever Without understanding use and context we will forever argue about which model works best, which tools work best argue about which model works best, which tools work best and who should do what – actually, there is room for and who should do what – actually, there is room for variation and negotiation herevariation and negotiation here

Following tables are the result of some brainstorming and Following tables are the result of some brainstorming and observations from the Ontolog community itselfobservations from the Ontolog community itself

1010

Possible Uses of Ontolog ContentPossible Uses of Ontolog Content

DoingDoing WhatWhat

FindFind Person who knows Person who knows something about an issuesomething about an issue

BrowseBrowse Issues that Ontolog has Issues that Ontolog has discusseddiscussed

FindFind All people who participated All people who participated in a discussionin a discussion

Learn AboutLearn About Reference models discussed Reference models discussed by Ontologby Ontolog

Get list ofGet list of Problems Ontolog identified Problems Ontolog identified that need attentionthat need attention

BrowseBrowse Collections by topicCollections by topic

SearchSearch Future conference call topicsFuture conference call topics

1111

Possible Uses of Ontolog ContentPossible Uses of Ontolog Content

DoingDoing WhatWhat

SearchSearch Next scheduled callNext scheduled call

SearchSearch Specific email messageSpecific email message

FindFind List of all members of List of all members of OntologOntolog

FindFind Specific Ontolog memberSpecific Ontolog member

Find Find Reference to ontology Reference to ontology standardsstandards

FindFind Book referencesBook references

FindFind Organizations working in this Organizations working in this areaarea

1212

Possible Uses of Ontolog ContentPossible Uses of Ontolog Content

DoingDoing WhatWhat

Find Find Upcoming conferences & Upcoming conferences & participantsparticipants

Generate Generate Knowledge map of who Knowledge map of who knows what in Ontologiesknows what in Ontologies

GenerateGenerate Map of the social networking Map of the social networking in Ontologin Ontolog

PublishPublish review of a new bookreview of a new book

StartStart Discussion of a new topicDiscussion of a new topic

Annotate/sujmarizeAnnotate/sujmarize Discussion threadDiscussion thread

Others??Others?? Others??Others??

1313

ContentContent Let’s do a simple exercise of defining the kinds of content that Let’s do a simple exercise of defining the kinds of content that

the ontology has to cover - this may seem like the easiest the ontology has to cover - this may seem like the easiest component to define, although a lot of the content is not as component to define, although a lot of the content is not as obvious as we might thinkobvious as we might think

When we began our work with semantic technologies at the When we began our work with semantic technologies at the World Bank five years ago we started from a content model World Bank five years ago we started from a content model perspective – all of our content types have data models (see perspective – all of our content types have data models (see next slide) – difference between ‘concepts’ and ‘instances’next slide) – difference between ‘concepts’ and ‘instances’

The content models, combined with the nature of their use and The content models, combined with the nature of their use and the type of user, helped us to identify the kinds of semantic the type of user, helped us to identify the kinds of semantic problems we would encounterproblems we would encounter

You can then evaluate semantic technologies vis a vis your You can then evaluate semantic technologies vis a vis your semantic problems – without this analysis, you may end up semantic problems – without this analysis, you may end up creating a situation you cannot manage or sustaincreating a situation you cannot manage or sustain

Let’s extract the set of content objects from the previous tables Let’s extract the set of content objects from the previous tables -- then let’s see what else people expect from the Ontolog -- then let’s see what else people expect from the Ontolog community wikicommunity wiki

1414

Content Data Model Example – Event, Content Data Model Example – Event, CommuniqueCommunique

1515

First Cut at Ontolog ContentFirst Cut at Ontolog Content

Ontolog People profiles/pagesOntolog People profiles/pages Ontolog presentationsOntolog presentations Ontolog discussion threadsOntolog discussion threads Ontolog conceptsOntolog concepts Ontolog Activity CalendarOntolog Activity Calendar Ontolog Conference call notesOntolog Conference call notes Ontolog Conference call Ontolog Conference call

agendasagendas Ontolog Conference call Ontolog Conference call

minutesminutes Ontolog Conference call Ontolog Conference call

transcriptstranscripts Email messagesEmail messages Discussion threads/forumsDiscussion threads/forums Wiki search logsWiki search logs

Professional Conference Professional Conference schedules & schedules & announcementsannouncements

Professional Conference Professional Conference representationrepresentation

Books on ontology topicsBooks on ontology topics Published articles on Published articles on

ontology topicsontology topics Reviews of books on Reviews of books on

ontologiesontologies Ontology standardsOntology standards Professional organizationsProfessional organizations Research institutions Research institutions

1616

Content Entity

Definition

Content Elements

Content

MetadataProfile

Ontolog Topic Class Scheme

Authority Control – Member Names

Thesaurus of Ontolog Concepts

Areas of Expertise

Authority Contro –Organizations

Has values

usesHas

Contains

UserHas relationship to

Has Meaning in

Use

ContextualMatrix &Sensiing

Understood in

uses

Profile

Has

Business Rule

Has

Ontology Architecture Begins to Ontology Architecture Begins to EmergeEmerge

Has values

Content Elements

Has

Content Model

Has

Aggregation Levels

1717

Functional Requirements Begin to Functional Requirements Begin to EmergeEmerge

We begin to see how all of the components of the semantic We begin to see how all of the components of the semantic architecture fit together….architecture fit together….

Metadata schema Metadata schema Different kinds of taxonomies (controlled lists, rings, Different kinds of taxonomies (controlled lists, rings,

hierarchies, concept networks)hierarchies, concept networks) Semantic analysis tools to support metadata capture Semantic analysis tools to support metadata capture Metadata encoding options (xml, rdf, etc.)Metadata encoding options (xml, rdf, etc.) Metadata storage options (e.g. embedded in document, Metadata storage options (e.g. embedded in document,

distinct database, etc.)distinct database, etc.) Search system which supports attribute searching & which Search system which supports attribute searching & which

leverages reference sources leverages reference sources Browse structureBrowse structure ReportingReporting Data mining and clustering Data mining and clustering Other more sophisticated inference and reasoning options Other more sophisticated inference and reasoning options

(shall we try to discover or test some standard axioms for (shall we try to discover or test some standard axioms for ontologies?)ontologies?)

1818

Metadata Schema and Metadata Schema and TaxonomiesTaxonomies

Schema needs to cover all kinds of content we’ve identified Schema needs to cover all kinds of content we’ve identified

We need to identify at least the basic attributes of the We need to identify at least the basic attributes of the content – keep it simple and purposeful – and targeted to content – keep it simple and purposeful – and targeted to use and usersuse and users

Discuss which attributes need to be managed and which Discuss which attributes need to be managed and which not managed?not managed?

Keep the horse in front of the cart -- what needs to be Keep the horse in front of the cart -- what needs to be managed should be analyzed in terms of its data structures, managed should be analyzed in terms of its data structures, syntax and semantics syntax and semantics before we can specify the type of before we can specify the type of ontology that is neededontology that is needed

1919

Faceted taxonomy at center – other types as controlling sources – distinct ontologies

Concept networks

Ontolog Topics Names

Any one value might have many synonyms (ring)

2020

Ontolog Metadata Coverage & Strategies Ontolog Metadata Coverage & Strategies

AttributeAttribute Semantic Semantic ChallengeChallenge

SolutionSolution

People Names, People Names, institution names, institution names, organization namesorganization names

Variations Variations Harmonization Harmonization through concept through concept extractionextraction

Ontolog TopicsOntolog Topics Distill the topics of Distill the topics of interest, maintaininterest, maintain

Automated Automated CategorizationCategorization

ConceptsConcepts Breadth of Breadth of coverage, coverage, variationsvariations

Concept Extraction, Concept Extraction, Harmonization Harmonization through clusteringthrough clustering

People skills & People skills & competenciescompetencies

Distill a list – Distill a list – maintainmaintain

Concept extraction, Concept extraction, harmonization harmonization through through categorizationcategorization

Domain knowledgeDomain knowledge Distill the list of Distill the list of domains, map to domains, map to topicstopics

Categorization, Categorization, harmonizationharmonization

2121

Semantic TechnologiesSemantic Technologies Most primitive level of semantic discovery and harmonization Most primitive level of semantic discovery and harmonization

is human brain and languageis human brain and language

But a human approach to metadata – based on our But a human approach to metadata – based on our experience is neither scalable nor practical – it can help you experience is neither scalable nor practical – it can help you to discover what your reference sources are but it won’t to discover what your reference sources are but it won’t sustain for tagging’sustain for tagging’

Cleanup, disconnects, amount of technical resources needed Cleanup, disconnects, amount of technical resources needed to compensate for unmanaged ‘human semantics’ can be to compensate for unmanaged ‘human semantics’ can be costly and resource intense to supportcostly and resource intense to support

Rather, leverage the human semantics to inform the Rather, leverage the human semantics to inform the semantics – not the other way aroundsemantics – not the other way around

Question then is how to leverage the semantic tools to Question then is how to leverage the semantic tools to support the ontology? support the ontology?

Where do the tools fit? What functions do they support? Where do the tools fit? What functions do they support?

What resources are needed to sustain them? What resources are needed to sustain them?

2222

Categorizing Content – Real World Categorizing Content – Real World ExampleExample

World Bank adopted an automated solution for ‘tagging’ content – all World Bank adopted an automated solution for ‘tagging’ content – all kinds of content – which is now operational in systems kinds of content – which is now operational in systems

Let’s take as examples selected attributes and illustrate how we’re Let’s take as examples selected attributes and illustrate how we’re categorizing our content to this structure automaticallycategorizing our content to this structure automatically

Topic classification, geographical region assignment, keywording Topic classification, geographical region assignment, keywording examplesexamples

This approach can be applied to any kind of content – as long as you This approach can be applied to any kind of content – as long as you have some electronic content to work with (electronic information have some electronic content to work with (electronic information about or from people can be used to generate people profiles)about or from people can be used to generate people profiles)

Enables us to build a robust metadata repository model, with strong Enables us to build a robust metadata repository model, with strong metadata quality, to move towards SI at the functional levelmetadata quality, to move towards SI at the functional level

Also note that we can do this across many languagesAlso note that we can do this across many languages

2323

Sidebar -- What is Teragram?Sidebar -- What is Teragram?

Semantic analysis tools which support concept extraction, Semantic analysis tools which support concept extraction, categorization, summarization and pattern matching rules categorization, summarization and pattern matching rules enginesengines

Teragram works in 23 languagesTeragram works in 23 languages

Use categorization to capture Topics, Business Activities, Use categorization to capture Topics, Business Activities, Regions, Sectors, Themes, etc.Regions, Sectors, Themes, etc.

Use Concept Extraction to capture keywordsUse Concept Extraction to capture keywords

Use Rules Engine to capture Loan #, Credit #, Project ID, Trust Use Rules Engine to capture Loan #, Credit #, Project ID, Trust Fund #, etc.Fund #, etc.

Use Summarization to generate a ‘gist’ of the contentUse Summarization to generate a ‘gist’ of the content

2424

Use of Semantic Technologies - Use of Semantic Technologies - ExampleExample

Sample structure –Topics Classification Scheme (hierarchical Sample structure –Topics Classification Scheme (hierarchical taxonomy)taxonomy)

Oracle data classes used to represent Topic Classification Oracle data classes used to represent Topic Classification scheme scheme

hierarchical taxonomy as reference source for the hierarchical taxonomy as reference source for the attribute – Topicattribute – Topic

used for Browse, Search, Content Syndication, used for Browse, Search, Content Syndication, PersonalizationPersonalization

11stst challenge is to architect the hierarchy correctly challenge is to architect the hierarchy correctly 3 distinct data classes, not a tree structure with 3 distinct data classes, not a tree structure with

inheritanceinheritance Allows you to use the three data classes for distinct Allows you to use the three data classes for distinct

functions across systems but still enforce relationships functions across systems but still enforce relationships across the classesacross the classes

2525

3 OracleData

classes

2626

Relationships across data

classes

2727

Subtopics

Domain concepts or controlled vocabulary

2828

Extensive operators allow us to write

grammatical rules to manage typical semantic

problems

2929

Concept based rules engine allows us to define patterns to

capture other kinds of data

3030

Example of use of Authority Control to capture country

names but extract ‘authorized’ version of

country name

Example of use of a gazetteer + concept

extraction + rules engine to support semantic

interoperability

3131

Use of concept extraction + rules engine to capture Loan #, Credit #,

Project ID#

3232

Caution Regarding ToolsCaution Regarding Tools

Not all tools will do what we describing hereNot all tools will do what we describing here

You need to have an underlying You need to have an underlying semantic enginesemantic engine which which can perform semantic analysis – Bayesian/statistical can perform semantic analysis – Bayesian/statistical data mining approaches will not work in this waydata mining approaches will not work in this way

You need to have a semantic engine in You need to have a semantic engine in multiple multiple languageslanguages – semantics vary by language – semantics vary by language

You need to have access to the programs through a You need to have access to the programs through a user-friendly interface so you can adapt them to your user-friendly interface so you can adapt them to your environment without having to have programming environment without having to have programming knowledgeknowledge

You need to have several different kinds of You need to have several different kinds of technologies to do what I’m describing heretechnologies to do what I’m describing here

Not all the tools on the market today support this workNot all the tools on the market today support this work

3333

How does semantic analysis work?How does semantic analysis work?

3434

Semantic Analysis BasicsSemantic Analysis Basics

Once you have made some sense of the sentence, Once you have made some sense of the sentence, reconstruct entities for information extraction (compose)reconstruct entities for information extraction (compose)

Identify names and other fixed form expressions – Identify names and other fixed form expressions – people, organizations, conferencespeople, organizations, conferences

Identify basic noun groups, verb groups, Identify basic noun groups, verb groups, presentations, other grammatical elementspresentations, other grammatical elements

Use exposed grammars to construct rules for Use exposed grammars to construct rules for targeted entity extraction - noun groups and verb targeted entity extraction - noun groups and verb groupsgroups

Identify event structuresIdentify event structures

Identify common elements and associate Identify common elements and associate

3535

Enterprise Profile

Development & Maintenance

Enterprise Metadata Profile

Concept Extraction TechnologyCountryOrganization NamePeople NameSeries Name/Collection TitleAuthor/CreatorTitlePublisher Standard Statistical VariableVersion/Edition

Categorization TechnologyTopic CategorizationBusiness Function CategorizationRegion CategorizationSector CategorizationTheme Categorization

Rule-Based CaptureProject IDTrust Fund #Loan #Credit #Series #Publication DateLanguage

Summarization

e-CDS Reference Sources forCountry, Region, Topics

Business Function, Keywords,Project ID, People, Organization

Data GovernanceProcess for

Topics, Business Function,Country, Region, Keywords,

People, Organizations, Project ID

Teragram Team

TK240 Client ISP IRIS ImageBankFactiva

JOLISE-Journals

Enterprise Profile Creation and Maintenance

UCM ServiceRequests

Update & Change Requests

3636

Next Steps - DiscussionNext Steps - Discussion

Purpose of this presentation is to try to frame the Purpose of this presentation is to try to frame the discussion for the Ontolog community going discussion for the Ontolog community going forwardforward

Next week we will have a panel of speakers who Next week we will have a panel of speakers who will talk about aspects of the challenge of will talk about aspects of the challenge of developing and applying an ontology for the developing and applying an ontology for the Ontolog contentOntolog content

In the time that we have remaining today, might In the time that we have remaining today, might we discuss what other issues need to be added to we discuss what other issues need to be added to the framework?the framework?

3737

Thank You!Thank You!