![Page 1: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/1.jpg)
Snapshot of Semantic Web Commercial State of the Art
(presented at Science on the Semantic Web, Rutgers, October 2002)
Amit Sheth
CTO, Semagix Inc. Large Scale Distributed Information Systems (LSDIS) Lab
University Of Georgia; http://lsdis.cs.uga.edu
October 24, 2002© Amit Sheth
Based on Keynote
CONTENT- AND SEMANTIC-BASED INFORMATION RETRIEVAL @ SCI 2002
![Page 2: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/2.jpg)
I am not selling any product here.
It is interesting to note SW = Software has move to SW = Semantic Web
![Page 3: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/3.jpg)
Fundamental Issue
• Ontology Creation and maintenance– Human consensus + automatic KB
(assertion) extraction
• Automatic Semantic Annotation• Extremely fast computations
exploiting semantic metadata– Especially named relationships
![Page 4: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/4.jpg)
Central Role of Metadata
Where is the
content? Whose is
it?
ProduceAggregate
What is this
content about?
Catalog/Index
What other
content is it
related to?
Integrate Syndicate
What is the right
content for this user?
Personalize
What is the best way to
monetize this interaction?
Interactive Marketing
Broadcast,Wireline,Wireless,Interactive TV
Semantic Metadata
ApplicationsBack End
"A Web content repository without metadata is like a library without an index." - Jack Jia, IWOV“Metadata increases content value in each step of content value chain.” Amit Sheth
![Page 5: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/5.jpg)
A Metadata Classification
Data (Heterogeneous Types/Media)(Heterogeneous Types/Media)
Content Independent Metadata (creation-date, location, type-of-sensor...)(creation-date, location, type-of-sensor...)
Content Dependent Metadata (size, max colors, rows, columns...)(size, max colors, rows, columns...)
Direct Content Based Metadata (inverted lists, document vectors, LSI)(inverted lists, document vectors, LSI)
Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML(C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...)Document Type Definitions, C program structure...)
Domain Specific Metadata area, population (Census),area, population (Census), land-cover, relief (GIS),metadata land-cover, relief (GIS),metadata concept descriptions from ontologiesconcept descriptions from ontologies
OntologiesClassificationsClassificationsDomain ModelsDomain Models
User
More More
SemanticsSemantics
for for
Relevance Relevance
to tackleto tackle
InformationInformation
Overload!!Overload!!
![Page 6: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/6.jpg)
Semantic Metadata Extraction, Semantic Annotation
WWW, EnterpriseRepositories
METADATAMETADATA
EXTRACTORSEXTRACTORS
Digital Maps
NexisUPIAPFeeds/
Documents
Digital Audios
Data Stores
Digital Videos
Digital Images. . .
. . . . . .
Key challenge: Create/extract as much (semantics)metadata automatically as possible
![Page 7: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/7.jpg)
![Page 8: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/8.jpg)
Semantic Content Organization and Retrieval Engine (SCORE) technology
• Automatically aggregates and extracts information
from
disparate sources and multiple formats
• Automatically tags/annotates and categorizes
content
• Automatically creates relevant associations
- Maps content topics and their relationships
• Semantic query engine relates information and
knowledge
both internal and external to the organization into a
single
view
![Page 9: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/9.jpg)
Semagix Freedom Product Components
![Page 10: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/10.jpg)
Market Guide (MG)ZDNet (ZD)
Hoover’s (H)Data supplied from NASA (DPL)
Federation of American Scientists (FAS)Central Intelligence Agency (CIA)
The Interdisciplinary Center (ICT)Federal Bureau of Investigation (FBI)
Capital Advantage (CA)Office of Foreign Assets Control (OFAC)
PERSON (OFAC, FBI, DPL)
-politician (OFAC, FBI, CIA, CA)
politician associated with politicalOrganziation
politician held politicalOffice
politician associated with politicalOffice
-terrorist (OFAC, FBI, DPL)
terrorist memberOf organization
terrorist appears on watchList
-companyExecutive (MG)
companyExecutive holdsOffice companyPosition
person has permanent address address (OFAC, FBI)
person has dob(date of birth) (OFAC, FBI)
person has pob(place of birth) (OFAC, FBI)
Knowledge Sources Used
THING
-event (ICT)
terroristOrganization participated in terroristSponsoredEvent (ICT)
-politicalOffice (CIA, CA)
politicalOffice office(s) within govtOrganization
politicalOffice associated with organization
-watchList (OFAC, FBI, DPL)
terroristOrganization appears on watchList (OFAC, FBI, DPL)
-organization (OFAC, FBI, FAS, ICT, CA, CIA)
organization appears on watchList
organization memberOf suborganization
-company
company manufactures product (ZD)
company identifiedBy tickeySymbol (H)
companyposition position in company (MG)
company memberOf industry (H)
-tickerSymbol (H)
tickerSymbol memberOf exchange (H)
PLACE
-organization located in place (H, OFAC)
-religiousAffiliation practiced in place (CIA)
-company headquarters in city (H)
Entity Classes and Relationships populated by these knowledge sources:
JIVA
![Page 11: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/11.jpg)
Video withEditorialized Text on the Web
AutoCategorization
AutoCategorization
Semantic MetadataSemantic Metadata
Automatic Categorization & Metadata Tagging (unstructured text)
![Page 12: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/12.jpg)
Extraction Agent
Enhanced Metadata Asset
Semantic Metadata Extraction/Annotation:Semi-structured source
Web Page
![Page 13: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/13.jpg)
Semantic Metadata
Syntax Metadata
Semantic Content Enhancement Workflow
![Page 14: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/14.jpg)
Enabling powerful linking of actionable information and facilitating important semantic applications such as knowledge discovery and link analysis
(user’s task of manually retrieving all the information he needs to know is greatly minimized; he can spend more time making effective decisions)
Semantic Metadata Content TagsCompany: Cisco Systems, Inc.Classification: Channel Partners,
E-Business SolutionsChannel Partner: Siemens NetworkChannel Partner: Voyager NetworkChannel Partner: Siemens NetworkChannel Partner: Wipro GroupE-Business Solution: CI S-1270 SecurityE-Business Solution: CI S-320 LearningE-Business Solution: CI S-6250 FinanceE-Business Solution: CI S-1005 e-MarketTicker: CSCOI ndustry: Telecommunication, . . .Sector: Computer HardwareExecutive: J ohn ChambersCompetition: Nortel Networks
Syntactic MetadataProducer: BusinessWireSource: BloombergDate: Sept. 10 2001Location: San J ose, CAURL: http:/ /bloomberg.com/1.htmMedia: Text
XML content item with enriched semantic tagging, ready to be queried
E-Business SolutionOntology
CiscoSystems
VoyagerNetwork
SiemensNetwork
WiproGroup
UlysysGroup
CIS-1270 Security
CIS-320Learning
CIS-6250 Finance
CIS-1005 e-Market
Channel Partner
belongs to
- - -
Ticker
represen
ted b
y
- - -
- - -
- - -
- - -
Industry
chan
nel p
artn
er of
- - -
- - -
- - -
- - -
Competitioncompetes with
provider of
- - -
- - -
- - -
- - -
Executives
works
for
- - -
- - -
- - -
- - -
Sectorbelo
ngs
to
Semantic Enhancement
Uniquelyexploiting
real-worldsemantic
associationsin the right
context
SemanticMetadataExtraction
(also syntactic)
Content TagsSemantic MetadataClassification: Channel Partners,
E-Business SolutionsCompany: Cisco Systems, Inc.
Syntactic MetadataProducer: BusinessWireSource: BloombergDate: Sept. 10 2001Location: San J ose, CAURL: http: //bloomberg.com/1.htmMedia: Text
ChannelPartners
E-BusinessSolutionsClassification
Content Tags
Semantic MetadataClassification: Channel Partners,
E-Business Solutions
Classification CommitteeKnowledge-base, Machine Learning &
Statistical Techniques
Content Asset Index Evolution
![Page 15: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/15.jpg)
Focused relevantcontent
organizedby topic
(semantic categorization)
Automatic ContentAggregationfrom multiple
content providers and feeds
Related relevant content not
explicitly asked for (semantic
associations)
Competitive research inferred
automatically
Automatic 3rd party content
integration
Semantic Application Example – Analyst Workbench
![Page 16: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/16.jpg)
Related Stock
News
Related Stock
News
Semantic Web – Intelligent Content
IndustryNews
IndustryNews
Technology Products
Technology Products
COMPANYCOMPANY
SECEPAEPA
RegulationsRegulations
CompetitionCompetition
COMPANIES in Same or Related INDUSTRY
COMPANIES inINDUSTRY with Competing PRODUCTS
Impacting INDUSTRY or Filed By COMPANY
Important to INDUSTRY or COMPANY
Intelligent Content = What You Asked for + What you need to know!
![Page 17: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/17.jpg)
Syntax Metadata
Semantic Metadata
led by
Same entity
Human-assisted inference
Knowledge-based & Manual Associations
![Page 18: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/18.jpg)
Blended Semantic Browsing and Querying (Intelligence Analyst
Workbench)
![Page 19: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/19.jpg)
Innovations that affect User Experience
• BSBQ: Blended Semantic Browsing and Querying
– Ability to query and browse relevant desired content in a highly contextual manner
• Seamless access/processing of Content, Metadata and Knowledge
– Ability to retrieve relevant content, view related metadata, access relevant knowledge and switch between all the
above, allowing user to follow his train of thought
• dACE: dynamic Automatic Content Enhancement
– Ability to provide enhanced annotation features, allowing the user to retrieve relevant knowledge about significant
pieces of content during content consumption
• Semantic Engine APIs with XML output
– Ability to create customized APIs for the Semantic Engine involving Semantic Associations with XML output to
cater to any user application
![Page 20: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/20.jpg)
VisionicsAcSysSecurity Portal
Check-in
Interrogation
Boarding Gate AirportAirspace
SemagixOntologyMetabase
Threat Scoring
Gov’t WatchlistsNews Media
Web Info
LexisNexisRiskWise
Passenger RecordsReservation Data
Airline DataAirport Data
Airline and Airport Data Future and Current Risks
Airport LEO
ARC AvSec ManagerData Management
Data Mining
IPG
![Page 21: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/21.jpg)
Sources Used
Knowledge Sources:FBI - Most Wanted Terrorists
Denied Persons Lists
Terrorism Files
ICT
Office of Foreign Asset Control (OFAC)
Hamas terrorists
CNN Locations
FAA_Airport_Codes
About.com
Comtex_International
Hindustan Times
JerusalemPost
CNN
Newstrove_Hamas
Content Sources :
Africa News Service
AFX News – Asia/UK/Europe
AP Worldstream
Asia Pulse
BusinessWire
ComputerWire (CTW)
EFE News Services
FWN Select
Itar-TASS
Knight Ridder News (Open)
Knight-Ridder Open
M2 - International
M2 Airline Industry Information
New World Publishing
PR Newswire
PRLine (PRL)
Resource News International
RosBusiness
United Press International
UPI Spotlights
![Page 22: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/22.jpg)
Semagix’s Semantic
Technology enables flight
authorities to :
- take a quick look at the
passenger’s history
- check quickly if the passenger is
on any official watchlist
- interpret and understand
passenger’s links to other
organizations (possibly terrorist)
- verify if the passenger has
boarded the flight from a “high
risk” region
- verify if the passenger originally
belongs to a “high risk” region
- check if the passenger’s name
has been mentioned in any news
article along with the name of a
known bad guy
Interrogation Kiosk – Unique Advantages of Semagix
SmithJohn
![Page 23: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/23.jpg)
SmithJohn
Threat Score Components
LEXIS NEXIS ANNOTATION
Action: Information about or related to the passenger returned by Lexis Nexis is enhanced by linking important entities to Semagix’s rich ontology
Ability Proven: Ability to automatically aggregate relevant rich domain knowledge, recognize entities in a piece of text and further automatically co-relate it with other data in the ontology to present a clear picture about the passenger to the flight official
Flight Coutry Check 45 0.15
Person Country Check 25 0.15
Nested Organizations Check 75 0.8
Aggregate Link Analysis Score: 17.7
LINK ANALYSIS
Action: Semantic analysis of the various components (watchlist, Lexis Nexis, ontology search, metabase search, etc.) to come up with an aggregate threat score for the passenger
Ability Proven: Ability to automatically aggregate relevant rich domain knowledge, recognize entities in a piece of text, automatically co-relate it with other data in the ontology, search for relevant content to present an overall idea of the threat level fo the passenger, allowing him to take quick action
appearsOn watchList:
FBI
ONTOLOGY SEARCH
Action: Semagix’s rich ontology is searched for this name and associated information like position, aliases, relationships (past or present) of this name to other organizations, watchlists, country, etc. are retrieved
Ability Proven: Ability to automatically aggregate relevant rich domain knowledge about a passenger and automatically co-relate it with other data in the ontology to present a visual association picture to the flight official
METABASE SEARCH
Action: Semagix’s rich metabase is searched for this name and associated content stories mentioning the passenger’s name are retrieved
Ability Proven: Ability to automatically aggregate and retrieve relevant content stories, field reports, etc. about the passenger that can be used by flight officials to determine if the passenger has any connections with known bad people or organizations
WATCHLIST ANALYSIS
Action: Semagix’s rich ontology is automatically searched for the possible appearance of this name on any of the watchlists
Ability Proven: Ability to automatically aggregate relevant rich domain knowledge and automatically co-relate it and rank the threat factors to indicate threat level of the passenger on the watchlist front
![Page 24: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/24.jpg)
What it will take RDBMS to support flight security application
Link Analysis Component # Queries (Voquette) # Queries (RDBMS) Time (Voquette) Time (RDBMS)
Direct Watchlist Match (person name)lookup person entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve person's relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 sec
Organization Watchlist Match (person name, organization name)lookup person entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve person's relationships to organizations 1 SQL Query 1 SQL Query .005 sec .005 secretrieve the organizations' relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 seclook up organization entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve the organizations' relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 sec
Nested Organization Watchlist Match (person name, organization name)look up organization entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve the organization's relationships to organizations 1 SQL Query 1 SQL Query .005 sec .005 secretrieve the organizations' relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 sec
Flight Origin (country name)retrieve country entity 1 SQL Query 1 SQL Query .005 sec .005 secsee if country is on a list containing "high-risk" countries 1 SQL Query 1 SQL Query .005 sec .005 sec
Person Origin (person name)lookup person entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve person's home country 1 SQL Query 1 SQL Query .005 sec .005 secretrieve the organization's relationships to lists containing "high-risk" countries 1 SQL Query 1 SQL Query .005 sec .005 sec
Field Report Search (person name)perform SSE query for field reports that mention this person 1 SSE Request 2 SQL Queries .03 sec 5-30 secretrieve a list of people associated with these field reports 1 SQL Query 1 SQL Query .005 sec .005 secdetermine which people are on watchlists, terrorists, etc… 1 SQL Query 1 SQL Query .005 sec .005 sec
18 requests 39-64 SQL Queries .33 sec 30-80 sec.
Query Comparison:Semagix vs. RDBMS
![Page 25: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/25.jpg)
Performance
> 10,000 entities/relationships per hr.Population/update rate in a Ontology with 1 million entities/relationships
1 minute (near real-time)Incremental Index Update Frequency
65msQuery Response Time (64 concurrent users)
1 - 10 msQuery Response Time (light load)
> 1,980,000Queries per server per hour
![Page 26: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d635503460f94a463f1/html5/thumbnails/26.jpg)
More at www.semagix.comand
http://lsdis.cs.uga.edu/lib/presentations.html