how to reveal hidden relationships in data and risk analytics
TRANSCRIPT
Presentation Outline
• Discovery and analytics case
• Data integration and FIBO mapping
• Discovery and analytics examples
• Future work
Apr 2016 Hidden Relationships in Data and Risk Analytics
Relation Discovery Case
Apr 2016 Hidden Relationships in Data and Risk Analytics
• Find suspicious
relationships like:
− Company in USA controls
− Another company in USA
− Through a company in an
off-shore zone
• Show news
relevant to them
• Database of locations with sub-region info
• Database with companies and control relations
• Define the semantics of the relevant relationships (using FIBO) – sub-region and control are transitive relationships
– located-in is transitive over sub-region
• Define suspicious relationships
CONSTRUCT { ?orgA my:suspiciousLink ?orgB } WHERE {
?orgA ptop:locatedIn ?x ; fibo:controls ?y .
?y fibo:controls ?orgB ; ptop:locatedIn ?z .
?orgB ptop:locatedIn ?x .
?z a ptop:OffshoreZone .
}
What It Takes to Make It Work?
Hidden Relationships in Data and Risk Analytics Apr 2016
Presentation Outline
• Discovery and analytics case
• Data integration and FIBO mapping
• Discovery and analytics examples
• Future work
Apr 2016 Hidden Relationships in Data and Risk Analytics
The Web of Linked Data in 2007
Apr 2016 Hidden Relationships in Data and Risk Analytics
structured database
version of Wikipedia
database of all
locations on Earth
product
reviews
semantic synonym
dictionary
Note: Each bubble represents a dataset.
Arrows represent mappings across datasets; e.g. dbpedia:Paris owl:sameAs geo:2988507
The Web of Linked Data is Gaining Mass
Apr 2016 Hidden Relationships in Data and Risk Analytics
• 2013 stats: 2 289 public datasets − http://stats.lod2.eu/
• Growing exponentially − see the dotted trend line
• Structured markup − Schema.org; semantic SEO
• Enables better semantic tagging! − As there are more concepts and
richer descriptions to refer to
27 43 89 162295
822
2,289
2007 2008 2009 2010 2011 2012 2013
Linked Data Datasets
Data Integration and Loading
• DBpedia (the English version only) 496M statements
• Geonames (all geographic features on Earth) 150M statements − owl:sameAs links between DBpedia and Geonames 471K statements
• Company registry data (GLEI) 3M statements
• News metadata (from NOW) 128M statements
• Total size: 986М statements − 667M explicit statements + 318M inferred statements
− RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-region constraints
Apr 2016 Hidden Relationships in Data and Risk Analytics
Global Legal Entity Identifier (GLEI) data
Apr 2016
• Global Markets Entity Identifier (GMEI) Utility data − The Global Markets Entity Identifier (GMEI) utility is DTCC's legal entity identifier solution offered in
collaboration with SWIFT
− We downloaded data dump from https://www.gmeiutility.org/
• RDF-ized company records − Fields: LEI#, legal name, ultimate parent, registered country
− 3M explicit statements for 211 thousand organizations
▪ For comparison, there are 490 000 organizations in DBPeda and D&B covers above 200 million
− 10,821 ultimate parent relationships and 1632 ultimate parents
− About 2 800 organizations from the GLEI dump mapped to DBPedia
Hidden Relationships in Data and Risk Analytics
GLEI Company Data Sample: ABN-AMRO
Apr 2016 Hidden Relationships in Data and Risk Analytics
lei:businessRegistry "Kamer van Koophandel"^^xsd:string
lei:businessRegistryNumber "34334259"^^xsd:string
lei:duplicateReference data:549300T5O0D0T4V2ZB28
lei:entityStatus "ACTIVE"^^xsd:string
lei:headquartersCity "Amsterdam"^^xsd:string
lei:headquartersState "Noord-Holland"^^xsd:string
lei:legalForm "NAAMLOZE VENNOOTSCHAP"^^xsd:string
lei:legalName "ABN AMRO Bank N.V."^^xsd:string
lei:lei "BFXS5XCH7N0Y05NIXW11"^^xsd:string
lei:registeredCity "Amsterdam"^^xsd:string
lei:registeredCountry "NL"^^xsd:string
lei:registeredPostCode "1082 PP"^^xsd:string
lei:registeredState "Noord-Holland"^^xsd:string
Global Legal Entity Identifier (GLEI) data
Apr 2016 Hidden Relationships in Data and Risk Analytics
Ultimate parent Children Country
1 The Goldman Sachs Group, Inc. 1 851 US
2 United Technologies Corporation 427 US
3 Honeywell International Inc. 341 US
4 Morgan Stanley 228 US
5 Cargill, Incorporated 217 US
6 1832 Asset Management L.P. 202 CA
7 Aegon N.V. 174 NL
8 Union Bancaire Privée, UBP SA 138 CH
9 Citigroup Inc. 135 US
10 State Street Corporation 128 US
Country Companies
1 dbr:United_States 103 548
2 dbr:Canada 17 425
3 dbr:Luxembourg 13 984
4 dbr:Sweden 7 934
5 dbr:United_Kingdom 7 421
6 dbr:Belgium 6 868
7 dbr:Ireland 4 762
8 dbr:Australia 4 385
9 dbr:Germany 3 039
10 dbr:Netherlands 2 561
Quick news-analytics case
Apr 2016 Hidden Relationships in Data and Risk Analytics
• Our Dynamic Semantic
Publishing platform
already offers linking
of text with big open
data graphs
• One can get navigate
from text to concepts,
get trends, related
entities and news
• Try it at
http://now.ontotext.com
News Metadata
• Metadata from Ontotext’s Dynamic Semantic Publishing platform − Automatically generated as part of the NOW.ontotext.com semantic news showcase
• News stream from Google since Feb 2015, about 10k news/month − ~70 tags (annotations) per news article
• Tags link text mentions of concepts to the knowledge graph − Technically these are URIs for entities (people, organizations, locations, etc.) and key phrases
Apr 2016 Hidden Relationships in Data and Risk Analytics
News Metadata
Apr 2016 Hidden Relationships in Data and Risk Analytics
Category Count
International 52 074
Science and Technology 23 201
Sports 20 714
Business 15 155
Lifestyle 11 684
122 828
Mentions / entity type Count
Keyphrase 2 589 676
Organization 1 276 441
Location 1 260 972
Person 1 248 784
Work 309 093
Event 258 388
RelationPersonRole 236 638
Species 180 946
Class Hierarchy Map (by number of instances)
Apr 2016 Hidden Relationships in Data and Risk Analytics
Left: The big picture Right: dbo:Agent class (2.7M organizations and persons)
Loading FIBO
• FIBO = Financial Industry Business Ontology
• We loaded FIBO Foundations and BE in GraphDB − About 55 RDF files the “foundations-14-11-30” and “business-eneitites-15-02-23” packages
• Reasoning switched to OWL 2 RL − Loading takes 3-4 seconds
• Number of explicit statements: 5 433
• Number of total statements: 20 646 − Of which inferred and materialized: 15 213
Apr 2016 Hidden Relationships in Data and Risk Analytics
Mapping FIBO to DBPedia
• We mapped FIBO to DBPedia Ontology − Minimalistic approach – we mapped as much as we needed
dbo:Organization rdfs:subClassOf fibo-fnd-org-fm:FormalOrganization.
dbo:Company rdfs:subClassOf fibo-be-le-cb:Corporation.
dbo:Person rdfs:subClassOf fibo-fnd-aap-ppl:Person.
dbo:subsidiary rdfs:subPropertyOf fibo-fnd-rel-rel:controls.
• Methodological notes − Note, fibo-fnd-rel-rel:controls is not transitive
− We mapped more specific DBPedia primitives to more general FIBO, so, that data becomes “visible” through FIBO
Apr 2016 Hidden Relationships in Data and Risk Analytics
Presentation Outline
• Discovery and analytics case
• Data integration and FIBO mapping
• Discovery and analytics examples
• Future work
Apr 2016 Hidden Relationships in Data and Risk Analytics
Semantic Press-Clipping
• We can trace references to a specific company in the news − This is pretty much standard, however we can deal with syntactic variations in the names, because state
of the art Named Entity Recognition technology is used
− What’s more important, we distinguish correctly in which mention “Paris” refers to which of the following: Paris (the capital of France), Paris in Texas, Paris Hilton or to Paris (the Greek hero)
• We can trace and consolidate references to daughter companies
• We have comprehensive industry classification − The one from DBPedia, but refined to accommodate identifier variations and specialization (e.g.
company classified as dbr:Bank will also be considered classified as dbr:FinancialServices)
Apr 2016 Hidden Relationships in Data and Risk Analytics
Mentions of related entities
select distinct ?news ?title ?date ?rel_entity
from onto:disable-sameAs
where {
BIND( dbr:Volkswagen_Group as ?entity )
{ ?entity fibo-fnd-rel-rel:controls ?rel_entity }
UNION
{ BIND(?entity as ?rel_entity) }
?news pub-old:containsMention / pub-old:hasInstance / pub:exactMatch ?rel_entity .
?news pub-old:creationDate ?date; pub-old:title ?title .
FILTER ( (?date > "2015-04-01T00:02:00Z"^^xsd:dateTime)
&& (?date < "2015-05-01T00:02:00Z"^^xsd:dateTime))
}
Apr 2016 Hidden Relationships in Data and Risk Analytics
Industry distribution
Apr 2016 Hidden Relationships in Data and Risk Analytics
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX ff-map: <http://factforge.net/ff2016-mapping/>
select distinct ?top_industry (count(?company) as ?companies)
where {
?company dbo:industry ?industry .
?industrySum ff-map:industryVariant ?industry;
ff-map:industryCenter ?top_industry .
} group by ?top_industry order by desc(?companies)
Most popular companies per industry
Apr 2016 Hidden Relationships in Data and Risk Analytics
select distinct ?pub_entity ?label (count(?news) as ?news_count)
where {
?news pub-old:containsMention / pub-old:hasInstance ?pub_entity .
?pub_entity pub:exactMatch ?entity; pub:preferredLabel ?label.
?entity dbo:industry ?industry .
dbr:Automotive ff-map:industryVariant ?industry .
} group by ?pub_entity ?label order by desc(?news_count)
Most popular companies, including children
Apr 2016 Hidden Relationships in Data and Risk Analytics
select distinct ?parent (count(?news) as ?news_count)
where {
{ select distinct ?parent ?entity {
BIND(dbr:Software as ?industry)
?industry ff-map:industryVariant ?industryVar .
?parent dbo:industry ?industryVar .
?parent a dbo:Company .
FILTER NOT EXISTS { ?parent dbo:parent / dbo:industry / ff-map:industryVariant ?industry }
{ ?entity dbo:parent ?parent . } UNION
{ BIND(?parent as ?entity) }
} }
?news pub-old:containsMention / pub-old:hasInstance ?pub_entity .
?pub_entity pub:exactMatch ?entity .
?news pub-old:creationDate ?date .
} group by ?parent order by desc(?news_count)
News Popularity Ranking: Automotive
Apr 2016 Hidden Relationships in Data and Risk Analytics
Rank Company News # Rank Company incl. mentions of controlled News #
1 General Motors 2722 1 General Motors 4620
2 Tesla Motors 2346 2 Volkswagen Group 3999 3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658
4 Ford Motor Company 1934 4 Tesla Motors 2370 5 Toyota 1325 5 Ford Motor Company 2125
6 Chevrolet 1264 6 Toyota 1656
7 Chrysler 1054 7 Renault-Nissan Alliance 1332
8 Fiat Chrysler Automobiles 1011 8 Honda 864
9 Audi AG 972 9 BMW 715
10 Honda 717 10 Takata Corporation 547
News Popularity: Finance
Apr 2016 Hidden Relationships in Data and Risk Analytics
Rank Company News # Rank Company incl. mentions of controlled News #
1 Bloomberg L.P. 3203 1 Intra Bank 261667
2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731 3 JP Morgan Chase 1712 3 China Merchants Bank 38288
4 Wells Fargo 1688 4 Alphabet Inc. 22601 5 Citigroup 1557 5 Capital Group Companies 4076
6 HSBC Holdings 1546 6 Bloomberg L.P. 3611
7 Deutsche Bank 1414 7 Exor 2704
8 Bank of America 1335 8 Nasdaq, Inc. 2082
9 Barclays 1260 9 JP Morgan Chase 1972
10 UBS 694 10 Sentinel Capital Partners 1053
Note: Including investment funds, stock exchanges, agencies, etc.
News Popularity: Banking
Apr 2016 Hidden Relationships in Data and Risk Analytics
Rank Company News # Rank Company incl. mentions of controlled News #
1 Goldman Sachs 996 1 China Merchants Bank * 38288
2 JP Morgan Chase 856 2 JP Morgan Chase 1972 3 HSBC Holdings 773 3 Goldman Sachs 1030
4 Deutsche Bank 707 4 HSBC 966 5 Barclays 630 5 Bank of America 771
6 Citigroup 519 6 Deutsche Bank 742
7 Bank of America 445 7 Barclays 681
8 Wells Fargo 422 8 Citigroup 630
9 UBS 347 9 Wells Fargo 428
10 Chase 126 10 UBS 347
Note: including investment funds, stock exchanges, agencies, etc.
Regional exposition of a company
Apr 2016 Hidden Relationships in Data and Risk Analytics
select distinct ?country (count(*) as ?count)
from onto:disable-sameAs
where {
{ select distinct ?related_entity {
BIND ( dbr:Toyota as ?entity )
{ ?related_entity ff-map:agentRelation ?entity . } UNION
{ BIND(?entity as ?related_entity) }
}
}
?news pub-old:containsMention / pub-old:hasInstance
/ pub:exactMatch ?related_entity .
?news pub:country ?country .
} group by ?country order by desc(?count)
Regional exposition – normalized
Apr 2016 Hidden Relationships in Data and Risk Analytics
select distinct ?country (count(*) as ?count) (?count / ?country_score as ?score)
from onto:disable-sameAs
where {
{ select distinct ?related_entity {
BIND ( dbr:BP as ?entity )
{ ?related_entity ff-map:agentRelation ?entity . } UNION
{ BIND(?entity as ?related_entity) }
}
}
?news pub-old:containsMention / pub-old:hasInstance
/ pub:exactMatch ?related_entity .
?news pub:country ?country .
?country ff-map:countryPopularityScore ?country_score .
} group by ?country ?country_score having (?count > 20) order by desc(?score)
Relationships discovery examples
• Companies that control other companies across countries
• Companies that control other companies in the same country through a company in another country
• Companies that control other companies in the same country through a company in an off-shore zone
Apr 2016 Hidden Relationships in Data and Risk Analytics
Presentation Outline
• Discovery and analytics case
• Data integration and FIBO mapping
• Discovery and analytics examples
• Future work
Apr 2016 Hidden Relationships in Data and Risk Analytics
Analytics with relations extracted from text
Apr 2016 Hidden Relationships in Data and Risk Analytics
Subject Object Count
dbr:Chrysler dbr:Fiat_Chrysler_Automobiles 455
dbr:NASA dbr:Goddard_Space_Flight_Center 69
dbr:Time_Warner_Cable dbr:Comcast 44
dbr:National_Football_League dbr:New_England_Patriots 40
dbr:DirecTV dbr:AT&T 33
dbr:Alcatel-Lucent dbr:Nokia 31
dbr:AOL dbr:Verizon_Communications 30
dbr:University_of_Pennsylvania dbr:Perelman_School_of_Medicine_at_... UPEN 29
dbr:Time_Warner_Cable dbr:Charter_Communications 27
dbr:Continental_Airlines dbr:United_Airlines 26
Note: relation types "RelationOrganizationAffiliatedWithOrganization" "RelationAcquisition" "RelationMerger"
Future Work
Apr 2016
• Comprehensive mapping of LEI data
• Experiments on Ultimate Parent discovery
• Partnership with commercial data providers
• Organizations, related in the news, but not in other datasets
• Organizations, co-occurring in the news, but not in other datasets
• Construct a profile of related entities for an orgnization
Hidden Relationships in Data and Risk Analytics
Wrap up
Apr 2016
• We allow Open Data to be accessed via FIBO − It took just few days to clean up DBPedia’s industry classifications and control relationships
• Integrating more data sources is easy (e.g. GLEI) − We can integrate proprietary and 3rd party data within days or weeks
• We can perform analytics on metadata − Regional exposition, popularity of entities, relation extraction
• All integrated in proven products and solutions − GraphDB triplestore, OpenPolicy, Dynamic Semantic Publishing platform
Hidden Relationships in Data and Risk Analytics
Thank you!
Experience the technology with NOW: Semantic News Portal
http://now.ontotext.com
Start using GraphDB and text-mining with S4 in the cloud
http://s4.ontotext.com
Learn more at our website or simply get in touch
[email protected], @ontotext
Apr 2016 Hidden Relationships in Data and Risk Analytics