res entity resolver res real identities · +1 617-386-2090 start using rni today try our free...

2
Names are the linchpin that connect data points in financial compliance, anti- fraud, government intelligence, law enforcement, and identity verification. Yet, names are challenging to connect because of their incredible variation in misspellings, nicknames, initials, and titles. In international databases, a single name may also appear in many languages! Rosette® Name Indexer (RNI) solves these challenges with a linguistic, knowledge-based system that compares and matches names of people, places, and organizations despite their many variations. RNI is unrivalled in its ability to match names because of its intelligent approach. As linguistics experts with deep understanding at the intersection of language and technology, Basis Technology continually improves the Rosette product family with language additions, feature updates, and the latest innovations from the academic world. RNI is unrivalled in its ability to match the names of entities—find out how your organization can utilize this pioneering technology for extraordinary results. Accurate fuzzy name matching in many languages 14 Supported Languages KEY FEATURES - Component of the Rosette SDK - Simple API - Fast and scalable - Industrial-strength support - Easy installation - Flexible and customizable - Java - Unix, Linux, Mac, or Windows - Matches names of people, places, and organizations - Increases name search accuracy - Ranks results by relevancy with a similarity score - Built to work with Apache™ Solr and Elasticsearch Select Customers www.basistech.com [email protected] +1 617-386-2090 Start using RNI today Try our free product evaluation www.basistech.com Franklin D. Roosevelt 32nd U.S. President ID: USPRES32 DOB: Jan. 30, 1882 82% 97% 77% 82% 84% 85% 74% 79% 73% 富兰克林·罗塞费尔特 Gov. Franklin Roosevelt Frank Delano Roosevelt Franklin Rosenvelt President Roosevelt Рузвельт, Франклин F. D. R. F. D. Roosev Franklin Delano Roosevelt, also known by his initials, FDR, was the 32nd President of the United States and a central figure in world events during the mid-20th century, leading the United States during.... RNI ROSETTE Name Indexer

Upload: others

Post on 07-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RES Entity Resolver RES Real Identities · +1 617-386-2090 Start using RNI today Try our free product evaluation Franklin D. Roosevelt 32nd U.S. President ID: USPRES32 DOB: Jan. 30,

Names are the linchpin that connect data points in financial compliance, anti-fraud, government intelligence, law enforcement, and identity verification. Yet, names are challenging to connect because of their incredible variation in misspellings, nicknames, initials, and titles. In international databases, a single name may also appear in many languages!

Rosette® Name Indexer (RNI) solves these challenges with a linguistic, knowledge-based system that compares and matches names of people, places, and organizations despite their many variations. RNI is unrivalled in its ability to match names because of its intelligent approach.

As linguistics experts with deep understanding at the intersection of language and technology, Basis Technology continually improves the Rosette product family with language additions, feature updates, and the latest innovations from the academic world. RNI is unrivalled in its ability to match the names of entities—find out how your organization can utilize this pioneering technology for extraordinary results.

Accurate fuzzy name matching in many languages 14 Supported

Languages

KEY FEATURES

- Component of the Rosette SDK

- Simple API

- Fast and scalable

- Industrial-strength support

- Easy installation

- Flexible and customizable

- Java

- Unix, Linux, Mac, or Windows

- Matches names of people, places, and

organizations

- Increases name search accuracy

- Ranks results by relevancy with a similarity

score

- Built to work with Apache™ Solr and

Elasticsearch

Select Customers

www.basistech.com [email protected]

+1 617-386-2090

Start using RNI today Try our free product evaluation

www.basistech.com

Franklin D. Roosevelt

32nd U.S. PresidentID: USPRES32DOB: Jan. 30, 1882

82%

97%

77%

82%

84%

85%

74%

79%

73%

富兰克林·罗塞费尔特

Gov. Franklin Roosevelt

Frank Delano Roosevelt

Franklin Rosenvelt

President Roosevelt

Рузвельт, Франклин

F. D. R.F. D. Roosev

Franklin Delano Roosevelt, also known by his initials,

FDR, was the 32nd President of the United States

and a central figure in world events during the

mid-20th century, leading the United States during....

Rosette®

BIG TEXT ANALYTICS

RES

RNT

RNI

REX

RBL

RLILanguage Identifier Identify languages and encodings

Base Linguistics Search many languages with high accuracy

Entity Extractor Tag names of people, places, and organizations

Name Indexer Match names between many variations

Name Translator Translate foreign names into English

CategorizerCategorize Everything In Sight

Sentiment AnalyzerDetect The Sentiments Of Your Text

Entity Resolver Make real-world connections in your data

Better Search

Tagged Entities

Real Identities

Matched Names

Sorted Languages

Translated Names

Sorted Content

Actionable Insights

RES

RNT

RNI

REX

RBL

RLI ROSETTELanguage Identifier

ROSETTEBase Linguistics

ROSETTEEntity Extractor

ROSETTEName Indexer

ROSETTEName Translator

ROSETTECategorizer

ROSETTESentiment Analyzer

ROSETTEEntity Resolver

RCA

RSA

RCA

RSA

Page 2: RES Entity Resolver RES Real Identities · +1 617-386-2090 Start using RNI today Try our free product evaluation Franklin D. Roosevelt 32nd U.S. President ID: USPRES32 DOB: Jan. 30,

Our knowledge-based system combines the latest in Natural Language Processing (NLP) to intelligently match names based on their linguistic and cultural structures and norms.

Unlike expensive and less accurate legacy solutions driven by thousands of spelling variants from known names, RNI analyzes the intrinsic structure of each name component and performs an intelligent comparison using advanced linguistic algorithms.

Our approach is not limited to a particular list of variants and reduces the likelihood of both “false positives” (wrong matches) and “false negatives” (missed matches).

List driven systems cannot equal RNI for matching never-seen-before names or mis-segmented names (Mary Ellen vs. MaryEllen).

- Arabic scripts: Arabic, Persian, Pashto, Urdu

- Cyrillic: Russian

- Hangul: Korean

- Hanzi (Simplified & Traditional): Chinese

- Kanji, Katakana, Hirigana: Japanese

- Roman scripts: English, Spanish, French, Italian, German, Portuguese

RNI matches names from these languages either in transliteration to English or written in their native scripts.

Available Languages and Scripts

Name Matching Capabilities

Code Base Platform Support

Compatibility

Same name in multiple languagesMao Zedong 1 Мао Цзэдун 1 毛泽东

Phonetic spelling di erencesCairns 1 Kearns 1 Kerns

Transliteration spelling di erencesAbdul Rasheed 1 Abd-al-Rasheed 1 Abdulrashid

NicknamesWilliam 1 Will 1 Bill 1 Billy

InitialsJ. E. Smith 1 James Earl Smith

Titles and honorificsDr. 1 Mr. 1 Ph.D.

Out-of-order name componentsDiaz, Carlos Alfonzo 1 Carlos Alfonzo Diaz

Missing name componentsPhillip Charles Carr 1 Phillip Carr

Missing spaces or hyphensMaryEllen 1 Mary Ellen 1 Mary-Ellen

Truncated name componentsMcDonalds 1 McD 1 McDonald

Name split inconsistently across database fieldsDick • Van Dyke 1 Dick Van • Dyke

© 2015 Basis Technology Corporation. “Basis Technology Corporation” , “Rosette” and “Highlight” are registered trademarks of Basis Technology Corporation. “Big Text Analytics” is a trademark of Basis Technology Corporation. All other trademarks, service marks, and logos used in this document are the property of their respective owners. (2015-06-29-RNI)

WEST COAST

1700 Montgomery St.San Francisco, CA 94111

FEDERAL

2553 Dulles View Dr.Suite 450Herndon, VA 20171

HEADQUARTERS

One Alewife CenterCambridge, MA 02140

EUROPE

Furzeground WayMiddlesex UB11 1BD, UK

ASIA

9-6 Nibancho, Chiyoda-kuTokyo 102-0084, Japan

Rosette®

BIG TEXT ANALYTICS

RES

RNT

RNI

REX

RBL

RLILanguage Identifier Identify languages and encodings

Base Linguistics Search many languages with high accuracy

Entity Extractor Tag names of people, places, and organizations

Name Indexer Match names between many variations

Name Translator Translate foreign names into English

CategorizerCategorize Everything In Sight

Sentiment AnalyzerDetect The Sentiments Of Your Text

Entity Resolver Make real-world connections in your data

Better Search

Tagged Entities

Real Identities

Matched Names

Sorted Languages

Translated Names

Sorted Content

Actionable Insights

RES

RNT

RNI

REX

RBL

RLI ROSETTELanguage Identifier

ROSETTEBase Linguistics

ROSETTEEntity Extractor

ROSETTEName Indexer

ROSETTEName Translator

ROSETTECategorizer

ROSETTESentiment Analyzer

ROSETTEEntity Resolver

RCA

RSA

RCA

RSA

Rosette®

BIG TEXT ANALYTICS

RES

RNT

RNI

REX

RBL

RLILanguage Identifier Identify languages and encodings

Base Linguistics Search many languages with high accuracy

Entity Extractor Tag names of people, places, and organizations

Name Indexer Match names between many variations

Name Translator Translate foreign names into English

CategorizerCategorize Everything In Sight

Sentiment AnalyzerDetect The Sentiments Of Your Text

Entity Resolver Make real-world connections in your data

Better Search

Tagged Entities

Real Identities

Matched Names

Sorted Languages

Translated Names

Sorted Content

Actionable Insights

RES

RNT

RNI

REX

RBL

RLI ROSETTELanguage Identifier

ROSETTEBase Linguistics

ROSETTEEntity Extractor

ROSETTEName Indexer

ROSETTEName Translator

ROSETTECategorizer

ROSETTESentiment Analyzer

ROSETTEEntity Resolver

RCA

RSA

RCA

RSA

The Rosette Advantage

Financial institutions use RNI to manage and update watchlists to block terrorist access to funds, simultaneously avoiding compliance violations and protecting their reputation. Applications also include fraud detection, money laundering, and document triage.

Financial Compliance

Names are often the most critical data point in intelligence, law enforcement, and border control. RNI is being adopted throughout the U.S. government to address the challenge of matching names in all their variations—particularly names from non-Latin languages such as Arabic, Russian, Chinese, Korean, or Persian.

Government Intelligence

Trust is foundational to the sharing economy. Whether booking room rentals, rides, or odd jobs, it is important to establish ways to connect the online and offline worlds to reinforce that trust and confidence.  

Name matching is a key component of verifying online identities with real-world documentation (passports, driver’s licenses).  Members of the sharing economy such as Airbnb rely on RNI to match names originating from all over the world, and internationally between names written in alphabets besides the Roman A-to-Z.

Identity Verification in the Sharing Economy

Rosette® Name Indexer integrates easily into Apache Solr™ as a plug-in or into applications as a Java library to support its main use cases. RNI can also be adapted to match the needs of each application.

Apache SolrApache Solr™-based search systems can easily add high-quality fuzzy name matching to every search by simply adding name fields. RNI provides a special Solr field type for names. This mechanism means Solr can index documents with multiple name fields, each with multiple values (e.g., an “alias” field may contain more than one name). Each document could also contain non-name fields like dates or plain text.

<fieldname=”primary”>MuhammadAli</field> <fieldname=”alias”>CassiusClayJr</field> <fieldname=”alias”>TheGreatest</field> <fieldname=”dob”>1/7/1942</field>

A single query can then be constructed that gives different weight to the various fields. For example, a single query can find movies starring “Binedict Cumberbund” with screenplays by “Giyermo Diltoro” that were released around 2014.

Java LibraryAny application that needs name matching can directly integrate a Java library which takes care of storing watchlists without incurring the overhead of a web-service call.

Integration Options

- Set the minimum threshold of the similarity score to manage the precision and recall of the returned search results.

- Ignore a given list of words (“stopwords”) with respect to matching (e.g., titles, honorifics).

- Force two name words to always match with a given score (e.g., “Elizabeth” and “Lisbeth” always match at 90%).

- Force two names to always match with a given score (e.g., “John Doe” and “Joe Bloggs” always match at 95%).

- Link multiple names to a single individual (e.g., queries for "Marilyn Monroe" and "Norma Jeane Mortensen" include the same person).

Customize To Your Needs

Same name in multiple languagesMao Zedong 1 Мао Цзэдун 1 毛泽东

Phonetic spelling di erencesCairns 1 Kearns 1 Kerns

Transliteration spelling di erencesAbdul Rasheed 1 Abd-al-Rasheed 1 Abdulrashid

NicknamesWilliam 1 Will 1 Bill 1 Billy

InitialsJ. E. Smith 1 James Earl Smith

Titles and honorificsDr. 1 Mr. 1 Ph.D.

Out-of-order name componentsDiaz, Carlos Alfonzo 1 Carlos Alfonzo Diaz

Missing name componentsPhillip Charles Carr 1 Phillip Carr

Missing spaces or hyphensMaryEllen 1 Mary Ellen 1 Mary-Ellen

Truncated name componentsMcDonalds 1 McD 1 McDonald

Name split inconsistently across database fieldsDick • Van Dyke 1 Dick Van • Dyke

Use Cases