system combination for hlt
Post on 16-Jul-2015
116 Views
Preview:
TRANSCRIPT
Challenges and Opportunities for
HLT System CombinationDavid Murgatroyd @dmurga
VP, Engineering, Basis Technology
Outline
● Why Combine?
● What to Combine?
● When to Combine?
● Where to Combine?
● How to Combine?
Existing
System
New
System
Added a name from a list
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
Querying the systems
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt
Should these match?
JohnJacob
(DELETED)Jingleheimer Schmidt
John SAME
JinglhiemerMINOR
TYPOS
Schmidt
Should these match?
JohnJacob
(DELETED)Jingleheimer Schmidt
John SAME
JinglhiemerMINOR
TYPOS
Schmidt SAME
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt
Desire: Positive
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt
Should these match?
JohnJacob
(DELETED)Jingleheimer Schmidt
John SAME
JinglhiemerMINOR
TYPOS
Schmidt SAME
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90
True Positive
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90
True Positive
Should these match?
JohnJacob
(DELETED)Jingleheimer Schmidt
John SAME
JinglhiemerMINOR
TYPO
Schmidt SAME
Should these match?
JohnJacob
(DELETED)Jingleheimer Schmidt
John SAME
Jinglhiemer CONFLICT
Schmidt SAME
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
False Negative
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
False Negative
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
John Cobby Schmidt
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
John Cobby Schmidt
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
John Cobby Schmidt 90
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
John Cobby Schmidt 90
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
John Cobby Schmidt 75 90
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
John Cobby Schmidt 75 90
Juan Herrero
Should these match?
JohnJacob
(DELETED)
Jingleheimer
(DELETED)Schmidt
JuanCLOSE
COGNATE
HerreroFAR
COGNATE
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
John Cobby Schmidt 75 90
Juan Herrero
Should these match?
JohnJacob
(DELETED)
Jingleheimer
(DELETED)Schmidt
JuanCLOSE
COGNATE
Herrero CONFLICT
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
John Cobby Schmidt 75 90
Juan Herrero 75
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
John Cobby Schmidt 75 90
Juan Herrero 75
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
John Cobby Schmidt 75 90
Juan Herrero 75 85
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
John Cobby Schmidt 75 90
Juan Herrero 75 85
False Positive
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
John Cobby Schmidt 75 90
Juan Herrero 75 85
False Positive
An Example
Existing System
John Jacob Jingleheimer Schmidt
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
John Cobby Schmidt 75 90
Juan Herrero 75 85
An Example
New System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt 90 75
John Cobby Schmidt 75 90
Juan Herrero 75 85
Existing System
John Jacob Jingleheimer Schmidt
An Example
John Jinglhiemer Schmidt 90 75
John Cobby Schmidt 75 90
Juan Herrero 75 85
Combined System
John Jacob Jingleheimer Schmidt
Combined System: New Fills Holes of Old
Combined System
John Jacob Jingleheimer Schmidt
John Jinglhiemer Schmidt match
John Cobby Schmidt match
Juan Herrero no-match
Why Combine
● Reduce errors by using newer technology
● Minimize risk of destabilizing system
Existing
System
New
System
What to Combine
● address the same task
● new system should take different
approach
● old system improvement not feasible “Cobby”
v.
“Jacob”
“Jinglhiemer”
v.
“Jingleheimer”
What to Combine (cont’d)
● systems with rich output for rich
combination
● new adaptable to compensate for existing
● new can be integrated like existingExisting
System
New
System
match /
no match0 … 100
When to Combine
● existing has known error types
● new easily turned on/off
● new’s effect can be reviewed without
commitment
● budget for integration, resource and
license costs
Existing
System
New
System
When to Combine (cont’d)
● Balance hits added vs. hits removed
● Keep workload of consumers the same
Existing
System
New
System
Where to Insert New System?
● In parallel on all inputs
Existing
System
New
System
JJS, JCS
JJS, JCS JJS, JCS
JJS, JCS
Where to Combine
● In parallel on all inputs
● In parallel on some inputs
Existing
System
New
System
JJS, JCS
JJS JJS, JCS
JJS, JCS
Where to Combine
● In parallel on all inputs
● In parallel on some inputs
● In series as a post-filter for false positive
suppressionExisting
System
New
System
JJS, JCS, JH
JCS, JH
JCS
How to Derive a Decision from Results?
● Experimentation to produce hand-tune rules, eg.,
● if Old or New > 0.95 MATCH
● if Old and New > 0.85 MATCH
● else NO-MATCH
● Annotation to produce a machine-learned model.
● Need hand annotated data and measurement discipline
(separate tuning & testing data sets)
Any Questions? Some suggestions...
● Have others combined name matchers?
● How much error reduction is targeted?
● What other HLT tasks have used combined systems?
● What if I don’t have lots of annotated data to measure with?
● Is there academic literature on this?
● How can Basis’s name matcher (RNI) be adapted to different
use cases?
top related