© Nube Technologies
Real Time Fuzzy Matching With Spark and ElasticSearch
© Nube Technologies
About Us
The only way to do great work is to love what you do.
- Steve Jobs
© Nube Technologies
The problem - lake or swamp?
© Nube Technologies
Duplicates
© Nube Technologies
Challenges
● Quadratic problem● No standard notion of similarity● Omissions, typos and other issues● Different languages
© Nube Technologies
Use Case - Customer Record Dedup
© Nube Technologies
Use Case - Customer Record Dedup
© Nube Technologies
Use Case - Shopping Site Comparison
© Nube Technologies
Use Case - Shopping Site Comparison
© Nube Technologies
Other Use Cases
● Cross selling● Financial Credit Ratings● Fraud Analytics● Catalog and inventory management● Household and individual level analytics.
© Nube Technologies
Lets start wishing...
● Data variety● Scalable● No manual configuration of rules or
algorithms● Multi language● Real time
© Nube Technologies
Reifier - learn
© Nube Technologies
Reifier - learn
© Nube Technologies
Reifier - learn
© Nube Technologies
Reifier - learn
© Nube Technologies
Real Time
Spark + ElasticSearch
© Nube Technologies
Spark Benefits
● Distributed● Scalable● Fast● Machine Learning● Sampling● No need to orchestrate multiple jobs
© Nube Technologies
Thank You!
www.nubetech.co [email protected]