reifier spark summit 2014 slides
DESCRIPTION
Presentation on Reifier, fuzzy matching using Apache Spark and Machine learningTRANSCRIPT
© Nube Technologies
Fuzzy Matching With Spark
© Nube Technologies
About Us
ALICE: This is impossible!
THE MAD HATTER: Only if you believe it is.
© Nube Technologies
The problem
According to Gartner, businesses are losing upto 25% potential revenue due to lack of holistic multichannel view of data.
© Nube Technologies
The problem
© Nube Technologies
Challenges
● Quadratic nature of the problem● No standard notion of similarity● Omissions, typos and other issues
© Nube Technologies
Use case - Cross and Upselling
© Nube Technologies
Lead Generation
© Nube Technologies
BFSI
Personal Credit RatingsFraud detection
© Nube Technologies
Other Use Cases
Yellow PagesCatalog and Inventory Management
© Nube Technologies
Wishlist
Works with any kind of dataScalableNo manual configuration of rules or algorithms
© Nube Technologies
Spark Advantages
● Distributed● Scalable● In memory● Machine Learning● Sampling● No need to orchestrate multiple jobs
© Nube Technologies
Reifier - Label
Are these duplicates?(Y/N)
© Nube Technologies
Reifier Output
© Nube Technologies
Thank You !