#abbyysummit15 (1/10): the future of information, the explosion of unstructured data

17
©2015 Experian Information Solutions, Inc. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Information Solutions, Inc. Other product and company names mentioned herein are the trademarks of their respective owners. No part of this copyrighted work may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian. The Future of Information Kevin Chen Chief Data Scientist Experian DataLabs, North America

Upload: abbyy-usa

Post on 08-Feb-2017

364 views

Category:

Software


0 download

TRANSCRIPT

© 2015 Experian Information Solutions, Inc. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Information Solutions, Inc. Other product and company names mentioned herein are the trademarks of their respective owners. No part of this copyrighted work may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian.

The Future of Information Kevin Chen Chief Data Scientist Experian DataLabs, North America

2 © 2015 Experian Information Solutions, Inc. All rights reserved.

3 © 2015 Experian Information Solutions, Inc. All rights reserved.

4 © 2015 Experian Information Solutions, Inc. All rights reserved.

People go to

but NOT

Like? _________________

5 © 2015 Experian Information Solutions, Inc. All rights reserved.

§  800%+ growth in data volume within next 5 years

§  Amount of unstructured data is growing 62% faster

§  80% of data will be unstructured data in 2019

Explosions of Unstructured Data

Organizations have little awareness of the volume, composition, risk and business value of their unstructured data (Gartner)

6 © 2015 Experian Information Solutions, Inc. All rights reserved.

Big Data Analytics

Information is the oil of the 21st century, and analytics is the combustion engine – Peter Soundergaard

7 © 2015 Experian Information Solutions, Inc. All rights reserved.

Living in a World of Unstructured Data

Structured data: §  Well-studied

§  Columnar + Relational

§  Interval/categorical/ordinal

Unstructured data: §  Diverse types, many inputs

§  Text, audio, image, video, metadata, health records, etc.

§  Need to be able to search, compare, understand, and prediction

8 © 2015 Experian Information Solutions, Inc. All rights reserved.

Key Questions: §  How do we capture and understand

unstructured data?

§  How do we represent unstructured data (words, sentences, phrases, concepts, objects) and use it in predictive modeling?

§  What are the applications?

Challenges and Opportunities

9 © 2015 Experian Information Solutions, Inc. All rights reserved.

Google, Microsoft, Bidu, Stanford, Berkley, CMU & UCLA all reported significant progress in automatic image captioning using deep learning in 2014

Machine Learning on Unstructured Data Automatic Image Captioning

10 © 2015 Experian Information Solutions, Inc. All rights reserved.

Transforming big data

Experian Breathes big data §  19 credit information and 13 business

information bureaus §  Credit data on 600 million consumers &

60 million businesses §  Demographic data on 260+ million

households §  Online behavior data for 25 million

users across 5 million websites

11 © 2015 Experian Information Solutions, Inc. All rights reserved.

Experian DataLabs Understanding consumer behavior through big data analytics

Debit/Credit Card Transaction Data

Social Media Data

Mobile/Geolocation Information

Business Entity Data

Credit Bureau Data

Online Behavior

12 © 2015 Experian Information Solutions, Inc. All rights reserved.

Understanding Consumer Transactions Using Machine Learning

Structured & Unstructured Machine Learning Algorithms

Behavior Profiles & Lifestyle Segments

Merchant Name Merchant Location

Product / SKU Purchased $ Amount

Transaction Time Card Present?

13 © 2015 Experian Information Solutions, Inc. All rights reserved.

Customer Lifestyle Segmentation

14 © 2015 Experian Information Solutions, Inc. All rights reserved.

Applications

Opportunity

§  Offer card best reflecting customer spend choices

Targeting the right card for a customer based on lifestyle segmentation promotes spend and deeper customer loyalty

Current card

15 © 2015 Experian Information Solutions, Inc. All rights reserved.

Future of Hadoop - Spark

From Hadoop Stack to Spark Stack

•  Comprehensive support for ETL, SQL, Machine Learning, Graphs, and Streaming

•  Faster with in-memory calculation •  Easier to use with flexible APIs such as join, union, intersection, etc. •  Tight integration with Python and R •  Extensive machine-learning/data-mining libraries

16 © 2015 Experian Information Solutions, Inc. All rights reserved.

The Future of Information Is Here

Data Scientists

Structured & Unstructured Data

Big Data Analytics Platform

Machine Learning

17 © 2015 Experian Information Solutions, Inc. All rights reserved.

Questions

[email protected]