a market place where everyone is exchanging idea and...

Post on 09-Jun-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Assoc. Prof. Dr. Tiranee Achalakul

A c a d e m i c D i r e c t o r

A market place where everyone is exchanging

idea and practice around Big data technology; to

create a practical realization of Big data in

Thailand

BIG DATA

BIG DATA

VOLUME

Online & Offline

Automatically generated

Manually created

VARIETY

Unstructured

Structured

VELOCITY

Speed of Generation

Rate of Analysis

VERACITY

Untrusted

Uncleansed

Unclear

Behavioral data - server logs,

clickstream, ATM log

Images & sounds -

photographs, videos, Google

street views images, medical

images, handwriting images,

voice recordings

Languages - text messages,

tweets, web content

Records - medical records,

large automated survey, tax

Sensors - temperature,

accelerometer, geolocation

BIG !

Ref: Mushroom Networks, Deep Blue Analytics, MBAOnline, IBM, Gartner

Data from 2014

Predictive analytics mine data for insights that lead to profitable actions.

Raw Data

(External + Internal)

Aggregated Data

Intelligence

Insights

Decisions

Operational Impact

Financial Outcomes

Value Creation

Data Scientists Business Personnel

BIG DATA ADOPTION GOALS

Creates new revenue sources

Improves operational efficiency

and drive productivity

Business sustainability

through customer

satisfaction

Improves profit through cost

reduction

BIG DATA ANALYTICS

A set of fundamental concepts/principles that underlie techniques for

extracting useful knowledge from large datasets containing a variety of

data types. To uncover hidden patterns, unknown correlations, market

trends, customer preferences, and other useful business information

THE ANALYTICS

Big data provides fact-based, math-based decision support (Data-Driven Decision)

Descriptive Analytics

tell you what happened;

how many; how often;

where critical events occur.

Predictive Analytics

project what will happen

next; provide indications

of possible outcomes if

trends continue.

Prescriptive Analytics

Synthesizes big data to

make predictions and then

suggests decision options to

take advantage of the

predictions

http://fisherdaniel.com/

Analytics Capability

Pro

du

ct C

apab

ility

BIG DATA MATURITY

BUSINESS PROBLEMS AND DATA SCIENCE SOLUTIONS

DATA DRIVEN BUSINESS

• Data driven business decision making problem is unique

• But… there are sets of common tasks that underlie most problems.

• Data scientist critical skills

Decompose a business problem into subtasks

Recognize familiar problems and their solutions

(Don’t try to re-invent the wheel)

Identify subtasks that have not been automated

and use creativity and intelligence to design a process

1. Classification

predict, for each individual in a population, which of a set of classes this individual

belongs to.

• Among the customers of Telco, which are likely to respond to a given offer ?

(Classes: will respond, will not respond)

2. Regression

produce a model that, given an individual, estimates the value of the particular

variable specific to that individual.

• How much will a given customer use the service? (variable: service usage)

3. Similarity matching

identify similar individuals based on data know about them.

Similarity underlie solutions to other tasks.

• Finding people who are similar to you in terms of products they have purchased.

COMMON TASKS

4. Clustering

group individuals in a population by their similarity (not driven by any specific

purpose).

• Do our customers form natural groups or segments?

5. Co-occurrence grouping

find associations between entities based on transactions involving them.

• What items are commonly purchased together?

6. Profiling

characterize the typical behavior of an individual, group, or population.

• What is the typical cell phone usage of this customer segment ?

• Used to establish behavior norms for anomaly detection (fraud detection)

COMMON TASKS

7. Link Prediction

predict connections between data items (Link should exist at what strength)

• Social network based applications, such as, friends suggestions and

recommendations.

8. Data Reduction

take a large set of data and replace it with a smaller set that contains much of the

important information in the larger set.

COMMON TASKS

ANSWERING

BUSINESS QUESTIONS

• Who are the most profitable customers?

– A straightforward database query, if “profitable” can be defined clearly.

• Is there really a difference between the profitable customers and the

average customer?

– Statistical Hypothesis testing

• But who really are these customers? Can I characterize them?

– Automated pattern finding

• Will some new customer be profitable ? How much revenue can I

expect?

– Predictive model of profitability

DATA SCIENCE

PROCESS

Data Aggregation

Data Selection,

cleaning and

filtering

Exploratory

Analysis

Insight Derivation

Model Feasibility

Model Construction

Deployment

Prediction

Validation

Ref: https://practicalanalytics.wordpress.com/2015/05/25/big-data-analytics-use-cases/

KNOWLEDGE

Data MiningThe Computational process of discovering patterns in large data sets

involving methods at the intersection of statistics, machine learning, and

database systems.

Text MiningThe process of deriving high-quality information from text. High-quality

information is typically derived through the devising of patterns and

trends through means such as statistical pattern learning.

Machine LearningThe science of getting computers to learn from data without having to be

explicitly programmed by humans. Machine model can teach themselves

to grow and change when exposed to new data.

METHODS

MACHINE LEARNING Learn from data and make predictions about data by

using statistics to develop self learning algorithm

MACHINE LEARNING

Machine learning is surrounding you

• Google search

• Auto Facebook photo tagging

• Email Spamming

• Games

• Chat bot

• Recommender

Machine Learning

“The science of getting computers to learn from data

without having to be explicitly programmed by humans.”

ML BASIC UNDERSTANINDING

• It’s all about taking in the ‘input’, pushing out the

‘output’ prediction

– Example: given the robot’s sensor and camera input,

the algorithm pushes out the appropriate movement

command.

– Example: given the search engine terms as input, the

algorithm output predictions of what the person is

looking for.

• It’s all about letting computer learns what ‘input’

is associated to what ‘output’.

HOW IS MACHINE PERCEPTION DONE?

Image Vision features Detection

Images/video

Audio Audio features Speaker ID

Audio

Text

Text Text features

Text classification,

Machine translation,

Information retrieval, ....

Slide courtesy of Andrew Ng, Stanford University

Early Use Cases

Image Classification, Object Detection, Localization Face Recognition

Speech & Natural Language Processing

Medical Imaging & Interpretation

Seismic Imaging & Interpretation Recommendation

THERE ARE SO MAY MODEL AROUND

Ref: machinelearningmastery.com

GLOBAL DEMAND

• The global market for smart machines is growing by

almost 20% annually.

• The total worth of the intelligent systems is $6.2 bn in

2014 and will likely be $ 41.2 bn in 2024

TEXT MINING AND NLP Deriving high-quality information from text by devising of patterns

and trends through means such as statistical pattern learning.

TEXT CLASSIFICAITON &

CLUSTERING

• Classification To assign a document

to one or more classes or categories.

• Clustering: The application of

cluster analysis to textual documents

SENTIMENT ANALYSIS

• To determine the attitude of a writer with respect to

some topic or the overall contextual polarity of a

document.

• Widely applied to reviews and social media for a

variety of applications, ranging from marketing to

customer service

NAMED ENTITY RECOGNITION (NER)

To locate and classify named entities in text into pre-

defined categories such as:

• Names of persons

• Organizations

• Locations

• Expressions of times

• etc.

TOPIC DISCOVERY

• Characterizes document according to topicsDiscover topics mentioned about “ประชามต”ิ on the social network

Discover topics mentioned about “พร้อมเพย์” on the social network

“Brexit Impact”

https://quid.com/feed/brexit-immediate-impacts

INFLUENCER ANALYSIS

• An influencer is an individual who has above-average impact on a specific niche process.

• On the social network, a influencer can referred to the most shaping a discussion about a brand or topic.

CASE STUDIES

Some examples of data-driven business

CUSTOMER 360

• A holistic approach that takes into account all available

and meaningful information about each customer

– Drive better engagement

– More revenue and long-term loyalty.

• Combines data exploration, data governance, data

access, data integration and data analytics to harness

volume, variety, and velocity of data.

Past: meaningful view of the customer’s history. activity, interaction history, recent product views,process history, etc.

Present: key customer information (who they are and how they relate to your organization) and context of recent activities and interaction.

Future: actions that can be initiated to guide the future of the relationship. churn? up-sell or cross-sell opportunities ?

Accurate

customer

data

John Smith

J. Smith

Same customer with

different identities

Relationships

among

customer

J. Smith – life insurance

M. Smith (wife) – auto insurance

J. Sr. Smith (data) – heath insurance

Customer details

across lines of

business

J. Smith buys

Life, auto, heath,

travel, properties

Understanding of

multi channel

communications

J. Smith

Mobile, email, fax,

Post mail, etc.

Analyzing

customers social

behavior

J. Smith

Facebook, twitter, blog,

Pinterest, youtube, etc.

Ref:

Intense Technologies Limited

John Smith

BIG DATA AND MARKETING

Targeted market Campaign combinations

for effective up-selling

Recommendations

for cross-selling

Customer behavior analysis: the key to marketing in this digital

era is contacting customers just

• When they wish to be reached

• When they are in the right location

• Then, engaging them with personalized real-time offers.

Predicting

customer

churn

BIG DATA IN MEDIA

Huffington Post

is using data improve the

UX from real-time

dashboards, social trends,

recommendations,

personalization, headline

optimization, and content

efficiency.

The Financial Times

builds customer

“signatures” of digital

consumption in order to

understand content

preferences, increase

communication efficiency,

personalize offerings, and

deploy the right

touchpoints.

.

CNN

uses big data as an early

warning system for

breaking news. CNN

listens to its audiences

with technology that

summarizes how viewers

are consuming news in

real-time, and how major

data sets are presented as

journalism stories.

Channel 4 (USA)

uses data by connecting to their millions of registered viewers, which allows them to

segment viewers into groups, create personalized emails, suggest tailored content

recommendations and serve relevant advertising.

BIG DATA AND LEARNINGThe measurement, collection, analysis and reporting of data about learners and their contexts.

• Focuses on applying techniques at

larger scales in instructional systems.

Track what students know or does not know

Monitor student behaviors through level of

engagement

Track individual student performance in each

class through opinions and scores

Track course outcomes and student

achievements

• Questions that can be answered: When are students ready to move on to the

next topic

When is a student at risk to not completing a

course

What grade is a student likely to receive

without intervention

Should a student be referred to a counselor

for help

BIG DATA FOR HR

• Predictive analytics is used for talent

acquisition, retention, placement,

promotion, compensation, or workforce

and succession planning.

• Analyzing the skills and attributes of high

performers in the present, then build a

template with quality hiring factors for

future hires.

• Non-traditional data gathering sources

– Social media channels where prospective

candidates usually leave their digital ‘thought

prints’.

• Statistical analysis of productivity and

turnover

– The data showed that old indicators (such as

GPA and education) were far less critical to

performance and retention. Factors like

experience is much more important.Ref: Forbe

TURNOVER PATTERN ANALYSIS

• A company studied the turnover and retention behavior of

employees based on pay raises.

• Traditional approach

• Pay based on a normal curve.

• Top performers get slightly higher raises than second-tier and so on.

• Findings

• “normal distribution” curve of pay is a mistake.

• Employees in the 2nd and 3rd quintile of performance would stay

even if their raise was as low as 91% of average increases in their

job class.

• Top performers would leave unless they received 115-120% of the

average pay increase for their job class. Money should go here.

http://www.bersin.com/talent-analytics-journey

The data analysis is not

testing a simple hypothesis,

but the data are explored

with the hope that

something useful will be

discovered.

Principles and techniques for

understanding phenomena via the

analysis of data.

Accessing and

processing of

massive-scale data

flexibly and

efficiently with Big

Data technologies

DDD = practice of basing decision

on the analysis of data, rather

than intuition

TO BUILD A BIG DATA PLATFORM

Facebook.com / BigDataExperience

CONTACT

08 9789 5224, 09 8258 8261

contact@BigDataExperience.org

JOIN

www.BigDataExperience.org / join

MEET

www.BigDataExperience.org / request

top related