a market place where everyone is exchanging idea and...
Post on 09-Jun-2020
1 Views
Preview:
TRANSCRIPT
Assoc. Prof. Dr. Tiranee Achalakul
A c a d e m i c D i r e c t o r
A market place where everyone is exchanging
idea and practice around Big data technology; to
create a practical realization of Big data in
Thailand
BIG DATA
BIG DATA
VOLUME
Online & Offline
Automatically generated
Manually created
VARIETY
Unstructured
Structured
VELOCITY
Speed of Generation
Rate of Analysis
VERACITY
Untrusted
Uncleansed
Unclear
Behavioral data - server logs,
clickstream, ATM log
Images & sounds -
photographs, videos, Google
street views images, medical
images, handwriting images,
voice recordings
Languages - text messages,
tweets, web content
Records - medical records,
large automated survey, tax
Sensors - temperature,
accelerometer, geolocation
BIG !
Ref: Mushroom Networks, Deep Blue Analytics, MBAOnline, IBM, Gartner
Data from 2014
Predictive analytics mine data for insights that lead to profitable actions.
Raw Data
(External + Internal)
Aggregated Data
Intelligence
Insights
Decisions
Operational Impact
Financial Outcomes
Value Creation
Data Scientists Business Personnel
BIG DATA ADOPTION GOALS
Creates new revenue sources
Improves operational efficiency
and drive productivity
Business sustainability
through customer
satisfaction
Improves profit through cost
reduction
BIG DATA ANALYTICS
A set of fundamental concepts/principles that underlie techniques for
extracting useful knowledge from large datasets containing a variety of
data types. To uncover hidden patterns, unknown correlations, market
trends, customer preferences, and other useful business information
THE ANALYTICS
Big data provides fact-based, math-based decision support (Data-Driven Decision)
Descriptive Analytics
tell you what happened;
how many; how often;
where critical events occur.
Predictive Analytics
project what will happen
next; provide indications
of possible outcomes if
trends continue.
Prescriptive Analytics
Synthesizes big data to
make predictions and then
suggests decision options to
take advantage of the
predictions
http://fisherdaniel.com/
Analytics Capability
Pro
du
ct C
apab
ility
BIG DATA MATURITY
BUSINESS PROBLEMS AND DATA SCIENCE SOLUTIONS
DATA DRIVEN BUSINESS
• Data driven business decision making problem is unique
• But… there are sets of common tasks that underlie most problems.
• Data scientist critical skills
Decompose a business problem into subtasks
Recognize familiar problems and their solutions
(Don’t try to re-invent the wheel)
Identify subtasks that have not been automated
and use creativity and intelligence to design a process
1. Classification
predict, for each individual in a population, which of a set of classes this individual
belongs to.
• Among the customers of Telco, which are likely to respond to a given offer ?
(Classes: will respond, will not respond)
2. Regression
produce a model that, given an individual, estimates the value of the particular
variable specific to that individual.
• How much will a given customer use the service? (variable: service usage)
3. Similarity matching
identify similar individuals based on data know about them.
Similarity underlie solutions to other tasks.
• Finding people who are similar to you in terms of products they have purchased.
COMMON TASKS
4. Clustering
group individuals in a population by their similarity (not driven by any specific
purpose).
• Do our customers form natural groups or segments?
5. Co-occurrence grouping
find associations between entities based on transactions involving them.
• What items are commonly purchased together?
6. Profiling
characterize the typical behavior of an individual, group, or population.
• What is the typical cell phone usage of this customer segment ?
• Used to establish behavior norms for anomaly detection (fraud detection)
COMMON TASKS
7. Link Prediction
predict connections between data items (Link should exist at what strength)
• Social network based applications, such as, friends suggestions and
recommendations.
8. Data Reduction
take a large set of data and replace it with a smaller set that contains much of the
important information in the larger set.
COMMON TASKS
ANSWERING
BUSINESS QUESTIONS
• Who are the most profitable customers?
– A straightforward database query, if “profitable” can be defined clearly.
• Is there really a difference between the profitable customers and the
average customer?
– Statistical Hypothesis testing
• But who really are these customers? Can I characterize them?
– Automated pattern finding
• Will some new customer be profitable ? How much revenue can I
expect?
– Predictive model of profitability
DATA SCIENCE
PROCESS
Data Aggregation
Data Selection,
cleaning and
filtering
Exploratory
Analysis
Insight Derivation
Model Feasibility
Model Construction
Deployment
Prediction
Validation
Ref: https://practicalanalytics.wordpress.com/2015/05/25/big-data-analytics-use-cases/
KNOWLEDGE
Data MiningThe Computational process of discovering patterns in large data sets
involving methods at the intersection of statistics, machine learning, and
database systems.
Text MiningThe process of deriving high-quality information from text. High-quality
information is typically derived through the devising of patterns and
trends through means such as statistical pattern learning.
Machine LearningThe science of getting computers to learn from data without having to be
explicitly programmed by humans. Machine model can teach themselves
to grow and change when exposed to new data.
METHODS
MACHINE LEARNING Learn from data and make predictions about data by
using statistics to develop self learning algorithm
MACHINE LEARNING
Machine learning is surrounding you
• Google search
• Auto Facebook photo tagging
• Email Spamming
• Games
• Chat bot
• Recommender
Machine Learning
“The science of getting computers to learn from data
without having to be explicitly programmed by humans.”
ML BASIC UNDERSTANINDING
• It’s all about taking in the ‘input’, pushing out the
‘output’ prediction
– Example: given the robot’s sensor and camera input,
the algorithm pushes out the appropriate movement
command.
– Example: given the search engine terms as input, the
algorithm output predictions of what the person is
looking for.
• It’s all about letting computer learns what ‘input’
is associated to what ‘output’.
HOW IS MACHINE PERCEPTION DONE?
Image Vision features Detection
Images/video
Audio Audio features Speaker ID
Audio
Text
Text Text features
Text classification,
Machine translation,
Information retrieval, ....
Slide courtesy of Andrew Ng, Stanford University
Early Use Cases
Image Classification, Object Detection, Localization Face Recognition
Speech & Natural Language Processing
Medical Imaging & Interpretation
Seismic Imaging & Interpretation Recommendation
THERE ARE SO MAY MODEL AROUND
Ref: machinelearningmastery.com
GLOBAL DEMAND
• The global market for smart machines is growing by
almost 20% annually.
• The total worth of the intelligent systems is $6.2 bn in
2014 and will likely be $ 41.2 bn in 2024
TEXT MINING AND NLP Deriving high-quality information from text by devising of patterns
and trends through means such as statistical pattern learning.
TEXT CLASSIFICAITON &
CLUSTERING
• Classification To assign a document
to one or more classes or categories.
• Clustering: The application of
cluster analysis to textual documents
SENTIMENT ANALYSIS
• To determine the attitude of a writer with respect to
some topic or the overall contextual polarity of a
document.
• Widely applied to reviews and social media for a
variety of applications, ranging from marketing to
customer service
NAMED ENTITY RECOGNITION (NER)
To locate and classify named entities in text into pre-
defined categories such as:
• Names of persons
• Organizations
• Locations
• Expressions of times
• etc.
TOPIC DISCOVERY
• Characterizes document according to topicsDiscover topics mentioned about “ประชามต”ิ on the social network
Discover topics mentioned about “พร้อมเพย์” on the social network
“Brexit Impact”
https://quid.com/feed/brexit-immediate-impacts
INFLUENCER ANALYSIS
• An influencer is an individual who has above-average impact on a specific niche process.
• On the social network, a influencer can referred to the most shaping a discussion about a brand or topic.
CASE STUDIES
Some examples of data-driven business
CUSTOMER 360
• A holistic approach that takes into account all available
and meaningful information about each customer
– Drive better engagement
– More revenue and long-term loyalty.
• Combines data exploration, data governance, data
access, data integration and data analytics to harness
volume, variety, and velocity of data.
Past: meaningful view of the customer’s history. activity, interaction history, recent product views,process history, etc.
Present: key customer information (who they are and how they relate to your organization) and context of recent activities and interaction.
Future: actions that can be initiated to guide the future of the relationship. churn? up-sell or cross-sell opportunities ?
Accurate
customer
data
John Smith
J. Smith
Same customer with
different identities
Relationships
among
customer
J. Smith – life insurance
M. Smith (wife) – auto insurance
J. Sr. Smith (data) – heath insurance
Customer details
across lines of
business
J. Smith buys
Life, auto, heath,
travel, properties
Understanding of
multi channel
communications
J. Smith
Mobile, email, fax,
Post mail, etc.
Analyzing
customers social
behavior
J. Smith
Facebook, twitter, blog,
Pinterest, youtube, etc.
Ref:
Intense Technologies Limited
John Smith
BIG DATA AND MARKETING
Targeted market Campaign combinations
for effective up-selling
Recommendations
for cross-selling
Customer behavior analysis: the key to marketing in this digital
era is contacting customers just
• When they wish to be reached
• When they are in the right location
• Then, engaging them with personalized real-time offers.
Predicting
customer
churn
BIG DATA IN MEDIA
Huffington Post
is using data improve the
UX from real-time
dashboards, social trends,
recommendations,
personalization, headline
optimization, and content
efficiency.
The Financial Times
builds customer
“signatures” of digital
consumption in order to
understand content
preferences, increase
communication efficiency,
personalize offerings, and
deploy the right
touchpoints.
.
CNN
uses big data as an early
warning system for
breaking news. CNN
listens to its audiences
with technology that
summarizes how viewers
are consuming news in
real-time, and how major
data sets are presented as
journalism stories.
Channel 4 (USA)
uses data by connecting to their millions of registered viewers, which allows them to
segment viewers into groups, create personalized emails, suggest tailored content
recommendations and serve relevant advertising.
BIG DATA AND LEARNINGThe measurement, collection, analysis and reporting of data about learners and their contexts.
• Focuses on applying techniques at
larger scales in instructional systems.
Track what students know or does not know
Monitor student behaviors through level of
engagement
Track individual student performance in each
class through opinions and scores
Track course outcomes and student
achievements
• Questions that can be answered: When are students ready to move on to the
next topic
When is a student at risk to not completing a
course
What grade is a student likely to receive
without intervention
Should a student be referred to a counselor
for help
BIG DATA FOR HR
• Predictive analytics is used for talent
acquisition, retention, placement,
promotion, compensation, or workforce
and succession planning.
• Analyzing the skills and attributes of high
performers in the present, then build a
template with quality hiring factors for
future hires.
• Non-traditional data gathering sources
– Social media channels where prospective
candidates usually leave their digital ‘thought
prints’.
• Statistical analysis of productivity and
turnover
– The data showed that old indicators (such as
GPA and education) were far less critical to
performance and retention. Factors like
experience is much more important.Ref: Forbe
TURNOVER PATTERN ANALYSIS
• A company studied the turnover and retention behavior of
employees based on pay raises.
• Traditional approach
• Pay based on a normal curve.
• Top performers get slightly higher raises than second-tier and so on.
• Findings
• “normal distribution” curve of pay is a mistake.
• Employees in the 2nd and 3rd quintile of performance would stay
even if their raise was as low as 91% of average increases in their
job class.
• Top performers would leave unless they received 115-120% of the
average pay increase for their job class. Money should go here.
http://www.bersin.com/talent-analytics-journey
The data analysis is not
testing a simple hypothesis,
but the data are explored
with the hope that
something useful will be
discovered.
Principles and techniques for
understanding phenomena via the
analysis of data.
Accessing and
processing of
massive-scale data
flexibly and
efficiently with Big
Data technologies
DDD = practice of basing decision
on the analysis of data, rather
than intuition
TO BUILD A BIG DATA PLATFORM
Facebook.com / BigDataExperience
CONTACT
08 9789 5224, 09 8258 8261
contact@BigDataExperience.org
JOIN
www.BigDataExperience.org / join
MEET
www.BigDataExperience.org / request
top related