from data to business advantage - az370354.vo.msecnd.netaz370354.vo.msecnd.net/videos/dopoledne_jak...
TRANSCRIPT
From data to business advantage
Rafal LukawieckiStrategic Consultant
Project Botticelli Ltd
@rafaldotnet
Objectives
The information herein is for informational purposes only and represents the opinions and views of Project
Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors.
Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.
Portions © 2014 Project Botticelli Ltd & entire material © 2014 Microsoft Corp unless noted otherwise. Some
slides contain quotations from copyrighted materials by other authors, as individually attributed or as already
covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other
product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The
information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of
the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions,
it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli
cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli
makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.
Introduction to BI & Big Data
DAX
MDX
Data Mining
Excel BI
projectbotticelli.com/ppt
Register on
projectbotticelli.com
[data + analytics + people] @ speedMicrosoft data dividend formula
Microsoft transformation
Mobile-first Cloud-first
Data-driven
Microsoft transformation
Cloud-firstMobile-first
Water pumps
Outbreaks
Public domain picture. Wikimedia. http://commons.wikimedia.org/wiki/File:Snow-cholera-map.jpg
Transformative opportunity
Business analytics
Good BI is key to business analytics
Integrate
+ Cleanse
Model +
Enrich
Visualise
Query
Share
Power
Query
Power
PivotPower View
& Power
Map
Power BI
Q&A
Power BI
Sites +
SharePoint
Power BI for cloud collaboration and new experiences
Excel as the BI tool for everyone
Self-service, cloud
IT
SharePoint + SQL/APSIT scalability & control
Excel user-driven
Corporate self-service, on-prem
Power BI for cloud collaboration and new experiences
Hybrid
IT
SharePoint + SQL/APSIT scalability & control
Big data, or just complex data?
velocity
variety complexity
volume
Data
interpretingpreparing
Today’s big data, tomorrow’s little dataComplexity vs. current capabilities
FAA International Flight Service Station, Honolulu, Hawaii, 1964 (Public Domain Image)
So… what is big data?
Machine learning
Machine learning = data mining?
Best together
Data wrangling (munging), retrieval
+ storage
Data mining & machine learning
Statistics
Big data
Domain Common big data scenarios
Financial services Modeling true risk
Threat analysis and fraud detection
Trade surveillance
Credit scoring and analysis
Media & Entertainment Recommendation engines
Ad targeting
Search quality
Abuse and click fraud detection
Retail Point of sales transaction analysis
Customer churn analysis
Sentiment analysis
Telecommunications Customer churn prevention
Network performance optimization
Call Detail Record (CDR) analysis
Network failure prediction
Government Cyber security (botnets, fraud)
Traffic congestion and re-routing
Environmental monitoring
Antisocial monitoring via social media
Healthcare Genomics research
Cancer research
Health pandemics early detection
Air quality monitoring
Do you need it?
Process
Understand & change
data
Discover patterns, build & validate models
Change business
People
Data expert
Data scientist
Domain expert
Start of an engagement
Data, sucks
slightly
Are there any useful patterns?
Unclear business
goals
Example: fraud
Does enough data show
examples of fraud?
Are the predictable patterns of fraud?
Can we reduce
fraud? What is fraud?
In-house intelligence
Understand & change
data
Discover patterns, build & validate models
Change business
What tools do data scientists use?Purple=data analyst role
SQL 42%!
#1 data science tool
Chart from "2013 Data Science Salary Survey" (ISBN 978-1-491-94914-6)
© 2014 O'Reilly Media, used with permission.
For more info, and great titles on data science, visit oreilly.com
My analytical toolkit at Project Botticelli
main tools
secondary
only if I can’t avoid it
Chart from "2013 Data Science Salary Survey" (ISBN 978-1-491-94914-6)
© 2014 O'Reilly Media, used with permission.
For more info, and great titles on data science, visit oreilly.com
might try
My toolkit (chronologically)
○ SQL Server
○ DB engine
○ SSAS for data mining
○ Excel
○ now + Power Query
○ R and RStudio
○ Stats
○ Great charts
○ Curve fitting
○ Rattle for data mining
○ Mahout in Hadoop
○ HDP, HDInsight, or just *nix
Hadoop
○ Evaluating H2O + Spark now
○ Python 3
○ PyCharm IDE on OS X
○ Visual Studio with Microsoft
Python tools on Windows
○ Azure ML
○ Data mining
Algorithm Description
Decision Trees Finds the odds of an outcome based on values in a training set, presents visually
Association Rules Identifies relationships between cases
Clustering Classifies cases into distinctive groups based on any attribute sets
Naïve Bayes Clearly shows the differences in a particular variable for various data elements
Sequence
Clustering
Groups or clusters data based on a sequence of previous events
Time Series Analyzes and forecasts time-based data combining the power of ARTXP (developed
by Microsoft Research) for short-term predictions with ARIMA for long-term accuracy.
Neural Nets Seeks to uncover non-intuitive relationships in data
Linear Regression Determines the relationship between columns in order to predict an outcome
Logistic
Regression
Determines the relationship between columns in order to evaluate the probability that
a column will contain a specific state
Algorithm Description
Random
Forests
Like decision tree, but can be more accurate, and
difficult to understand
Boosting Like random forest, but using any other algorithm (not
just DT), “boosts” model accuracy for less frequent
items
Survival
Analysis
Finds the risk of an outcome given periods of time
Ensemble Combination of multiple models (in SQL or R)
Microsoft tech for big data
Prebuilt & performance-tuned appliance
Linear scale-out to petabytes of data
MPP design & in-memory columnstorefor 100x speed improvement
Dedicated region for Hadoop
Joining relational & non-relational datawith Polybase
Analytics Platform System (APS)
MPP SQL Server
Hadoop
PolyBase
Apache Hadoop distribution
Developed by Hortonworks & Microsoft
Integrated with Microsoft BI
Microsoft HDInsight
Part 1: the job
Big, fast, or
complex
data
HDInsight
Tabular
OLAP
SQL
010101010101010101
1010101010101010
01010101010101
101010101010
Interaction,
exploration,
reporting,
visualisationAPS +
Polybase
Hadoop cluster
Yahoo! Hadoop cluster, about 2007.
Source: http://developer.yahoo.com. Picture used with permission.
Hadoop cluster
Buster Cluster, an early research project
by Miles Osborne, University of
Edinburgh, School of Informatics.
Picture used with permission.
http://homepages.inf.ed.ac.uk/miles/
Cloudrent-a-Hadoop-cluster, or:
“Supercomputer for cents”
Windows Azure HD Insight
Processing logic in HDInsight 3.0 & 3.1Hadoop 2.2/2.4: Interactive, online, stream, or batch(Hadoop 1.x was batch process only)
Hadoop data science
Collaborative filtering,
recommenders, clustering,
singular value decomposition,
parallel frequent pattern mining,
naive Bayes, decision tree
Part 2: the results
Azure ML = data science in
Turning data into advantage
Summary
projectbotticelli.com
BI video tutorials, PPTs, and articles
15% Off: 15PRAGUE2014
Valid until end of November 2014
Follow: @rafaldotnet
Email: [email protected]
Discover: rafal.net
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties,
express, implied or statutory, as to the information in this presentation.
Portions © 2014 Project Botticelli Ltd & entire material © 2014 Microsoft Corp unless noted otherwise. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright
ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and
represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and
Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.