Download - DAMA Presentation
S
Driving Business Transformations with Big Data Analytics
DAMA SouthWest Ohio September 13, 2012
Key Business Trends
S Mega Trends S Socializa1on S Collabora1on S Gamifica1on S Mobile
S Micro Trends S Micro-‐Segmenta1on S Advanced Analy1cs
copyright @Sixth Sense Advisors Inc 2012 3
Crowdsourcing & Collabora1on
Within 1 month:
S More than 1000 virtual prospectors
S 50 countries
S 110 new targets, 50% previously uniden1fied
S 80% yielded gold
• $575,000 prize money • 400Mb data • 55,000 acres
Within a few years: • From a $100 million company into a $9
billion juggernaut
GoldCorp
Collaboration & GamiCication
copyright @Sixth Sense Advisors Inc 2012 4
Gamifica1on
Peer 2 Peer Collabora1on
Crowdsourcing
Game Changer
S To become a leader from a compe1tor and create an undisputed market presence, companies need to create new and vibrant business models
S These business models need a lot of research, idea1on and execu1on (read – Data, Data and more Data)
S Companies that can harvest data efficiently and effec1vely will emerge as the winner of the Game, ul1mately changing the Game.
S
What Does It Take
A Growing Trend
©2012 Sixth Sense Advisors, Inc. All Rights Reserved 10
Requirement ExpectaDons Reality
Speed Speed of the Internet Speed = Infra + Arch + Design
Accessibility Accessibility of a Smartphone
BI Tool licenses & security
Usability IPAD -‐ Mobility Web Enabled BI Tool
Availability Google Search Data & Report Metadata
Delivery Speed of ques1ons Methodology & Signoff
Data Access to everything Structured Data
Scalability Cloud (Amazon) Exis1ng Infrastructure
Cost Cell phone or Free WIFI Millions
Expecta1ons for BI are changing w/o anyone telling us
20%
The New Way (with a bigger, longer tail)
The Old Way (Pareto Principle,
or 80/20 rule) Control
When Web 2.0 is applied…
Source: http://en.wikipedia.org/wiki/The_Long_Tail
Long Tail
copyright: Sixth Sense Advisors Inc @2012
2008 US Presidential Elections
copyright: Sixth Sense Advisors Inc @2012
$32 million raised from 275,000 people who gave $100 or less
20% Source: http://en.wikipedia.org/wiki/The_Long_Tail
High $ value donors, Small
constellation
Low $ value donors, Larger constellation
Web 2.0 significantly increases total value contributed/received by aggregating the “long tail” of smaller value donors.
Long Tail Example
copyright: Sixth Sense Advisors Inc @2012
Brand Management
copyright: Sixth Sense Advisors Inc @2012
S
Big Data
The Buzz
copyright: Sixth Sense Advisors Inc @2012
Data Disruptions
copyright: Sixth Sense Advisors Inc @2012 17 Porter CompeDDve Model
State of Data Today
©2012 Sixth Sense Advisors, Inc. All Rights Reserved 18
Future of Data
copyright @Sixth Sense Advisors Inc 2012 19
Big Data
copyright: Sixth Sense Advisors Inc @2012 20
Big Data can be defined as data that can grow in volume, velocity, variety and complexity at unprecedented pace. The growth and complexity present challenges with the capture, storage, management, analysis and visualization using the typical BI tool stack
Tapping into the data
copyright: Sixth Sense Advisors Inc @2012 21
Big Data existing across the enterprise that can be made available to business
Structured data used today
Today we do Big or Small compute with Small and Large structured data sets
Big Data will mean Big or Small compute with Big data sets, not always available in structured or semi-‐structured formats
Business Infrastructure
Analytics S Analy1cs is the key visualiza1on technique to analyze and mone1ze
from Big Data
S The field of analy1cs is resurging from the advent of Big Data S Social Analy1cs S Sensor Analy1cs S Text Analy1cs S Deep Data Mining
S Analy1cs needs metadata for integra1on
S Applica1ons S Fraud Detec1on S Campaign Op1miza1on S Demand and Supply Op1miza1on S Forecast Op1miza1on
copyright: Sixth Sense Advisors Inc @2012 22
What’s so Big about Big Data
Velocity Volume Variety
Complexity Ambiguity
©2012 Sixth Sense Advisors, Inc. All Rights
Reserved 23
What do we collect
copyright: Sixth Sense Advisors Inc @2012 24
• Facebook has an average of 30 billion pieces of content added every month
• YouTube receives 24hours of video, every minute
• 5 Billion mobile phones in use in 2010
• A leading retailer in the UK collects 1.5 billion pieces of information to adjust prices and promotions
• Amazon.com: 30% of sales is out of its recommendation engine
• A Boeing Jet Engine produces 20TB/Hour for engineers to examine in real time to make improvements
Potential Business Insights
S Trends
S Brand Iden1ty & Management
S Consumer Educa1on
S Compe11ve Intelligence
S Micro-‐Targe1ng Leverage “Crowdsourcing” driven innova1on to beger products and services (DELL, Innocen1ve (SAP, P&G))
S eDiscovery (Legal trends and pagerns, financial fraud)
S Pharmaceu1cal Companies S Pa1ent Educa1on S Physician Enriched Content
Management S Reduce Clinical Trial Cycles and
Errors S Pharmacovigilance
S Financial S Fraud S Customer Management
S Manufacturing S Supply chain op1miza1on S Track & Trace S Compliance
copyright: Sixth Sense Advisors Inc @2012
Base Graph Courtesy – Dr. Richard Hackathorn
Why DWBI Fails Repeatedly
copyright: Sixth Sense Advisors Inc @2012 26
AcDon Dme or AcDon distance Time
Business Value
Data Latency
Analysis Latency
Decision Latency
Business SituaDon
Data is ready
InformaDon is available
Decision is made
Lost
Value
Lost value = Sum (Latencies)+ Opportunity Cost
The Data Landscape
copyright: Sixth Sense Advisors Inc @2012 27 Data Transforma1on
Transac1onal Systems ODS
Enterprise Datawarehouse
Datamarts & Analy1cal Databases
Datamarts & Analy1cal Databases
Datamarts & Analy1cal Databases
Transac1onal Systems ODS
Transac1onal Systems ODS
Reports
Dashboards
Analy1c Models
Other Applica1on
s
ACID Kills
S Atomic – All of the work in a transaction completes (commit) or none of it completes
S Consistent – A transaction transforms the database from one consistent state to another consistent state. Consistency is defined in terms of constraints.
S Isolated – The results of any changes made during a transaction are not visible until the transaction has committed.
S Durable – The results of a committed transaction survive failures
copyright: Sixth Sense Advisors Inc @2012 28
BIG Data Scenarios EXAMPLES
copyright: Sixth Sense Advisors Inc @2012 29
To: [email protected] Dear Mr. Collins, This email is in reference to my bank account which has been efficiently handled by your bank for more than five years. There has been no problem 1ll date un1l last week the situa1on went out of the hand. I have deposited one of my high amount cheque to my bank account no: 65656512 which was to be credited same day but due to your staff carelessness it wasn’t done and because of this negligence my reputa1on in the market has been tarnished. Furthermore I had issued one payment cheque to the party which was showing bounced due to “Insufficient balance” just because my cheque didn’t make on 1me. My rela1onship with your bank has matured with the 1me and it’s a shame to tell you about this kind of services are not acceptable when it is ques1on of somebody’s reputa1on. I hope you got my point and I am agaching a copy of the same for further rapid procedures and remit into my account in a day. Yours sincerely Daniel Carter Ph: 564-‐009-‐2311
BIG Data Text Example S We will ooen imply addi1onal informa1on in spoken language by the way we place
stress on words.
S The sentence "I never said she stole my money" demonstrates the importance stress can play in a sentence, and thus the inherent difficulty a natural language processor can have in parsing it. S "I never said she stole my money" -‐ Someone else said it, but I didn't. S "I never said she stole my money" -‐ I simply didn't ever say it. S "I never said she stole my money" -‐ I might have implied it in some way, but I never
explicitly said it. S "I never said she stole my money" -‐ I said someone took it; I didn't say it was she. S "I never said she stole my money" -‐ I just said she probably borrowed it. S "I never said she stole my money" -‐ I said she stole someone else's money. S "I never said she stole my money" -‐ I said she stole something, but not my money
S Depending on which word the speaker places the stress, this sentence could have several dis1nct meanings.
copyright: Sixth Sense Advisors Inc @2012 30 Example Source: Wikepedia
Pattern Detection
copyright: Sixth Sense Advisors Inc @2012 31
Reduc1on Techniques Backward Elimina1on Forward Selec1on Agribute Removal Principal Components
Clustering Techniques K-‐Means Maximin Agglomera1ve Divisive Regression
Classifica1on Techniques Na1ve Bayes Neural Networks
Back Propoga1onal Recursively Spliung
K-‐Nearest Neighbor Minimum Distance
U1li1es Accuracy Measures Range Filters K-‐Fold Cross Valida1on Merge & Subset Vector Magnitude
Examples • Text – OCR, Machine, Digital • Face recogni1on, verifica1on, retrieval. • Finger prints recogni1on. • Speech recogni1on. • Medical diagnosis: X-‐Ray, EKG analysis • Machine diagnos1cs data • Geological data • Automated Target Recogni1on (ATR). • Image segmenta1on and analysis (recogni1on from aerial or satelite photographs).
So you are about to start the Big Data Project
©2012 Sixth Sense Advisors, Inc. All Rights Reserved 32
Tools
instruc1ons
Data
Output
The Normal Way Results In ……..
@2012 Copyright Sixth Sense Advisors 33
Performance
copyright: Sixth Sense Advisors Inc @2012 34
+ New Data Types
+ New volume
+ New Analytics
+ New Data Retention
+ New Data Workloads
Re-‐Engineering a Ferrari Engine in a Yugo does not make the fastest race car.
BIG Data
ü Workload Demands ü Process dynamic data content ü Process unstructured data ü Systems that can scale up and
scale out with high volume data ü Perform complex opera1ons
within reasonable response 1me
ü Infrastructure Needs ü Scalable plaxorm ü Database independence ü Fault Tolerance ü Supported by standard toolsets
©2012 Sixth Sense Advisors, Inc. All Rights Reserved 35
Data Warehouse Appliance
High Availability
Standard SQL Interface
Advanced Compression
MPP
Leverages existing BI, ETL and OLTP investments
Hadoop & MapReduce Interface / Embedded
Minimal disk I/O bogleneck; simultaneously load & query
Auto Database Management
copyright: Sixth Sense Advisors Inc @2012 36
• A Data Warehouse (DW) Appliance is an integrated set of servers, storage, OS, database and interconnect specifically preconfigured and tuned for the rigors of data warehousing.
• DW appliances offer an agrac1ve price / performance value proposi1on and are frequently a frac1on of the cost of tradi1onal data warehouse solu1ons.
Hadoop
copyright: Sixth Sense Advisors Inc @2012 37
Hadoop & RDBMS Analogy
Cargo train: • rough • missing a lot of “luxury”
• slow to accelerate • carries almost anything • moves a lot of stuff very
efficiently copyright: Sixth Sense Advisors Inc @2012 38
Sports car: • refined • has a lot of features • accelerates very fast • pricey • expensive to maintain
RDBMS Hadoop
* Original Slide Author-‐ Amr Adwallah , CloudEra
NoSQL S Stands for Not Only SQL
S Based on CAP Theorem / BASE
S Usually do not require a fixed table schema nor do they use the concept of joins
S All NoSQL offerings relax one or more of the ACID properDes
S Scalable replication and distribution S Potentially thousands of machines S Potentially distributed around the world
S Queries need to return answers quickly
S Mostly query, few updates
S Asynchronous Inserts & Updates
S NoSQL databases come in a variety of flavors S XML (myXMLDB, Tamino, Sedna) S Wide Column (Cassandra, Hbase, Big Table) S Key/Value (Redis, Memcached with BerkleyDB) S Graph (neo4j, InfoGrid) S Document store (CouchDB, MongoDB)
©2012 Sixth Sense Advisors, Inc. All Rights Reserved 39
NoSQL Footprint
©2012 Sixth Sense Advisors, Inc. All Rights Reserved 40
Size
Complexity
Amazon Dynamo
Google Big Table
Cassandra
Lotus Notes
HBase
Voldermort
Graph Theory
Map Reduce
n Technique for indexing and searching large data volumes
n Two Phases, Map and Reduce n Map
n Extract sets of Key-‐Value pairs from underlying data n Poten1ally in Parallel on mul1ple machines
n Reduce n Merge and sort sets of Key-‐Value pairs n Results may be useful for other searches
copyright: Sixth Sense Advisors Inc @2012 41
Forest Rim Technology – Textual ETL Engine (TETLE) – is an integration tool for turning text into a structure of data that can be analyzed by standard analytical tools
Textual ETL Engine
©2012 Sixth Sense Advisors, Inc. All Rights Reserved 42
ü Textual ETL Engine provides a robust user interface to define rules (or pagerns / keywords) to process unstructured or semi-‐structured data.
ü The rules engine encapsulates all the complexity and lets the user define simple phrases and keywords
ü Easy to implement and easy to realize ROI
ü Advantages ü Simple to use ü No MR or Coding required for text
analysis and mining ü Extensible by Taxonomy integra1on ü Works on standard and new databases ü Produces a highly columnar key-‐value
store, ready for metadata integra1on
ü Disadvantages ü Not integrated with Hadoop as a rules
interface ü Currently uses Sqoop for metadata
interchange with Hadoop or NoSQL interfaces
ü Current GA does not handle distributed processing outside Windows plaxorm
Integration
S All RDBMS vendors today are suppor1ng Hadoop or NoSQL as an integra1on or extension S Oracle Exaly1cs / Big Data Appliance S Teradata Aster Appliance S EMC Greenplum Appliance S IBM BigInsights S Microsoo Windows Azure Integra1on
S There are mul1ple providers of Hadoop distribu1on S CloudEra S HortonWorks S Hadapt S Zegaset S IBM
S Adapters from vendors to interface with CloudEra or HortonWorks distribu1ons of Hadoop are available today. There are integra1on efforts to release Hadoop as an integral engine across the RDBMS vendor plaxorms
©2012 Sixth Sense Advisors, Inc. All Rights Reserved 43
Conceptual Solu1on Architecture
©2012 Sixth Sense Advisors, Inc. All Rights Reserved 44
Metadata
Data Warehouse
Taxonomy
Big Data DW Textual
ETL
ETL ELT CDC
MDM
DataMart’s
OLTP
BIG Data Content Email Docs
And / Or
Which Tool
ApplicaDon Hadoop NoSQL Textual ETL
Machine Learning x x
Sen1ments x x x
Text Processing x x x
Image Processing x x
Video Analy1cs x x
Log Parsing x x x
Collabora1ve Filtering
x x x
Context Search x
Email & Content x
©2012 Sixth Sense Advisors, Inc. All Rights Reserved 45
Integration Tips
S The key to the castle in integra1ng Big Data is metadata
S Whatever the tool, technology and technique, if you do not know your metadata, your integra1on will fail
S Seman1c technologies and architectures will be the way to process and integrate the Big Data, much akin to Web 2.0 models
S Data quality for Big Data is a very ques1onable goal. To get some semblance of quality, taxonomies and ontologies can be of help
S 3rd part data providers also provide keywords, trending tags and scores, these can provide a lot of integra1on support
S Wri1ng business rules for Big Data can be very cumbersome and not all programs can be wrigen in MapReduce
©2012 Sixth Sense Advisors, Inc. All Rights Reserved 46
Success Stories S Machine learning & Recommenda1on Engines – Amazon, Orbitz
S CRM -‐ Consumer Analy1cs, Metrics, Social Network Analy1cs, Churn, Sen1ment, Influencer, Proximity
S Finance – Fraud, Compliance
S Telco – CDR, Fraud
S Healthcare – Provider / Pa1ent analy1cs, fraud, proac1ve care
S Lifesciences – clinical analy1cs, physician outreach
S Pharma – Pharmacovigilance, clinical trials
S Insurance – fraud, geo-‐spa1al
S Manufacturing – warranty analy1cs, supplier quality metrics
©2012 Sixth Sense Advisors, Inc. All Rights Reserved 47
Big Data Challenges
S Integra1on to the EDW is s1ll an open issue – Big Data reduces to small metrics, and this translates into the current state issues faced with EDW data
S Big Data requires lot of Taxonomy processing especially in Content related Search
S There are several applica1ons that need high performing memory architectures as data is compute intensive – example image processing of brain scans
S Technology is improving by the day, but integra1on and deployment are becoming equally complex.
copyright: Sixth Sense Advisors Inc @2012 48
Data Science
©2012 Sixth Sense Advisors, Inc. All Rights Reserved 49
Data Analytics Content Customer Product Behaviors Optimization
Big Data Processing & ETL
Business Intelligence Advanced Analy1cs
Art & Science
Business Analysts, Data Analysts, Metadata Architects, Data Architects are all in some evolu1onary stage of a Data Scien1st
Summary
S With effec1ve use of Big Data and Analy1cs S You can drive successful business transforma1ons S Create an agile environment for business decision processes S Use the Data Warehouse for Analy1cal Processes as it was
originally designed for S Create predic1ve insights S Prac1cally “mine (explore)” any data from any source S Create powerful dashboards from near real 1me data S Reduce risk S Increase compe11veness
Contact
Krish Krishnan
Twiger -‐ @datagenius
copyright: Sixth Sense Advisors Inc @2012 51