using sas/hadoop to support marketing analytics with big data kerem tomak vp, marketing analytics,...
TRANSCRIPT
Using SAS/Hadoop to Support Marketing Analytics with Big Data
Kerem TomakVP, Marketing Analytics,
Macys.com
Agenda
• Who is the customer?• Life and death of a customer• Data galore• Crystal Ball• What matters the most…
Who is the customer?
• .com• stores
Life of a customer
• Present value of all future profits obtained from a customer over his or her life of relationship with a firm.
• The CLV of a customer i is the discounted value of the future profits yielded by this customer
• Where– CFi,t = net cash flow generated by the customer i activity at time t– h = time horizon for estimating the CLV– d = discount rate
• The CLV is the value added, by an individual customer, to the company
Customer Lifetime Value
Why is CLV important ?• By knowing the CLV of the customers, one can
– Focus on groups of customers of equal wealth– Evaluate the budget of a marketing campaign– Measure the efficiency of a past marketing campaign by
evaluating the CLV change it incurred• Focus on the most valuable customers, which deserve to be closely
followed• Neglect the less valuable ones, to which the company should pay less
attention
– Use CLV to introduce new segmentation opportunities
Tapping into the data
• Data Storage• Reporting• Analytics• Advanced Analytics
– Computing with big datasets is a fundamentally different challenge than doing “big compute” over a small dataset
Unutilized data that can be available to business
Utilized data
Hadoop & RDBMS Analogy
Cargo train:• rough• missing a lot of “luxury”• slow to accelerate• carries almost anything• moves a lot of stuff very
efficiently
Sports car:• refined• has a lot of features• accelerates very fast• pricey• expensive to maintain
RDBMS & Hadoop is like car & trainRDBMS Hadoop
RDBMS & Hadoop Comparison*Traditional RDBMS (Oracle, DB2) Hadoop
Maximum Data Capacity Up to 100’s of TBs Up to 10’s of PBs (hundreds times more)
Processing Capacity Up to 10’s of TBs Up to 10’s of PBs (thousands times more)
Costs High software, license and hardware/storage costs
Cost effective: commodity hardware + open source software
Transactional Yes No (batch process)
Update Patterns Supported Not Supported Yet
Schema Complexity Structured (tables only) Structured or Unstructured
Processing Freedom SQL MapReduce, SQL (Hive), Streaming, Pig, HBase, etc..
Scalability Non-linear scaling Fully distributed and linearly scalable
Reliability Fault-tolerant at high cost, but without self-healing by design
Fault-tolerant and self-healing by desing
Real Time Response Yes No (HBase required)
* Cloudera comparison chart
Crystal Ball
10Source: Forrester
Toolshed
What matters the most
• Building data infrastructure– Fast processing of large amounts of data and deployment of model
scoring on the same environment• Business task execution
– Real-time optimization for customized offer management• Planning tools
– Give analytical guidelines to campaign management• Strategic support
– Develop robust analytics that look at customer’s environment
“Making sense out of models” “Deploying in production”