richard langlois - yellow pages canada at the chief analytics officer forum west coast

46
Richard Langlois Search, Big Data & Analytics Director May 2016 Walking before you can run Building a solid foundation for future success

Upload: corinium-global

Post on 13-Feb-2017

67 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

Richard Langlois Search, Big Data & Analytics Director

May 2016

Walking before you can run –

Building a solid foundation for future success

Page 2: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

About Yellow Pages

(Canada)

Page 3: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

3

About Yellow Pages (YP) - Canada

• Yellow Pages is a leading digital media and marketing solutions company in Canada

• We serve approximately 245 000 (Q4, 2015) local businesses through our nationwide sales force of media consultants

• YP also caters to the country’s largest national agencies and advertisers through Mediative, its national digital advertising and marketing solutions division

A b o u t Y e l l o w P a g e s - C a n a d a

CAO Forum – West Coast

Page 4: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

4

The Building Blocks of Relevance

What is Content? What is User Data? What is Knowledge?

Advertiser Content

(merchants, products,

deals …)

User Characteristics

(demographics, preferences, …)

Linguistic

(synonyms, acronyms, multilingual,…)

Editorial Content

(smart tips, top ten lists, …)

User Context

(location, time, device, …)

Geo Data

(location names, polygons, …)

User-Generated Content

(reviews, ratings …)

User Behavior

(origin, session actions, …)

Role:

Provide the data that the

user is looking for

Role:

Understand the user’s bias,

intentions and propensity

Role:

Help the Search Engine to bridge the

gap between user queries and content

data (i.e., natural terms and language)

A b o u t Y e l l o w P a g e s - C a n a d a

CAO Forum – West Coast

Advertisers Looking for Qualified

Consumers

Consumers Looking for the “Best

Merchant & Product”

YP’s Business Model is a

Multi-Sided Platform

Page 5: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

5

The Building Blocks of Relevance

What is Search? What is

Presentation?What is Optimization?

An engine that:

1) interprets user queries

An engine that:

builds the presentation layout

based on the semantic value of

content

Scenario Management, Tracking and Analytics

2) extracts the proper

result set

Example: “Movie Theater” vs

“Restaurant” Layouts

Provides the ability to test and measures the

results for variations applied in the other

building blocks

3) and rank the results

Role:

It is at the heart of the user

search process

Role:

Ensures that user quickly gets

the most out of the content

presented and enhances

conversion

Role:

Provides the “empirical / scientific” tools to

optimize relevance, user engagement and

conversion

SEO keywords searched in Google, Yahoo & Bing are not provided anymore

A b o u t Y e l l o w P a g e s - C a n a d a

CAO Forum – West Coast

Page 6: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

6

What is Relevance Optimization?

Relevance of an information retrieval process is … … ensuring that, for a given query, the best answer is displayed in search result in rank 1

… and at the end knowing the user intent even before he

expressed it • Trending topics

• Autocomplete

• …

A b o u t Y e l l o w P a g e s - C a n a d a

CAO Forum – West Coast

Page 7: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

7

Content is also syndicated on other platforms A b o u t Y e l l o w P a g e s - C a n a d a

CAO Forum – West Coast

NetSync

Page 8: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

8

About YP Digital Transformation Programs

Digital revenue represented approximately 62% of consolidated revenues (Q4, 2015)

Advertisers

Florists, electricians,

restaurant owners… as well

as large retailers such as

Best Buy, Walmart, RONA

Customer

experience

Sales effectiveness

& efficiency

Content Accuracy

& Relevance

Big Data &

Analytics

Third-party

content &

advertisement

platforms

Consumer

engagement

Consumers

464 million visits made across

YP’s online and mobile

properties & syndication

partners (2015 )

Operational excellence

A b o u t Y e l l o w P a g e s - C a n a d a

CAO Forum – West Coast

Page 9: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

Understanding the Spectrum

of Analytics and Big Data

Page 10: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

10

What is Analytics? (The 4 Types of Analytics)

• Use Descriptive analytics to understand historical

performance, alert you to events, spot trends, and

monitor the performance of models and actions • Scorecards

• Dashboards

• Reports

• Use Diagnostic analytics to visualize and interact

with the data, discover relationships, and explain

why you are observing outcomes, events or trends • Query & Analysis (OLAP)

• Data Discovery

• Use Predictive analytics to answer questions

about the future or to determine the likelihood of

unknown outcomes • Statistics

• R and others

• Use Prescriptive analytics to optimize decisions,

efficiently allocate resources or find the best

customer treatments • Optimization

• What-if

• Statistics

• Scoring Systems

U n d e r s t a n d i n g t h e S p e c t r u m o f A n a l y t i c s a n d B i g D a t a

CAO Forum – West Coast

Page 11: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

11

Top Trends in Analytics

CAO Forum – West Coast

U n d e r s t a n d i n g t h e S p e c t r u m o f A n a l y t i c s a n d B i g D a t a

Page 12: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

12

Which Problems Does Analytics Solve? U n d e r s t a n d i n g t h e S p e c t r u m o f A n a l y t i c s a n d B i g D a t a

CAO Forum – West Coast

Page 13: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

13

What is Big Data?

Massive volume of both structured and unstructured data (variety) that is so large it is difficult to process using traditional database and software techniques. Another aspect is velocity, which is about the rate of changes, about linking data sets that are coming with different speeds and about bursts of activities, rather than habitual steady tempo

Why the rise in popularity?

• Availability of much cheaper open source platforms (such as Hadoop) allowing to process Big Data

• Heavy algorithms used in Machine Learning can now run in reasonable amount of time

• Availability of ‘new data’ IoE (Internet of Everything)

U n d e r s t a n d i n g t h e S p e c t r u m o f A n a l y t i c s a n d B i g D a t a

CAO Forum – West Coast

Page 14: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

14

Big Data and Analytics Challenges and Priorities U n d e r s t a n d i n g t h e S p e c t r u m o f A n a l y t i c s a n d B i g D a t a

CAO Forum – West Coast

Page 15: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

Big Data & Analytics

At Yellow Pages

Hadoop is data's darling for a reason — it thoroughly disrupts the economics

of data, analytics, and data-driven applications

Enterprise adoption is mandatory for firms that wish to double-down on

advanced analytics and create insights-driven applications to help them

succeed in the age of the customer Forrester January 2016

Page 16: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

16

Top Trends in Analytics and YP CTR Squad

Search Relevance

Multiple metrics

from partners

are aggregated

Analytics providers (cloud)

Databricks

BigQuery

Hadoop - Cloudera (on premise)

MPN (Brain)

Events are captured in real-time

Processing (Daily Batch)

COMPASS (IBR)

Our BIGGEST Challenge

Google processes photos

Facebook

COMPASS, YPA, CMR

OLAP on Hadoop

CAO Forum – West Coast

B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s

Page 17: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

17

The Path to being Analytical Innovators B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s

CAO Forum – West Coast

Collect & Integrate

Store & Maintain

Manipulate & Analyse

Use / Derive Insights

Cu

rren

t

Sit

uati

on

An

aly

tic

al

Inn

ovato

rs

Data are gathered in silos:

• Gaps

• Inconsistencies

• No Centralization

• Unstandardized

dimensions

Data have deficiencies:

• Quality Gaps

• Reliability

• Untimely

• Low Governance

• Historical data is

overridden

Data have fragmented

usage:

• Bad Accessibility

• Limited technology &

support

• Unstandardized

business rules

Data are used mostly to

address tactical and

operational issues

Data are a core

business asset:

• Complete

• Consistent

• Centralized

• Standardized

Data are reliable,

accurate, timely and

adequate:

• Quality standards

and thresholds

• Clear, defined

Governance

Data usage is

widespread and used

in a collaborative

culture:

• High accessibility

• Right technology

• Support/Experts

Data are used to create a

competitive advantage and to

fuel innovation:

• Thought leadership

• Dissemination of insights

• Faster, more effective use

of more data

Understanding starting position and journey ahead!

Page 18: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

18

Our Analytics Customers and Applications

Applications / Customers

CAA DB & Services

YP Analytics Digital Media Dashboards

Solution Provider

DAD & Services (includes AtScale)

CanPipe Hadoop Cluster

SAT Tool Tag Commander

Compass l

Mediative l l l

Advertisers l l

Sales & Support teams

l l

Digital Analytics (consumers)

l l l

CTR Squad l

Corporate Performance

l

Search Relevance l l l l

Business Intelligence l l

Marketing l l l

Executives l

l Current

l Future (in 2016)

Move Fast: ‘We have a saying: Move fast and break things. ‘The idea is that if you

never break anything, you’re probably not moving fast enough’’

CAO Forum – West Coast

B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s

Page 19: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

19

Ease of Access Health Check - YP

Page View: An instance of a

consumer (user) visiting a

particular page on a digital

property

Impression: Each time an ad

and/or merchant is fetched it is

counted as one impression

Click: A conversion metric, an

event can be any logged or

recorded action that has a

specific date and time assigned

to it by either the browser or

server

Platform: Analytics Data Source

for a YP Digital media property

Page views or Impressions

Platform MID Asset Search Geo User Agent Platform Id Granular Good Health

YP Web • Anametrix • Analytics DB • CanPipe

✔ ✔ ✔

✖ ✖ ✖

✔ ✖ ✔

✔ ✖ ✔

✔ ✔ ✔

✔ ✔ ✔

✔ ✔ ✔

✖ ✖ ✖

YP Mobile Web • Anametrix • Analytics DB • CanPipe

✔ ✔ ✔

✖ ✖ ✖

✔ ✖ ✔

✔ ✖ ✔

✔ ✖ ✔

✔ ✖ ✔

✔ ✖ ✔

✖ ✖ ✖

YP iOS • Anametrix • Analytics DB • CanPipe

✔ ✔ ✔

✖ ✖ ✖

✔ ✖ ✔

✔ ✖ ✔

✔ ✖ ✔

✔ ✖ ✔

✔ ✖ ✔

✖ ✖ ✖

YP Android • Anametrix • Analytics DB • CanPipe

✔ ✔ ✔

✖ ✖ ✖

✔ ✖ ✔

✔ ✖ ✔

✔ ✖ ✔

✔ ✖ ✔

✔ ✖ ✔

✖ ✖ ✖

Clicks Platform MID Asset Search Geo User Agent Platform Id Granular Good Health

YP Web • Anametrix • Analytics DB • CanPipe

✔ ✔ ✔

✖ ✖ ✖

✔ ✖ ✔

✔ ✖ ✖

✔ ✔ ✔

✔ ✔ ✔

✔ ✔ ✔

✖ ✖ ✖

YP Mobile Web • Anametrix • Analytics DB • CanPipe

✔ ✔ ✔

✖ ✖ ✖

✔ ✖ ✔

✔ ✖ ✖

✔ ✖ ✔

✔ ✖ ✔

✔ ✖ ✔

✖ ✖ ✖

YP iOS • Anametrix • Analytics DB • CanPipe

✔ ✔ ✔

✖ ✖ ✖

✔ ✖ ✔

✔ ✖ ✖

✔ ✖ ✔

✔ ✖ ✔

✔ ✖ ✔

✖ ✖ ✖

YP Android • Anametrix • Analytics DB • CanPipe

✔ ✔ ✔

✖ ✖ ✖

✔ ✖ ✔

✔ ✖ ✖

✔ ✖ ✔

✔ ✖ ✔

✔ ✖ ✔

✖ ✖ ✖

B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s

CAO Forum – West Coast

Page 20: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

20

Current Analytics Ecosystem

B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s

CAO Forum – West Coast

Page 21: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

21

Where do we stand?

• Deploying a user-friendly (and accessible) internal tools for ad-hoc reporting and analysis based on

Tableau & AtScale

• Keep a large history of instantly accessible raw data (the Big Data -- 25 months rolling)

• Create an Analytical Data Service Layer allowing applications such as Compass to get the SAME

data/information for both Consumer and Advertiser Analytics

• Integrating and replacing the current user behaviour tracking product … by …

• Document, streamline, and optimize the data loading processes (ETL) to facilitate rapid growth of data

• …

Stop the Bleeding

2012 Lay the Foundation

2013

Seeing the Changes

2014

Adding functionalities

2015

1 Digital Analytics DB

2016

B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s

CAO Forum – West Coast

Business Challenges: • Reliable Analytics (Enterprise KPI)

• Advertisers Self-Serve ROI Analytics (YP Analytics)

• COMPASS (Sales Cockpit)

• Campaign Management Automation (Optimization)

• Sale Recommendation Engine (Optimization)

• Providing extended Internal Analytics DB

• Providing proper environments for all type of Analytics

Page 22: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

22

Digital Analytics DB – Current Data Architecture

• DAD Phase 1 objective: To capture users raw events

once and have appropriate data architecture allowing

to analyze from Advertisers, Consumers or Search

perspectives

• A new Campaign Dimension has been introduced,

which will improve the conformity when existing fact

tables point to it

• RAM is reclustered every day to guarantee good

response time

• …

B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s

CAO Forum – West Coast

Page 23: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

23

YP Analytics – Much better now

• But still does not include entire YP products portfolio

• 2015

• Supports SEM TP and SEO

• Ad-hoc capability provided on Hadoop via Tableau & AtScale

• 2014

• YPA is uncoupled, now calling data services

• Adequate/on-boarding registration process

• UI look and feel has been completely redesigned according to

requirements

• Uses Market Data

• Incorporates Revenue Calculator functionality

• Includes traffic from Merchant website

• 2013

• Deployed a Commercial Off-The-Shelf BI tool (Webfocus)

• Replaced unreliable Print Capture with Webfocus reports

• Now with 25 months of data instead of 13

Won Big Data & Analytics Octas in 2015 for the

2014 YPA delivery!

B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s

CAO Forum – West Coast

Page 24: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

24

Digital Data Collection Status

Data Collection

Analytics

Ad Servers

Retargeting

Conversion Testing

DMP

Automation

Highly critical function that aims to:

• Collect the digital data to feed analytics platforms

• Manage the deployment of the YP digital marketing

ecosystem

Main projects since 2012:

Standardized the data collection methodologies across the

YP digital platforms

Unified data layer to prevent the data fragmentation

Introduced a data governance process and developed a

data monitoring dashboard to improve the quality and the

consistency of the data

Introduced a new Server Side technology to reduce the

impact of collecting data in the YP digital platforms as well

as building a technological foundation to break the

analytics silos

2016 challenges:

• …

B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s

CAO Forum – West Coast

Page 25: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

25

ETL (Extract, Transform & Load) Status

• All ETL inherited from outsourcer are written in Java

• New ETL are being done in Java, Pig script, and SQL

• Will start using existing SSIS tool for non massive ETL jobs (SSIS running natively on Hadoop is not available)

• SPARK streaming is being tested

• Investigating if we should use an ETL tool that could execute directly on the Hadoop cluster (tools such as Pentaho, Iway Big Data Integrator or Informatica)

• New sources are stored raw (data lakes) on Hadoop, in Avro format, for allowing immediate user access (in this mode, the data is not certified)

B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s

CAO Forum – West Coast

Page 26: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

26

Hadoop Clusters Status

• Lab, Dev, QA, and Production clusters have been deployed (no pre-prod. cluster)

• Dev and QA clusters need to be upgraded (in progress) and we are deploying Openstack for spawning Hadoop clusters

• Cloudera EDH (Enterprise Data Hub) licenses on QA and Production clusters to support Advanced Analytics (on Spark) and descriptive and diagnostic analytics using Impala, AtScale and rest of Hadoop stack

• Mediative will be moving away from Databricks when Cloudera Notebook becomes available

• Mediative migration from Big Query to Hadoop cluster is in progress

• Currently, the Hadoop Production Cluster is used for running ETL jobs and hosting the DAD DB (currently a replication of the CAA DB) with accessibility via AtScale and Tableau

• DAD project phase 1 (in progress) will host consumer analytics data on Hadoop which will make Hadoop our de facto Analytics Platform

• Real-time acquisition and processing (in progress)

• ML on Hadoop to start when Cloudera notebook is deployed

B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s

CAO Forum – West Coast

Page 27: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

Big Data & Analytics

Strategy & Architecture at YP

Page 28: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

28

YP Digital Analytics Tomorrow

Single Source of Truth for

• Consumers | Searches | Advertising

Complete Source of Truth

• Assets based | Rich Context | Granular | All

products/channels are collected | Veracity / Governed

Efficient Data Collection

• One Time Collection | No dependency on analytics solution

supplier

Data Accessibility

• Volume | Variety | Good response time

Real Time

• Appropriate Velocity

B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P

CAO Forum – West Coast

Page 29: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

29

Breaking Down Analytics Silos

Digital

Data

Consumer Analytics • Total Digital Population (Direct & Owned, Paid, SEO)

• Engagement (incl. Profiling, Growth Hacker)

• Interactions (leads)

• Behaviour

• Social Media & Personalisation

Advertiser Analytics • Advertiser ROI (incl. Lead box)

• Life-cycle

• Market Potential

• Campaign Management (incl. MPN / YP Lead) / Placement

Performance

• Data Monetization

Search Relevancy • Content

• Search Experience

• Consumer Profile Based Search

Business Intelligence • Sales

• Financial

• Corporate Performance

• Segmentation & Pricing

B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P

CAO Forum – West Coast

Page 30: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

30

All Analytics Data on Hadoop! BENEFITS:

• Enables a complete view of digital performance from consumer behaviour to

advertiser ROI

• Reduce Data Manipulations and allow better and faster analytics

• Data can be correlated without gymnastics

• Increase Data Quality and Data Governance (1 version of the truth)

• Data easily accessible through visualization tools such as Tableau

• Data Science and Machine learning using same data!

• Since existing Operational Analytics platform perform well, lets keep them

to free up Hadoop resources

Operational

Analytics

CAA Services

Infobright DB

YP Analytics

COMPASS

CURRENTLY: - Most Data Sources go directly to Infobright DB

- In order to populate Hadoop, we are, since 2015,

replicating data from Infobright to Hadoop

Digital Analytics

(Hadoop)

ETL jobs

(Map/Reduce)

Self-Serve

OLAP

Future (2016)

Discovery Data Mining,

ML (Spark)

Resources Management (YARN)

DATA (HDFS)

Future

Replicated data

Currently

Replication

Clustered data

CURRENTLY: - Only Big Data Analytics sources go thru Hadoop

FUTURE: - All Data Analytics sources to go thru Hadoop

Future (2016)

Operational

Analytics

, Excel, etc.

MPN

Search &

Consumer

relevance

Programming languages

(Scala, Python)

Data scientists

Be Bold: ‘We encourage everyone to make bold decisions, even

if that means being wrong some of the time’ B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P

CAO Forum – West Coast

Page 31: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

31

All Digital Analytics Data on Hadoop post DAD

CAO Forum – West Coast

B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P

Page 32: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

32

Application…

Application 2

Unified and Governed Digital Data Collection

• 1 single TMS for all YP

• TMS needs to be ‘Analytics solution neutral’

• Changing TMS is costly (even with Standard Data Layer)

• Need to retag

• It takes 6 months for mobile users to transfer

• It is acceptable, even recommended, to keep a hybrid environment

where analytics based applications coexist with the Digital Analytics

Database (DAD). Benefits:

• Allows selection of best of breed solutions

• Speedier time to market

• Does not prevent future phase-out of solution when a DAD based solution

becomes available

• Real-time correlation of the client-side data and the server-side data

is an enabler (when it makes sense) to break analytics silos. Benefits:

• Cost and maintenance efficiency

• Significant improvement of the analytics capabilities

• 1 uniquely identified event

Back-End Data Layer

YP API

Front-End Data Layer

Data Hub

Global Digital Analytics DB

Application 1

B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P

CAO Forum – West Coast

Page 33: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

33

Real-Time Data Acquisition B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P

CAO Forum – West Coast

Page 34: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

34

We keep the Internal / External metrics divide

• Analytics and BI are similar in nature, but here are

some differences: • Analytics is more closely associated with Big Data since

the nature of data captured is huge (Facebook, tweeter

feed, logs, etc.)

• Analytics is closely related with Advanced Analytics aka

Machine Learning / Deep Machine Learning (AI), etc.

• Analytics is closely associated with MPP technology like

Hadoop

• At YP: • BI Refers to internal metrics and the BI group uses the

Microsoft stack

• While Digital Analytics refers to external metrics, and the

Big Data & Analytics group uses Hadoop based

technologies & components with additional visualization

tools

• Please note that Internal Data could be moved inside

the Digital Analytics DB (DAD) if required for

performance/costs/requirement reasons

Operational Analytics

Business Intelligence

(BI)

Business Activity

Monitoring (BAM)

Digital Analytics

External Metrics Internal Metrics

Data

Governance

Long term persistence

Short term persistence

Integration

into

B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P

CAO Forum – West Coast

Page 35: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

35

Moving toward Bi-Modal Big Data Analytics

Characteristics: • Reliable

• Robust

• Monitored

• Single version of the Truth

• Slower to Implement

• Faster to query

• IT Centric

• Plan-driven

• Internal Platform

(but does not have to)

Characteristics: • Flexibility

• Multiple Prototypes

• Faster to Implement

• Slower to query

• Business Centric

• Empirical, iterative

• Internal (the Industrial Analytic)

& Multiple External platforms

B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P

CAO Forum – West Coast

Page 36: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

36

Bi-Modal: Industrial Analytics & Discovery/Data Lakes

Discovery/Data Lakes

Need to explore new data sources

to support new initiatives

• It should be exception based and not the

norm

• The new data sources to explore should be

used in conjunction with industrialized data

(blend)

Industrial Analytics

DATA (HDFS, Parquet)

DB definition

(Hive)

SQL Engine

(Impala)

Metadata/OLAP + Aggregate mgt.

(AtScale)

SQL/MDX dialects + ODBC, JDBC or

OLE/DB protocols (AtScale)

ETL Jobs

Hadoop (Cloudera)

Common semantic layer +

aggregate management

(adaptive cache)

• Shifting ETL transformations to metadata layer to reduce ETL delivery time

• Adding Hadoop based ETL tool

Light ETL (data

lakes)

New Data Sources

B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P

CAO Forum – West Coast

Page 37: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

37

The Power of ‘Behind the Scenes’ Aggregates

Query performance was tested in the YP’s Production Hadoop environment

• AtScale allocated at 30% of Hadoop cluster capacity

• Queries written and executed manually through HUE

• Same queries executed via Tableau with no AtScale aggregates

• Same queries executed again (clean Tableau cache). AtScale engine hits the aggregate table

B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P

CAO Forum – West Coast

Page 38: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

38

Yellow Pages Tableau Deployment

Tableau Creators: 5%

They create Dashboards & Data

Sandboxes for their Consumers

Tableau

Advanced

Consumers:

15%

They can use Dashboards & Data

Sandboxes

Tableau

Consumers:

80%

They can only use Dashboards

B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P

CAO Forum – West Coast

Page 39: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

Big Data & Analytics

Services Offering

Page 40: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

40

Big Data & Analytics Services Offering

Strategy, Planning and Architecture Services

Services Offering Benefits Modes

Big Data Analytics

roadmaps

In a context of going faster, new silos will be created and will need

to be industrialized back in the Hadoop Data Warehouse (HDW)

1

Data Architecture New data sources will require Data Architecture services to maintain integration and correlation of all digital events in HDW

1

Semantic Data

Layer

To support self-serve analysis via OLAP 1,2

Hadoop Capacity Planning

To plan growth of Hadoop clusters based on planned projects and

actual usage

1

Image from: library.gatech.edu

B i g D a t a & A n a l y t i c s S e r v i c e s O f f e r i n g

CAO Forum – West Coast

Page 41: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

41

Big Data & Analytics Services Offering

Development Services

Services Offering Benefits Modes

Data Collection Allowing optimized (low impacts on digital properties) collection of digital

events (standardized and contextually rich)

1, 2

Robust ETL development (error management, monitoring, scheduling, audit & control)

To maintain data freshness with proper SLA 1

Development of operational analytics application

To build operational platforms such as YP Analytics 1

Development of data services

Provide data services used by applications such as Compass and YP

Analytics to ensure data consistency

1

Development of analytics/BI reports and dashboards

For complex reports requiring more advance expertise and production of

corporate dashboards

1,2

Image from: forum.khonkaenlink.info

B i g D a t a & A n a l y t i c s S e r v i c e s O f f e r i n g

CAO Forum – West Coast

Page 42: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

42

Big Data & Analytics Services Offering

Enablement & Data Services

Services Offering Benefits Modes

Aggregates management and query rewrite capability

To shield analysts & data scientists from cumbersome data manipulation work which could represent up to 90% of their time

1,2

Provide profiling/discovery/light ETL tools

To use raw data sources (data lakes) 1, 2

Data sciences For business units without data scientists 1, 2

Data Quality & Governance To ensure right level of data quality 1

Image from: www-03.ibm.com

B i g D a t a & A n a l y t i c s S e r v i c e s O f f e r i n g

CAO Forum – West Coast

Page 43: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

Assessing needs and requirements

and learning from others

Page 44: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

44

On Requirements A s s e s s i n g n e e d s a n d r e q u i r e m e n t s a n d l e a r n i n g f r o m o t h e r s

CAO Forum – West Coast

• Meet all stakeholders (operations, managers, executives)

• Have them active in project vs keep them in the loop (product owner, sprint demo, project updates, etc)

• Establish a common vocabulary/taxonomy • Use Conceptual Data Model

• Decouple ‘final use’ which is what is being requested from ‘atomic data’

• Establish a scalable MVP (Minimum Viable Product)

• Maintain the BRD (what is expected) as Solution is being designed

Page 45: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

45

On Learning from Others A s s e s s i n g n e e d s a n d r e q u i r e m e n t s a n d l e a r n i n g f r o m o t h e r s

CAO Forum – West Coast

• Define your Service Offerings

• Ask other teams to demonstrate what they have built and how they did it

• Understand the gaps from your Service Offerings to what your customers need • Providing the data and processing capabilities was not sufficient for a group of Data Scientists

• Revalidate your architecture and roadmaps based on new information

• Be careful with acquired reflexes, you might have to relearn!

• Before ignoring a new approach/method come back to it in a few months, you might have a new context which is now favorable to it

Page 46: Richard Langlois - Yellow Pages Canada at the Chief Analytics Officer Forum West Coast

Thank You!