richard langlois - yellow pages canada at the chief analytics officer forum west coast
TRANSCRIPT
Richard Langlois Search, Big Data & Analytics Director
May 2016
Walking before you can run –
Building a solid foundation for future success
About Yellow Pages
(Canada)
3
About Yellow Pages (YP) - Canada
• Yellow Pages is a leading digital media and marketing solutions company in Canada
• We serve approximately 245 000 (Q4, 2015) local businesses through our nationwide sales force of media consultants
• YP also caters to the country’s largest national agencies and advertisers through Mediative, its national digital advertising and marketing solutions division
A b o u t Y e l l o w P a g e s - C a n a d a
CAO Forum – West Coast
4
The Building Blocks of Relevance
What is Content? What is User Data? What is Knowledge?
Advertiser Content
(merchants, products,
deals …)
User Characteristics
(demographics, preferences, …)
Linguistic
(synonyms, acronyms, multilingual,…)
Editorial Content
(smart tips, top ten lists, …)
User Context
(location, time, device, …)
Geo Data
(location names, polygons, …)
User-Generated Content
(reviews, ratings …)
User Behavior
(origin, session actions, …)
Role:
Provide the data that the
user is looking for
Role:
Understand the user’s bias,
intentions and propensity
Role:
Help the Search Engine to bridge the
gap between user queries and content
data (i.e., natural terms and language)
A b o u t Y e l l o w P a g e s - C a n a d a
CAO Forum – West Coast
Advertisers Looking for Qualified
Consumers
Consumers Looking for the “Best
Merchant & Product”
YP’s Business Model is a
Multi-Sided Platform
5
The Building Blocks of Relevance
What is Search? What is
Presentation?What is Optimization?
An engine that:
1) interprets user queries
An engine that:
builds the presentation layout
based on the semantic value of
content
Scenario Management, Tracking and Analytics
2) extracts the proper
result set
Example: “Movie Theater” vs
“Restaurant” Layouts
Provides the ability to test and measures the
results for variations applied in the other
building blocks
3) and rank the results
Role:
It is at the heart of the user
search process
Role:
Ensures that user quickly gets
the most out of the content
presented and enhances
conversion
Role:
Provides the “empirical / scientific” tools to
optimize relevance, user engagement and
conversion
SEO keywords searched in Google, Yahoo & Bing are not provided anymore
A b o u t Y e l l o w P a g e s - C a n a d a
CAO Forum – West Coast
6
What is Relevance Optimization?
Relevance of an information retrieval process is … … ensuring that, for a given query, the best answer is displayed in search result in rank 1
… and at the end knowing the user intent even before he
expressed it • Trending topics
• Autocomplete
• …
A b o u t Y e l l o w P a g e s - C a n a d a
CAO Forum – West Coast
7
Content is also syndicated on other platforms A b o u t Y e l l o w P a g e s - C a n a d a
CAO Forum – West Coast
NetSync
8
About YP Digital Transformation Programs
Digital revenue represented approximately 62% of consolidated revenues (Q4, 2015)
Advertisers
Florists, electricians,
restaurant owners… as well
as large retailers such as
Best Buy, Walmart, RONA
Customer
experience
Sales effectiveness
& efficiency
Content Accuracy
& Relevance
Big Data &
Analytics
Third-party
content &
advertisement
platforms
Consumer
engagement
Consumers
464 million visits made across
YP’s online and mobile
properties & syndication
partners (2015 )
Operational excellence
A b o u t Y e l l o w P a g e s - C a n a d a
CAO Forum – West Coast
Understanding the Spectrum
of Analytics and Big Data
10
What is Analytics? (The 4 Types of Analytics)
• Use Descriptive analytics to understand historical
performance, alert you to events, spot trends, and
monitor the performance of models and actions • Scorecards
• Dashboards
• Reports
• Use Diagnostic analytics to visualize and interact
with the data, discover relationships, and explain
why you are observing outcomes, events or trends • Query & Analysis (OLAP)
• Data Discovery
• Use Predictive analytics to answer questions
about the future or to determine the likelihood of
unknown outcomes • Statistics
• R and others
• Use Prescriptive analytics to optimize decisions,
efficiently allocate resources or find the best
customer treatments • Optimization
• What-if
• Statistics
• Scoring Systems
U n d e r s t a n d i n g t h e S p e c t r u m o f A n a l y t i c s a n d B i g D a t a
CAO Forum – West Coast
11
Top Trends in Analytics
CAO Forum – West Coast
U n d e r s t a n d i n g t h e S p e c t r u m o f A n a l y t i c s a n d B i g D a t a
12
Which Problems Does Analytics Solve? U n d e r s t a n d i n g t h e S p e c t r u m o f A n a l y t i c s a n d B i g D a t a
CAO Forum – West Coast
13
What is Big Data?
Massive volume of both structured and unstructured data (variety) that is so large it is difficult to process using traditional database and software techniques. Another aspect is velocity, which is about the rate of changes, about linking data sets that are coming with different speeds and about bursts of activities, rather than habitual steady tempo
Why the rise in popularity?
• Availability of much cheaper open source platforms (such as Hadoop) allowing to process Big Data
• Heavy algorithms used in Machine Learning can now run in reasonable amount of time
• Availability of ‘new data’ IoE (Internet of Everything)
U n d e r s t a n d i n g t h e S p e c t r u m o f A n a l y t i c s a n d B i g D a t a
CAO Forum – West Coast
14
Big Data and Analytics Challenges and Priorities U n d e r s t a n d i n g t h e S p e c t r u m o f A n a l y t i c s a n d B i g D a t a
CAO Forum – West Coast
Big Data & Analytics
At Yellow Pages
Hadoop is data's darling for a reason — it thoroughly disrupts the economics
of data, analytics, and data-driven applications
Enterprise adoption is mandatory for firms that wish to double-down on
advanced analytics and create insights-driven applications to help them
succeed in the age of the customer Forrester January 2016
16
Top Trends in Analytics and YP CTR Squad
Search Relevance
Multiple metrics
from partners
are aggregated
Analytics providers (cloud)
Databricks
BigQuery
Hadoop - Cloudera (on premise)
MPN (Brain)
Events are captured in real-time
Processing (Daily Batch)
COMPASS (IBR)
Our BIGGEST Challenge
Google processes photos
COMPASS, YPA, CMR
OLAP on Hadoop
CAO Forum – West Coast
B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s
17
The Path to being Analytical Innovators B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s
CAO Forum – West Coast
Collect & Integrate
Store & Maintain
Manipulate & Analyse
Use / Derive Insights
Cu
rren
t
Sit
uati
on
An
aly
tic
al
Inn
ovato
rs
Data are gathered in silos:
• Gaps
• Inconsistencies
• No Centralization
• Unstandardized
dimensions
Data have deficiencies:
• Quality Gaps
• Reliability
• Untimely
• Low Governance
• Historical data is
overridden
Data have fragmented
usage:
• Bad Accessibility
• Limited technology &
support
• Unstandardized
business rules
Data are used mostly to
address tactical and
operational issues
Data are a core
business asset:
• Complete
• Consistent
• Centralized
• Standardized
Data are reliable,
accurate, timely and
adequate:
• Quality standards
and thresholds
• Clear, defined
Governance
Data usage is
widespread and used
in a collaborative
culture:
• High accessibility
• Right technology
• Support/Experts
Data are used to create a
competitive advantage and to
fuel innovation:
• Thought leadership
• Dissemination of insights
• Faster, more effective use
of more data
Understanding starting position and journey ahead!
18
Our Analytics Customers and Applications
Applications / Customers
CAA DB & Services
YP Analytics Digital Media Dashboards
Solution Provider
DAD & Services (includes AtScale)
CanPipe Hadoop Cluster
SAT Tool Tag Commander
Compass l
Mediative l l l
Advertisers l l
Sales & Support teams
l l
Digital Analytics (consumers)
l l l
CTR Squad l
Corporate Performance
l
Search Relevance l l l l
Business Intelligence l l
Marketing l l l
Executives l
l Current
l Future (in 2016)
Move Fast: ‘We have a saying: Move fast and break things. ‘The idea is that if you
never break anything, you’re probably not moving fast enough’’
CAO Forum – West Coast
B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s
19
Ease of Access Health Check - YP
Page View: An instance of a
consumer (user) visiting a
particular page on a digital
property
Impression: Each time an ad
and/or merchant is fetched it is
counted as one impression
Click: A conversion metric, an
event can be any logged or
recorded action that has a
specific date and time assigned
to it by either the browser or
server
Platform: Analytics Data Source
for a YP Digital media property
Page views or Impressions
Platform MID Asset Search Geo User Agent Platform Id Granular Good Health
YP Web • Anametrix • Analytics DB • CanPipe
✔ ✔ ✔
✖ ✖ ✖
✔ ✖ ✔
✔ ✖ ✔
✔ ✔ ✔
✔ ✔ ✔
✔ ✔ ✔
✖ ✖ ✖
YP Mobile Web • Anametrix • Analytics DB • CanPipe
✔ ✔ ✔
✖ ✖ ✖
✔ ✖ ✔
✔ ✖ ✔
✔ ✖ ✔
✔ ✖ ✔
✔ ✖ ✔
✖ ✖ ✖
YP iOS • Anametrix • Analytics DB • CanPipe
✔ ✔ ✔
✖ ✖ ✖
✔ ✖ ✔
✔ ✖ ✔
✔ ✖ ✔
✔ ✖ ✔
✔ ✖ ✔
✖ ✖ ✖
YP Android • Anametrix • Analytics DB • CanPipe
✔ ✔ ✔
✖ ✖ ✖
✔ ✖ ✔
✔ ✖ ✔
✔ ✖ ✔
✔ ✖ ✔
✔ ✖ ✔
✖ ✖ ✖
Clicks Platform MID Asset Search Geo User Agent Platform Id Granular Good Health
YP Web • Anametrix • Analytics DB • CanPipe
✔ ✔ ✔
✖ ✖ ✖
✔ ✖ ✔
✔ ✖ ✖
✔ ✔ ✔
✔ ✔ ✔
✔ ✔ ✔
✖ ✖ ✖
YP Mobile Web • Anametrix • Analytics DB • CanPipe
✔ ✔ ✔
✖ ✖ ✖
✔ ✖ ✔
✔ ✖ ✖
✔ ✖ ✔
✔ ✖ ✔
✔ ✖ ✔
✖ ✖ ✖
YP iOS • Anametrix • Analytics DB • CanPipe
✔ ✔ ✔
✖ ✖ ✖
✔ ✖ ✔
✔ ✖ ✖
✔ ✖ ✔
✔ ✖ ✔
✔ ✖ ✔
✖ ✖ ✖
YP Android • Anametrix • Analytics DB • CanPipe
✔ ✔ ✔
✖ ✖ ✖
✔ ✖ ✔
✔ ✖ ✖
✔ ✖ ✔
✔ ✖ ✔
✔ ✖ ✔
✖ ✖ ✖
B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s
CAO Forum – West Coast
20
Current Analytics Ecosystem
B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s
CAO Forum – West Coast
21
Where do we stand?
• Deploying a user-friendly (and accessible) internal tools for ad-hoc reporting and analysis based on
Tableau & AtScale
• Keep a large history of instantly accessible raw data (the Big Data -- 25 months rolling)
• Create an Analytical Data Service Layer allowing applications such as Compass to get the SAME
data/information for both Consumer and Advertiser Analytics
• Integrating and replacing the current user behaviour tracking product … by …
• Document, streamline, and optimize the data loading processes (ETL) to facilitate rapid growth of data
• …
Stop the Bleeding
2012 Lay the Foundation
2013
Seeing the Changes
2014
Adding functionalities
2015
1 Digital Analytics DB
2016
B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s
CAO Forum – West Coast
Business Challenges: • Reliable Analytics (Enterprise KPI)
• Advertisers Self-Serve ROI Analytics (YP Analytics)
• COMPASS (Sales Cockpit)
• Campaign Management Automation (Optimization)
• Sale Recommendation Engine (Optimization)
• Providing extended Internal Analytics DB
• Providing proper environments for all type of Analytics
22
Digital Analytics DB – Current Data Architecture
• DAD Phase 1 objective: To capture users raw events
once and have appropriate data architecture allowing
to analyze from Advertisers, Consumers or Search
perspectives
• A new Campaign Dimension has been introduced,
which will improve the conformity when existing fact
tables point to it
• RAM is reclustered every day to guarantee good
response time
• …
B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s
CAO Forum – West Coast
23
YP Analytics – Much better now
• But still does not include entire YP products portfolio
• 2015
• Supports SEM TP and SEO
• Ad-hoc capability provided on Hadoop via Tableau & AtScale
• 2014
• YPA is uncoupled, now calling data services
• Adequate/on-boarding registration process
• UI look and feel has been completely redesigned according to
requirements
• Uses Market Data
• Incorporates Revenue Calculator functionality
• Includes traffic from Merchant website
• 2013
• Deployed a Commercial Off-The-Shelf BI tool (Webfocus)
• Replaced unreliable Print Capture with Webfocus reports
• Now with 25 months of data instead of 13
Won Big Data & Analytics Octas in 2015 for the
2014 YPA delivery!
B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s
CAO Forum – West Coast
24
Digital Data Collection Status
Data Collection
Analytics
Ad Servers
Retargeting
Conversion Testing
DMP
Automation
Highly critical function that aims to:
• Collect the digital data to feed analytics platforms
• Manage the deployment of the YP digital marketing
ecosystem
Main projects since 2012:
Standardized the data collection methodologies across the
YP digital platforms
Unified data layer to prevent the data fragmentation
Introduced a data governance process and developed a
data monitoring dashboard to improve the quality and the
consistency of the data
Introduced a new Server Side technology to reduce the
impact of collecting data in the YP digital platforms as well
as building a technological foundation to break the
analytics silos
2016 challenges:
• …
B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s
CAO Forum – West Coast
25
ETL (Extract, Transform & Load) Status
• All ETL inherited from outsourcer are written in Java
• New ETL are being done in Java, Pig script, and SQL
• Will start using existing SSIS tool for non massive ETL jobs (SSIS running natively on Hadoop is not available)
• SPARK streaming is being tested
• Investigating if we should use an ETL tool that could execute directly on the Hadoop cluster (tools such as Pentaho, Iway Big Data Integrator or Informatica)
• New sources are stored raw (data lakes) on Hadoop, in Avro format, for allowing immediate user access (in this mode, the data is not certified)
B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s
CAO Forum – West Coast
26
Hadoop Clusters Status
• Lab, Dev, QA, and Production clusters have been deployed (no pre-prod. cluster)
• Dev and QA clusters need to be upgraded (in progress) and we are deploying Openstack for spawning Hadoop clusters
• Cloudera EDH (Enterprise Data Hub) licenses on QA and Production clusters to support Advanced Analytics (on Spark) and descriptive and diagnostic analytics using Impala, AtScale and rest of Hadoop stack
• Mediative will be moving away from Databricks when Cloudera Notebook becomes available
• Mediative migration from Big Query to Hadoop cluster is in progress
• Currently, the Hadoop Production Cluster is used for running ETL jobs and hosting the DAD DB (currently a replication of the CAA DB) with accessibility via AtScale and Tableau
• DAD project phase 1 (in progress) will host consumer analytics data on Hadoop which will make Hadoop our de facto Analytics Platform
• Real-time acquisition and processing (in progress)
• ML on Hadoop to start when Cloudera notebook is deployed
B i g D a t a & A n a l y t i c s a t Y e l l o w P a g e s
CAO Forum – West Coast
Big Data & Analytics
Strategy & Architecture at YP
28
YP Digital Analytics Tomorrow
Single Source of Truth for
• Consumers | Searches | Advertising
Complete Source of Truth
• Assets based | Rich Context | Granular | All
products/channels are collected | Veracity / Governed
Efficient Data Collection
• One Time Collection | No dependency on analytics solution
supplier
Data Accessibility
• Volume | Variety | Good response time
Real Time
• Appropriate Velocity
B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P
CAO Forum – West Coast
29
Breaking Down Analytics Silos
Digital
Data
Consumer Analytics • Total Digital Population (Direct & Owned, Paid, SEO)
• Engagement (incl. Profiling, Growth Hacker)
• Interactions (leads)
• Behaviour
• Social Media & Personalisation
Advertiser Analytics • Advertiser ROI (incl. Lead box)
• Life-cycle
• Market Potential
• Campaign Management (incl. MPN / YP Lead) / Placement
Performance
• Data Monetization
Search Relevancy • Content
• Search Experience
• Consumer Profile Based Search
Business Intelligence • Sales
• Financial
• Corporate Performance
• Segmentation & Pricing
B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P
CAO Forum – West Coast
30
All Analytics Data on Hadoop! BENEFITS:
• Enables a complete view of digital performance from consumer behaviour to
advertiser ROI
• Reduce Data Manipulations and allow better and faster analytics
• Data can be correlated without gymnastics
• Increase Data Quality and Data Governance (1 version of the truth)
• Data easily accessible through visualization tools such as Tableau
• Data Science and Machine learning using same data!
• Since existing Operational Analytics platform perform well, lets keep them
to free up Hadoop resources
Operational
Analytics
CAA Services
Infobright DB
YP Analytics
COMPASS
CURRENTLY: - Most Data Sources go directly to Infobright DB
- In order to populate Hadoop, we are, since 2015,
replicating data from Infobright to Hadoop
…
Digital Analytics
(Hadoop)
ETL jobs
(Map/Reduce)
Self-Serve
OLAP
Future (2016)
Discovery Data Mining,
ML (Spark)
Resources Management (YARN)
DATA (HDFS)
Future
Replicated data
Currently
Replication
Clustered data
CURRENTLY: - Only Big Data Analytics sources go thru Hadoop
FUTURE: - All Data Analytics sources to go thru Hadoop
…
Future (2016)
Operational
Analytics
, Excel, etc.
MPN
Search &
Consumer
relevance
Programming languages
(Scala, Python)
Data scientists
Be Bold: ‘We encourage everyone to make bold decisions, even
if that means being wrong some of the time’ B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P
CAO Forum – West Coast
31
All Digital Analytics Data on Hadoop post DAD
CAO Forum – West Coast
B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P
32
Application…
Application 2
Unified and Governed Digital Data Collection
• 1 single TMS for all YP
• TMS needs to be ‘Analytics solution neutral’
• Changing TMS is costly (even with Standard Data Layer)
• Need to retag
• It takes 6 months for mobile users to transfer
• It is acceptable, even recommended, to keep a hybrid environment
where analytics based applications coexist with the Digital Analytics
Database (DAD). Benefits:
• Allows selection of best of breed solutions
• Speedier time to market
• Does not prevent future phase-out of solution when a DAD based solution
becomes available
• Real-time correlation of the client-side data and the server-side data
is an enabler (when it makes sense) to break analytics silos. Benefits:
• Cost and maintenance efficiency
• Significant improvement of the analytics capabilities
• 1 uniquely identified event
Back-End Data Layer
YP API
Front-End Data Layer
Data Hub
Global Digital Analytics DB
Application 1
B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P
CAO Forum – West Coast
33
Real-Time Data Acquisition B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P
CAO Forum – West Coast
34
We keep the Internal / External metrics divide
• Analytics and BI are similar in nature, but here are
some differences: • Analytics is more closely associated with Big Data since
the nature of data captured is huge (Facebook, tweeter
feed, logs, etc.)
• Analytics is closely related with Advanced Analytics aka
Machine Learning / Deep Machine Learning (AI), etc.
• Analytics is closely associated with MPP technology like
Hadoop
• At YP: • BI Refers to internal metrics and the BI group uses the
Microsoft stack
• While Digital Analytics refers to external metrics, and the
Big Data & Analytics group uses Hadoop based
technologies & components with additional visualization
tools
• Please note that Internal Data could be moved inside
the Digital Analytics DB (DAD) if required for
performance/costs/requirement reasons
Operational Analytics
Business Intelligence
(BI)
Business Activity
Monitoring (BAM)
Digital Analytics
External Metrics Internal Metrics
Data
Governance
Long term persistence
Short term persistence
Integration
into
B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P
CAO Forum – West Coast
35
Moving toward Bi-Modal Big Data Analytics
Characteristics: • Reliable
• Robust
• Monitored
• Single version of the Truth
• Slower to Implement
• Faster to query
• IT Centric
• Plan-driven
• Internal Platform
(but does not have to)
Characteristics: • Flexibility
• Multiple Prototypes
• Faster to Implement
• Slower to query
• Business Centric
• Empirical, iterative
• Internal (the Industrial Analytic)
& Multiple External platforms
B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P
CAO Forum – West Coast
36
Bi-Modal: Industrial Analytics & Discovery/Data Lakes
Discovery/Data Lakes
Need to explore new data sources
to support new initiatives
• It should be exception based and not the
norm
• The new data sources to explore should be
used in conjunction with industrialized data
(blend)
Industrial Analytics
DATA (HDFS, Parquet)
DB definition
(Hive)
SQL Engine
(Impala)
Metadata/OLAP + Aggregate mgt.
(AtScale)
SQL/MDX dialects + ODBC, JDBC or
OLE/DB protocols (AtScale)
…
ETL Jobs
Hadoop (Cloudera)
Common semantic layer +
aggregate management
(adaptive cache)
• Shifting ETL transformations to metadata layer to reduce ETL delivery time
• Adding Hadoop based ETL tool
Light ETL (data
lakes)
…
New Data Sources
B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P
CAO Forum – West Coast
37
The Power of ‘Behind the Scenes’ Aggregates
Query performance was tested in the YP’s Production Hadoop environment
• AtScale allocated at 30% of Hadoop cluster capacity
• Queries written and executed manually through HUE
• Same queries executed via Tableau with no AtScale aggregates
• Same queries executed again (clean Tableau cache). AtScale engine hits the aggregate table
B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P
CAO Forum – West Coast
38
Yellow Pages Tableau Deployment
Tableau Creators: 5%
They create Dashboards & Data
Sandboxes for their Consumers
Tableau
Advanced
Consumers:
15%
They can use Dashboards & Data
Sandboxes
Tableau
Consumers:
80%
They can only use Dashboards
B i g D a t a & A n a l y t i c s S t r a t e g y & A r c h i t e c t u r e a t Y P
CAO Forum – West Coast
Big Data & Analytics
Services Offering
40
Big Data & Analytics Services Offering
Strategy, Planning and Architecture Services
Services Offering Benefits Modes
Big Data Analytics
roadmaps
In a context of going faster, new silos will be created and will need
to be industrialized back in the Hadoop Data Warehouse (HDW)
1
Data Architecture New data sources will require Data Architecture services to maintain integration and correlation of all digital events in HDW
1
Semantic Data
Layer
To support self-serve analysis via OLAP 1,2
Hadoop Capacity Planning
To plan growth of Hadoop clusters based on planned projects and
actual usage
1
Image from: library.gatech.edu
B i g D a t a & A n a l y t i c s S e r v i c e s O f f e r i n g
CAO Forum – West Coast
41
Big Data & Analytics Services Offering
Development Services
Services Offering Benefits Modes
Data Collection Allowing optimized (low impacts on digital properties) collection of digital
events (standardized and contextually rich)
1, 2
Robust ETL development (error management, monitoring, scheduling, audit & control)
To maintain data freshness with proper SLA 1
Development of operational analytics application
To build operational platforms such as YP Analytics 1
Development of data services
Provide data services used by applications such as Compass and YP
Analytics to ensure data consistency
1
Development of analytics/BI reports and dashboards
For complex reports requiring more advance expertise and production of
corporate dashboards
1,2
Image from: forum.khonkaenlink.info
B i g D a t a & A n a l y t i c s S e r v i c e s O f f e r i n g
CAO Forum – West Coast
42
Big Data & Analytics Services Offering
Enablement & Data Services
Services Offering Benefits Modes
Aggregates management and query rewrite capability
To shield analysts & data scientists from cumbersome data manipulation work which could represent up to 90% of their time
1,2
Provide profiling/discovery/light ETL tools
To use raw data sources (data lakes) 1, 2
Data sciences For business units without data scientists 1, 2
Data Quality & Governance To ensure right level of data quality 1
Image from: www-03.ibm.com
B i g D a t a & A n a l y t i c s S e r v i c e s O f f e r i n g
CAO Forum – West Coast
Assessing needs and requirements
and learning from others
44
On Requirements A s s e s s i n g n e e d s a n d r e q u i r e m e n t s a n d l e a r n i n g f r o m o t h e r s
CAO Forum – West Coast
• Meet all stakeholders (operations, managers, executives)
• Have them active in project vs keep them in the loop (product owner, sprint demo, project updates, etc)
• Establish a common vocabulary/taxonomy • Use Conceptual Data Model
• Decouple ‘final use’ which is what is being requested from ‘atomic data’
• Establish a scalable MVP (Minimum Viable Product)
• Maintain the BRD (what is expected) as Solution is being designed
45
On Learning from Others A s s e s s i n g n e e d s a n d r e q u i r e m e n t s a n d l e a r n i n g f r o m o t h e r s
CAO Forum – West Coast
• Define your Service Offerings
• Ask other teams to demonstrate what they have built and how they did it
• Understand the gaps from your Service Offerings to what your customers need • Providing the data and processing capabilities was not sufficient for a group of Data Scientists
• Revalidate your architecture and roadmaps based on new information
• Be careful with acquired reflexes, you might have to relearn!
• Before ignoring a new approach/method come back to it in a few months, you might have a new context which is now favorable to it
Thank You!