data regions: modernizing your company's data ecosystem

23
Copyright © 2015, SAS Institute Inc. All rights reserved. 1 Data Regions: Modernizing Your Company’s Data Ecosystem Evan Levy Vice President, Data Management Programs SAS EvanJayLevy

Upload: dataworks-summithadoop-summit

Post on 14-Apr-2017

263 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Data Regions: Modernizing your company's data ecosystem

C o p y r i g h t © 2 0 1 5 , S A S I n s t i t u t e I n c . A l l r i g h t s re s e rv e d .

1

Data Regions:Modernizing Your Company’s Data Ecosystem

Evan Levy Vice President, Data Management ProgramsSAS

EvanJayLevy

Page 2: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 2

A 20 Year Old Paradigm

The Change Data PerspectiveTraditional Assumption

All data originates from internal systems

The company runs on OLTP systems

Users have the BI/DW to address their reporting and analysis needs

Users require data from many sources (and the quantity is growing)

Business Operations rely on OLTP, Data, and Analytics

The Data Warehouse is the data source

Today’s Reality Most data is internal; >35% is external

Today’s Reality

We have multiple analytical systems: data mining, exploration, sandboxes, etc.

1339F9C1339F9C

Page 3: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 3

Data Challenges…“Why is all the data put into the warehouse? Only 3 people need to use the data”

“Can you tell me what data we purchased from outside vendors?”

“Why will it take you 30 days to load data? I can cut and paste it into my server in 4 minutes.”

“We have to standardize business terminology. We’ve learned that data governance is critical.”

“Why do I have to work around the ‘infrastructure’. Shouldn’t it be built for my needs?”

“You send me a file from SalesForce every month, and the layout changes every month. And you don’t tell me.”

“We have data all over (systems, the cloud, external apps, etc.) Why don’t we have a catalog of the sources?

“Finance wants all data reconciled. I can’t wait. Why do I have to suffer from their requirements?”

133A061

Page 4: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 4

Data Characteristics

Data

Access

Domain

Structure

Audience

Integrity

1337ADC

Page 5: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 5

Data Characteristics

Audience

The individual user (and their skills and data needs)

Reviewing data about a known situations

Report users

DW Developers

Uses ETL tools to retrieve and load data

Analytic Developers

Builds analytical models to manipulate known data

Data Scientists

Analyzes any available data to identify new trends

BI Developers

Building reports using structured data

Business Analyst

Analyzing data to for a new hypothesis

Develops code to navigate any available data source

Application Developers

1337ADC

Page 6: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 6

A business analyst running a report on

DBMS tables

Data Characteristics

Access

Custom code navigating a flat file (to retrieve specific

values)

Code call platform specific APIs for data

access

The methods, interfaces, and tools used to access the data

A cloud-application sending

transactions

SQL

An application listening / receiving

event streams

A data scientist playing with data in a

sandbox

x y z

Access

1337ADC

Page 7: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 7

Data Characteristics

StructureStructured Data Semi Structured Data

Unstructured Data

The structure and organization of the data content1337ADC

Page 8: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 8

EnterpriseBusiness Unit

Data Characteristics

Domain

Organization

Project

Individual

The business context for data usage1337ADC

Page 9: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 9

Data Characteristics

IntegrityClient John Smith

Username Oracleuser

RequestDate 9/28/2000

Request Time 23:59:07

Status Code OK

Browser Netscape

203.93.245.97 - oracleuser [28/Sep/2000:23:59:07 -0700] "GET /files/search/search.jsp?s=driver&a=10 HTTP/1.0" 200 2374 "http://datawarehouse. oracle.co/contents.htm" "Mozilla/4.7 [en] (WinNT; I)"

P;ECalibri;M220;SB;L10 P;ECalibri;M220;L11 P;ECalibri;M220;SI;L24 P;ECalibri;M220;SB;L9 P;ECalibri;M220;L10 P;ESegoe UI;M200;L9 P;ESegoe UI;M200;SB;L9 P;ECalibri;M180;L9 F;P0;DG0G8;M300 B;Y12;X5;D0 0 11 4 O;L;D;V0;K47;G100 0.001 F;M495;R1 F;SM24;Y1;X1 C;K"name" F;SM24;X2 C;K"Shares" F;SM24;X3 C;K"Quote/ Price" F;SM24;X4 C;K"cost/ share" F;SM24;X5 C;K"total cost" F;SM24;Y2;X1 C;K"aapl" F;P4;FF2G;SM24;X2 C;K1454.4024 F;SM24;X3 C;K126.85 F;SM24;X4 C;K79.006952 F;P4;FF2G;SM24;X5 C;K114907.9 F;SM24;Y3;X1 C;K"axp" F;P4;FF2G;SM24;X2 C;K1454.4108 F;SM24;X3 C;K79.27 F;SM24;X4 …

name Shares Quote/ Price cost/ share total costaapl 1,454.40 126.85 79.006952 114,907.90axp 1,454.41 79.27 84.671889 123,147.71bmy 3,666.51 63.95 43.25259 158,586.21brk.b 1,000 143.46 119.3527 119,352.70celg 1,000 116.44 102.47094 102,470.94chl 500 71.4 71.4179 35,708.95

The format, typing, and accuracy of the data 1337ADC

Page 10: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 10

The 5 Characteristics of Data

Data

Access

Domain

Structure

Audience

Integrity

1339F9C

Page 11: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 11

Challenging the Existing Data Paradigm

Support numerous new data sources

Establish a shared source staging area

Allow “trial & error” analysis for all users

Support Self Service Data (ETL, report, analysis, etc.)

Support different levels of data acceptance

1339F9C

Page 12: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 12

Data RegionsInternal

Applications

Sour

ce D

ata

Rep

osito

ry

Cloud Applications

DataStreamsFiles

Services

Inbound Data

Sour

ce

Onb

oard

ing

Sandbox

Reporting & BI

EnterpriseView

Data Exploration

Advanced Analytics & Modeling

Messages

133A061

Page 13: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 13

Data Regions

Addressing an Enterprise Data NeedInternal

Applications

Sour

ce D

ata

Rep

osito

ry

Cloud Applications

DataStreamsFiles

Services

Inbound Data

Sour

ce

Onb

oard

ing

Sandbox

Reporting & BI

EnterpriseView

Data Exploration

Advanced Analytics & Modeling

Messages

Create an environment that

fits user needs (not IT convenience)

Support data onboarding and distribution as a production need

Support a diverse set of data usage

needs

Address the complexities of data movement

Reduce resource/skill

overlap across the company

133A061

Page 14: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 14

Data Regions

Source Onboarding

Audience Source Onboarding developers only; receiving for Source Data repository

Access Supports multiple delivery methods: txns, messages, bulk formats.

Structure Data layout based on source system. Likely dynamic & volatile

Domain N/A. This detail is implicit with the data source and the supplier.

Integrity N/A. Data details are defined by the data supplier.

• Manages the delivery of data from internal & external sources • Holds data until acceptance is complete; Data is then moved

to the Source Data Repository • Centralized support for sophisticated data capture methods

(ESP, 3rd party data delivery, API/messaging, etc.) • Productionalizes source data capture, identification and

sharing

1339F9C

Page 15: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 15

Data Regions

Source Data Repository• Stores and retains all source data content; reduces enterprise

storage requirements • Establishes centralized registry of available data sources. • Reflects a defined data layout (independent of source

changes) • Alleviates developers’ need to learn data navigation, layout,

naming conventions on dozens of source systems

Audience Data Integration (Developers – DW, Application, Data Scientists, etc. )

Access Usually file oriented (transaction and other access based on situation)

Structure Company-centric, documented layout; Incl structured & unstructured

Domain N/A. Data reflects source

Integrity Company-centric format; Data quality and accuracy not addressed. 1339F9C

Page 16: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 16

Data Regions

Data Exploration• Supports one-off, in depth business analysis using any data

─ Environment is permanent but resource usage is very transient─ Does not support production application access or deployment

• Often a general purpose platform that can support numerous technologies (Big Data, files, RDBMS, advanced analytics, etc.)

• A walled-off, protected data scientist-centric environment

Audience Data Scientists & Analytics Developers (unable to be supported by sandbox)

Access All access methods due to the “from scratch” nature of environment

Structure All data layouts. (Unstructured likely due to focus on new concept development)

Domain Typically enterprise or line of business level

Integrity Data transformed/standardized to streamline exploration efforts (often ignored for new or unknown data sources)

1339F9C

Page 17: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 17

Data Regions

Enterprise View• Contains multiple integrated subject areas (w/ long-term history) • Content reflects enterprise trusted (and corrected) data• Includes metadata (terms, definitions, lineage, etc.) • Supports query processing and data provisioning

─ Online end-user queries and reporting ─ Data provisioning to analytical and transactional systems─ Content continually updated (where possible)

Audience All user. Most access will occur via query tools or data manipulation/ETL tools

Access Usually query-based access (w/existing tools). Unstructured requires APIs

Structure Data is usually structured. (unstructured requires special tools/extensions

Domain Enterprise level. Other domains may use content for provisioning purposes

Integrity Reflective of enterprise terminology and value standards1339F9C

Page 18: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 18

Data Regions

Sandbox• Allowing users to extend their analysis with custom data

─ Supports structured data and queries using existing tools/technologies ─ Focused on supporting additional (external) data

• Environment is temporary; does not support production─ Walled-off environment; reports or data not distributable

• Allows for business-level data discovery and exploration─ Supports one-off user data needs

Audience Advanced business users. Requites dbms query and data integration skills

Access Data is accessible via SQL/table environment.

Structure Data content is structured and RDBMS oriented (goal is data variety)

Domain Any/All domains (enterprise to individual)

Integrity Enterprise data is standardized/corrected. Other data must be addressed by user1337ADC1339F9C

Page 19: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 19

Data Regions

Reporting and Business Intelligence

• Supports defined reporting and ad hoc analysis (departmental data marts)

• Supports an application- or tool-centric view of data─ Simplifies tool access and data manipulation, or─ Reflects unique business (organization) view of data details

• Requires additional technical staff resources ─ ETL processing for additional sources, aggregates, hierarchies, etc. ─ Query and usage support for non-enterprise data

Audience Business users focused on using standard reports and content

Access Usually SQL-based access. Some data may be tool-centric (e.g. OLAP cubes)

Structure Usually structured data and reflecting rows of columns

Domain Likely to use enterprise data. Additional data may reflect different structure or domain as needed.

Integrity Enterprise data is standardized/corrected. Other data must be addressed by user1337ADC1339F9C

Page 20: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 20

Data Regions

Advanced Analytics & Modeling

• A processing environment that can support advanced analytics─ Typically general purpose processing platforms with inexpensive directly

attached storage ─ Data is structured and often stored in highly denormalized structures─ usually driven by a specialized tool or language

• Typically small, high-value user audience • Production-supported environment. Data & Results are distributed

Audience Highly skilled technical staff (data scientists, developers with advanced analysis skills)

Access Data accessed via specialized tools using standard and custom access methods.

Structure Data is usually structured; May process unstructured data into structured content

Domain Typically enterprise-level data. Business drivers are often specific to organization

Integrity Data is often cleansed and standardized1339F9C

Page 21: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 21

Data Services

Sour

ce D

ata

Rep

osito

ry

Sour

ce

Onb

oard

ing

Sandbox

Reporting & BI

EnterpriseView

Data Exploration

Advanced Analytics & Modeling

Data Transformation

Data Quality Data Governance

Metadata

1339F9C

Page 22: Data Regions: Modernizing your company's data ecosystem

Copyr ight © 2016, SAS Ins t i tute Inc. Al l r ights reserved . 22

Getting Started, Moving Forward…• Evaluate the diversity of audiences and domains

− Understand the unique combinations – those dictate the complexity of your environment

− Review the external data that is already in use

• Extend your environment one region at a time− Focus on adding (or remediating) regions based on business need

• Sharing data is not a courtesy – it’s a production need − Data provisioning and integration is a costly activity; it should be addressed

with “economies-of-scale” methods − Establishing repositories (with card catalogs) to provide “raw” and

“approved” data is a necessity

13378871339F9C

Page 23: Data Regions: Modernizing your company's data ecosystem

Copyr igh t © 2016, SAS Ins t i tute Inc . A l l r i gh ts r es erved .

THANKS!

www.EvanJLevy.com@EvanJayLevy

[email protected]