strata conference nyc 2013

Post on 26-Jan-2015

113 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Taewook Eom Data Infrastructure Team SK planet 2014-01-28

Taewook Eom

http://www.flickr.com/photos/oreillyconf/10616622085/

Data Programmer Plaster(Planet Master) of Big Data Infra Pre-Assessor of Hiring Programmers Mentor of 101 Startup Korea

Twitter: @taewooke LinkedIn: http://kr.linkedin.com/in/taewookeom

http://strataconf.com/

by O’Reilly

Web 2.0 : Open, Sharing, Participation

Santa Clara : Technical

New York with Cloudera : Financial, Business

Europe : Privacy, Government

Boston : Medical

Big Data : Making Data Work Change the World with Data.

Data

When hardware became commoditized, software was valuable. Now software being commoditized, data is valuable.

– Tim O’Reilly, 2011

Data is like the blood of the enterprise.

– Amr Awadallah, CTO at Cloudera, 2013

Big Data Architectural Patterns http://strataconf.com/stratany2013/public/schedule/detail/30397

What is Big Data?

All data that is not a fit for a traditional RDBMS, whether used for OLTP or Analytics purposes

http://blog.vitria.com/Portals/47881/images/3values-resized-600.png

Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data - Gartner, 2011

http://im

age-s

tore

.slid

esh

are

cdn.com

/ae63030a-3

d9b-1

1e3-9

cff-

22000a970267-o

rigin

al.j

pg

Defining your Big Data Arsenal: NoSQL, Hadoop, and RDBMS http://strataconf.com/stratany2013/public/schedule/detail/29968

Data Science

http://en.wikipedia.org/wiki/File:DataScienceDisciplines.png http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Big Data

http://mappingignorance.org/fx/media/2013/07/Figura-11.jpg

Open Mind!

Big Data

Gartner's 2013 Hype Cycle for Emerging Technologies (2013-08-19)

more than half of technical sessions are presented by Chinese or Indian

39 of 125 sessions are sponsored sessions

Big Data: 4 Approaches

Search-based Hadoop-based

RDB-based NoSQL

Real-time Processing

Real-time Recommendations for Retail: Architecture, Algorithms, and Design http://strataconf.com/stratany2013/public/schedule/detail/30217

Real-time Stream Processing

Apache Storm

Streaming

Apache Kafka Gathering

Processing

Querying Search-based

NoSQL

Stringer/Tez Shark SQL

… not yet Graph Processing

Big Data Space

No one tools is the right fit for all Big Data problem Do not be afraid to recommend the right solution for the problem over the popular solution To do this, you must be aware of the entire ecosystem

Big Data Architectural Patterns http://strataconf.com/stratany2013/public/schedule/detail/30397

Practical Performance Analysis and Tuning for Cloudera Impala http://strataconf.com/stratany2013/public/schedule/detail/30551

Hadoop and the Relational Data Warehouse – When to Use Which? http://strataconf.com/stratany2013/public/schedule/detail/30964

Defining your Big Data Arsenal: NoSQL, Hadoop, and RDBMS http://strataconf.com/stratany2013/public/schedule/detail/29968

Ignite

Signal Detection Theory: Man vs Machine

Co-Founder @VividCortex Kyle Redinger

http://www.youtube.com/watch?v=Fg6mN-jevds

(5 minutes 6 seconds)

http://www.slideshare.net/realkyleredinger/man-vs-machine-signal-detection-theory-and-big-data

Signal Detection Theory: Man vs Machine

Remove the obvious and look at what is important Remember: Less is more.

Towards Strata 2014

Director of market research at O’Reilly Media Roger Magoulas

http://www.youtube.com/watch?v=Ytd5VkEgQf8

(5 minutes 26 seconds)

http://strataconf.com/stratany2013/public/schedule/detail/31935

Keynote

http://www.oreilly.com/data/free/files/stratasurvey.pdf

Towards Strata 2014

Towards Strata 2014

Towards Strata 2014

Towards Strata 2014

Beyond R and Ph.D.s: The Mythology of Data Science Debunked Douglas Merrill (ZestFinance)

http://www.youtube.com/watch?v=J2sgObXbIWY (8 minutes 9 seconds)

Science is fundamentally about data, but data is not fundamentally about science

People

A data scientist is a data analyst who lives in California. – George Roumeliotis, (Intuit)

Scientists think they can code, software engineers think they are scientists. Team them up so they collaborate.

– Scott Sorenson (Ancestry.com) Ancestry.com: Managing Big Data Reaching Back to the 11th Century with Hadoop

How Nordstrom Utilizes Human Intelligence to Blend Brick-and-Mortar with Online Commerce http://strataconf.com/stratany2013/public/schedule/detail/30707

Data scientists spend their lives as data janitors instead of leveraging their skills

– Wes McKinney (DataPad) Building More Productive Data Science and Analytics Workflows

Keynote

Is Bigger Really Better? Predictive Analytics

with Fine-grained Behavior Data

Professor at the NYU Stern School of Business Foster Provost

http://www.youtube.com/watch?v=1jzMiAfLH2c

(10 minutes 16 seconds)

http://strataconf.com/stratany2013/public/schedule/detail/31685

Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data

Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data

Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data

Predictive does not mean actionable. – Scott Sorenson (Ancestry.com)

Ancestry.com: Managing Big Data Reaching Back to the 11th Century with Hadoop

Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data

More data gives you more precision, not more prediction. Using multiple datasets to reduce errors when measuring values.

- Ravi Iyer (Ranker.com) Using Graphs of Data to Understand your Customers, Users, and Employees

Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data

Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data

Big Impact from Big Data

Head of Analytics at Facebook Ken Rudin

http://www.youtube.com/watch?v=RJFwsZwTBgg

(11 minutes 57 seconds)

http://strataconf.com/stratany2013/public/schedule/detail/31903

Keynote

Big Impact from Big Data

Designing Your Data-Centric Organization Josh Klahr (Pivotal)

http://www.youtube.com/watch?v=D86udfrVzrI (12 minutes)

Hadoop is a hammer, but you need other tools along with it.

Big Impact from Big Data

The way you organize information depends on the question you intend to ask of it.

- Richard Saul Wurman Building a Data Platform

HaDump : Loading data into Hadoop for not reason.

Data Science Without a Scientist http://strataconf.com/stratany2013/public/schedule/detail/31801

Big Impact from Big Data

Technical people still don't understand the business needs of business people! Business people don't know what's a table.

- Anurag Tandon (MicroStrategy) Inject Big Data into your Corporate DNA: Enable Every Employee to Make Data Driven Decisions

Ask the Right Questions Organizations already have people who know their own data better than mystical data scientists. Learning Hadoop is easier than learning the company’s business.

- Gartner, 2012

Defining your Big Data Arsenal: NoSQL, Hadoop, and RDBMS http://strataconf.com/stratany2013/public/schedule/detail/29968

Non-linear Storytelling: Towards New Methods and Aesthetics for Data Narrative http://strataconf.com/stratany2013/public/schedule/detail/30207

Every Soldier is a Sensor: Countering Corruption in Afghanistan http://strataconf.com/stratany2013/public/schedule/detail/30828

Big Impact from Big Data

Big Impact from Big Data

Big Impact from Big Data

< Actionable Usable < Useful

with Impact If you can't answer for "so what?", you only have facts, not insight

- Baron Schwartz (VividCortex Inc) Making Big Data Small

Descriptive (Easy) What happened?

Predictive (Medium) What will happen?

Prescriptive (Hard) What should we do about it? Hadoop & Data Science for the Enterprise

Value of Data

Big Data is first industry that was created by open source.

- Jack Norris (MapR Technologies) Separating Hadoop Myths from Reality

The Future of Hadoop : What Happened

& What's Possible?

Co-Founder of Hadoop Doug Cutting

http://www.youtube.com/watch?v=_WwuZI6AhN8

(14 minutes 41 seconds) http://strataconf.com/stratany2013/public/

schedule/detail/31591 Hadoop the kernel of the OS for data.

Hadoop's Impact on the Future of Data Management Mike Olson (Cloudera)

http://www.youtube.com/watch?v=puHS2JNKgRM http://strataconf.com/stratany2013/public/schedule/detail/31380

Single : S/W & H/W system : security model : management model : metadata model : audit model : resource management model

Common : storage & schema

http://www.slideshare.net/cloudera/enterprise-data-hub-the-next-big-thing-in-big-data

Last generation of data management is not sufficient More copies, representations, transformations increase risk Index once and reuse across workloads, lifecycle NoSQL: indexing and updates for interactive apps Hadoop: staging, persistence, and analytics

Data Governance for Regulated Industries Using Hadoop http://strataconf.com/stratany2013/public/schedule/detail/30738

Rethink How You See Data Sharmila Shahani-Mulligan (ClearStory Data)

http://www.youtube.com/watch?v=07hGulTOZGk (9 minutes 6 seconds) http://strataconf.com/stratany2013/public/schedule/detail/31742

Data Intelligence

?

Question Analysis & Discovery

Access Sampling Modeling Presentation

The Data Availability Problem

Insight

Data Prep – too slow!

Loading

Introducing a New Way to Interact with Insight http://strataconf.com/stratany2013/public/schedule/detail/31743

Information Supply Chain

Running Non-MapReduce Big Data applications on Apache Hadoop http://strataconf.com/stratany2013/public/schedule/detail/30755

What’s Next for Apache HBase: Multi-tenancy, Predictability, and Extensions. http://strataconf.com/stratany2013/public/schedule/detail/30857

Apache HBase for Architects http://strataconf.com/stratany2013/public/schedule/detail/30619

Securing the Apache Hadoop Ecosystem http://strataconf.com/stratany2013/public/schedule/detail/30302

An Introduction to the Berkeley Data Analytics Stack With Spark, Spark Streaming, Shark, Tachyon, and BlinkDB http://strataconf.com/stratany2013/public/schedule/detail/30959

Schema

Information does not exist until a schema is defined and data is stored in a relational database

- anonymous

Building a Data Platform http://strataconf.com/stratany2013/public/schedule/detail/31400

Lessons Learned From A Decade’s Worth of Big Data At The U.S. National Security Agency (NSA) http://strataconf.com/stratany2013/public/schedule/detail/30913

Managing a Rapidly Evolving Analytics Pipeline http://strataconf.com/stratany2013/public/schedule/detail/30635

SQL on/in Hadoop/Hbase Solutions

Stringer/Tez Shark

Perception is Key: Telescopes, Microscopes and Data http://strataconf.com/strataeu2013/public/schedule/detail/32351

All SQL on Hadoop Solutions are Missing the Point of Hadoop

Every Solution makes you define a schema - SQL(Structured Query Language) is expressed over an assumed schema

Major reasons why Hadoop has taken of include: - Ability to load data without defining a schema - Process data using schema-on-read instead of first defining a schema

Hadoop contains a lot of: - Raw, granular data sets with potentially inconsistent schemas - Data sets in JSON, key-value, and other self-describing (non-relational) models designed for schema-on-read processing

SQL on Hadoop solutions that make you first define a schema are missing a major part of Hadoop’s usage patterns

Flexible Schema and the End of ETL http://strataconf.com/stratany2013/public/schedule/detail/31868

Lessons Learned

Hadoop Adventures At Spotify http://strataconf.com/stratany2013/public/schedule/detail/30570

Hadoop Adventures At Spotify http://strataconf.com/stratany2013/public/schedule/detail/30570

Prototyping is key to overcoming resistance to change Technical architecture is heavily influenced by people organization Developing a team of experienced Hadoop users can often be done using internal employees A culture of experimentation and innovation yields the best result

Quick prototyping is the fastest way to internal advocacy. Ship It! Cloud == Speed We don’t always need a complicated solution. KISS Play to your differentiating strengths. Experience >> Data Bias towards impact. It Takes a Village EASE!! (Emulate, Analyze, Scale, Evaluate)

Ancestry.com: Managing Big Data Reaching Back to the 11th Century with Hadoop http://strataconf.com/stratany2013/public/schedule/detail/30499

How Nordstrom Utilizes Human Intelligence to Blend Brick-and-Mortar with Online Commerce http://strataconf.com/stratany2013/public/schedule/detail/30707

Questions? SELECT questions FROM audience;

top related