big data for one big family

Post on 22-Jun-2015

518 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation by Matt Asay (MongoDB) at the FamilySearch Developer Conference (2014), talking about how big data applies to family history.

TRANSCRIPT

MongoDB Inc. Proprietary and Confidential

Big Data for One Big Family

VP, Community, MongoDB Matt Asay

2

What Genealogy Was: Neat and Tidy Data

3

Genealogy = Family Stories

4

Stories Aren’t Told in Spreadsheets

5

They’re Increasingly Told Like This

6

Modern, “Big” Data Is Messy

7

Data Now Looks Like This

8

It Looks Like People

The Big Data Unknown

10

Who’s Embracing Big Data?

Source: Gartner

11

Top Big Data Challenges?

Translation? Most struggle to know what Big Data is, how to manage it and who can manage it

Source: Gartner

12

•  More than 90% of today’s data was created in the last 2 years

•  Moore’s Law for data: Doubles at regular intervals

Big Data: Volume Matters

13

Big(ger) Is the New Normal

14

Volume Is Not Really the Problem

“Of Gartner's "3Vs" of big data (volume, velocity, variety), the variety of data sources is seen by our clients as both the greatest challenge and the greatest opportunity.”

- Forrester, 2014

* From Big Data Executive Summary of 50+ execs from F100, gov orgs

What are the primary data issues driving you to consider Big Data?*

Data Variety (68%)

Data Volume (15%)

Other Data (17%)

Diverse, streaming or new data types

Greater than 100TB

Less than 100TB

15

Compounding the Confusion

16

We Hire for Machines but…

Source: Kdnuggets 2014

17

Time to Rethink the Solution

18

NoSQL Born for Unstructured Data

18

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

Log

data

Free

-form

text

Web

or m

obile

co

nten

t

Soc

ial m

edia

dat

a

Geo

spat

ial d

ata

Tran

sact

ions

Mob

ile d

evic

e da

ta

Web

ses

sion

s or

ca

chin

g da

ta

Sen

sor d

ata

Em

ail/d

ocum

ents

Mac

hine

dat

a

Imag

es

Vide

o

Aud

io

NoSQL Data Types (multiples allowed)

Source: Gartner, 2014

Innovation As Iteration

“I have not failed. I've just found 10,000 ways that won't work.” ― Thomas A. Edison

21

Back in 1970…Cars Were Great!

22

So Were Computers!

23

Including the Relational Database

24

Lots of Great Innovations Since 1970

25

Legacy Data Infrastructure Makes Development Hard

Relational Database

Object Relational Mapping Application

Code XML Config DB Schema

26

And Even Harder To Iterate

New Table

New Table

New Column

Name Pet Phone Email

New Column

3 months later…

27

Scale and Flexibility Drive Choices

27

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

Scalability Schema flexibility Ease of development

Cost Availability of cloud deployment options

What motivated you to use a NoSQL database over traditional alternatives? (multiples allowed)

Source: Gartner, 2014

28

RDBMS

NoSQL Drives Agility

MongoDB

{ _id : ObjectId("4c4ba5e5e8aabf3"),

employee_name: "Dunham, Justin", department : "Marketing",

title : "Product Manager, Web", report_up: "Neray, Graham",

pay_band: “C", benefits : [

{ type : "Health", plan : "PPO Plus" },

{ type : "Dental", plan : "Standard" }

] }

29

Optimize for (Developer) Iteration

1985 2013

Infrastructure Cost

Engineer Cost

30

So…Use Open Source

31

Big Data != Big Upfront Payment

32

Shouldn’t Be Penalized for Success

“Clients can also opt to run zEC12 without a raised datacenter floor -- a first for high-end IBM mainframes.”

IBM Press Release 28 Aug, 2012

33

Cloud Fosters Experimentation

Those that go out and buy expensive infrastructure find that the problem scope and domain shift really quickly. By the time they get around to answering the original question, the business has moved on. You need an environment that is flexible and allows you to quickly respond to changing big data requirements. Your resource mix is continually evolving - if you buy infrastructure it's almost immediately irrelevant to your business because it's frozen in time. It's solving a problem you may not have or care about any more.

- Matt Wood, GM of Data Science, Amazon Web Services

34

NoDoop: Not Only Hadoop

Source: Silicon Angle, 2012

35

The Data Scientist Is You

“Organizations already have people who know their own data better than mystical data scientists….Learning Hadoop is easier than learning the company’s business.”

(Gartner, 2012)

@mjasay

top related