what is big data? - o'reilly mediaassets.en.oreilly.com/1/event/70/opening remarks_ the harsh...

Post on 21-Jul-2018

223 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

What is big data?

Big Data’s a nebulous term, like cloud computing. It’s not really clear what it means.

A series of emerging technologies to create, manipulate, and manage “very large data sets.”

Dan Kusnetzky, What is “Big Data?,” ZDNet (Feb. 16, 2010), http://www.zdnet.com/blog/virtualization/what-is-big-data/1708.

Dan Kusnetzky described it as the tools to manage very large amounts of information.

A movement to bring large-scale data analysis capabilities to the public by providing access to existing data sets, along with the ability to use this data in exciting new ways.

http://www.data.gov/.

Data.gov has a more utopian definition.

Datasets that grow so large that they become awkward to work with using on-hand database management tools.

http://en.wikipedia.org/wiki/Big_data

And Wikipedia agrees on this idea of “new tools needed.”

My definition.

Here’s my working definition

Large amounts of information

Public and private

Easily linked and collected

Stored just because we can

Analyzedby algorithms

In near real time

Applied to business

Usable by everyone

Fed back into the system

Sure, big data is about large amounts of information. But increasingly, that’s both public and private: enterprise data warehouses connected to maps, social networks, government data, and so on. In fact, it’s because this data has become easily collected (through sensor networks, people, and a computerized society) and can be linked (by someone’s email, or a barcode, etc.) that this much data exists.Then, it’s about storing stuff just in case—because storage is free. It’s about letting machines chew on it to find hidden patterns, and producing results in or near real time.Finally, it’s about using this stuff for business. Everyone’s a quant. We’re much more data-literate than we were. And finally, all these conclusions feed back into the system as a set of new data, or improvements to the collection tools, or better algorithms.

Why now?

Big Data’s a nebulous term, like cloud computing. It’s not really clear what it means.

0

27.5

55

82.5

110

2007 2008 2009 2010Revenue Widgets Things Stuff

In the past, when we collected information, we had a priori knowledge of how we’d use it. When we put information into databases, we knew what it was for. We knew we were collecting quarterly sales figures by store, by sales rep, and by product. Storage was expensive, data warehouses took time to manage, and what lived in our databases had structure.

From Elasticsoul on Flickr (http://www.flickr.com/photos/elasticsoul/19940431)

Big Data, on the other hand, is about unstructured information we collect on faith. We drink from the firehose. We don’t know how we’ll use it yet. We store it because we think it’ll be useful later. We have good reason to think so:

In 2020 a two-disk, 2.5” drive will store over 14 TB

and will cost $40.

Magnetic disk areal storage density doubles annually. This estimate assumes that hard drives continue to progress at their current pace; it was first reported in physorg according to http://en.wikipedia.org/wiki/Mark_Kryder

It’s going to be really cheap. Imagine an iPad that can do BI work. So the cost of storing and analyzing is so low, we often assume we may as well keep everything.

Why it’s an advantage

Big Data’s a nebulous term, like cloud computing. It’s not really clear what it means.

“Any sufficiently advanced technology is indistinguishable from magic.”Arthur C. Clarke, Profiles of The Future, 1961 (Clarke’s third law)

Advancement is in the eye of the beholder

For traditional, non-technical business,big datais magic.

Nobody is immune.

Big Data’s a nebulous term, like cloud computing. It’s not really clear what it means.

http://www.flickr.com/photos/saucysalad/3640865387

A quick visit to a music store

http://www.flickr.com/photos/uggboy/4158150814

a DVD rental outlet,

http://www.flickr.com/photos/maladjusted/5207565912

or a travel agent will confirm this—if you can still find one.

http://www.flickr.com/photos/bobjagendorf/5130753552

Companies that aren’t using data to transform themselves will soon be the walking dead, unable to anticipate their markets. The web’s household names got where they are today by mining the information that their users generate and turning it into business advantage.

Why isn’tBlockbuster Netflix?They had data on what people watch and where they live. They just didn’t thing the postal service was a substitute for retail outlets.

Big Data has already transformed many industries forever.

But it’s not just about new businesses.It’s about doing the boring things in new ways.

Big Data’s a nebulous term, like cloud computing. It’s not really clear what it means.

Fina

nce

& b

anki

ng“Q

uant

Pha

rmac

eutic

als

Gen

omic

sex

pert

Ene

rgy

Qua

ntita

tive

geol

ogis

t

Civ

ic p

lann

ing

Traf

fic p

atte

rn

anal

yst

Nat

iona

l def

ense

Sim

ulat

ions

op

erat

or

......

“Datascientist”

Many industries have had employees who worked with big data for decades. But until recently, they thought of themselves in terms of their industry—they didn’t realize they were part of a discipline that reached beyond the borders of their specific vertical.Today, we call these people “data scientists,” and the good ones are able to move between industries easily.

0K

125,000K

250,000K

375,000K

500,000K

Amazon Q410 Barnes & Noble Q410 Netflix Q409 Blockbuster Q409 Dropbox Q211 Groupon Q211

Revevnue/Employee (000s)

Revenue per capita, augmenting humans with data

0K

17,500,000K

35,000,000K

52,500,000K

70,000,000K

Amazon Barnes & Noble Netflix Blockbuster Dropbox Groupon

Market cap/employee August 2011

Who would you rather be?

Now, the leader is the one who knows what questions to ask

(Computers, Communications and the Public Interest, pages 40-41, Martin Greenberger, ed., The Johns Hopkins Press, 1971.)

“What information consumes is rather obvious: it consumes the attention of its recipients.Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.”

Now, the leader is the one who knows what questions to ask

What would an MBA look like in a data-filled world?

Big Data’s a nebulous term, like cloud computing. It’s not really clear what it means.

Once, a leader convinced others in the absence of data.

Once, a leader was someone who could convince people to do things in the absence of data.

Now, a leader knows what questions to ask.

Now, the leader is the one who knows what questions to ask

In an era of technology we'll be judging companies by their ability to augment people with technology.

Welcome to JumpStart.

top related