-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
1/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
2/137
Big Data Now
Beijing Cambridge Farnham Kln Sebastopol Tokyo
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
3/137
Big Data Now
Printing History:
mailto:[email protected]://my.safaribooksonline.com/?portal=oreillyhttp://my.safaribooksonline.com/?portal=oreilly -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
4/137
Table of Contents
F o rew o rd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. Data Science and Data Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Data Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
iii
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
5/137
3. The Application of Data: Products and Processes . . . . . . . . . . . . . . . . . . . . 75
4. The Business of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
iv | Table of Contents
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
6/137
Table of Contents | v
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
7/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
8/137
Foreword
vii
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
9/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
10/137
CHAPTER 1
Data Science and Data Tools
What is data science?
1
http://radar.oreilly.com/mikel/index.htmlhttp://oreilly.com/web2/archive/what-is-web-20.htmlhttp://www.nytimes.com/2009/08/06/technology/06stats.htmlhttp://radar.oreilly.com/mikel/index.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
11/137
What is data science?
2 | Chapter 1:Data Science and Data Tools
http://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-what-is-data-sciencehttp://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-what-is-data-sciencehttp://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-what-is-data-sciencehttp://en.wikipedia.org/wiki/CDDB -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
12/137
Flu trends
What is data science? | 3
http://www.linkedin.com/http://www.amazon.com/http://www.linkedin.com/http://www.facebook.com/http://www.google.org/flutrends/about/how.htmlhttp://gdgt.com/discuss/voice-recognition-is-amazing-ive-only-68e/http://en.wikipedia.org/wiki/PageRank -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
13/137
Where data comes from
4 | Chapter 1:Data Science and Data Tools
http://infochimps.org/http://www.factual.com/http://en.wikipedia.org/wiki/Nielsen_BookScanhttp://www.nytimes.com/2010/05/02/magazine/02self-measurement-t.html?ref=magazinehttp://oreilly.com/catalog/9780596804787http://www.factual.com/http://infochimps.org/http://en.wikipedia.org/wiki/Nielsen_BookScanhttp://www.nytimes.com/2010/05/02/magazine/02self-measurement-t.html?ref=magazinehttp://oreilly.com/catalog/9780596804787 -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
14/137
What is data science? | 5
http://news.cnet.com/2300-1010_3-6031405-6.htmlhttp://en.wikipedia.org/wiki/Motorola_68000 -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
15/137
1956 disk drive
6 | Chapter 1:Data Science and Data Tools
http://en.wikipedia.org/wiki/Data_scraping#Screen_scrapinghttp://www.almaden.ibm.com/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
16/137
What is data science? | 7
http://www.nltk.org/http://www.nltk.org/http://google.com/trendshttp://www.nas.nasa.gov/About/Education/Ozone/history.htmlhttp://www.nas.nasa.gov/About/Education/Ozone/history.htmlhttp://oreilly.com/perl/http://oreilly.com/python/http://oreilly.com/catalog/9780596000707http://oreilly.com/catalog/9780596000707http://www.nltk.org/http://www.nltk.org/http://google.com/trends?q=Pythonhttp://google.com/trends?q=Cassandrahttp://google.com/trendshttp://www.nas.nasa.gov/About/Education/Ozone/history.htmlhttp://www.nas.nasa.gov/About/Education/Ozone/history.htmlhttp://oreilly.com/python/http://oreilly.com/perl/http://oreilly.com/catalog/9780596000707http://www.crummy.com/software/BeautifulSoup/http://www.crummy.com/software/BeautifulSoup/http://oreilly.com/catalog/9780596804787%20id=hni2 -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
17/137
Working with data at scale
8 | Chapter 1:Data Science and Data Tools
http://twitter.com/hackingdatahttp://oreilly.com/catalog/9780596157128/%20id=aod4%20title=Data?Beautifulhttp://twitter.com/hackingdatahttps://www.mturk.com/mturk/welcome%20id=k3lahttps://www.mturk.com/mturk/welcome%20id=k3la -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
18/137
What is data science? | 9
http://aws.amazon.com/elasticmapreduce/http://developer.yahoo.net/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.htmlhttp://hadoop.apache.org/http://labs.google.com/papers/mapreduce.htmlhttp://hadoop.apache.org/hbase/http://www.riptano.com/http://labs.google.com/papers/bigtable.htmlhttp://www.allthingsdistributed.com/2007/10/amazons_dynamo.htmlhttp://aws.amazon.com/elasticmapreduce/http://www.cloudera.com/http://developer.yahoo.net/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.htmlhttp://developer.yahoo.net/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.htmlhttp://hadoop.apache.org/http://hadoop.apache.org/http://labs.google.com/papers/mapreduce.htmlhttp://www.cloudera.com/http://hadoop.apache.org/hbase/http://www.riptano.com/http://cassandra.apache.org/http://www.allthingsdistributed.com/2007/10/amazons_dynamo.htmlhttp://labs.google.com/papers/bigtable.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
19/137
10 | Chapter 1:Data Science and Data Tools
http://www.snaptell.com/http://www.google.com/mobile/goggles/http://bit.ly/http://twitter.com/hmasonhttp://twitter.com/http://search.twitter.com/http://code.google.com/p/hop/http://hadoop.apache.org/pig/http://www.stanford.edu/class/cs229/http://www.snaptell.com/http://www.google.com/mobile/goggles/http://bit.ly/http://twitter.com/hmasonhttp://twitter.com/http://twitter.com/http://search.twitter.com/http://code.google.com/p/hop/http://hadoop.apache.org/pig/http://hadoop.apache.org/hive/http://hadoop.apache.org/hdfs/http://oreilly.com/catalog/9780596521981 -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
20/137
What is data science? | 11
http://www.r-project.org/http://cran.r-project.org/http://oreilly.com/catalog/9780596801717/http://twitter.com/datasporahttp://www.dataspora.com/http://cran.r-project.org/http://www.r-project.org/http://www.r-project.org/http://oreilly.com/catalog/9780596801717/http://twitter.com/datasporahttp://twitter.com/datasporahttp://www.dataspora.com/https://www.mturk.com/mturk/welcome%20id=k3lahttp://opencv.willowgarage.com/wiki/http://code.google.com/apis/predict/http://lucene.apache.org/mahout/http://www.cs.waikato.ac.nz/ml/weka/http://elefant.developer.nicta.com.au/http://pybrain.org/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
21/137
Making data tell its story
Data scientists
12 | Chapter 1:Data Science and Data Tools
http://flowingdata.com/2010/04/07/watching-the-growth-of-walmart-now-with-100-more-sams-club/http://flowingdata.com/http://manyeyes.alphaworks.ibm.com/manyeyes/http://processing.org/http://www.gnuplot.info/http://twitter.com/wattenberghttp://flowingdata.com/2010/04/07/watching-the-growth-of-walmart-now-with-100-more-sams-club/http://flowingdata.com/http://manyeyes.alphaworks.ibm.com/manyeyes/http://processing.org/http://www.gnuplot.info/http://twitter.com/wattenberghttp://www.amazon.com/Visual-Display-Quantitative-Information-2nd/dp/0961392142/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
22/137
What is data science? | 13
http://oreilly.com/catalog/9780596157128/%20id=aod4%20title=Data?Beautifulhttp://oreilly.com/catalog/9780596157128/%20id=aod4%20title=Data?Beautifulhttp://www.midomi.com/http://twitter.com/dpatilhttp://www.linkedin.com/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
23/137
Hiring trends for data science
14 | Chapter 1:Data Science and Data Tools
http://radar.oreilly.com/research/http://radar.oreilly.com/research/http://radar.oreilly.com/research/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
24/137
What is data science? | 15
http://oreilly.com/catalog/9780596153946/http://oreilly.com/catalog/9780596527587/http://oreilly.com/catalog/0636920000617/http://oreilly.com/catalog/9780596157128/http://oreilly.com/catalog/9780596529321/http://oreilly.com/catalog/9780596802363/http://oreilly.com/catalog/9780596510497/http://oreilly.com/catalog/9780596801717/http://oreilly.com/catalog/9780596153946/http://oreilly.com/catalog/9780596527587/http://oreilly.com/catalog/0636920000617/http://oreilly.com/catalog/9780596157128/http://oreilly.com/catalog/9780596529321/http://oreilly.com/catalog/9780596802363/http://oreilly.com/catalog/9780596510497/http://oreilly.com/catalog/9780596801717/http://www.mckinseyquarterly.com/Hal_Varian_on_how_the_Web_challenges_managers_2286http://www.mckinseyquarterly.com/Hal_Varian_on_how_the_Web_challenges_managers_2286 -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
25/137
The SMAQ stack for big data
16 | Chapter 1:Data Science and Data Tools
http://radar.oreilly.com/edd/index.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
26/137
MapReduce
The SMAQ stack for big data | 17
http://labs.google.com/papers/mapreduce.htmlhttp://labs.google.com/papers/mapreduce.htmlhttp://labs.google.com/papers/mapreduce.htmlhttp://oreilly.com/web2/archive/what-is-web-20.htmlhttp://strataconf.com/http://en.wikipedia.org/wiki/LAMP_(software_bundle) -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
27/137
18 | Chapter 1:Data Science and Data Tools
http://en.wikipedia.org/wiki/MapReduce#Examplehttp://en.wikipedia.org/wiki/MapReduce#Examplehttp://en.wikipedia.org/wiki/MapReduce#Example -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
28/137
Hadoop MapReduce
public static class Mapextends Mapper {
private final static IntWritable one = new IntWritable(1);private Text word = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();StringTokenizer tokenizer = new StringTokenizer(line);while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken()); context.write(word, one);
}}
}
public static class Reduceextends Reducer {
public void reduce(Text key, Iterable values,Context context) throws IOException, InterruptedException {
int sum = 0; for (IntWritable val : values) { sum += val.get(); }
context.write(key, new IntWritable(sum));}}
The SMAQ stack for big data | 19
http://hadoop.apache.org/mapreduce/docs/current/http://hadoop.apache.org/#What+Is+Hadoop%3Fhttp://research.yahoo.com/files/cutting.pdf -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
29/137
Other implementations
Storage
20 | Chapter 1:Data Science and Data Tools
http://en.wikipedia.org/wiki/MapReduce#Implementationshttp://en.wikipedia.org/wiki/MapReduce#Implementations -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
30/137
Hadoop Distributed File System
HBase, the Hadoop Database
The SMAQ stack for big data | 21
http://labs.google.com/papers/bigtable.htmlhttp://labs.google.com/papers/bigtable.htmlhttp://hbase.apache.org/http://hbase.apache.org/http://labs.google.com/papers/bigtable.htmlhttp://hadoop.apache.org/hdfs/docs/current/hdfs_design.htmlhttp://hadoop.apache.org/hdfs/http://hadoop.apache.org/hdfs/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
31/137
Hive
Cassandra and Hypertable
22 | Chapter 1:Data Science and Data Tools
http://cassandra.apache.org/http://hypertable.org/http://www.zvents.com/http://hypertable.org/http://cassandra.apache.org/http://hadoop.apache.org/hive/http://incubator.apache.org/thrift/http://en.wikipedia.org/wiki/Representational_State_Transfer -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
32/137
NoSQL database implementations of MapReduce
The SMAQ stack for big data | 23
https://wiki.basho.com/display/RIAK/Riakhttp://www.mongodb.org/http://code.google.com/p/hypertable/wiki/HiveExtensionhttp://code.google.com/p/hypertable/wiki/HiveExtensionhttp://wiki.apache.org/cassandra/HadoopSupporthttps://wiki.basho.com/display/RIAK/MapReducehttps://wiki.basho.com/display/RIAK/Riakhttp://www.mongodb.org/display/DOCS/MapReducehttp://www.mongodb.org/http://couchdb.apache.org/http://code.google.com/p/hypertable/wiki/HiveExtensionhttp://wiki.apache.org/cassandra/HadoopSupporthttp://wiki.apache.org/cassandra/HadoopSupport -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
33/137
Integration with SQL databases
Integration with streaming data sources
Commercial SMAQ solutions
24 | Chapter 1:Data Science and Data Tools
http://github.com/facebook/scribehttp://archive.cloudera.com/cdh/3/flume-0.9.1+1/UserGuide.htmlhttp://github.com/cloudera/flumehttp://github.com/cwensel/cascading.jdbc/http://github.com/backtype/cascading-dbmigratehttp://www.cloudera.com/http://wiki.github.com/cloudera/sqoop/http://github.com/facebook/scribehttp://archive.cloudera.com/cdh/3/flume-0.9.1+1/UserGuide.htmlhttp://archive.cloudera.com/cdh/3/flume-0.9.1+1/UserGuide.htmlhttp://github.com/cloudera/flumehttp://github.com/backtype/cascading-dbmigratehttp://github.com/cwensel/cascading.jdbc/http://www.cloudera.com/http://wiki.github.com/cloudera/sqoop/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
34/137
Query
The SMAQ stack for big data | 25
http://www.cloudera.com/company/open-source/http://www.cloudera.com/company/open-source/http://www.cloudera.com/company/open-source/http://www.cloudera.com/products-services/enterprise/http://www.cloudera.com/hadoop/http://www.cloudera.com/http://www.netezza.com/releases/2010/release071510.htmhttp://www.netezza.com/releases/2010/release071510.htmhttp://www.netezza.com/http://www.vertica.com/MapReducehttp://www.vertica.com/http://www.cloudera.com/company/open-source/http://www.cloudera.com/company/open-source/http://www.cloudera.com/products-services/enterprise/http://www.cloudera.com/hadoop/http://www.cloudera.com/http://www.netezza.com/releases/2010/release071510.htmhttp://www.netezza.com/releases/2010/release071510.htmhttp://www.netezza.com/http://www.vertica.com/MapReducehttp://www.vertica.com/http://www.asterdata.com/resources/mapreduce.phphttp://www.asterdata.com/resources/mapreduce.phphttp://www.asterdata.com/product/index.phphttp://www.greenplum.com/technology/mapreduce/http://www.greenplum.com/technology/mapreduce/http://www.greenplum.com/http://en.wikipedia.org/wiki/Data_warehousehttp://en.wikipedia.org/wiki/Data_warehouse -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
35/137
Pig
input = LOAD 'input/sentences.txt' USING TextLoader();words = FOREACH input GENERATE FLATTEN(TOKENIZE($0));grouped = GROUP words BY $0;counts = FOREACH grouped GENERATE group, COUNT(words);ordered = ORDER counts BY $0;STORE ordered INTO 'output/wordCount' USING PigStorage();
26 | Chapter 1:Data Science and Data Tools
http://hadoop.apache.org/pig/docs/r0.7.0/udf.htmlhttp://hadoop.apache.org/pig/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
36/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
37/137
(defmapcatop split [sentence](seq (.split sentence "\\s+")))
(? ?word)
(c/count ?count))
Search with Solr
Conclusion
28 | Chapter 1:Data Science and Data Tools
http://lucene.apache.org/http://lucene.apache.org/solr/http://nathanmarz.com/blog/introducing-cascalog-a-clojure-based-query-language-for-hado.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
38/137
Scraping, cleaning, and selling big data
Scraping, cleaning, and selling big data | 29
http://www.infochimps.com/datasets/http://blog.infochimps.com/2008/12/29/massive-scrape-of-twitters-friend-graph/http://www.infochimps.com/http://blog.infochimps.com/2008/12/29/massive-scrape-of-twitters-friend-graph/http://radar.oreilly.com/audreyw/index.htmlhttp://en.wikipedia.org/wiki/Trespass_to_chattels#United_States_lawhttp://www.infochimps.com/datasets/http://blog.infochimps.com/2008/12/29/massive-scrape-of-twitters-friend-graph/http://blog.infochimps.com/2008/12/29/massive-scrape-of-twitters-friend-graph/http://www.infochimps.com/http://radar.oreilly.com/audreyw/index.htmlhttp://radar.oreilly.com/2010/06/what-is-data-science.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
39/137
30 | Chapter 1:Data Science and Data Tools
http://en.wikipedia.org/wiki/Denial-of-service_attackhttp://radar.oreilly.com/2011/03/twitter-developers.htmlhttp://dev.twitter.com/pages/api_termshttp://www.copyright.gov/fls/fl102.htmlhttp://en.wikipedia.org/wiki/Denial-of-service_attackhttp://radar.oreilly.com/2011/03/twitter-developers.htmlhttp://dev.twitter.com/pages/api_termshttp://www.copyright.gov/fls/fl102.htmlhttp://www.copyright.gov/title17/92chap1.html#102http://www.copyright.gov/title17/92chap1.html#102 -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
40/137
Scraping, cleaning, and selling big data | 31
http://www.spss.com/https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-infochimpshttp://www.wolfram.com/mathematica/http://en.wikipedia.org/wiki/XMLhttp://www.w3.org/RDF/http://www.spss.com/https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-infochimpshttp://www.oscon.com/oscon2011https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-infochimps -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
41/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
42/137
Data hand tools
Data hand tools | 33
http://www.dataists.com/2010/09/a-taxonomy-of-data-science/http://www.gnu.org/software/octave/http://www.mathworks.com/http://wolfram.com/http://nosql-database.org/http://hadoop.apache.org/http://www.r-project.org/http://radar.oreilly.com/2010/06/what-is-data-science.htmlhttp://radar.oreilly.com/mikel/index.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
43/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
44/137
$ grep '599 [A-Z][A-Z]' rudx-log.txt | colrm 1 72 | head -2VRMO...
$ grep '599 [A-Z][A-Z]' rudx-log.txt | colrm 1 72 | sort |\uniq | head -2
ADAL
Data hand tools | 35
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
45/137
$ grep '599 [A-Z][A-Z]' rudx-log.txt | colrm 1 72 | sort | uniq | wc38 38 342
$ grep '599 [A-Z][A-Z]' rudx-log.txt | awk '{print $2 " " $11}' |\sort | uniq
14000 AD14000 AL14000 AN...
$ grep '599 [A-Z][A-Z]' rudx-log.txt | awk '{print $2 " " $11}' |\sort | uniq | grep 21000 | wc20 40 180
$ grep '599 [A-Z][A-Z]' rudx-log.txt | awk '{print $2 " " $11}' |\sort | uniq | grep 14000 | wc26 52 234
...
36 | Chapter 1:Data Science and Data Tools
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
46/137
$ grep '599 [A-Z][A-Z]' `find . -name rudx-log.txt -print` |\awk '{print $2 " " $11}' | sort | uniq | grep 14000 | wc
48 96 432
...
./2008/rudx-log.txt:QSO: 14000 CW 2008-03-15 1526 W1JQ 599 0054 \\UA6YW 599 AD./2009/rudx-log.txt:QSO: 14000 CW 2009-03-21 1225 W1JQ 599 0015 \\RG3K 599 VR...
Data hand tools | 37
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
47/137
$ find . -name rudx-log.txt -print | xargs grep '599 [A-Z][A-Z]' |\awk '{print $2 " " $11}' | grep 14000 | sort | uniq | wc
48 96 432
38 | Chapter 1:Data Science and Data Tools
http://www.softpanorama.org/Tools/Find/using_exec_option_and_xargs_in_find.shtmlhttp://www.softpanorama.org/Tools/Find/using_exec_option_and_xargs_in_find.shtml -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
48/137
$ find . -name rudx-log.txt -print | xargs grep '599 [A-Z][A-Z]' |\
awk '{print $2 " " $11}' | pv | grep 14000 | sort | uniq | wc3.41kB 0:00:00 [ 20kB/s] [48 96 432
Data hand tools | 39
http://www.macports.org/ports.phphttp://www.ivarch.com/programs/pv.shtmlhttp://twitter.com/dataspora -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
49/137
Hadoop: What it is, how it works, and what it can do
40 | Chapter 1:Data Science and Data Tools
http://developer.yahoo.com/hadoop/http://en.wikipedia.org/wiki/Nutchhttp://labs.google.com/papers/mapreduce.htmlhttp://labs.google.com/papers/gfs.htmlhttp://strataconf.com/strata2011/public/schedule/speaker/5259?cmp=il-radar-st11-hadoop-olsonhttp://strataconf.com/strata2011/public/schedule/speaker/5259?cmp=il-radar-st11-hadoop-olsonhttp://www.cloudera.com/http://hadoop.apache.org/http://hadoop.apache.org/http://radar.oreilly.com/jamest/index.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
50/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
51/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
52/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
53/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
54/137
Four free data tools for journalists (and snoops) | 45
http://www.nytimes.com/2010/11/28/business/28borker.htmlhttp://www.nytimes.com/2010/11/28/business/28borker.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
55/137
bit.ly
46 | Chapter 1:Data Science and Data Tools
http://backtype.com/http://bit.ly/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
56/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
57/137
The quiet rise of machine learning
48 | Chapter 1:Data Science and Data Tools
http://www.orbitz.com/http://www.estar.org.uk/wiki/index.php/Main_Pagehttp://www.astro.ex.ac.uk/http://www.astro.ex.ac.uk/http://www.astro.ex.ac.uk/https://twitter.com/#!/aallan/http://oreilly.com/catalog/9780596806446/http://www.astro.ex.ac.uk/people/aa/http://www.teleread.com/paul-biba/goodreads-revs-up-a-book-recommendation-engine/http://www.discovereads.com/http://radar.oreilly.com/2011/02/watson-machine-learning.htmlhttp://radar.oreilly.com/jennw/index.htmlhttp://radar.oreilly.com/jennw/index.htmlhttp://techcrunch.com/2011/03/29/gmail-to-roll-out-ads-that-learn-from-your-inbox/http://www.google.com/http://www.slideshare.net/jseidman/real-world-machine-learning-at-orbitz-strata-2011http://www.orbitz.com/http://www.estar.org.uk/wiki/index.php/Main_Pagehttp://www.astro.ex.ac.uk/http://www.astro.ex.ac.uk/http://www.astro.ex.ac.uk/people/aa/http://oreilly.com/catalog/9780596806446/https://twitter.com/#!/aallan/http://www.discovereads.com/http://www.teleread.com/paul-biba/goodreads-revs-up-a-book-recommendation-engine/http://www.goodreads.com/http://radar.oreilly.com/2011/02/watson-machine-learning.htmlhttp://radar.oreilly.com/jennw/index.htmlhttp://web.mailana.com/labs/bigdataforjournalists.pdf -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
58/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
59/137
50 | Chapter 1:Data Science and Data Tools
http://strataconf.com/http://en.wikipedia.org/wiki/Sensor_nodehttp://www.youtube.com/watch?v=7zpl_DZC2-g&feature=player_embeddedhttp://strataconf.com/http://en.wikipedia.org/wiki/Sensor_node -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
60/137
Where the semantic web stumbled, linked data willsucceed
Where the semantic web stumbled, linked data will succeed | 51
http://radar.oreilly.com/2009/05/google-rich-snippets-semantic-web.htmlhttp://opengraphprotocol.org/http://radar.oreilly.com/tylerb/index.htmlhttp://radar.oreilly.com/tylerb/index.htmlhttp://linkeddata.org/http://radar.oreilly.com/2009/05/google-rich-snippets-semantic-web.htmlhttp://radar.oreilly.com/2009/05/google-rich-snippets-semantic-web.htmlhttp://radar.oreilly.com/2010/05/facebook-open-graph-and-the-se.htmlhttp://opengraphprotocol.org/http://en.wikipedia.org/wiki/Holy_Roman_Empirehttp://radar.oreilly.com/tylerb/index.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
61/137
52 | Chapter 1:Data Science and Data Tools
http://en.wikipedia.org/wiki/Named_entity_recognitionhttp://en.wikipedia.org/wiki/Named_entity_recognitionhttp://developer.yahoo.com/search/boss/structureddata.htmlhttp://data.ordnancesurvey.co.uk/id/7000000000037256http://data.ordnancesurvey.co.uk/id/7000000000037256http://evan.prodromou.name/RDFa_vs_microformatshttp://evan.prodromou.name/RDFa_vs_microformatshttp://data.nytimes.com/http://blog.ordnancesurvey.co.uk/2010/11/linked-data-at-ordnance-survey/http://en.wikipedia.org/wiki/Named_entity_recognitionhttp://foursquare.com/venue/18645http://www.yelp.com/biz/cin-cin-wine-bar-los-gatos-2http://developer.yahoo.com/search/boss/structureddata.htmlhttp://developer.yahoo.com/search/boss/structureddata.htmlhttp://data.ordnancesurvey.co.uk/id/7000000000037256http://data.ordnancesurvey.co.uk/id/7000000000037256http://evan.prodromou.name/RDFa_vs_microformatshttp://evan.prodromou.name/RDFa_vs_microformatshttp://en.wikipedia.org/wiki/Hcardhttp://en.wikipedia.org/wiki/RDFahttp://www.google.com/support/webmasters/bin/answer.py?answer=176035http://www.factual.com/http://developer.yahoo.com/geo/geoplanet/data/http://blog.ordnancesurvey.co.uk/2010/11/linked-data-at-ordnance-survey/http://blog.ordnancesurvey.co.uk/2010/11/linked-data-at-ordnance-survey/http://data.nytimes.com/http://data.nytimes.com/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
62/137
Where the semantic web stumbled, linked data will succeed | 53
http://www.bbc.co.uk/blogs/bbcinternet/2010/07/the_world_cup_and_a_call_to_ac.htmlhttp://www.insidefacebook.com/2010/11/09/aggregated-mentions-machine-reading/http://techcrunch.com/2010/10/27/aro-mobile/http://www.guardian.co.uk/open-platform/blog/linked-data-open-platformhttp://developer.yahoo.com/geo/geoplanet/guide/api-reference.html#api-concordancehttp://en.wikipedia.org/wiki/HCardhttp://developer.yahoo.com/geo/geoplanet/guide/api-reference.html#api-concordancehttp://blog.placecast.net/post/489490648/opening-the-placecast-match-apihttp://www.insidefacebook.com/2010/11/09/aggregated-mentions-machine-reading/http://www.bbc.co.uk/blogs/bbcinternet/2010/07/the_world_cup_and_a_call_to_ac.htmlhttp://www.bbc.co.uk/blogs/bbcinternet/2010/07/the_world_cup_and_a_call_to_ac.htmlhttp://techcrunch.com/2010/10/27/aro-mobile/http://www.guardian.co.uk/open-platform/blog/linked-data-open-platformhttp://developer.yahoo.com/geo/geoplanet/guide/api-reference.html#api-concordancehttp://developer.yahoo.com/geo/geoplanet/guide/api-reference.html#api-concordancehttp://en.wikipedia.org/wiki/HCardhttp://blog.placecast.net/post/489490648/opening-the-placecast-match-apihttp://gigaom.com/2010/05/07/the-great-open-database-of-place-pages-in-the-sky/http://gigaom.com/2010/05/07/the-great-open-database-of-place-pages-in-the-sky/http://viewer.opencalais.com/http://viewer.opencalais.com/http://www.headup.com/http://www.headup.com/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
63/137
Social data is an oracle waiting for a question
54 | Chapter 1:Data Science and Data Tools
http://radar.oreilly.com/mslocum/index.htmlhttps://en.oreilly.com/where2011/public/regwith/whr11rad?cmp=il-radar-wh11-russell-social-datahttp://oreilly.com/catalog/0636920010203/http://twitter.com/ptwobrussellhttp://www.datameer.com/index.htmlhttp://www.needlebase.com/http://www.needlebase.com/http://aws.amazon.com/publicdatasets/http://radar.oreilly.com/2011/02/google-data-explorer.htmlhttp://radar.oreilly.com/mslocum/index.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
64/137
Social data is an oracle waiting for a question | 55
http://www.infochimps.com/http://gnip.com/https://en.oreilly.com/where2011/public/regwith/whr11rad?cmp=il-radar-wh11-russell-social-datahttp://gnip.com/http://www.infochimps.com/http://www.infochimps.com/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
65/137
The challenges of streaming real-time data
56 | Chapter 1:Data Science and Data Tools
http://radar.oreilly.com/audreyw/index.htmlhttp://github.com/ptwobrussell/Mining-the-Social-Webhttp://radar.oreilly.com/audreyw/index.htmlhttp://github.com/ptwobrussell/Mining-the-Social-Webhttps://en.oreilly.com/where2011/public/regwith/whr11rad?cmp=il-radar-wh11-russell-social-data -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
66/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
67/137
58 | Chapter 1:Data Science and Data Tools
https://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-gnip-realtime-datahttps://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-gnip-realtime-datahttps://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-gnip-realtime-datahttps://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-gnip-realtime-data -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
68/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
69/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
70/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
71/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
72/137
Theres no definition
Time for the community to rally
Why you cant really anonymize your data
Why you cant really anonymize your data | 63
http://radar.oreilly.com/petew/index.htmlhttp://radar.oreilly.com/petew/index.htmlhttp://www.datasciencetoolkit.org/http://www.datasciencetoolkit.org/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
73/137
64 | Chapter 2:Data Issues
http://33bits.org/2011/03/09/link-prediction-by-de-anonymization-how-we-won-the-kaggle-social-network-challenge/http://33bits.org/2011/03/09/link-prediction-by-de-anonymization-how-we-won-the-kaggle-social-network-challenge/http://www.kaggle.com/http://33bits.org/about/netflix-paper-home-page/http://33bits.org/about/netflix-paper-home-page/http://33bits.org/2011/03/09/link-prediction-by-de-anonymization-how-we-won-the-kaggle-social-network-challenge/http://33bits.org/2011/03/09/link-prediction-by-de-anonymization-how-we-won-the-kaggle-social-network-challenge/http://33bits.org/2011/03/09/link-prediction-by-de-anonymization-how-we-won-the-kaggle-social-network-challenge/http://www.kaggle.com/http://33bits.org/about/netflix-paper-home-page/http://33bits.org/about/netflix-paper-home-page/http://33bits.org/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
74/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
75/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
76/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
77/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
78/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
79/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
80/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
81/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
82/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
83/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
84/137
CHAPTER 3
The Application of Data: Products
and Processes
How the Library of Congress is building the Twitterarchive
75
http://blog.twitter.com/2010/04/tweet-preservation.htmlhttps://twitter.com/#!/BarackObama/status/1389362776http://bits.blogs.nytimes.com/2010/01/22/first-tweet-from-space/http://bits.blogs.nytimes.com/2010/01/22/first-tweet-from-space/http://blog.twitter.com/2010/04/tweet-preservation.htmlhttp://radar.oreilly.com/audreyw/index.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
85/137
76 | Chapter 3:The Application of Data: Products and Processes
http://www.loc.gov/folklife/https://groups.google.com/forum/#!topic/twitter-development-talk/Gs2VT4oE-oQ/overview -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
86/137
How the Library of Congress is building the Twitter archive | 77
http://www.archive.org/details/301workshttp://mehack.com/map-of-a-twitter-status-objecthttp://www.gnip.com/http://blog.twitter.com/2011/03/numbers.htmlhttp://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-loc-twitterhttp://www.archive.org/details/301workshttp://mehack.com/map-of-a-twitter-status-objecthttp://www.gnip.com/http://blog.twitter.com/2011/03/numbers.htmlhttp://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-loc-twitterhttp://www.oscon.com/oscon2011?cmp=il-radar-os11-loc-twitterhttps://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-loc-twitter -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
87/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
88/137
Data journalism and data tools
Data journalism, data tools, and the newsroom stack | 79
http://radar.oreilly.com/2010/12/data-journalism.htmlhttp://www.knightfoundation.org/press-room/press-release/knight-foundation-media-innovation-contest-announc/http://gigaom.com/2011/06/22/future-of-media-when-big-data-meets-journalism/http://gigaom.com/2011/06/22/future-of-media-when-big-data-meets-journalism/http://radar.oreilly.com/2010/12/data-journalism.htmlhttp://radar.oreilly.com/2010/12/data-journalism.htmlhttp://www.knightfoundation.org/press-room/press-release/knight-foundation-media-innovation-contest-announc/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
89/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
90/137
The newsroom stack
Data journalism, data tools, and the newsroom stack | 81
http://www.youtube.com/watch?v=CaXWWuNDHgE&feature=player_embeddedhttp://jonathanstray.com/the-editorial-search-enginehttp://www.niemanlab.org/2011/06/the-news-challenge-winning-panda-project-aims-to-make-research-easier-in-the-newsroom/https://docs.google.com/present/view?id=dft4sbfd_71fgd4fpg3 -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
91/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
92/137
The data analysis path is built on curiosity, followed by
action
The data analysis path is built on curiosity, followed by action | 83
http://radar.oreilly.com/mslocum/index.htmlhttp://oreilly.com/catalog/9781449389796/http://oreilly.com/catalog/9781449389796/http://www.oreillynet.com/pub/au/933http://radar.oreilly.com/mslocum/index.htmlhttp://www.flickr.com/photos/blprnt/3291244820/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
93/137
84 | Chapter 3:The Application of Data: Products and Processes
http://oreilly.com/catalog/9781449389796/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
94/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
95/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
96/137
How data and analytics can improve education | 87
http://www.open.ac.uk/http://research.uow.edu.au/learningnetworks/seeing/snapp/index.htmlhttp://google.com/analyticshttp://piwik.org/http://www.moodle.org/http://desire2learn.com/http://www.open.ac.uk/http://research.uow.edu.au/learningnetworks/seeing/snapp/index.htmlhttp://piwik.org/http://google.com/analyticshttp://desire2learn.com/http://www.moodle.org/http://www.athabascau.ca/http://www.elearnspace.org/blog/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
97/137
88 | Chapter 3:The Application of Data: Products and Processes
https://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-siemens-education-datahttps://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-siemens-education-datahttp://en.wikipedia.org/wiki/Hawthorne_effecthttps://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-siemens-education-datahttps://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-siemens-education-datahttp://radian6.com/http://klout.com/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
98/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
99/137
90 | Chapter 3:The Application of Data: Products and Processes
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
100/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
101/137
Data science is a pipeline between academic disciplines
92 | Chapter 3:The Application of Data: Products and Processes
http://strataconf.com/stratany2011/public/schedule/speaker/104414?cmp=il-radar-st11-drew-conway-data-science-academichttp://strataconf.com/stratany2011?cmp=il-radar-st11-drew-conway-data-science-academichttp://strataconf.com/stratany2011?cmp=il-radar-st11-drew-conway-data-science-academichttp://strataconf.com/stratany2011/public/schedule/speaker/104414?cmp=il-radar-st11-drew-conway-data-science-academichttp://twitter.com/drewconwayhttp://www.drewconway.com/Drew_Conway/About.htmlhttp://radar.oreilly.com/audreyw/index.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
102/137
Data science is a pipeline between academic disciplines | 93
http://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-drew-conway-data-science-academichttp://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-drew-conway-data-science-academichttp://themonkeycage.org/http://themonkeycage.org/http://oreilly.com/python/http://oreilly.com/python/http://oreilly.com/catalog/9780596801717 -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
103/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
104/137
Data science is a pipeline between academic disciplines | 95
https://www.mturk.com/mturk/welcomehttps://www.mturk.com/mturk/welcomehttp://en.wikipedia.org/wiki/Institutional_review_boardhttp://en.wikipedia.org/wiki/Institutional_review_boardhttp://orda.siuc.edu/human/http://orda.siuc.edu/human/https://www.mturk.com/mturk/welcomehttp://en.wikipedia.org/wiki/Institutional_review_boardhttp://en.wikipedia.org/wiki/Institutional_review_boardhttp://orda.siuc.edu/human/http://radar.oreilly.com/2011/02/big-data-metaphor.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
105/137
Big data and open source unlock genetic secrets
96 | Chapter 3:The Application of Data: Products and Processes
http://www.benaroyaresearch.org/http://www.benaroyaresearch.org/http://www.oscon.com/oscon2011/public/schedule/speaker/109459?cmp=il-radar-os11-charlie-quinn-data-geneshttp://radar.oreilly.com/2011/04/fcc-website-reboot-open-source-cloud.htmlhttp://radar.oreilly.com/gov2/http://strataconf.com/http://strataconf.com/http://www.economist.com/node/15557443?story_id=15557443http://www.oscon.com/?cmp=il-radar-os11-charlie-quinn-data-geneshttp://www.oscon.com/oscon2011/public/schedule/detail/19186?cmp=il-radar-os11-charlie-quinn-data-geneshttp://www.oscon.com/oscon2011/public/schedule/detail/19186?cmp=il-radar-os11-charlie-quinn-data-geneshttp://www.benaroyaresearch.org/http://www.benaroyaresearch.org/http://www.oscon.com/oscon2011/public/schedule/speaker/109459?cmp=il-radar-os11-charlie-quinn-data-geneshttp://www.huffingtonpost.com/alexander-howard/first-international-open-_b_784440.htmlhttp://www.huffingtonpost.com/alexander-howard/first-international-open-_b_784440.htmlhttp://radar.oreilly.com/2011/04/fcc-website-reboot-open-source-cloud.htmlhttp://radar.oreilly.com/gov2/http://strataconf.com/http://www.economist.com/node/15557443?story_id=15557443http://radar.oreilly.com/alexh/index.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
106/137
Big data and open source unlock genetic secrets | 97
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
107/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
108/137
Big data and open source unlock genetic secrets | 99
http://www.flickr.com/photos/jurvetson/3351973835/http://www.flickr.com/photos/jurvetson/3351973835/http://www.nih.gov/http://www.flickr.com/photos/jurvetson/3351973835/http://www.nih.gov/http://www.pubnet.org/http://www.oscon.com/oscon2011/public/schedule/detail/19186?cmp=il-radar-os11-charlie-quinn-data-geneshttp://www.oscon.com/oscon2011/public/schedule/detail/19186?cmp=il-radar-os11-charlie-quinn-data-genes -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
109/137
Visualization deconstructed: Mapping Facebooksfriendships
Mapping Facebooks friendships
100 | Chapter 3:The Application of Data: Products and Processes
http://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919http://radar.oreilly.com/2011/01/visualization-mapping-america.htmlhttp://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919http://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919http://paulbutler.org/http://radar.oreilly.com/2011/01/visualization-mapping-america.htmlhttp://radar.oreilly.com/sebastienp/index.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
110/137
Visualization deconstructed: Mapping Facebooks friendships | 101
http://apod.nasa.gov/apod/ap001127.htmlhttp://apod.nasa.gov/apod/ap001127.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
111/137
102 | Chapter 3:The Application of Data: Products and Processes
http://strataconf.com/?cmp=il-radar-st11-viz-facebook-friendshttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-viz-facebook-friendshttp://strataconf.com/?cmp=il-radar-st11-viz-facebook-friends -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
112/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
113/137
104 | Chapter 3:The Application of Data: Products and Processes
http://twitter.com/#search?q=%23teapartyhttp://twitter.com/#search?q=%23teapartyhttp://twitter.com/#search?q=%23teapartyhttp://twitter.com/#search?q=%23justinbieberhttp://aws.amazon.com/ec2/http://www.datameer.com/about/management.htmlhttp://analytics.google.com/http://www.phpmyadmin.net/home_page/index.phphttp://www.datameer.com/http://www.datameer.com/http://radar.oreilly.com/2010/06/what-is-data-science.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
114/137
Data science democratized | 105
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
115/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
116/137
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
117/137
108 | Chapter 4:The Business of Data
https://en.oreilly.com/jumpstart2011/public/regwith/stj11rad?cmp=il-radar-st11-alistair_croll_bigdata_081011https://en.oreilly.com/jumpstart2011/public/regwith/stj11rad?cmp=il-radar-st11-alistair_croll_bigdata_081011https://en.oreilly.com/jumpstart2011/public/regwith/stj11rad?cmp=il-radar-st11-alistair_croll_bigdata_081011https://en.oreilly.com/jumpstart2011/public/regwith/stj11rad?cmp=il-radar-st11-alistair_croll_bigdata_081011https://en.oreilly.com/jumpstart2011/public/regwith/stj11rad?cmp=il-radar-st11-alistair_croll_bigdata_081011 -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
118/137
Big data and the innovators dilemma
Theres no such thing as big data | 109
http://online.wsj.com/article/SB10001424053111903885604576486330882679982.htmlhttp://online.wsj.com/article/SB10001424053111903885604576486330882679982.htmlhttp://online.wsj.com/article/SB10001424053111903885604576486330882679982.htmlhttp://www.mckinsey.com/mgi/publications/big_data/index.asphttp://en.wikipedia.org/wiki/Eureka_(word)http://ideas.economist.com/event/information-2011http://ideas.economist.com/event/information-2011http://online.wsj.com/article/SB10001424053111903885604576486330882679982.htmlhttp://online.wsj.com/article/SB10001424053111903885604576486330882679982.htmlhttp://www.mckinsey.com/mgi/publications/big_data/index.asphttp://en.wikipedia.org/wiki/Eureka_(word)http://ideas.economist.com/event/information-2011http://en.wikipedia.org/wiki/Disruptive_technology -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
119/137
Building data startups: Fast, big, and focused
Setting the stage: The attack of the exponentials
110 | Chapter 4:The Business of Data
http://www.slideshare.net/medriscoll/driscoll-strata-buildingdatastartups25may2011cleanhttp://strataconf.com/strata-may2011/public/schedule/detail/20623http://strataconf.com/strata-may2011/public/schedule/detail/20623http://www.slideshare.net/medriscoll/driscoll-strata-buildingdatastartups25may2011cleanhttp://radar.oreilly.com/michaeld/index.htmlhttp://bit.ly/jumpstart-AC -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
120/137
Leveraging the big data stack
Building data startups: Fast, big, and focused | 111
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
121/137
Fast data
112 | Chapter 4:The Business of Data
http://radar.oreilly.com/2011/01/what-is-hadoop.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
122/137
Big analytics
Building data startups: Fast, big, and focused | 113
http://www.accenture.com/us-en/Pages/index.aspxhttp://www.greenplum.com/http://www.dbms2.com/2011/05/23/databases-ram/http://www.accenture.com/us-en/Pages/index.aspxhttp://www.netezza.com/http://hbase.apache.org/http://labs.google.com/papers/bigtable.htmlhttp://labs.google.com/papers/bigtable.htmlhttp://www.postgresql.org/http://www.greenplum.com/http://www.dbms2.com/2011/05/23/databases-ram/http://www.mapr.com/http://www.fusionio.com/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
123/137
Focused services
114 | Chapter 4:The Business of Data
http://www.metamarketsgroup.com/http://www.metamarketsgroup.com/http://klout.com/homehttp://www.news.me/http://flipboard.com/http://www.billguard.com/http://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-driscoll-data-startupshttp://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-driscoll-data-startupshttp://www.mckinsey.com/mgi/publications/big_data/index.asp -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
124/137
Democratizing big data
Data markets arent coming: Theyre already here
Data markets arent coming: Theyre already here | 115
http://strataconf.com/strata2011/public/schedule/detail/17604http://strataconf.com/strata2011/public/schedule/speaker/26?cmp=il-radar-st11-valeskihttp://strataconf.com/strata2011/public/schedule/detail/17602?cmp=il-radar-st11-valeskihttp://strataconf.com/strata2011/?cmp=il-radar-st11-valeskihttp://strataconf.com/strata2011/public/schedule/detail/17602?cmp=il-radar-st11-valeskihttp://www.delicious.com/http://twitter.com/http://www.facebook.com/http://twitter.com/#!/jvaleskihttp://gnip.com/http://radar.oreilly.com/julies/index.htmlhttp://infochimps.com/http://strataconf.com/strata2011/public/schedule/speaker/107129?cmp=il-radar-st11-valeskihttps://datamarket.azure.com/http://strataconf.com/strata2011/public/schedule/speaker/50595?cmp=il-radar-st11-valeskihttp://thomsonreuters.com/http://strataconf.com/strata2011/public/schedule/speaker/104234?cmp=il-radar-st11-valeskihttp://urbanmapping.com/http://urbanmapping.com/http://strataconf.com/strata2011/public/schedule/speaker/26?cmp=il-radar-st11-valeskihttp://strataconf.com/strata2011/public/schedule/detail/17604http://strataconf.com/strata2011/public/schedule/detail/17602?cmp=il-radar-st11-valeskihttp://strataconf.com/strata2011/public/schedule/detail/17602?cmp=il-radar-st11-valeskihttp://strataconf.com/strata2011/?cmp=il-radar-st11-valeskihttp://www.delicious.com/http://www.delicious.com/http://www.flickr.com/http://www.facebook.com/http://twitter.com/http://gnip.com/http://twitter.com/#!/jvaleskihttp://radar.oreilly.com/julies/index.htmlhttp://aboutfoursquare.com/foursquare-explains-how-explore-came-to-be/http://www.linkedin.com/answers/technology/information-technology/information-storage/TCH_ITS_IST/59136-2897253http://www.linkedin.com/answers/technology/information-technology/information-storage/TCH_ITS_IST/59136-2897253 -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
125/137
116 | Chapter 4:The Business of Data
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
126/137
Data markets arent coming: Theyre already here | 117
http://gnip.com/twitter/decahosehttp://gnip.com/twitter/halfhosehttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttp://dev.twitter.com/pages/streaming_api_concepts#samplinghttp://dev.twitter.com/pages/streaming_api_concepts#samplinghttp://gnip.com/twitter/spritzerhttp://gnip.com/twitter/halfhosehttp://gnip.com/twitter/decahosehttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttp://strataconf.com/?cmp=il-radar-st11-valeski -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
127/137
118 | Chapter 4:The Business of Data
http://en.wikipedia.org/wiki/Customer_relationship_managementhttp://en.wikipedia.org/wiki/Botnethttp://en.wikipedia.org/wiki/Customer_relationship_managementhttp://en.wikipedia.org/wiki/Botnethttp://radar.oreilly.com/2010/10/the-black-market-for-data.htmlhttp://en.wikipedia.org/wiki/Value-added_reseller -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
128/137
An iTunes model for data
An iTunes model for data | 119
http://www.web2expo.com/webexsf2011/public/schedule/detail/16684http://twitter.com/gilelbazhttp://www.factual.com/http://radar.oreilly.com/audreyw/index.htmlhttp://oreilly.com/catalog/9780596157128http://www.web2expo.com/webexsf2011/public/schedule/detail/16684http://www.factual.com/http://twitter.com/gilelbazhttp://radar.oreilly.com/audreyw/index.htmlhttp://oreilly.com/catalog/9780596157128 -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
129/137
120 | Chapter 4:The Business of Data
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
130/137
An iTunes model for data | 121
http://www.flickr.com/photos/ivanwalsh/5187183980/http://www.youtube.com/watch?v=X9RErxDRVW4http://www.flickr.com/photos/ivanwalsh/5187183980/http://www.flickr.com/photos/ivanwalsh/5187183980/http://www.youtube.com/watch?v=X9RErxDRVW4http://www.database.com/ -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
131/137
Data is a currency
122 | Chapter 4:The Business of Data
http://www.bloomberg.com/solutions/http://thomsonreuters.com/products_services/financial/financial_products/a-z/data_feeds/http://radar.oreilly.com/edd/index.htmlhttp://twitter.com/lockerprojecthttp://www.infochimps.com/http://thomsonreuters.com/products_services/financial/financial_products/a-z/data_feeds/http://www.bloomberg.com/solutions/http://radar.oreilly.com/edd/index.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
132/137
Big data: An opportunity in search of a metaphor
Big data: An opportunity in search of a metaphor | 123
http://strataconf.com/strata2011http://radar.oreilly.com/tylerb/index.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
133/137
124 | Chapter 4:The Business of Data
-
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
134/137
Data and the human-machine connection
Data and the human-machine connection | 125
http://www.pcworld.com/article/235846/as_twitter_turns_5_it_delivers_350_billion_tweets_each_day.htmlhttp://www.operasolutions.com/index.htmlhttp://www.operasolutions.com/profile_arnab_gupta.htmlhttp://radar.oreilly.com/julies/index.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
135/137
126 | Chapter 4:The Business of Data
http://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-gupta-interviewhttp://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-gupta-interview -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
136/137
Data and the human-machine connection | 127
http://www-03.ibm.com/innovation/us/watson/index.htmlhttp://www-03.ibm.com/innovation/us/watson/index.html -
7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy
137/137
http://www.flickr.com/photos/pdenker/74684051/http://www.flickr.com/photos/pdenker/74684051/