![Page 1: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/1.jpg)
Tweet about Enzee Universe using #enzee11
Hadoop and NetezzaCo-existence or competition?Krishnan Parasuraman, CTO - Digital Media, Netezza
Tweet about Enzee Universe using #enzee11
@kparasuraman
![Page 2: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/2.jpg)
Tweet about Enzee Universe using #enzee11
2
The Buzz
![Page 3: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/3.jpg)
Tweet about Enzee Universe using #enzee11
3
![Page 4: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/4.jpg)
Tweet about Enzee Universe using #enzee11
4
Fuelling the debate
![Page 5: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/5.jpg)
Tweet about Enzee Universe using #enzee11
5
A brief history of wannabe RDBMS killers
![Page 6: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/6.jpg)
Tweet about Enzee Universe using #enzee11
6
Open Source Distributed Storage and Processing Engine
Self healing, distributed storage
Commodity hardware – inexpensive storage
+
Fault tolerant distributed processing
Abstraction for parallel computing
Manage complex data – relational and non relational – in a single repository
Store source data forever and analyze as and when needed
Process at source – eliminate data movementOozie
Workflow
SqoopIntegration
ZookeeperService coordination
Flume, Chukwa, ScribeData collection
![Page 7: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/7.jpg)
Tweet about Enzee Universe using #enzee11
7
Hadoop: Origin and evolution
2003 2004 2005 2006 2007 2008 2009 2010 2011
Google: GFS paper
Google: MapReduce paper
Apache: Lucene subproject
Google: Bigtable paper
Apache: Hadoop project
Yahoo: 10K core cluster
Apache: HBase project
Netezza : Hadoop Connector, MapReduce support
Early Research Open source dev momentum
Initial success stories
Commercialization
![Page 8: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/8.jpg)
Tweet about Enzee Universe using #enzee11
8
Common Perceptions
Low cost
Cloud
Complex Analytics
Ad-hoc queries
Unstructured
Large Volumes
![Page 9: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/9.jpg)
Tweet about Enzee Universe using #enzee11
9
Parallel data warehouse systems
FPGA
Memory
CPU FPGA
Memory
CPU FPGA
Memory
CPU
Hosts
Storage Units
Massively parallel compute nodes
Network fabric
Host controllers
SQL
![Page 10: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/10.jpg)
Tweet about Enzee Universe using #enzee11
10
Hadoop
Storage Units
Parallel compute nodes
Network fabric
Master Node
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Name Node
Job Tracker
Map Reduce
![Page 11: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/11.jpg)
Tweet about Enzee Universe using #enzee11
11
The similarities
Highly Available
Scalable
Execute code & algorithms next to data
Massive parallelism
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Name Node
Job Tracker
Map Reduce
![Page 12: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/12.jpg)
Tweet about Enzee Universe using #enzee11
12
The differences
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Name Node
Job Tracker
Map Reduce
Data Loading = File copy Look Ma, No ETL
Schema on Read – Data loading is fast
Batch mode data access
Not intended for real time access
Doesn’t support Random Access
No joins, no query engine, no types, no SQL
![Page 13: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/13.jpg)
Tweet about Enzee Universe using #enzee11
Where does it work well?
1. Queryable Archive: Moving computation is cheaper than moving data
13
2. Exploratory analysis: Relationships not defined yet; Can’t put in a process for ETL; Evolving schema
3. Complex data: Parallel ETL in Java
![Page 14: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/14.jpg)
Tweet about Enzee Universe using #enzee11
14
Imperatives for co-existence
• Fast data loading - flexible schema till we figure out what we want to do
• Expressability of SQL coupled with flexibility of procedural code i.e. MapReduce
• Low cost of storing and analyzing not-so-hot data
• Parse and analyze complex data such as video and images
![Page 15: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/15.jpg)
Tweet about Enzee Universe using #enzee11
Netezza-Hadoop: Co-existence use cases
unstructured data
semi-structured data
structured data
Create context (classification, text mining)
Analyze
Parse, aggregate Analyze, report
Analyze, reportActive archival
Long running queries
![Page 16: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/16.jpg)
Tweet about Enzee Universe using #enzee11
Pattern 1: Data ingestion
NameNodeJobTracker
DataNodeTaskTracker
DataNodeTaskTracker
DataNodeTaskTracker
Hadoop Cluster Netezza Environment
Raw Weblogs
1
2
3
4
![Page 17: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/17.jpg)
Tweet about Enzee Universe using #enzee11
Pattern 2: Low cost storage and dynamic provisioning
Elastic MapReduce
2
3
Amazon S3
Amazon Cloud
1
![Page 18: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/18.jpg)
Tweet about Enzee Universe using #enzee11
Pattern 3: Queryable archive
Data Sources
1 2
![Page 19: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/19.jpg)
Tweet about Enzee Universe using #enzee11
Pattern 4: Support low interaction partners
Data Sources
1
23
![Page 20: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/20.jpg)
Tweet about Enzee Universe using #enzee11
Netezza and Hadoop integration
Hadoop/HDFS integration
High speed data loader(bidirectional)
weblogs
• Move data back and forth between Netezza and Hadoop cluster
• Use Hadoop for ingesting/parsing web logs, offline analytics
![Page 21: Hadoop and Netezza - Co-existence or Competition?](https://reader035.vdocuments.us/reader035/viewer/2022062701/553989da550346f02f8b4a41/html5/thumbnails/21.jpg)
Tweet about Enzee Universe using #enzee11
21
Summary: Leveraging best of both worlds
2. Hadoop and Netezza are complementary technologies
1. Hadoop is not a replacement to a parallel datawarehouse
4. We have only solved the integration problem
3. Don’t let the hype drive the need