hadoop at rakuten, 2011/07/06
DESCRIPTION
Hadoop at RakutenTRANSCRIPT
1111
Hadoop at Rakuten.
Rakuten Inc. Architect GroupHamba Mitsuharu & Nakagawa Gen 2011/07/06(Wed)
2222
Today’s Agenda.
Hadoop at Rakuten.
1. Our Profie.2. What is Hadoop?3. Our Current Hadoop System Overview.4. Our Hadoop Usage.5. Our Challenge.6. Our Future Plan.
3333
Our Profile.
Hadoop at Rakuten.
4444
From ACT GroupNakagawa GenHamba Mitsuharu
Our Profile.
5555
Our Profile.
Our Mission
Enhancing Hadoop at Rakuten.
6666
Our Profile.
Latest Our Tasks.Done.
1.Implementing Ganglia.2.Implementing HA.
7777
Our Profile.
Latest Our Tasks.Now Handing Over.
1. Keeping Up Our Hadoop Cluster.2. Modifying Our Hadoop Configurations.3. Implementing Scripts for Daily Chores.
8888
Our Profile.
Latest Our Tasks. Concentrate It!
1.Evaluating The Related Products.
9999
What is Hadoop?
Hadoop at Rakuten.
10101010
One of The Most PowerfulDistributed Processing for Large Data Sets.
What is Hadoop?
11111111
Distributions.
What is Hadoop?
12121212
Ecosystem.
What is Hadoop?
ETC...
13131313
What is Hadoop?
HDFS : Hadoop Distributed File System.MapReduce :Map & Reduce (Includes Shuffle & Sort) .
HDFS & MapReduce Constitute Hadoop.
14141414
What is Hadoop?
Source : http://horicky.blogspot.com/2008_11_01_archive.html
Input from HDFS.
Output to HDFS. Process by MapReduce.
15151515
What is Hadoop?
Simple Example.
Source : http://techblog.yahoo.co.jp/cat207/cat209/hadoop/
16161616
What is Hadoop?
Source : http://horicky.blogspot.com/2008_11_01_archive.html
In Common Case,Combine Several Simple Jobs.
17171717
What is Hadoop?
NameNode & DataNode Constitute HDFS.
Source : http://horicky.blogspot.com/2008_11_01_archive.html
18181818
What is Hadoop?
Read & Write on HDFS.
Source : http://hadoop.apache.org/common/docs/current/hdfs_design.html#NameNode+and+DataNodes
19191919
What is Hadoop?
JobTracker & TaskTracker Constitute MapReduce.
Source : http://horicky.blogspot.com/2008_11_01_archive.html
20202020
What is Hadoop?
Good & Bad Points of Hadoop.
�Bad…There is SPoF at NameNode.
�Good!Easy to Scale Out System.Easy to Implement Distributed Processing.
21212121
Our Current HadoopSystem Overview.
Hadoop at Rakuten.
22222222
Our Current Hadoop System Overview.
The Cluster Infrastructure. #1For Instance.
Source : http://www.ibm.com/developerworks/linux/library/l-hadoop/
23232323
Our Current Hadoop System Overview.
The Cluster Infrastructure. #2In Our Case.
Switch Switch Switch
Switch
Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack
NN&JTActive
NN&JTStandby
SNN
DN&TT DN&TT DN&TT DN&TT DN&TT DN&TT
DN&TTDN&TTDN&TT
1Gbps1Gbps1Gbps
1Gbps
x10 x10 x10 x10 x10 x10
x3 x3 x3
Client
Others Others Others
x18 x18 x183 Masters & 69 Slaves.
24242424
Our Current Hadoop System Overview.
The Monitoring System.Using Ganglia (& MRTG).Every Time We Easily Can CheckThe Resource Usage,Not Only Each MachineBut As Cluster.
25252525
Our Current Hadoop System Overview.
High Availability.Using DRBD & HeartBeat.
v-host.rakuten.co.jp
eth1
NN JT NN JT
/foo/drbd0 /foo/drbd1 /foo/drbd0 /foo/drbd1
DRBD Sync The Change.
eth0 eth0
eth1
Active Standby
Client
Source : Gen
NN : NameNodeJT : JobTracker
26262626
Our Hadoop Usage.
Hadoop at Rakuten.
27272727
Our Hadoop Usage.
1. Generating Recommend Engine Index.2. Analyzing Redirect Log.3. Calculating AD Targeting Index.4. Measuring AD Effects.5. Analyzing Ichiba Merchandise & Order Info. 6. Calculating Ichiba Product Ranking.7. Analyzing Search Log.
8. Analyzing Rakuten Travel’s Access Log. (Coming Soon...)9. Analyzing Search Word N-gram. (Coming Soon...)
Who Is Using Our Hadoop.
28282828
Our Hadoop Usage.
The Issues of The Previous System.
Purchase
Shop
ITEM
Intermediate
Intermediate
FileFileFileFileFileFile
Marketing
Utility
Previous System
Category NFS
Unload
Load
Manipulate
1. Need High Cost to Keep Up The RDBMS.2. Need Quite a Lot of Storage Space More & More.3. System Cannot Handle So Many Job Request
Due to Low Performance.Batch Server
29292929
Our Hadoop Usage.
The Effect of The New System.
Purchase
Shop
ITEM
FileFileFileFileFileFile
Marketing
Utility
New System! 1st Step.
Category NFS
Unload
Load
Manipulate
Batch Serverwith
1. Get Scalable System at Very Low Cost. (80% OFF as Storage.)2. Transaction Time is Dramatically Improved. (50-75% OFF.)
Intermediate
30303030
Our Hadoop Usage.
The Remaining Subject ofThe New System.1. Still Halfway to Aiming DWH.2. The Negative Influence Due to The Migration
from Occupied Environment to Shared Environment.1. Security.2. Sharing Cluster Resource.
31313131
Our Challenge.
Hadoop at Rakuten.
32323232
Our Challenge.
1. Likely to Use Up The HDFS Space.2. Need Much Electlicity Power.3. Share The Cluster Resource Efficiently.4. Need More Network Bandwidth.
The Issues with Our Hadoop.
33333333
Our Future Plan.
Hadoop at Rakuten.
34343434
Our Future Plan.
Considering New Slave Machine.
?
Now Looking for a Machine Which has…Low Electric Power Consumption,About 6 Cores CPU x2,About 10TB HDD,About 96GB Memory,& Naturally Compatible With Our Data Center.
35353535
Our Future Plan.
Upgrade from Apache to CDH3.
Source : http://www.quora.com/What-are-the-advantages-of-getting-Apache-Hadoop-from-Cloudera-rather-than-the-Apache-Software-Foundation]
1. A version of Hadoop that has frequent releases (quarterly) that include bug fixes and back ported features (append for HBase, Kerberos security from Y!, etc.).
2. Related projects (Hive, Pig, Oozie, HBase, Flume, Sqoop, etc.) tested together and work as a cohesive system.
3. Simplified installation via Yum / Apt repositories.4. Tighter integration with the OS (init scripts for daemons, installation of things in
common paths, logs in their proper location.).5. A fixed release schedule.6. Support available from Cloudera with SLAs.
Mr.Eric Sammer (Solution Architect at Cloudera) Described the Advantage of Hadoop from Cloudera on Quora.
36363636
Our Future Plan.
Evaluating HBase Using AWS.
Constructing HBase Cluster on Amazon EC2.Doing Evaluation & Verification This Summer!
37373737
Hadoop at Rakuten.
We Need Hadooper Much More!Come With Us!
Need Your Help!
38383838
Thank You.
Hadoop at Rakuten.