introdution to apache hadoop

Download Introdution to Apache Hadoop

If you can't read please download the document

Upload: semtech-solutions-ltd

Post on 20-Aug-2015

498 views

Category:

Technology


4 download

TRANSCRIPT

  1. 1. Apache Hadoop What is it ? Architecture Related Projects Large users
  2. 2. Hadoop What is it ? An open source system developed using Java Supports very large data sets Supports large clusters of servers Designed to run on pre existing low cost hardware Allows for fragmentation of work over cluster Allows for fragmentation of storage over cluster Provides resiliance via automatic failure handling
  3. 3. Hadoop - Architecture Hadoop consists of Hadoop Common Common utilities for Hadoop module support Hadoop MapReduce Parallel processing of Hadoop data Hadoop Yarn Scheduler and resource manager Hadoop Distributed File System (HDFS) A Master/Slave file system which spreads the Hadoop data over a very large cluster of slave data nodes controlled by a single name node.
  4. 4. Hadoop Related Projects
  5. 5. Hadoop Related Projects Pig - for analysing large data sets Hive data warehouse system for Hadoop Mahout machine learning and data mining Avro a data serialization system Zoo Keeper helps build distributed applications Chukwa data collection and analysis
  6. 6. Hadoop Related Projects Hue Hadoop user interface Oozie work flow scheduler Hama bulk synchronous parallel framework For massive scientific computations Nutch web crawler Hbase Non relational database
  7. 7. Hadoop Large Users Yahoo 10,000 core Linux cluster Facebook 100 Petabytes, growing at .5 Petabytes a day Amazon Its possible to run Hadoop on Amazon's EC2 and S3
  8. 8. Contact Us Feel free to contact us at www.semtech-solutions.co.nz [email protected] We offer IT project consultancy We are happy to hear about your problems You can just pay for those hours that you need To solve your problems