big-data hadoop tutorials - mindscripts technologies, pune

Post on 27-Jan-2015

115 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

MindScripts Technologies, is the leading Big-Data Hadoop Training institutes in Pune, providing a complete Big-Data Hadoop Course with Cloud-Era certification.

TRANSCRIPT

Why do I need Hadoop?

Business analytics

Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods.

Problem : Too much data

Big Data!!

Velocity

How fast data is being produced and how fast the data must be processed to meet

demand.

Have a look through analytics lens!

Variability

highly inconsistent with periodic peaks

Is something big trending in the social media?

Difference in Variety and Variability

Megabytes,Gigabytes…Terabyte : To put it in some perspective, a

Terabyte could hold about 300 hours of good quality video. A Terabyte could hold 1,000 copies of the Encyclopedia Britannica.

Petabyte :   It could hold 500 billion pages of standard printed text.

Exabyte: It has been said that 5 Exabytes would be equal to all of the words ever spoken by mankind.

Human Generated Data and Machine GeneratedData

Challenges of Big Data

Sheer size of Big DataBig Data is unstructured or semi

structured.No point in just storing big data, if

we can't process it.

Hadoop enables a computing solution that is:

Scalable– New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.

Cost effective– Hadoop brings massively parallel computing to commodity servers.

Flexible– Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources.

Fault tolerant– When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat.

Power of Map Reduce

Course Content

IntroductionHadoop: Basic Concepts  What is Hadoop? The Hadoop Distributed File System Hadoop Map Reduce Works Anatomy of a Hadoop Cluster 

Hadoop daemonsMaster DaemonsName nodeJob TrackerSecondary name nodeSlave Daemons Job trackerTask tracker

HDFS ( Hadoop Distributed File System )

Blocks and SplitsInput SplitsHDFS SplitsData ReplicationHadoop Rack AwareData high availabilityData IntegrityCluster architecture and block placementAccessing HDFSJAVA ApproachCLI ApproachProgramming PracticesDeveloping MapReduce Programs in Local ModeRunning without HDFS and MapreducePseudo-distributed ModeRunning all daemons in a single nodeFully distributed modeRunning daemons on dedicated nodes

Writing a MapReduce ProgramExamining a Sample MapReduce ProgramWith several examplesBasic API ConceptsThe Driver CodeThe MapperThe ReducerHadoop's Streaming API 

Common MapReduce Algorithms Sorting and SearchingIndexingClassification/Machine LearningTerm Frequency - Inverse Document FrequencyWord Co-OccurrenceHands-On Exercise: Creating an Inverted IndexIdentity MapperIdentity ReducerExploring well known problems using MapReduce applications

Debugging MapReduce ProgramsTesting with MRUnitLoggingOther Debugging Strategies.

Advanced MapReduce Programming A Recap of the MapReduce FlowThe Secondary SortCustomized Input Formats and Output Formats

Hadoop Ecosystem

HBaseHBase conceptsHBase architectureRegion server architectureFile storage architectureHBase basicsColumn accessScansHBase use casesInstall and configure HBase on a multi node clusterCreate database, Develop and run sample applicationsAccess data stored in HBase using clients like Java, Python and PearlHBase and Hive IntegrationHBase admin tasksDefining Schema and basic operation

Hive Hive conceptsHive architectureInstall and configure hive on clusterCreate database, access it from java clientBucketsPartitionsJoins in hiveInner joinsOuter JoinsHive UDFHive UDAFHive UDTFDevelop and run sample applications in Java/Python to access hive

PIG Pig basicsInstall and configure PIG on a clusterPIG Vs MapReduce and SQLPig Vs HiveWrite sample Pig Latin scriptsModes of running PIGRunning in Grunt shellProgramming in EclipseRunning as Java programPIG UDFsPig Macros

FlumeFlume conceptsInstall and configure flume on clusterCreate a sample application to capture logs from Apache using flume 

SqoopGetting SqoopA Sample ImportDatabase ImportsControlling the importImports and consistencyDirect-mode importsPerforming an Export

Contact Us

AddressMindScripts Technologies,2nd Floor, Siddharth Hall, Near Ranka Jewellers, Behind HP Petrol Pump, Karve Rd,Pune 411004

AddressMindScripts Technologies,C8, 2nd Floor, Sant Tukaram Complex ,Pradhikaran, Above Savali Hotel, Opp Nigdi Bus Stand,Nigdi, Pune - 411044

Call 9595957557 8805674210 9764560238 9767427924 9881371828

www.mindscripts.cominfo@mindscripts.com

top related