about this training - s3-ap-southeast · pdf filewith writing real scripts on a hadoop system...
TRANSCRIPT
ABOUTTHISTRAINING:
TheworldofHadoopand“BigData"canbeintimidating-hundredsofdifferenttechnologieswithcrypticnamesformtheHadoopecosystem.
Thiscomprehensivetraininghasbeendesignedbyindustryexpertsconsideringcurrentindustryjobrequirementstoprovidein-depthlearningonbigdataandHadoopModules.
ThisindustryorientedprogramisacombinationofthetrainingcoursesinHadoopdeveloper,Hadoopadministrator,Hadooptesting,andanalytics.
ThisHadooptrainingwillalsoprepareyouforthe“BigDataCertificationofCloudera-CCPandCCA”.
Withthiscourse,you'llnotonlyunderstandwhatthosesystemsareandhowtheyfittogether-butyou'llgohands-onandlearnhowtousethemtosolverealbusinessproblems!
-Soit'snotjusttheory…
It'sfilledwithhands-onactivitiesandexercisesbackedwithplentydiscussionsofusecasesofdifferentclientsacrossdomains.Webelievethiswillhelpyoutobuildyourconfidenceonthetechnologyanditsusage.
You'llfindarangeofactivitiesinthiscourseforpeopleateverylevel.Ifyou'reaprojectmanagerwhojustwantstolearnthebuzzwords,therearewebUI'sformanyoftheactivitiesinthecoursethatrequirenoprogrammingknowledge.Ifyou'recomfortablewithcommandlines,
we'llshowyouhowtoworkwiththemtoo.Andifyou'reaprogrammer,we'llchallengeyouwithwritingrealscriptsonaHadoopsystemusingjava,Scala,PigLatin,andPython.
YOURTAKEAWAYFROMTRAINING:
You'llwalkawayfromthiscoursewithareal,deepunderstandingofHadoopanditsassociateddistributedsystems,andyoucanapplyHadooptoreal-worldproblems.
Plusavaluablecompletioncertificateiswaitingforyouattheend!
WHATYOUWILLLEARNINTHISBIGDATAHADOOPONLINETRAININGCOURSE?
1. DetailedunderstandingofBigDataanalytics2. MasterfundamentalsofHadoop2.8andYARNfordesigningdistributedsystemsthatmanages
"bigdata"usingHadoopandrelatedtechnologiesforstoringandanalyzingdataatscale.3. Understand the architecture of HDFS and MapReduce for parallel storage and parallel
processing.4. Understand the configuration choices you shouldmake for stability, reliability andoptimized
taskschedulingonyourdistributedsystem.5. AnalyzerelationaldatausingHiveandMySQL(ConnectingHadooptoOtherDBs)6. Analyzenon-relationaldatausingHBase7. UsePigandSparktocreatescriptstoprocessdataonaHadoopclusterinmorecomplexways.8. UnderstandhowHadoopclustersaremanagedbyYARN,ZookeeperandHue.9. KnowhowtoscheduleyourHadoopjobsusingOozie.10. Collectdatafromavarietyofsources toyourHadoopclusterusingSqoopandFlume11. MasterHadoopadministrationactivitieslikeclustermanaging,monitoring,administrationand
troubleshooting12. LearntestingHadoopapplicationsusingMRUnitandotherautomationtools.13. KnowbasicsofSpark,SparkRDD,Graphx,MLlibandwritingSparkapplications14. Practicereal-lifeprojectsusingHadoopandApacheSpark15. Discussiononindustryusecasesofdifferentclientsacrossdomains16. BeequippedtoclearBigDataHadoopCertification.(CCPandCCA)
RECOMMENDEDSKILLSPRIORTOTAKINGTHISCOURSE
There is no pre-requisite to take this Big data training and tomaster Hadoop. But basics ofUNIX,SQLandjava/Pythonwouldbegood.AtGraduIT,weprovidecomplimentarySelf-PacedunixandJavacoursewithourBigDataHadooptrainingtobrush-uptherequiredskillssothatyouaregoodonyourHadooplearningpath.
WHOSHOULDTAKETHISBIGDATAHADOOPTRAININGCOURSE?
1. ProgrammingDevelopersandSystemAdministrators2. Experiencedworkingprofessionals,Projectmanagers3. Big Data Hadoop Developers eager to learn other verticals like Testing, Analytics,
Administration4. BusinessIntelligence,DatawarehousingandAnalyticsProfessionals5. Graduates,undergraduateseagertolearnthelatestBigDatatechnologycantakethisBigData
HadoopCertificationonlinetraining
BITONHADOOP:
HadoopenablestoBUILDANINSIGHT-DRIVENBUSINESS.
Tobespecific,Hadoopisanopen-sourcesoftwareframeworkforstoringdataandrunningapplicationsonclustersofcommodityhardware.Itprovidesmassivestorageforanykindofdata,enormousprocessingpowerandtheabilitytohandlevirtuallylimitlessconcurrenttasksorjobs.
AlmosteverylargecompanyyoumightwanttoworkatusesHadoopinsomeway,includingAmazon,Ebay,Facebook,Google,LinkedIn,IBM,Spotify,Twitter,andYahoo!Andit'snotjusttechnologycompaniesthatneedHadoop;eventheNewYorkTimesusesHadoopforprocessingimages.
CURRICULUM:
Module1:UnderstandingBigDataandHadoop
• IntroductiontoBigDataAndHadoop• DiscussiononBigDataanditsSourcesandChallengesrelatedtoit• UnderstandingattributesofBigDataanddifferentdatavarieties• DiscussingUsesCaseson“OpportunityforBusiness”inBigData• DiscussionondifferentsolutionsforproblemsrelatedtoBigData• ComparisonofHadoopvstraditionalsystemsSolutions• UnderstandingCompleteSolutionarchitecturefromDataAcquisitiontoDataAnalysis
forBusiness• DiscussionontechnologiesgettingusedforEndtoEndsolutionforBigDataAnalysis
drivenbusiness• SessiononPYTHON,UNIXandJAVA(Self-pacedlearningvideos)
Module2:UnderstandingHadoopfromThousandfeetView
• BitonhistoryofHadoopanditsevolution• Understandingparallelprocessingandparallelstoragearchitecture• RelatingMAPREDUCEandHDFSArchitecturetoabove• OverviewofalltechnologystacksrelatedtoHadoopEcosystem.• DiscussiononDifferentDistributionsofHadoop,DifferentvendorsofHadoop• Installationandset-upofHadoopCluster(Clouderapreferably)
Module3:UnderstandingParallelStoragesolutionwithHDFS
• UnderstandingHDFSArchitectureindetail• UnderstandingHadoopMaster-SlaveArchitecture• UnderstandingNameNode,DataNode,SecondaryNameNode• DiscussiononBlocksandDataReplication• LearningcommonHDFScommandsandpracticingthem• DiscussiononTypicalProductionClusteranditsconfigurationsforstability,reliability
andOptimization• UnderstandingAnatomyoffileRead,WriteoperationsinHDFS• DiscussiononHadoop2.xClusterArchitecture-FederationandHighAvailability
Module4:UnderstandingParallelProcesssolutionwithMapReduce
• UnderstandingHadoop2.xMapReduceArchitecture(YARNArchitecture)• UnderstandingHadoop2.xMapReduceComponents• DiscussiononYARNMRApplicationExecutionFlow• DiscussiononAnatomyofMapReduceProgram–WorkFlow• DiscussiononbasicMapReduceAPIConcepts• WritingMapReduceDriver,Mappers,andReducersusingJAVA• DemoonMapReduceprogramexecution• UnderstandingInputSplitsanditsrelationshipwithHDFSBlocks• DiscussonAdvancedConcepts:
o DistributedCacheo CombinerandPartitionero Counterso MapsideandReducesideJoinso UseofCompressiontechniques(Snappy,LZOandZip)o AdvancedDatatypesinMapReduce(WritableandWritableComparable)
• Hands-onexercisesonMapReduceProgramExecution• DemoonTelecomDataset
Module 5: Learn to analyze relational data using Hive
• UnderstandingofHiveFramework&Components• DiscussiononrelationshipbetweenHiveandMapReduce(WhentochooseWhat)• Understandingandhandson
o HiveDataTypeso HiveDDL–Create/Show/DropDataBaseandTableso HiveDML–LoadFiles&InsertData o HiveSQL-Select,Filter,Join,GroupBy
• UnderstandingofInternalandExternalTables• UnderstandingofProgrammingstructureinUDF,PartitionsandBuckets• DiscussiononLimitationsofHive.• DiscussiononHIVEonSPARK.
Module6:LearnDataFlowETLScriptingLanguage“Pig”
• Introduction to Apache Pig• DiscussiononrelationshipbetweenHive,MapReduceandPig• Grunt• ShellandUtilitycomponents • Different data types in Pig
• ProgrammingStructureinPig • Modes Of Execution in Pig • Experiencing Pig Script by writing Evaluation, Filter, Load and Store functions • UDFs in Pig • Understand Integration of HBASE with Pig (After Completion of HBASE Module)
Module7:Learn to analyze non-relational data using HBase
• IntroductiontoNoSQLDatabases• UnderstandingHBasenon-relational distributedDatabaseanditsarchitecture• When/WhytouseHBase• UnderstandingHBaseClientAPI• KnowhowtoloadDataintoHBase• KnowhowtoqueryDatafromHBase• DiscussiononrelationshipbetweenHive,MapReduce,PigandHBase• Overviewofothernon-relationallikeCassandra,andMongoDBanditsadvantagesover
RDBMSandCareerpathinNOSQLDBs.
Module8:LearnhowtoCollectdatafromavarietyofsourcestoyourHadoopclusterusingSqoop,Flume
• UnderstandWhyandwhatisSQOOP• UnderstandSQOOPArchitecture• LearnhowtoImportingDataUsingSQOOPtoHDFS/HIVE/HBaseFromRDBMS• UnderstandWhyandwhatisFlume• UnderstandFlumeArchitecture• Learnhowtoefficientlycollect,aggregate,andmovelargeamountsofstreamingdataintothe
HadoopDistributedFileSystem(HDFS)
Module9:LearnhowtoscheduleHadoopjobsusingOozie
• IntroductiontoWorkFlowManagementandOozie• LearnOozieComponentsAndWorkflow• OzzieCommands• ExperienceschedulingjobsusingOozie
Module10:DiscussiononEmergingTechnologiesinBigDataLandscape
• Discussionon:o EmergingTrendsInBigDataLandscapeo IntroductiontoSPARK–InMemoryParallelProcessingFrameworkforReal
Time/NearRealTime(NRT)operations
• BestpracticesforHadoopDeveloper• DiscussiononInterviewpreparation