big-data hadoop tutorials - mindscripts technologies, pune

Why do I need Hadoop?

Business analytics

Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods.

Problem : Too much data

Big Data!!

Velocity

How fast data is being produced and how fast the data must be processed to meet

demand.

Have a look through analytics lens!

Variability

highly inconsistent with periodic peaks

Is something big trending in the social media?

Difference in Variety and Variability

Megabytes,Gigabytes…Terabyte : To put it in some perspective, a

Terabyte could hold about 300 hours of good quality video. A Terabyte could hold 1,000 copies of the Encyclopedia Britannica.

Petabyte : It could hold 500 billion pages of standard printed text.

Exabyte: It has been said that 5 Exabytes would be equal to all of the words ever spoken by mankind.

Human Generated Data and Machine GeneratedData

Challenges of Big Data

Sheer size of Big DataBig Data is unstructured or semi

structured.No point in just storing big data, if

we can't process it.

Hadoop enables a computing solution that is:

Scalable– New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.

Cost effective– Hadoop brings massively parallel computing to commodity servers.

Flexible– Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources.

Fault tolerant– When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat.

Power of Map Reduce

Course Content

IntroductionHadoop: Basic Concepts What is Hadoop? The Hadoop Distributed File System Hadoop Map Reduce Works Anatomy of a Hadoop Cluster

Hadoop daemonsMaster DaemonsName nodeJob TrackerSecondary name nodeSlave Daemons Job trackerTask tracker

HDFS ( Hadoop Distributed File System )

Blocks and SplitsInput SplitsHDFS SplitsData ReplicationHadoop Rack AwareData high availabilityData IntegrityCluster architecture and block placementAccessing HDFSJAVA ApproachCLI ApproachProgramming PracticesDeveloping MapReduce Programs in Local ModeRunning without HDFS and MapreducePseudo-distributed ModeRunning all daemons in a single nodeFully distributed modeRunning daemons on dedicated nodes

Writing a MapReduce ProgramExamining a Sample MapReduce ProgramWith several examplesBasic API ConceptsThe Driver CodeThe MapperThe ReducerHadoop's Streaming API

Common MapReduce Algorithms Sorting and SearchingIndexingClassification/Machine LearningTerm Frequency - Inverse Document FrequencyWord Co-OccurrenceHands-On Exercise: Creating an Inverted IndexIdentity MapperIdentity ReducerExploring well known problems using MapReduce applications

Debugging MapReduce ProgramsTesting with MRUnitLoggingOther Debugging Strategies.

Advanced MapReduce Programming A Recap of the MapReduce FlowThe Secondary SortCustomized Input Formats and Output Formats

Hadoop Ecosystem

HBaseHBase conceptsHBase architectureRegion server architectureFile storage architectureHBase basicsColumn accessScansHBase use casesInstall and configure HBase on a multi node clusterCreate database, Develop and run sample applicationsAccess data stored in HBase using clients like Java, Python and PearlHBase and Hive IntegrationHBase admin tasksDefining Schema and basic operation

Hive Hive conceptsHive architectureInstall and configure hive on clusterCreate database, access it from java clientBucketsPartitionsJoins in hiveInner joinsOuter JoinsHive UDFHive UDAFHive UDTFDevelop and run sample applications in Java/Python to access hive

PIG Pig basicsInstall and configure PIG on a clusterPIG Vs MapReduce and SQLPig Vs HiveWrite sample Pig Latin scriptsModes of running PIGRunning in Grunt shellProgramming in EclipseRunning as Java programPIG UDFsPig Macros

FlumeFlume conceptsInstall and configure flume on clusterCreate a sample application to capture logs from Apache using flume

SqoopGetting SqoopA Sample ImportDatabase ImportsControlling the importImports and consistencyDirect-mode importsPerforming an Export

Contact Us

AddressMindScripts Technologies,2nd Floor, Siddharth Hall, Near Ranka Jewellers, Behind HP Petrol Pump, Karve Rd,Pune 411004

AddressMindScripts Technologies,C8, 2nd Floor, Sant Tukaram Complex ,Pradhikaran, Above Savali Hotel, Opp Nigdi Bus Stand,Nigdi, Pune - 411044

Call 9595957557 8805674210 9764560238 9767427924 9881371828

www.mindscripts.cominfo@mindscripts.com

big-data hadoop tutorials - mindscripts technologies, pune

data formats

fast data

type of data

challenges of big data

mapreduce applications

human generated data

pig pig basics

cluster pig vs mapreduce

Education

android institutes in pune -mindscripts

hadoop tutorials spark - cern

software testing training in pune mindscripts

java training center in pune - mindscripts

software testing tutorials - mindscripts technologies, pune

salesforce certification courses in pune - mindscripts

pune pradhikaran nigdi - cloud computing classes @...

software testing training institutes in pune mindscripts

250 hadoop interview questions and answers for experienced...

mindscripts corporate profile

big data technologies -...

software testing certification course mindscripts

advanced java training institutes in pune - mindscripts

java training institute in pune - mindscripts

apache hive - tutorials point · apache hive i about the...

java courses in pune - mindscripts

cloud computing training institutes in pune : mindscripts

salesforce training in pune - mindscripts

hadoop 2 development with spark - mindscripts · pdf...

java training institutes in pune - mindscripts