data science training bangalore, itpl whitefield aug / sep 2015
TRANSCRIPT
On Saturday 22nd August 2015
From 10:00AM to 12:00PM
At InfoVision Solution India Pvt Ltd,
7th Floor, Discoverer, ITPL, Whitefield, Bangalore - 560066
Course Details
Duration: 100 Hours, One Month
Fees: INR 10,000/-
Location: InfoVision Solution India Pvt Ltd,
7th Floor, Discoverer, ITPL,
Whitefield, Bangalore - 560066
Schedule: 2-4 Hours per day, weekdays and
weekend flexi hours
Outcome: Able to independently deliver data
science projects
Certification: InfoVision Certified Data
Scientist Level 1
Today’s competitive
battleground is fueled by
information.
Big Data solutions are
designed to capture,
process, store, and analyze
data so that the right
person gets the right
information, at the right
time.
What is Big Data?
Big data refers to datasets whose volume, velocity, variety, and complexity exceed the ability of commonly used software tools to capture, process, store, manage, and analyze. Big Data is the combination of different types of data:
•Unstructured—data communicated every day by email, phone, text, tweet, and video
•Semistructured—data generated by machines
•Structured—data traditionally stored in databases, such as account information and credit card transactions for example, log files
The challenge of Big Data is to efficiently and effectively capture, store, manage, and analyze 100 percent of the data to drive business insight and timely decisions.
What is Apache Hadoop?
Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.
What is R Programming?
R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.
What is Pentaho?
The Pentaho BI Project is an ongoing effort by the Open Source community to provide organizations with best-in-class solutions for their enterprise Business Intelligence (BI) needs.
Course Contents
Statistics and
Business Analytics
40 Hours
Introduction to R Programming – R Studio, Shiny
Introduction and Data Analytics
Statistics – Mean, Mode, Median, Standard Deviation
Introduction to WEKA
Classification, Rules of Association, Regression Analysis, Cluster
Analysis
Algorithms - K-means, TwoStep, Kohonen net, Apriori and GRI
Decision Tree and Clustering
Projects : Patient Analytics, Automobile Analytics, Football
Analytics, Stock Market Analytics
Data Visualization
20 Hours
Introduction
Pentaho – Installation
Pentaho Report Designer (PRD)
PostgreSQL connection to Pentaho Tools
Pentaho Data Integration
Saiku Analytics in Pentaho
Big Data & Hadoop
Cloudera 5.3
Installation
20 Hours
Introduction To Hadoop Distributed File System (HDFS).
Understanding - Map-Reduce Basics
SQOOP / ZOOKEEPER
HBASE
PIG
HIVE
Flume
Case Studies &
Projects
20 Hours
Case Studies
Supply Chain Optimization
Genome Analysis
Project Work