data science concept by raj krishna paul

12
Data Science Concept Raj Krishna Paul B S Engg (USA) Team Lead Verizon Data Service, India Email: [email protected] Subir Paul, B Tech, M Tech, Ph .D Professor faculty of Engg, Jadavpur University India Email: [email protected]

Upload: subir-paul

Post on 20-Mar-2017

85 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Data science concept by Raj Krishna Paul

Data Science Concept

Raj Krishna Paul B S Engg (USA)Team Lead Verizon Data Service, India

Email: [email protected]

Subir Paul, B Tech, M Tech, Ph .DProfessor faculty of Engg, Jadavpur University India

Email: [email protected]

Page 2: Data science concept by Raj Krishna Paul

Data Science Visualization

Page 3: Data science concept by Raj Krishna Paul

Why Data Science

• Mathematical Relationship between a output and the several input parameters not known viz. stock market, health data, human activity, mobile activity

• Because the relationship is very complicated , inter relationship of several parameters involved

• Advent of Big Data , Statistics, Programming, We model a hypothesis, test it, train it till the output predict with the minimum errors & develop predictive relationship

• Higher the availability Data volume More Accuracy• Its Emerging area of study , as Big Data available in all

sphere of science, Engg, Economics, Social affairs

Page 4: Data science concept by Raj Krishna Paul

What it can Predict

• Stock Market share prices with date time , type industries, commodity, people , country,cities

• People Behavior and trend of buying commodities, use of mobile data plan, investment

• Damage and Loss due to Natural Calamities• Relationship between Bank products and People

type in different regions , countries, cities• Life prediction of Big structures in corrosive Env

Page 5: Data science concept by Raj Krishna Paul

How does it help Big Industries• Guides the Big Entrepreneur to plan and

decide which way to go, which products they can increase price and still making profits,

• Measures to be taken by a govt to reduce the loss of people and property due to natural calamity

• Develop High Strength and resistance Future materials to totally stop unpredictable Structures failures Aeroplane, Bridge, ships

Page 6: Data science concept by Raj Krishna Paul

Data Science How it is done• Collection of Big Data from the web , various

data source,• Data Integrity: Manage Missing data, duplicate

data, out of data, inconsistent data, multiple addresses of person, negative salary, Data & time in character format to numeric

• Data Cleaning : missing files, smoothening data, filtering, sampling

Page 7: Data science concept by Raj Krishna Paul

Big Data” Sources

Every:ClickAd impressionBilling eventFast Forward, pause,…Server requestTransactionNetwork messageFault…

User Generated (Web & Mobile)

…..

Internet of Things / M2M Health/Scientific Computing

It’s All Happening On-line

Page 8: Data science concept by Raj Krishna Paul

• Make a subset out of Big Data of Important & interest of Investigation to Party or Firm

• Randomly Select sample of data frames• Apply Statistical laws & equation to find and

fit scatter Data to some known distribution, Normal Distribution, Poisson

• Make Graphics and visual representation of the results to study and find linear or non linear relationship

• Make a hypothesis of Input and output • Test the Hypothesis with data if fail Modify

Page 9: Data science concept by Raj Krishna Paul

Statistical Tests• t test, Chi-square Tests, Identity of samples• Distribution: Normal ,Binomial, Poisson• Mean , Mode, Median, Variance, Sd,• Correlation, Regression • ANOVA/MANOVA: Fit a Model,

Page 10: Data science concept by Raj Krishna Paul

Tools for Statistical Modeling

• Genetic Algorithm (GA): Input are the genes, output the chromosomes , the combination of best genes to produce a product

• Artificial Neural Network (ANN): A model is trained with say 60% data , tested with 20% and predicted 20 % data. Each time error between prediction and actual is reduced by modifying the NN Architect till a global minima is achieved

Page 11: Data science concept by Raj Krishna Paul

Data Science Programming

• R Programming

• Python Programming

• SAS Programming

Page 12: Data science concept by Raj Krishna Paul

Final Delivery

• Finally a Predictive Model is delivered• It correctly predicts an output of commodity

or product• Helps the production and Marketing units of

a company to take the right steps to carry forward the business with higher profits