data science concept by raj krishna paul
TRANSCRIPT
![Page 1: Data science concept by Raj Krishna Paul](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cedba61a28abd4098b6467/html5/thumbnails/1.jpg)
Data Science Concept
Raj Krishna Paul B S Engg (USA)Team Lead Verizon Data Service, India
Email: [email protected]
Subir Paul, B Tech, M Tech, Ph .DProfessor faculty of Engg, Jadavpur University India
Email: [email protected]
![Page 2: Data science concept by Raj Krishna Paul](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cedba61a28abd4098b6467/html5/thumbnails/2.jpg)
Data Science Visualization
![Page 3: Data science concept by Raj Krishna Paul](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cedba61a28abd4098b6467/html5/thumbnails/3.jpg)
Why Data Science
• Mathematical Relationship between a output and the several input parameters not known viz. stock market, health data, human activity, mobile activity
• Because the relationship is very complicated , inter relationship of several parameters involved
• Advent of Big Data , Statistics, Programming, We model a hypothesis, test it, train it till the output predict with the minimum errors & develop predictive relationship
• Higher the availability Data volume More Accuracy• Its Emerging area of study , as Big Data available in all
sphere of science, Engg, Economics, Social affairs
![Page 4: Data science concept by Raj Krishna Paul](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cedba61a28abd4098b6467/html5/thumbnails/4.jpg)
What it can Predict
• Stock Market share prices with date time , type industries, commodity, people , country,cities
• People Behavior and trend of buying commodities, use of mobile data plan, investment
• Damage and Loss due to Natural Calamities• Relationship between Bank products and People
type in different regions , countries, cities• Life prediction of Big structures in corrosive Env
![Page 5: Data science concept by Raj Krishna Paul](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cedba61a28abd4098b6467/html5/thumbnails/5.jpg)
How does it help Big Industries• Guides the Big Entrepreneur to plan and
decide which way to go, which products they can increase price and still making profits,
• Measures to be taken by a govt to reduce the loss of people and property due to natural calamity
• Develop High Strength and resistance Future materials to totally stop unpredictable Structures failures Aeroplane, Bridge, ships
![Page 6: Data science concept by Raj Krishna Paul](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cedba61a28abd4098b6467/html5/thumbnails/6.jpg)
Data Science How it is done• Collection of Big Data from the web , various
data source,• Data Integrity: Manage Missing data, duplicate
data, out of data, inconsistent data, multiple addresses of person, negative salary, Data & time in character format to numeric
• Data Cleaning : missing files, smoothening data, filtering, sampling
![Page 7: Data science concept by Raj Krishna Paul](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cedba61a28abd4098b6467/html5/thumbnails/7.jpg)
Big Data” Sources
Every:ClickAd impressionBilling eventFast Forward, pause,…Server requestTransactionNetwork messageFault…
User Generated (Web & Mobile)
…..
Internet of Things / M2M Health/Scientific Computing
It’s All Happening On-line
![Page 8: Data science concept by Raj Krishna Paul](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cedba61a28abd4098b6467/html5/thumbnails/8.jpg)
• Make a subset out of Big Data of Important & interest of Investigation to Party or Firm
• Randomly Select sample of data frames• Apply Statistical laws & equation to find and
fit scatter Data to some known distribution, Normal Distribution, Poisson
• Make Graphics and visual representation of the results to study and find linear or non linear relationship
• Make a hypothesis of Input and output • Test the Hypothesis with data if fail Modify
![Page 9: Data science concept by Raj Krishna Paul](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cedba61a28abd4098b6467/html5/thumbnails/9.jpg)
Statistical Tests• t test, Chi-square Tests, Identity of samples• Distribution: Normal ,Binomial, Poisson• Mean , Mode, Median, Variance, Sd,• Correlation, Regression • ANOVA/MANOVA: Fit a Model,
![Page 10: Data science concept by Raj Krishna Paul](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cedba61a28abd4098b6467/html5/thumbnails/10.jpg)
Tools for Statistical Modeling
• Genetic Algorithm (GA): Input are the genes, output the chromosomes , the combination of best genes to produce a product
• Artificial Neural Network (ANN): A model is trained with say 60% data , tested with 20% and predicted 20 % data. Each time error between prediction and actual is reduced by modifying the NN Architect till a global minima is achieved
![Page 11: Data science concept by Raj Krishna Paul](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cedba61a28abd4098b6467/html5/thumbnails/11.jpg)
Data Science Programming
• R Programming
• Python Programming
• SAS Programming
![Page 12: Data science concept by Raj Krishna Paul](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cedba61a28abd4098b6467/html5/thumbnails/12.jpg)
Final Delivery
• Finally a Predictive Model is delivered• It correctly predicts an output of commodity
or product• Helps the production and Marketing units of
a company to take the right steps to carry forward the business with higher profits