becoming a data-centric engineering team...ad-hoc individual analysis common infrastructure; tested...
TRANSCRIPT
![Page 1: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/1.jpg)
1© 2015 The MathWorks, Inc.
Becoming a Data-Centric Engineering Team
Emelie Andersson
![Page 2: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/2.jpg)
2
A path for how your team can
better work with and utilize
data.
![Page 3: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/3.jpg)
3
Data Science Maturity Levels
Ad-hoc
Individual
Analysis
Generally Useful
Tools for
Analysis
Common
Infrastructure;
Tested and
Documented
Overhead when asking
a new question
Ease of scaling
to more people
![Page 4: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/4.jpg)
4
Data Science Maturity Levels
Ad-hoc
Individual
Analysis
Common
Infrastructure;
Tested and
Documented
Generally Useful
Tools for
Analysis
• Goal is to be fast: reduce time to insight
![Page 5: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/5.jpg)
5
Getting Started: Exploring a New Dataset
![Page 6: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/6.jpg)
6
Getting Started: Exploring a New Dataset
![Page 7: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/7.jpg)
7
Getting Started: Exploring a New Dataset
![Page 8: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/8.jpg)
8
Getting Started: Exploring a New Dataset Missing Dataismissing
rmmissing
fillmissing
Outliersisoutlier
rmoutliers
filloutliers
Change Pointsischange
Noisy Datasmoothdata
and more…
https://www.mathworks.com/help/
matlab/preprocessing-data.html
![Page 9: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/9.jpg)
9
Getting Started: Exploring a New Dataset
![Page 10: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/10.jpg)
10
Getting Started: Exploring a New Datasetgeoplot
geoscatter
geobubble
geodensityplot
https://www.mathworks.com/help/
matlab/geographic-plots.html
![Page 11: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/11.jpg)
11
Data Science Maturity Levels
Ad-hoc
Individual
Analysis
Common
Infrastructure;
Tested and
Documented
• Explore and understand data
• Document analysis
• Tools will be re-used in next steps
Generally Useful
Tools for
Analysis
![Page 12: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/12.jpg)
12
Data Science Maturity Levels
Ad-hoc
Individual
Analysis
Common
Infrastructure;
Tested and
Documented
• Apply to different datasets
• Functions/Scripts
• MATLAB Apps
• Trend: Work with BIG DATA
Generally Useful
Tools for
Analysis
![Page 13: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/13.jpg)
13
Overview of Flight Data
▪ 35 unique aircraft
▪ 180,000 unique flights
▪ 300 GB of data
▪ Source:
– NASA Dash Link: Sample Flight Data
– https://c3.nasa.gov/dashlink/projects/85/
![Page 14: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/14.jpg)
14
Big Data Creates Opportunities
Find rare events, then dive deeper
Build and validate test scenarios that match real-world conditions
Perform fleet-wide calculations
![Page 15: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/15.jpg)
15
Big Data Requires New Tools
Built-In Datastores
General datastore
spreadsheetDatastore
tabularTextDatastore
fileDatastore
Database databaseDatastore
Image imageDatastore
denoisingImageDatastore
randomPatchExtractionDatastore
pixelLabelDatastore
augmentedImageDatastore
Audio audioDatastore
Predictive
Maintenance
fileEnsembleDatastore
simulationEnsembleDatastore
Simulink SimulationDatastore
Automotive mdfDatastore
![Page 16: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/16.jpg)
16
Big Data Requires New Tools
▪ Customize a datastore to work with
your dataset
▪ Gives you control over how data is
loaded and formatted
▪ MATLAB subclass: “fill-in-the-blanks”
▪ Build a piece of infrastructure, then re-
use it in your analyses
function [data,info] = read(ds)
...
end
function tf = hasdata(ds)
...
end
function reset(ds)
...
end
function p = progress(ds)
...
end
function data = readall(ds)
...
end
Custom Datastore
![Page 17: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/17.jpg)
17
A Custom Datastore for Flight Data
![Page 18: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/18.jpg)
18
Find Rare Events, then Dive Deeper
![Page 19: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/19.jpg)
19
Perform Fleet-Wide Calculations
![Page 20: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/20.jpg)
20
Data Science Maturity Levels
Ad-hoc
Individual
Analysis
Common
Infrastructure;
Tested and
Documented
• Make it easy to navigate the data
• Re-use each time you analyze the dataset
Generally Useful
Tools for
Analysis
![Page 21: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/21.jpg)
21
Data Science Maturity Levels
Ad-hoc
Individual
Analysis
Common
Infrastructure;
Tested and
Documented
Generally Useful
Tools for
Analysis
• Collaborate: Work with others on a common code base
• Verify: Write well-tested software
• Share: Build tools for others
![Page 22: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/22.jpg)
22
MATLAB Projects
![Page 23: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/23.jpg)
23
Testing
![Page 24: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/24.jpg)
24
Creating a Toolbox
![Page 25: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/25.jpg)
25
Data Science Maturity Levels
Ad-hoc
Individual
Analysis
Common
Infrastructure;
Tested and
Documented
• Scale-out to larger group of users
• Easier to maintain and share
Generally Useful
Tools for
Analysis
![Page 26: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/26.jpg)
26
What’s Next?
Advanced Analytics and Machine Learning
Build and Test Algorithms for
Embedded Systems
Deploy Apps and Analytics to
Enterprise IT Systems
![Page 27: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps](https://reader035.vdocuments.us/reader035/viewer/2022081400/5f275ae614ec896c7721bedf/html5/thumbnails/27.jpg)
27
Takeaways
▪ MATLAB has many new tools to help you better work with and utilize
your data
▪ Create tools for you / your team / your organization to explore and
analyze data
▪ Increasing maturity with data science is a journey; we’re here to help