Download - Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath
![Page 1: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/1.jpg)
Streamflow - Programming Model for Data Streaming inScientific Workflows
Chathura Herath
![Page 2: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/2.jpg)
OutlineBackgroundMotivationApproachArchitectureProgramming ModelDomain application
![Page 3: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/3.jpg)
BackgroundScientific workflow are a
good programming model for scientific computing
Scientific domains have high volumes of data
Most of the data are coming from sensors, catalogs and other experiments.
Most data sources are data streams or can be modeled as streams.
![Page 4: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/4.jpg)
Motivation Huge data sources require preprocessing and mining
and scaling down of data volumes. Compute resources are limited when taking the scale
of date. Currently experts determine which data sets contain
the interesting data Preserve the workflow programming model for the
user. Users are familiar with DAG execution Define workflow patterns for use as new workflow
semantics that can capture data streams Goal
◦ Real-time data mining, filtering and preprocessing◦ Data-driven reactive workflow systems◦ Feedback systems
![Page 5: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/5.jpg)
Data to Information Data Storage
Supercomputing
Information RateData Rate
![Page 6: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/6.jpg)
Data to Information Data Storage
Supercomputing
Information RateData Rate
Scientific workflow
Stream Mining
![Page 7: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/7.jpg)
Streamflow Data Storage
Supercomputing
Information RateData Rate
Streamflow
![Page 8: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/8.jpg)
Why Workflow Streaming?Most scientific workflows
are staticConsiderable segment of
scientific data for scientific workflows are produced by scientific sensors
Sensor data tend to behave as repeating data streams
It is possible to provide a programming abstraction to capture data search and filtration?
![Page 9: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/9.jpg)
Possible approachesComplete decoupled systems where
workflows and the data mining is separate.◦ Data mining rules or queries would produce
outputs which would may get refined again and again.
◦ Some interesting event would launch the workflow.
◦ It may loose the insight and abstraction provided by the workflows
◦ The Data mining itself may have complex data and control dependencies
Pure workflow approach◦ Workflow languages are not designed for
streaming
![Page 10: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/10.jpg)
Stream Integration Approach Complex Event Processing system
◦ Interact with the streams◦ Filter and bundle data◦ Publish input datasets to workflows
Workflow system◦ Handles the scientific computations◦ Gets invoked when dataset of specified
nature gets published to the CEP system
Resources
Streamflow Semantics
StreamBase
Workflow
Streamflow Composer
Esper
![Page 11: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/11.jpg)
STREAMing workFLOWS -Streamflows
Streamflows are enhancement of workflows to handle data streams
Allows the complex experimental logic to be encapsulated using scientific workflows
Allows the management of large streams of data with stream mining
Provide a programming model similar to workflow composition to handle streams Workflo
w Streamflow
![Page 12: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/12.jpg)
Stream Integration
Select * from DataminedRUCDATA(reflectivity> 3.5).win:time_batch(1h)
![Page 13: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/13.jpg)
Workflow Semantics Conventional SOA components
can be used as it is. Workflow components may
change behavior based on input data or stream.
Filter nodes will change the “cardinality” of the output stream
Aggregator will aggregate data over a window.
Generator node interface external stream to the Streamflow
![Page 14: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/14.jpg)
Programming model
Join semantics◦ Constant inputs need to be matched to
streams.Inputs Streamed into the workflow
from Stream EngineOutputs are published back by stream
sinks and may be used for feedback.
![Page 15: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/15.jpg)
Evaluation Deployment Overhead
◦ Extra overhead as the workflow is flat. Θ(1)
◦ Extra overhead are comparable to the normal workflow deployment because it may need to deploy new workflows
Runtime Latency◦ Latency of
event arriving at the framework to be delivered the workflow.
![Page 16: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/16.jpg)
Evaluation
![Page 17: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/17.jpg)
Domains
MeteorologyAstronomy
On-DemandGrid Computing
StreamingObservations
Storms Forming
Forecast Model
Data Mining
Astronomy
Meteorology
![Page 18: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/18.jpg)
Related work B. Biornstad. A workflow approach to stream
processing, PhD Thesis, Computer Science Department, ETH Zurich.
Y. Liu, N. Vijayakumar, and B. Plale. Stream processing in data-driven computational science. In Proceedings of the 7th IEEE/ACM International Conference on Grid Computing, pages 160–167. IEEE Computer Society Washington, DC, USA, 2006.
J. Buck, S. Ha, E. Lee, and D. Messerschmitt. Ptolemy: A framework for simulating and prototyping heterogeneous systems. International Journal of Computer Simulation, 4(2):155–182, 1994. – DataTurbine
Y. Cai et al. MAIDS: Mining Alarming Incidents from Data Streams Automated Learning Group, NCSA, University of Illinois at Urbana-Champaign, U.S.A.
![Page 19: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/19.jpg)
Future workDevelop a formal model for the
workflow semanticsEvent order guarantees How to handle missing streamsProvenance for data streams.
![Page 20: Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f295503460f94c426d1/html5/thumbnails/20.jpg)
Questions ?