apachecon big data 2015 - stock prediction.key
TRANSCRIPT
1
1 Pivotal Confidential–Internal Use Only
2
William Markito@william_markito
Fred Melo@fredmelo_br
(incubating)
Implementing a highly scalable Stock prediction system with Apache Geode,
Spring XD and Spark MLib
About us
Fred Melo
Technical Director for Data
@fredmelo_br
William Markito
Enterprise Architect for GemFire
@william_markito
A Simple Example
Data SourcesLook for patterns
Forecast
"Smart System"
Applicability
Smart System
Learns with HISTORICAL TRENDS
Live data becomes historical over time
Real-Time
Evaluates LIVE DATA
Historical
What do we want to build?
Trading Data
“According to historical trends, there’s an 80% chance this stock prices might go down within the next few minutes"
"How were the technical indicator readings when the latest price drops happened? "
Live Data
Data Temperature
Hot
Cold
Apache Hawq
Apache Geode / GemFire1- Live data is ingested into the grid
3 - Results are pushed immediately to deployed applications
4 - “Hot" data ages, becoming part of the historical dataset
5 - Re-training triggered, ML model updated.
Spring XD
2 - Trained ML model compares new data to historical patterns
The Machine Learning Pipeline data flow
Spring XD
Machine Learning model
Live Data
Data Temperature
Hot
Warm
Apache Geode / GemFire1- Live data is ingested into the grid
3 - Results are pushed immediately to deployed applications
Machine Learning model
2 - Trained ML model compares new data to historical patterns
The Machine Learning Pipeline data flow
5 - Re-training triggered, ML model updated.
Spring XD
Simplified Model
Spring XD
Transform Sink
SpringXD
Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native
Machine Learning
Enrich Filter
Split
Dashboard
Indicators
1
2
Predict
3
Real data
Simulator
/Stocks
/TechIndicators
/Predictions
Too complex?? Eating it in small bites…
SpringXD GemFire
Transform Sink
SpringXD
Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native
Machine Learning
Enrich Filter
Split
Dashboard
Indicators
1
2
Predict
3
Real data
Simulator
/Stocks
/TechIndicators
/Predictions
/Stocks
/TechIndicators
/Predictions
• Cache • Configurable through XML, ,Java
• Region • Distributed j.u.Map on steroids • Highly available, redundant
• Member • Locator, Server, Client
• Callbacks • Listener, Writer, AsyncEventListener, Parallel/Serial
Apache Geode Concepts
Apache Geode HA and Fail-Tolerance
Transform Sink
SpringXD
Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native
Machine Learning
Enrich Filter
Split
Dashboard
Indicators
1
2
Predict
3
Real data
Simulator
/Stocks
/TechIndicators
/Predictions
Transform Sink
SpringXDEnrich Filter
Split1
2
Predict3
Streams Pipelines Sources Sinks Filters Taps
Transform Sink
SpringXD
Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native
Machine Learning
Enrich Filter
Split
Dashboard
Indicators
1
2
Predict
3
Real data
Simulator
/Stocks
/TechIndicators
/Predictions
medium avg (x+1)
relative strength (x)
medium avg (x)
price(x)
Machine Learning Model (e.g. Linear Regression)
Features Label
medium avg (x+1)
relative strength (x)
medium avg (x)
price(x)
Machine Learning Model (e.g. Linear Regression)
Features Label
Demo Time
Error
https://github.com/Pivotal-Open-Source-Hub/StockInference-SparkSource code and detailed instructions available at:
22
William Markito@william_markito
Fred Melo@fredmelo_br
Follow us on Twitter!
23
1 Pivotal Confidential–Internal Use Only