vortex tutorial part ii
Post on 06-Jul-2015
462 Views
Preview:
DESCRIPTION
TRANSCRIPT
Angelo Corsaro, PhD Chief Technology Officer
angelo.corsaro@prismtech.com
Vortex Tutorial Part II
Recap
Cop
yrig
ht P
rism
Tech
, 201
4
Vortex enables seamless, ubiquitous, efficient and timely data sharing across mobile, embedded, desktop, cloud and web applications
Vortex is based on the OMG DDS standard
The Vortex PlatformVortex Device
Tools
Integration
Vortex Cloud
MaaS
Building ChirpIt
Cop
yrig
ht P
rism
Tech
, 201
4
To explore the various features provided by the Vortex platform we will be designing and implementing a micro-blogging platform called ChirpIt. Specifically, we want to support the following features:
ChirpIt users should be able to “chirp”, “re-chirp”, “like” and “dislike” trills as well as get detailed statistics
The ChirpIt platform should provide information on trending topics — identified by hashtags — as well as trending users
Third party services should be able to flexibly access slices of produced trills to perform their own trend analysis
ChirpIt should scale to millions of users
ChirpIt should be based on a Lambda Architecture
ChirpIt Requirements
Cop
yrig
ht P
rism
Tech
, 201
4
ChirpIt Architecture
analytics
data centre
chirp
emotionsstats
batch layer
chirpit apps
Cloud Messaging
serving layer
chirp
emotionsstats
master dataset
view view view
chirp
emotionsstats
3rd party svcs
Speed layer
Next Step
Trendy #hashtags
Cop
yrig
ht P
rism
Tech
, 201
4
As an example of how VORTEX can be used to compute analytics we’ll see how to compute trending topics, as identified by their #hashtag
Calculating Trendy #hashtags
Cop
yrig
ht P
rism
Tech
, 201
4
ChirpIt Architecture
analytics
data centre
chirp
emotionsstats
batch layer
chirpit apps
Cloud Messaging
serving layer
chirp
emotionsstats
Chirps chirp
chirp
emotionsstats
3rd party svcs
Speed layer
Chirps
trend …
Cop
yrig
ht P
rism
Tech
, 201
4
To fully exploit the features provided by VORTEX in our ChirpIt application there are a few more things we need to learn
Quality of Service
Cop
yrig
ht P
rism
Tech
, 201
4
VORTEX provides a rich set of QoS-Policies to control local as well as end-to-end properties of data sharing
Some QoS-Policies are matched based on a Request vs. Offered (RxO) Model
QoS Policies
HISTORY
LIFESPAN
DURABILITY
DEADLINE
LATENCY BUDGET
TRANSPORT PRIO
TIME-BASED FILTER
RESOURCE LIMITS
USER DATA
TOPIC DATA
GROUP DATA
OWENERSHIP
OWN. STRENGTH
LIVELINESS
ENTITY FACTORY
DW LIFECYCLE
DR LIFECYCLE
PRESENTATION
RELIABILITY
PARTITION
DEST. ORDER
RxO QoS Local QoS
Cop
yrig
ht P
rism
Tech
, 201
4
Data Delivery QoS Policies provide control over:
who delivers data
where data is delivered, and
how data is delivered
Data Delivery
Reliability
Presentation
Destination OrderPartition
Ownership OwnershipStrength
Data Delivery
Cop
yrig
ht P
rism
Tech
, 201
4
Data Delivery QoS Policies provide control over:
who delivers data
where data is delivered, and
how data is delivered
Data Delivery
Reliability
Presentation
Destination OrderPartition
Ownership OwnershipStrength
Data Delivery
Cop
yrig
ht P
rism
Tech
, 201
4
Data Delivery QoS Policies provide control over:
who delivers data
where data is delivered, and
how data is delivered
Data Delivery
Reliability
Presentation
Destination OrderPartition
Ownership OwnershipStrength
Data Delivery
Cop
yrig
ht P
rism
Tech
, 201
4
Data Delivery QoS Policies provide control over:
who delivers data
where data is delivered, and
how data is delivered
Data Delivery
Reliability
Presentation
Destination OrderPartition
Ownership OwnershipStrength
Data Delivery
Cop
yrig
ht P
rism
Tech
, 201
4Data Availability QoS Policies provide control over data availability with respect to:
Temporal Decoupling (late Joiners)
Temporal Validity
Data Availability
History
DurabilityLifespan Data Availability
Cop
yrig
ht P
rism
Tech
, 201
4Data Availability QoS Policies provide control over data availability with respect to:
Temporal Decoupling (late Joiners)
Temporal Validity
Data Availability
History
DurabilityLifespan Data Availability
Cop
yrig
ht P
rism
Tech
, 201
4Data Availability QoS Policies provide control over data availability with respect to:
Temporal Decoupling (late Joiners)
Temporal Validity
Data Availability
History
DurabilityLifespan Data Availability
Cop
yrig
ht P
rism
Tech
, 201
4Several policies provide control over temporal properties, specifically:
Outbound Throughput
Inbound Throughput
Latency
Temporal Properties
Throughput
TimeBasedFilter
[Inbound]
[Outbound]Latency
Deadline
TransportPriority
LatencyBudget
Cop
yrig
ht P
rism
Tech
, 201
4Several policies provide control over temporal properties, specifically:
Outbound Throughput
Inbound Throughput
Latency
Temporal Properties
Throughput
TimeBasedFilter
[Inbound]
[Outbound]Latency
Deadline
TransportPriority
LatencyBudget
Cop
yrig
ht P
rism
Tech
, 201
4Several policies provide control over temporal properties, specifically:
Outbound Throughput
Inbound Throughput
Latency
Temporal Properties
Throughput
TimeBasedFilter
[Inbound]
[Outbound]Latency
Deadline
TransportPriority
LatencyBudget
Cop
yrig
ht P
rism
Tech
, 201
4Several policies provide control over temporal properties, specifically:
Outbound Throughput
Inbound Throughput
Latency
Temporal Properties
Throughput
TimeBasedFilter
[Inbound]
[Outbound]Latency
Deadline
TransportPriority
LatencyBudget
Cop
yrig
ht P
rism
Tech
, 201
4
For data to flow from a DataWriter (DW) to one or many DataReader (DR) a few conditions have to apply:
The DR and DW domain participants have to be in the same domain
The partition expression of the DR’s Subscriber and the DW’s Publisher should match (in terms of regular expression match)
The QoS Policies offered by the DW should exceed or match those requested by the DR
QoS ModelDomain
Participant
DURABILITY
OWENERSHIP
DEADLINE
LATENCY BUDGET
LIVELINESS
RELIABILITY
DEST. ORDER
Publisher
DataWriter
PARTITION
DataReader
Subscriber
DomainParticipant
offered QoS
Topicwrites reads
Domain Idjoins joins
produces-in consumes-from
RxO QoS Policies
requested QoS
Useful QoS for ChirpIt
Cop
yrig
ht P
rism
Tech
, 201
4
The DataWriter HISTORY QoS Policy controls the amount of data that can be made available to late joining DataReaders under TRANSIENT_LOCAL Durability
The DataReader HISTORY QoS Policy controls how many samples will be kept on the reader cache
- Keep Last. DDS will keep the most recent “depth” samples of each instance of data identified by its key
- Keep All. The DDS keep all the samples of each instance of data identified by its key -- up to reaching some configurable resource limits
History QoS Policy
0 1 2 3
Pressure time
Pressure time
Pressure time
KeepLast(3)
KeepLast(1)
KeepAll
Cop
yrig
ht P
rism
Tech
, 201
4
The HISTORY QoS can be leveraged to automatically provide Chirps to late joiners
In other terms, depending on applications settings, VORTEX can be leveraged to automatically provide an application with the last n chirps produced
Notice that HISTORY can be “DURABLE” thus making it possible to completely decouple in time the availability of history
Exploiting the History Policy in ChirpIt
Cop
yrig
ht P
rism
Tech
, 201
4
The DURABILITY QoS controls the data availability w.r.t. late joiners, specifically the DDS provides the following variants:
Volatile. No need to keep data instances for late joining data readers
Transient Local. Data instance availability for late joining data reader is tied to the data writer availability
Transient. Data instance availability outlives the data writer
Persistent. Data instance availability outlives system restarts
Durability QoS Policy
Cop
yrig
ht P
rism
Tech
, 201
4Co
pyrig
ht 2013, PrismTech – A
ll Rights Reserved.
Data Writer
Data Reader
TopicAQoS
• No Time Decoupling
• Readers get only data produced after they joined the Global Data Space
Volatile Durability
1
Cop
yrig
ht P
rism
Tech
, 201
4Co
pyrig
ht 2013, PrismTech – A
ll Rights Reserved.
• No Time Decoupling
• Readers get only data produced after they joined the Global Data Space
Volatile Durability
Data Writer
Data Reader
TopicAQoS
Data Reader
1
Late Joiner
Cop
yrig
ht P
rism
Tech
, 201
4Co
pyrig
ht 2013, PrismTech – A
ll Rights Reserved.
• No Time Decoupling
• Readers get only data produced after they joined the Global Data Space
Volatile Durability
Data Writer
Data Reader
TopicAQoS
Data Reader
1
Late Joiner
22
Cop
yrig
ht P
rism
Tech
, 201
4Co
pyrig
ht 2013, PrismTech – A
ll Rights Reserved.
• Some Time Decoupling
• Data availability is tied to the availability of the data writer and the history settings
Data Writer
Data Reader
TopicAQoS
Transient Local Durability
11
Cop
yrig
ht P
rism
Tech
, 201
4Co
pyrig
ht 2013, PrismTech – A
ll Rights Reserved.
• Some Time Decoupling
• Data availability is tied to the availability of the data writer and the history settings
Transient Local Durability
Data Writer
Data Reader
TopicAQoS
Data Reader
1
Late Joiner
11
Cop
yrig
ht P
rism
Tech
, 201
4Co
pyrig
ht 2013, PrismTech – A
ll Rights Reserved.
Data Writer
Data Reader
TopicAQoS
Data Reader
2
• Some Time Decoupling
• Data availability is tied to the availability of the data writer and the history settings
Transient-Local Durability
1
2
1
12
Cop
yrig
ht P
rism
Tech
, 201
4Co
pyrig
ht 2013, PrismTech – A
ll Rights Reserved.
Data Writer
Data Reader
TopicAQoS
Transient Durability
1
1
• Time Decoupling
• Data availability is tied to the availability of the durability service
Cop
yrig
ht P
rism
Tech
, 201
4Co
pyrig
ht 2013, PrismTech – A
ll Rights Reserved.
Data Writer
Data Reader
TopicAQoS
Data Reader
1
Transient Durability
1
Late Joiner
1
• Time Decoupling
• Data availability is tied to the availability of the durability service
Cop
yrig
ht P
rism
Tech
, 201
4Co
pyrig
ht 2013, PrismTech – A
ll Rights Reserved.
Data Writer
Data Reader
TopicAQoS
Data Reader
2
Transient Durability
1
2
1
1
2
• Time Decoupling
• Data availability is tied to the availability of the durability service
Cop
yrig
ht P
rism
Tech
, 201
4Co
pyrig
ht 2013, PrismTech – A
ll Rights Reserved.
Data Reader
TopicAQoS
Data Reader
Transient Durability
1
1
2
2
12
Cop
yrig
ht P
rism
Tech
, 201
4Co
pyrig
ht 2013, PrismTech – A
ll Rights Reserved.
Transient Durability
1
Late Joiner
1
Data Reader
TopicAQoS
Data Reader
Data Reader
2
2
12 12
• Time Decoupling
• Data availability is tied to the availability of the durability service
Cop
yrig
ht P
rism
Tech
, 201
4
Beside the service specific configuration — that we won’t discuss here — it is important to understand that the amount of data that the durability service will maintain for a given topic is configured using the DurabilityService Policy
The DurabilityService Policy, defined for topics, can be used to store:
- The last n samples for each topic instance
- All samples ever produced for a given Topic (across all its instances)
Resource constraints can also be specified to limit the maximum amount of data taken by a topic
NOTE: beware that when you dispose an instance its data is removed from the Durability Service
Durability Service Configuration
Cop
yrig
ht P
rism
Tech
, 201
4
Data can be retrieved from the Durability Service in two ways
Automatic Retrieval: Non VOLATILES DataReaders automatically receive a set of historical data. How much data is received depends on the DR HISTORY setting and the Durability Service (or DW for TRANSIENT_LOCAL) historical settings
Query-Based Retrieval: Any applications can create a “special” data reader to query the Durability Service. Query can predicate on content, as well as time
- Get all Chirps made by @wolverine in the last 2 days:
• dr.getHistoricalData(“uid = ‘@wolverine’”, now() - 2 days, now())
- Get all Chirps containing the #hashtag “#xmen” in the last day:
• dr.getHistoricalData(“msg like ‘*#xmen*’”, now() - 1 day, now())
Getting Data From the Durability Service
Cop
yrig
ht P
rism
Tech
, 201
4
VORTEX’s Durability can be leveraged to address several different use cases in ChirpIt
Batch Layer: VORTEX durability can be used to persist all the chirps ever received by our application. Scalability can be easily achieved by partitioning (more later)
Speed Layer: Views on the data-set can be efficiently created using the Durability Service Query API
Historical Data: Any analytics application as well as end-user application can access historical data through either the Automatic or Query-based delivery
Exploiting VORTEX Durability in ChirpIt
Cop
yrig
ht P
rism
Tech
, 201
4
ChirpIt Architecture
analytics
data centre
chirp
emotionsstats
batch layer
chirpit apps
Cloud Messaging
serving layer
chirp
emotionsstats
Chirps chirp
chirp
emotionsstats
3rd party svcs
Speed layer
Chirps
trend …
Durability Service
Historical Data
Cop
yrig
ht P
rism
Tech
, 201
4
Beside the Durability Service, you may implement the batch layer via:
VORTEX Record and Reply (RnR) Service
3rd Party Big Data Store
Depending on the specific system requirements one solution may be more appropriate than another. Notice however, that the back-ends for both VORTEX Durability and RnR are pluggable, thus a big-data store back-end could be easily plugged
Batch Layer Observations
Cop
yrig
ht P
rism
Tech
, 201
4
It is trivial to push Chirps into a big data store such as HBase by using VORTEX Gateway
Just define the following route:
Big-Data Store Integration
val chirpURI = "ddsi:ChirpAction:0/com.chirpIt.ChirpAction” val hbaseURI = "hbase:chirpit?mappingStrategyName=body&operation=CamelHBasePut"
// ...
// Put incoming chirps into an HBase TablechirpURI unmarshal(cdrData) process { e2d (_, "chirp") } to(hbaseURI)
Computing Trending #hastags
Cop
yrig
ht P
rism
Tech
, 201
4A good approach to deal with failures is to ensure that applications are stateless
The application state is maintained externally (in our case on VORTEX Durability)
Be Stateless
Cop
yrig
ht P
rism
Tech
, 201
4
ChirpIt #hashtag ranking function will be stateless
The latest rankings will be maintained by the VORTEX durability service. This allows to restore the state after a failure as well as easily do aggregations (more later)
#hashtag ranking function
#hashtag ranking function
Ranking
Chirps
Ranking
latest ranking
last ranking
real-‐time data
As such, the ranking function will consume the latest ranking along with live chirps and produce the new ranking for the basic interval (say the shortest interval that will define our granularity and from which aggregation will be created)
Cop
yrig
ht P
rism
Tech
, 201
4
Trendy #hashtags
#hashtag ranking function
Ranking
chirp
emotionsstats
batch layer
chirpit apps
Cloud Messaging
serving layer
Chirps chirp
chirp
emotionsstats
3rd party svcs
trend …
Durability Service
Chirps
Ranking
latest ranking
last ranking
real-‐time data
Cop
yrig
ht P
rism
Tech
, 201
4Wait a moment… Do we have a single instance of the ranking function?
How do we scale out?
How can we partition the load?
Scaling out the #hashtag ranking
Cop
yrig
ht P
rism
Tech
, 201
4
It would be a good idea to add more structure to our partition to include some geographical information
We take advantage of information concerning the continent, country, region, and city of the registrant
Divide et Impera
@drx
ChirpAction
ChirpAction
@wolverine
chirp:wolverine
ChirpAction
@magneto
chirp:magneto
Cop
yrig
ht P
rism
Tech
, 201
4
By encoding origins in the partitions associated with users we can easily and efficiently do regional aggregation
Thus all the chirps in EU would be in EU*:chirp:*, while all chirps in Siracusa would be EU:IT:Siracusa:chirp:*
Divide et Impera
@joe
ChirpAction
ChirpAction
@william
EU:UK:London:chirp:william
ChirpAction
@archimede
NA:CA:Pasadina:chirp:joeEU:IT:Siracusa:chirp:archimede
@antonio
ChirpAction
SA:AR:Rosario:chirp:antonio
Cop
yrig
ht P
rism
Tech
, 201
4
With the new partition organisation it is now trivial to scale out the ranking function
In addition, the application could easily support on-line reconfiguration since we may want to consolidate or further distribute the load as the system goes
Scaling out the #hashtag ranking
Cop
yrig
ht P
rism
Tech
, 201
4
EU:*:chirpEU:*:stats
NA:*:chirpNA:*:stats
SA:*:chirpSA:*:stats
…
#hashtag ranking
aggregationRanking
Ranking
new aggregated ranking
last ranking
*:stats
global:stats
Ranking
latest aggregated ranking
global:stats
Cop
yrig
ht P
rism
Tech
, 201
4
The processing pipeline can be easily reconfigured at runtime by simply changing the partitions expressions on which every process operates
Every single stage in the processing pipeline is stateless, its state is maintained on the Durability Service
Hierarchical aggregation can be introduced easily too
Some Observations
Cop
yrig
ht P
rism
Tech
, 201
4
Notice that the Ranking topic is keyless
The durability service will be configured with KeepAll
Ranking Topic
struct HashtagScore { string hashtag; float score; }; typedef sequence <HashtagScore> HashtagScores; struct HashtagRanking { long startTS; long endTS; HashtagScores htscores; };#pragma keylist HashtagRanking
Cop
yrig
ht P
rism
Tech
, 201
4
In this presentation we have seen how the vortex platform can be used to implement not only data sharing and distribution but also analytics
We have seen how VORTEX provide solution for both the batch as well as the speed layer thus making it quite easy to implement lambda architectures
Summary
Cop
yrig
ht P
rism
Tech
, 201
4
Online Resources
top related