taming big data streams · 2013. 10. 1. · big data streams . hideyuki kawashima . center for...
TRANSCRIPT
![Page 1: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/1.jpg)
Taming Big Data Streams
Hideyuki KAWASHIMA Center for Computational Sciences
University of Tsukuba, Japan
![Page 2: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/2.jpg)
STORM
Norikra
Jubatus
Relational-stream
XML-stream
S4
Puma
System S
MillWheel
Complex event processing
Machine learning
Incremental computation
Continual query
Spring (DTW)
CPD (Change
Point Detection)
Window-aggregate
Window-join
FPGA GPU
SASE
Esper
Handshake-join Incr. LOCI
Online LDA
Window
Tuple-stream
A Variety of Data Processing Techniques
Cayuga
Privacy Preservation CryptDB
Data mining
Kafka MLBase
2
![Page 3: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/3.jpg)
Privacy Preservation CryptDB GPGPU Intel MIC FPGA Tilera Encryption
Privacy Accelerator
ML&DM SQL NoSQL
Relational stream
Norikra
Online Data Mining &
Machine Learning
Esper Puma
Complex event
processing
Jubatus
Borealis System S
Window aggregate
SASE
Cayuga
Continual query & Window
S4
Window join
CPD
Online LDA
Tuple stream
Kafka
STORM
MLBase
Incr. LOCI
Spring (DTW)
IBM Facebook
Line
NTT & PFI UCB
![Page 4: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/4.jpg)
Which Analytics Style ?
SQL style NoSQL style
Embedded operators Filter, join, aggregation
User defined operators Python, java, C++, …
Machine learning Data mining
Machine learning Data mining
4
Poor operators High performance
Rich operators Low performance
In-DB style
Embedded operators
Filter, join, aggregation Machine learning
Data mining
Rich operators High performance
Oracle-R MADLib
![Page 5: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/5.jpg)
Relational stream
Norikra
Online Data Mining &
Machine Learning
Esper Puma
Complex event
processing
Jubatus
Borealis System S
Window aggregate
SASE
Cayuga
Continual query & Window
S4
Window join
CPD
Online LDA
Tuple stream
Privacy Preservation CryptDB GPGPU Intel MIC FPGA Tilera Encryption
Privacy Accelerator
ML&DM SQL NoSQL
Kafka
STORM
MLBase
Incr. LOCI
Spring (DTW)
Falcon
5
MADLib @UCB
Bismarck @Stanford
Oracle-R
![Page 6: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/6.jpg)
3 Research Topics
• Falcon – In-DSMS analytics system – Multiple query optimization for CPD
• Window Operators – Join operator over data streams
• Crypt stream – Privacy preserving stream data processing
6
![Page 7: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/7.jpg)
A Multiple Query Optimization Scheme for Change Point Detection on Falcon
Joint work with Masahiro OKE
Presented at BIRTE’13
![Page 8: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/8.jpg)
SELECT COUNT(*) FROM eth0[TIME 1 MIN] WHERE port = 80
DSMS
Relation eth0
・Destination IP ・Source IP ・Destination Port ・Source Port ・Interface (e.g. eth0) ・Length ・Version (e.g. IPV4 ) ・Payload
Relational schema
20
Quick Review Data Stream Management System (DSMS)
Q1
8
How many packets are arrived for port 80
in a minute ?
![Page 9: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/9.jpg)
• SQL is translated to operator tree. • On arrival of data, tree is evaluated. • Operators are based on relational database
– w(Window): Cutting off relations from a stream – σ (Selection): Filter – α (Aggregation): such as AVG, MIN, MAX
Query
Result Users/Apps.
w σ α Input adapter
Output adapter
DSMS Data
SELECT COUNT(*) FROM eth0[TIME 1 MIN] WHERE port = 80
9
![Page 10: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/10.jpg)
A Target Application: Malware Detection
• Real datasets – “Anti Malware engineering WorkShop 2013 (MWS
2013)” – Extracted by NEGI proposed by Dr. Shinichi Isida.
• NICTER – Keeps about 160,000 unused ip addresses (DARK NET)
• Packets to dark net are considered as attacks. – Uses CPD (Change Point Detection) [1]) to detect
attacks such as DoS (denial of services).
[1] Daisuke Inoue, K. Yoshioka, M. Eto, Masaya Yamagata, Eisuke Nishino, Jun-ichi Takeuchi, Kazuya Ohkouchi, Koji Nakao: An Incident Analysis System NICTER and Its Analysis Engines Based on Data Mining Techniques. ICONIP (1) 2008: 579-586 [2] J. Takeuchi and K. Yamanishi, “A Unifying Framework for Detecting Outliers and Change Points from Time Series,” IEEE TKDE, pp.482-492, 2006.
10
![Page 11: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/11.jpg)
Relational data processing
Attack Detection
Discussion
? • Aggregates are good
CPD(AR)/ LOF / LDA/FIM Yet Another DSMS: Falcon 11
![Page 12: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/12.jpg)
Example Query on Falcon (1/2)
• #Access for each port ? [1] • Group by aggregates
SELECT dst_port, COUNT(dst_port) FROM pkt[1 sec] GROUP BY dst_port
g-pkt
src_ip
dst_ip
src_port
dst_port
seq_no
packet_size
timestamp
protocol
ack
fin
syn
urg
push
reset
content
22: 2 80: 2 15: 1
22
N I C
80 15 80 22
1 second
[1] “Enabling Real Time Data Analysis”, Divesh Srivastava (AT&T Labs), et, al. Keynote talk, VLDB 2010. (a similar query is found in pp.15 of talk slide)
12
![Page 13: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/13.jpg)
Example Query on Falcon (2/2)
• Access on each port ? [2] • Outlier score for each port/sec
select dst_port, cpd(dst_port) from pkt[1 sec] group by dst_port
g-cpd-pkt
src_ip
dst_ip
src_port
dst_port
seq_no
packet_size
timestamp
protocol
ack
fin
syn
urg
push
reset
content
22: 1.33 80: 2.44
15: 1.22
22
N I C
80 15 80 22
1 second
[2] “An Incident Analysis System NICTER and Its Analysis Engines Based on Data Mining Techniques”, Daisuke Inoue (NICT), et, al. ICONIP (1) 2008: 579-586 13
![Page 14: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/14.jpg)
Dividing CPD into 4 operators
Compute outiler score and Moving average score
(omitting shwoing outlier score)
1st stage learning
Compute outiler score and Moving average score
Input tx
2nd stage learning
Outlier score Moving average score
Probability provided by 2nd stage learning
Compute outiler score and Moving average ascore
Input time series data
Probability provided by 1st stage learning
14
![Page 15: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/15.jpg)
Problem of CPD: Parameter setting
• CPD requires 6 parameters (𝛼𝑅, 𝛼𝐾 , 𝛼𝑇, 𝛽𝑅, 𝛽𝐾 , 𝛽𝑇) • Appropriate parameter setting is necessary … but it is difficult
– Blue: # accesses, Red: CPD score
Using appropriate parameter set Using inappropriate parameter set 15
![Page 16: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/16.jpg)
Parameter set
2
A simple way for parameter tuning: ---Multiple CPDs with different parameter sets---
Input packet
Compute outiler score
1st stage learning
Compute outiler score
2nd stage learning
Compute outiler score
1st stage learning
Compute outiler score
2nd stage learning
Result aggregation (e.g. majority voting)
Parameter set
3
Parameter set
4
Parameter set
0 k
Issue: How to accelerate multiple CPD executions ? Approach: Multiple query optimization
16
![Page 17: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/17.jpg)
The 4 sharing patterns -- Only branch cases, not merge --
Compute outiler score
1st stage learning
Compute outiler score
2nd stage learning
Compute outiler score
Compute outiler score
2nd stage learning
Compute outiler score
Compute outiler score
2nd stage learning
Compute outiler score
1st stage learning
2nd stage learning
Compute outiler score
Compute outiler score
2nd stage learning
Compute outiler score
2nd stage learning
Compute outiler score
1st stage learning
Compute outiler score
Compute outiler score
2nd stage learning
Compute outiler score
1st stage learning
Compute outiler score
Compute outiler score
2nd stage learning
NOTE: “1st stage learning” and “3rd stage learning” can be divided to sub operators, and a part of sub operators can also be shared. The sharing patterns are described in the paper.
Pattern 1: Sharing CPD-1 if α_R and α_K are the same. Pattern 2: Sharing CPD-1, 2 if α_R, α_K and α_T are the same. Pattern 3: Sharing CPD-1, 2, 3 if α_R, α_K, α_T, β_R and β_K are the same. Pattern 4: Sharing CPD-1, 2, 3, 4 if α_R, α_K, α_T, β_R, β_K and β_T are the same.
Pattern 1 Pattern 2 Pattern 3 Pattern 4
17
![Page 18: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/18.jpg)
Experiment
Measuring execution time when sharing ONLY 1st stage learning – Implement CPD by C++ and eigen library (for
matrix manipulation). – Measured execution time using the CPD.
18
![Page 19: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/19.jpg)
Exec. Time with Sharing 1st Stage Learning
ID
Parameters Execution Time (second)
Performance Gain (times)
𝛼𝑅 𝛼𝐾 𝛼𝑇 𝛽𝑅 𝛽𝐾 𝛽𝑇 Naive Shared CPD-1
Shared CPD-1
1 .02 2 5 .02 3 5 2.92 1.77 1.65 2 .02 4 5 .02 3 5 3.65 1.77 2.06 3 .02 2 5 .02 4 5 3.29 2.17 1.52 4 .005 2 5 .02 4 5 2.91 1.76 1.65 5 .02 2 5 .005 3 5 2.89 1.77 1.64 6 .02 2 7 .02 7 5 3.00 1.87 1.60 7 .02 1 5 .02 1 5 1.96 1.11 1.78 8 .02 10 10 .02 10 10 11.2 5.84 1.92 9 .02 1 10 .02 10 10 6.66 5.80 1.15
10 .02 10 10 .02 1 10 6.68 1.34 5.00
19 Parallel execution with accelerator is required.
![Page 20: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/20.jpg)
3 Research Topics
• Falcon – In-DSMS analytics system
• Window Operators acclerated by FPGA – Join operator over data streams
• Crypt stream – Privacy preserving stream data processing
20
![Page 21: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/21.jpg)
A Novel Architecture of Merging Network
for Handshake Join
Joint work with Yasin OGE, Takefumi MIYOSHI, Tsutomu YOSHINAGA
Presented at
ICNC’11, FPL’11, MCSoC’12, SSDBM’13.
![Page 22: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/22.jpg)
Falcon
- UDP-RX - Window join (64-cores)
Performance Monitor
22
FPGA instead of Data Center Demo System at SSDBM’13
![Page 23: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/23.jpg)
Example|Simple Continuous Query
S1.key = S2.key S1 [Rows 100], S2 [Rows 100] * SELECT
FROM WHERE
23
key value
key value
tuple
![Page 24: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/24.jpg)
S1.key = S2.key WHERE
Window Operators * SELECT
]] S1 [Rows 100], S2 [Rows 100] FROM
24
key value
key value
window
w
w
![Page 25: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/25.jpg)
S1 [Rows 100], S2 [Rows 100] FROM
Join Operator * SELECT
S1.key = S2.key WHERE
25
w
w
key value
key value
join
![Page 26: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/26.jpg)
* SELECT S1 [Rows 100], S2 [Rows 100] FROM
S1.key = S2.key WHERE
Overall Query Plan
26
w
w
key value
key value
![Page 27: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/27.jpg)
HANDSHAKE JOIN J. Teubner and R. Müller, SIGMOD’11
![Page 28: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/28.jpg)
Handshake Join
basic idea
advantage
streams flow in opposite direction
highly parallel evaluation
input stream 1
input stream 2 window
28
![Page 29: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/29.jpg)
Handshake Join
29
window of stream 2
old tuple
new tuple
new tuple
old tuple
window of stream 1
![Page 30: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/30.jpg)
Handshake Join|Parallelization
30
Processor1 Processor2
divided into two sub-windows
![Page 31: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/31.jpg)
Handshake Join|Parallelization
31
Processor1 Processor2 Processor3
divided into three sub-windows
![Page 32: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/32.jpg)
Naïve IMPLEMENTATION
![Page 33: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/33.jpg)
Baseline Implementation
Join Core Join Core Join Core Join Core
33
dedicated processor for join operation
![Page 34: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/34.jpg)
Baseline Implementation
Join Core Join Core Join Core Join Core
buffer
34
results are stored in these buffers
![Page 35: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/35.jpg)
Baseline Implementation
Merger
Merger Merger
Join Core Join Core Join Core Join Core
35
buffer
Mer
ging
Net
wor
k
![Page 36: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/36.jpg)
Output Data Flow
Merger
Merger Merger
Join Core Join Core Join Core Join Core
36
3
2
1
![Page 37: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/37.jpg)
Merger
Merger Merger
Join Core Join Core Join Core
Issue|Inefficient Buffer Utilization
37
Join Core
3
2
1
![Page 38: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/38.jpg)
Merger
Merger Merger
Join Core Join Core Join Core
Issue|Inefficient Buffer Utilization
38
Join Core
![Page 39: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/39.jpg)
Merger
Merger Merger
Join Core Join Core Join Core
Issue|Inefficient Buffer Utilization
39
Join Core
![Page 40: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/40.jpg)
Merger
Merger Merger
Join Core Join Core Join Core
Issue|Inefficient Buffer Utilization
40
Join Core
in case of 64 cores
only 5% of usage
![Page 41: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/41.jpg)
PROPOSED IMPLEMENTATION
Adaptive Merging Network
![Page 42: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/42.jpg)
Proposed Implementation
Join Core Join Core Join Core Join Core 42
![Page 43: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/43.jpg)
Proposed Implementation
Join Core Join Core Join Core Join Core 43
Here, no buffering is required! NO
buffer
![Page 44: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/44.jpg)
Proposed Implementation
Ring Node
Ring Node
Ring Node
Ring Node
Join Core Join Core Join Core Join Core 44
![Page 45: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/45.jpg)
Proposed Implementation
Ring Node
Ring Node
Ring Node
Ring Node
Join Core Join Core Join Core Join Core 45
buffer
Now, results are stored in these buffers!
![Page 46: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/46.jpg)
Proposed Implementation
Merger
Merger Merger
Ring Node
Ring Node
Ring Node
Ring Node
Join Core Join Core Join Core Join Core 46
NO buffer
![Page 47: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/47.jpg)
Output Data Flow
47
Merger
Merger Merger
Ring Node
Ring Node
Ring Node
Ring Node
Join Core Join Core Join Core Join Core
1
![Page 48: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/48.jpg)
48
Merger
Merger Merger
Ring Node
Ring Node
Ring Node
Ring Node
Join Core Join Core Join Core Join Core
2
![Page 49: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/49.jpg)
Merger
Merger Merger
Ring Node
Ring Node
Ring Node
Ring Node
Join Core Join Core Join Core Join Core
49
2 1 3 4
![Page 50: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/50.jpg)
Merger
Merger Merger
Ring Node
Ring Node
Ring Node
Ring Node
Join Core Join Core Join Core Join Core
50
2 1 3 4 up to 100% utilization
![Page 51: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/51.jpg)
Performance|Proposed (64 join cores)
0
0.5
1
1.5
2
2.5
3
3.5
10 20 30 40 50 60 70 80 90 100スループット
[10
0万タプル
/秒]
Match rate [%]
Thro
ughp
ut [
1M tu
ples
/sec]
51
nested loop baseline proposed
![Page 52: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/52.jpg)
Falcon
- UDP-RX - Window join (64-cores)
Performance Monitor
52
Basic: 6.7 millions of tuples per second Proposal: 14.6 millions of tuples per second
![Page 53: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/53.jpg)
Wire-Speed Implementation of Sliding-Window Aggregate Operator
over Out-of-Order Data Streams
Int’l Symposium on Embedded Multicore/Many-core SoCs Sept. 26 – 28, 2013, Tokyo
Yasin Oge∗, Masato Yoshimi ∗, Takefumi Miyoshi†, Hideyuki Kawashima‡,
Hidetsugu Irie ∗, and Tsutomu Yoshinaga∗ ∗Univ. of Electro-Communications
†e-trees.Japan, Inc. ‡Univ. of Tsukuba
![Page 54: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/54.jpg)
3 Research Topics
• Falcon – In-DSMS analytics system
• Window Operators – Join operator over data streams
• CryptStream – Privacy preserving stream data processing
54
![Page 55: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/55.jpg)
A Security aware Stream Data Processing Scheme on the Cloud
Joint work with Katsuhiro TOMIYAMA
Introduced at CloudDB’11
![Page 56: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/56.jpg)
Related work:CryptDB • CryptDB [R. A. Popa, et al. SOSP’11]
– Realizes relational operators over encrypted data on traditional relational RDBMS #Our research goal is to achieve it on an SPE
– Encrypts each value of the data stored in the DBMS • Uses more than one type of encryption having different characteristics
⇒Three kinds of cipher are generated for each value of one plain value
Trusted area Untrusted area
val
80
val-DET val-OPE val-HOM
DET(80) OPE(80) HOM(80)
Encrypted value
Plain value Database
proxy
UDF DECRYPT_RND DECRYPT_DET …
Client
DBMS
56
![Page 57: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/57.jpg)
Ciphers of CryptDB/CryptStream • DET (Deterministic)
– Be able to check the equality of two encrypted values. – 𝑥 = 𝑦 ⇔ 𝐷𝐷𝐷𝐾 𝑥 = 𝐷𝐷𝐷𝐾 𝑦
• OPE (Order-preserving) – Be able to check the inequality of two encrypted values. – 𝑥 < 𝑦 ⇔ 𝑂𝑂𝐷𝐾 𝑥 < 𝑂𝑂𝐷𝐾 𝑦
• HOM (Homomorphic) – Be able to execute addition operation over two
encrypted values. – 𝐻𝑂𝐻𝐾 𝑥 ∙ 𝐻𝑂𝐻𝐾 𝑦 = 𝐻𝑂𝐻𝐾(𝑥 + 𝑦)
57
![Page 58: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/58.jpg)
Scheme of CryptStream
Encryption module
Encryption module
Encryption module
Trusted area Trusted area Trusted area
Public cloud (Untrusted area)
Trusted area
Decryption m
odule
Result Result
Stream processing engine
2012/10/29
Client
![Page 59: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/59.jpg)
• The side effect of encryption:Increase Tuple Size – Generating three kinds of cipher
• Already Implemented onto Falcon • Performance improvement is left on future work
– Partially resolved by our work at CloudDB’11.
id temp 1 32
id-DET id-OPE id-HOM
DET(1) OPE(1) HOM(1)
temp-DET temp-OPE temp-HOM
DET(32) OPE(32) HOM(32)
Encrypted stream data processing scheme
59
![Page 60: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/60.jpg)
3 Research Topics
• Falcon – In-DSMS analytics system
• Window Operators – Join operator over data streams – Aggregate operator over data streams
• Crypt stream – Privacy preserving stream data processing
60 These 3 researches are in progress...
![Page 61: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/61.jpg)
Summary
![Page 62: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/62.jpg)
Relational stream
Norikra
Online Data Mining &
Machine Learning
Esper Puma
Complex event
processing
Jubatus
MillWheel System S
Window aggregate
SASE
Cayuga
Continual query & Window
S4
Window join
CPD
Online LDA
Tuple stream
Privacy Preservation CryptDB GPGPU Intel MIC FPGA Tilera Encryption
Privacy Accelerator
ML&DM SQL NoSQL
Kafka
STORM
MLBase
Incr. LOCI
Spring (DTW)
62
![Page 63: Taming Big Data Streams · 2013. 10. 1. · Big Data Streams . Hideyuki KAWASHIMA . Center for Computational Sciences . University of Tsukuba, Japan . STORM Norikra . Jubatus . Relational-stream](https://reader033.vdocuments.us/reader033/viewer/2022051822/5fec4ff13558df7c493beb0e/html5/thumbnails/63.jpg)
Frontiers for Accelerators
SQL
Types of relational operator are limited.
New techs are created everyday ! 63
NoSQL, Privacy, DM&ML
Blue ocean, many chances ! Red ocean, but big return !
An FPGA memcached appliance S. Chalamalasetti, et al, FPGA'13 Deep Learning with COTS HPC A. Coates, et al, ICML’13
Netezza Exadata