e6895 advanced big data analytics lecture 5: massive data …cylin/course/bigdata/eecs... · 2021....
TRANSCRIPT
![Page 1: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/1.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analysis1
E6895 Advanced Big Data Analytics Lecture 5:
Massive Data Processing
Ching-Yung Lin, Ph.D.
Adjunct Professor, Dept. of Electrical Engineering and Computer Science
February 12th, 2021
![Page 2: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/2.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics2
Massive Stream Analysis Challenges
![Page 3: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/3.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics3
Example IP Packet Stream Instantiation
ip http
ntp
udp
tcp ftp
rtp
rtsp
video
audio
Inputs Dataflow Graph
By IBM Dense Information Gliding Team
![Page 4: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/4.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics4
Semantic MM Filtering
200-500MB/s ~100MB/sper PE rates
10 MB/s
Inputs Dataflow Graph
ip http
ntp
udp
tcp ftp
rtp
rtsp
sessvideo
sessaudio Interest Routing
keywords id
Packet content analysis
Advanced content analysis
Interest Filtering
Interested MM streams
![Page 5: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/5.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics5
Resource-Accuracy Trade-Offs
Configurable Parameters of Processing Elements to maximize relevant information: Y’’(X | q, R) > Y’(X | q, R),
with resource constraint. Required resource-efficient algorithms for:
Classification, routing and filtering of signal-oriented data: (audio, video and, possibly, sensor data)
X
R
Y(X|q)Y’’(X|q,R)X’
▪ Input data X – Queries q – Resource R – Y(X | q): Relevant information – Y’(X | q, R) ` Y(X | q): Achievable subset given R
![Page 6: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/6.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics6
Example: Distributed Video Signal Understanding (Lin et al.)
Face
Outdoors
Indoors
PE1: 9.2.63.66: 1220
PE2: 9.2.63.67
PE3: 9.2.63.68
Female
Male
Airplane
Chair
Clock
PE4: 9.2.63.66:1235
PE5: 9.2.63.66: 1240
PE6: 9.2.63.66
PE7: 9.2.63.66
PE100: 9.2.63.66
(Server) Concept Detection Processing Elements
CDS Features
(Distributed Smart Sensors) Block diagram of the smart sensors
Meta- data
600 bps
Control Modules
Resource Constraints
User Interests
Display and Information Aggregation
Modules
1.5 Mbps
MPEG- 1/2 GOP
ExtractionEvent
Extraction 2.8 Kbps
Feature Extraction320
Kbps22.4 Kbps
Encod- ing
Sensor 1Sensor 2
Sensor NSensor 3
TV broadcast, VCR, DVD discs, Video File Database, Webcam
Smart Cam
![Page 7: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/7.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics7
Semantic Concept Filters
E.g.:
![Page 8: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/8.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics8
Complexity Reduction Introduction
• Objective: Real-time classification of instances using Support Vector Machines (SVMs)
• Computationally efficient and reasonably accurate solutions • Techniques capable of adjusting tradeoff between accuracy and speed based on
available computational resources
![Page 9: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/9.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics9
SVM formulation
SVM
![Page 10: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/10.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics10
Decision
![Page 11: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/11.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics11
Problems
![Page 12: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/12.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics12
Example
![Page 13: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/13.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics13
Naïve Approach I – Feature Dimension ReductionA
ccur
acy
-- A
vera
ge P
reci
sion
0
0.2
0.4
0.6
0.8
Complexity -- Feature Dimension Ratio
0 0.25 0.5 0.75 1
Slice Color TextureFeature Dimension
Ratio AP
3 3 3 1 0.7861
3 3 2 0.666666667 0.7861
3 2 3 0.666666667 0.7757
2 3 3 0.666666667 0.5822
3 2 2 0.444444444 0.7757
2 3 2 0.444444444 0.5822
2 2 3 0.444444444 0.5235
3 3 1 0.333333333 0.4685
3 1 3 0.333333333 0.6581
1 3 3 0.333333333 0.1684
2 2 2 0.296296296 0.5235
3 2 1 0.222222222 0.427
3 1 2 0.222222222 0.6581
2 3 1 0.222222222 0.1241
2 1 3 0.222222222 0.3457
1 3 2 0.222222222 0.1684
1 2 3 0.222222222 0.1065
2 2 1 0.148148148 0.0699
2 1 2 0.148148148 0.3457
1 2 2 0.148148148 0.1065
3 1 1 0.111111111 0.3219
1 3 1 0.111111111 0.0314
1 1 3 0.111111111 0.07
2 1 1 0.074074074 0.0318
1 2 1 0.074074074 0.0173
▪ Experimental Results for Weather_News Detector
▪ Model Selection based on the Model Validation Set
▪ E.g., for Feature Dimension Ratio 0.22, (the best selection of features are: 3 slices, 1 color, 2 texture selections), the accuracy is decreased by 17%.
![Page 14: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/14.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics14
Naïve Approach II – Reduction on the Number of Support
Acc
urac
y - A
vera
ge P
reci
sion
0
0.225
0.45
0.675
0.9
Complexity -- Number of Support Vectors
0 0.25 0.5 0.75 1
▪ Proposed Novel Reduction Methods: – Ranked Weighting – P/N Cost Reduction – Random Selection – Support Vector Clustering and Centralization
▪ Experimental Results on Weather_News Detectors show that complexity can be at 50% for the cost of 14% decrease on accuracy
![Page 15: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/15.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics15
Weighted Clustering Approach
![Page 16: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/16.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics16
Cluster center weight (contd.)
![Page 17: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/17.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics17
Using the weights
![Page 18: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/18.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics18
Experiments
• Datasets • TREC video datasets (2003 and
2005) • 576 features per instance • > 20000 test instances
overall • MNist handwritten digit dataset
(RBF kernel) • 576 features • 60000 training instances,
10000 test instances
• Performance metrics • Speedup achieved over
evaluation with all support vectors
• Average precision achieved
![Page 19: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/19.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics19
Results (Mnist 0-4)Av
erag
e Pr
ecis
ion
0.95
0.9625
0.975
0.9875
1
Speedup Ratio
0 75 150 225 300
01234
![Page 20: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/20.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics20
Results (Mnist 5-9)Av
erag
e Pr
ecis
ion
0.7
0.775
0.85
0.925
1
Speedup Ratio
1 10 100 1000
56789
![Page 21: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/21.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics21
Results (TREC 2003)
Aver
age
Prec
isio
n
0
0.225
0.45
0.675
0.9
ConceptHuman Outdoors Sport-Event Crowd People-Event
AP_fastAP_original
Spee
dup
1
10
100
1000
10000
Concept
Human Studio-Setting Crowd
![Page 22: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/22.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics22
Summary of Complexity Reduction
![Page 23: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/23.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics23
Acceleration of Neural Networks
![Page 24: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/24.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics24
Neuron Importance Score Propagation (NISP, Yu et al 2018)
![Page 25: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/25.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics25
Methods for Running CNNs on Mobile Devices
Compression (pruning) of CNN
Speeding up CNN +
Sending CNN jobs to cloud
![Page 26: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/26.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics26
Thinking Differently
• All existing methods can be viewed as approximations of an overly-redundant CNN, but do we really need such a CNN as the starting point?
× × × × × × CNN Sliming!
![Page 27: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/27.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics27
Slim CNN
• Slim CNN leads to:
• less storage space
• less memory usage
• less computation
• less power consumption
![Page 28: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/28.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics28
Feature Selection on CNN
• CNNs can be viewed as a set of "overly-redundant" feature extractors
features
![Page 29: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/29.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics29
A method for Pruning Redundant Neurons and Kernels of
Apply thermal
ExtractCNNResponses
MeasuretheImportanceof
FeatureExtractors
PruneModel Fine-tuningA pre-trained CNN
![Page 30: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/30.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics30
A method for Pruning Redundant Neurons and Kernels of Deep Convolutional Neural Networks (NISP)
• Intractable
• Inconsistent
ExtractResponsesofaHigh-level
Layer
MeasuretheImportanceof
FeatureExtractors
Back-propagatetheImportance&PruneModel
Fine-tuningA pre-trained CNN
FC layers
…Inputlayers
Response
Forward Propagation
Important Score Back Propagation and Pruning
……ResponseResponse
tractable
consistent
![Page 31: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/31.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics31
Fine-tuning the Pruned Model
• Our method outperforms the baselines in three aspects
• Very small accuracy loss at the beginning ==> retains the most important neurons
• Converges much faster than baselines
• For LeNet on MIST, our method only decreases 0.02% top-1 accuracy with a running ratio of 50% as compared to the pre-pruned network.
![Page 32: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/32.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics32
Fine-tuning the Pruned Model
• The pruned model consists of important feature extractors, but will suffer loss of accuracy due to loss of redundant features
• Good starting point on the learning curve due to feature selection
• Fine-tuning the pruned model with a lower learning rate to recover the performance
![Page 33: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/33.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics33
Stream Analysis using Spark
![Page 34: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/34.jpg)
© 2018 CY Lin, Columbia UniversityE6893 Big Data Analytics – Lecture 5: Big Data Analytics Algorithms34
Spark ML Classification and Regression
![Page 35: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/35.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms35
Spark Streaming
![Page 36: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/36.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms36
Spark Streaming
![Page 37: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/37.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms37
Spark Streaming
https://www.edureka.co/blog/spark-streaming/
![Page 38: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/38.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms38
Spark Streaming
![Page 39: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/39.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms39
Spark Streaming Example
![Page 40: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/40.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms40
Spark Streaming Example
![Page 41: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/41.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms41
Discretized Streams
![Page 42: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/42.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms42
Discretized Streams
![Page 43: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/43.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms43
Discretized Streams
https://www.edureka.co/blog/spark-streaming/
![Page 44: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/44.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms44
DStream Transforms
https://www.edureka.co/blog/spark-streaming/
![Page 45: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/45.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms45
Output DStreams
https://www.edureka.co/blog/spark-streaming/
![Page 46: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/46.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms46
DStreams Caching
https://www.edureka.co/blog/spark-streaming/
![Page 47: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/47.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms47
DStreams Example — Twitter Sentiment Analysis
https://www.edureka.co/blog/spark-streaming/
![Page 48: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/48.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms48
DStreams Example — Twitter Sentiment Analysis
https://www.edureka.co/blog/spark-streaming/
![Page 49: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/49.jpg)
© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms49
DStreams Example — Twitter Sentiment Analysis
https://www.edureka.co/blog/spark-streaming/
All the tweets are categorized into Positive, Neutral and Negative according to the sentiment of the contents of the tweets
![Page 50: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector](https://reader035.vdocuments.us/reader035/viewer/2022071513/6134a054dfd10f4dd73bd9b9/html5/thumbnails/50.jpg)
© 2018 CY Lin, Columbia UniversityE6893 Big Data Analytics – Lecture 5: Big Data Analytics Algorithms50
Questions?