![Page 1: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/1.jpg)
Large-Scale Data Collection Using Redis
C. Aaron Cois, Ph.D. -- Tim PalkoCMU Software Engineering Institute
© 2011 Carnegie Mellon University
![Page 2: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/2.jpg)
Us
C. Aaron Cois, Ph.D.
Software Architect, Team LeadCMU Software Engineering InstituteDigital Intelligence and Investigations Directorate
Tim Palko
Senior Software EngineerCMU Software Engineering InstituteDigital Intelligence and Investigations Directorate
© 2011 Carnegie Mellon University
@aaroncois
![Page 3: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/3.jpg)
Overview
• Problem Statement• Sensor Hardware & System Requirements• System Overview– Data Collection– Data Modeling– Data Access– Event Monitoring and Notification
• Conclusions and Future Work
![Page 4: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/4.jpg)
The Goal
Critical infrastructure/facility protection
via
Environmental Monitoring
![Page 5: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/5.jpg)
Why?
Stuxnet• Two major components:
1) Send centrifuges spinning wildly out of control2) Record ‘normal operations’ and play them back to operators during the attack 1
• Environmental monitoring provides secondary indicators, such as abnormal heat/motion/sound
1 http://www.nytimes.com/2011/01/16/world/middleeast/16stuxnet.html?_r=2&
![Page 6: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/6.jpg)
The Broader Vision
Quick, flexible out-of-band monitoring
• Set up monitoring in minutes• Versatile sensors, easily repurposed • Data communication is secure (P2P VPN) and
requires no existing systems other than outbound networking
![Page 7: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/7.jpg)
A CMU research project called Sensor Andrew
• Features: – Open-source sensor platform– Scalable and generalist system supporting a
wide variety of applications– Extensible architecture• Can integrate diverse sensor types
The Platform
![Page 8: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/8.jpg)
Sensor Andrew
![Page 9: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/9.jpg)
Gateway
Gateway
Server
End Users
Sensor Andrew Overview
Nodes
![Page 10: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/10.jpg)
What is a Node?
Environment Node Sensors• Light• Audio• Humidity• Pressure• Motion• Temperature• Acceleration
Power Node Sensors• Current• Voltage• True Power• Energy
A node collects data and sends it to a collector, or gateway
Radiation Node Sensors• Alpha particle
count per minute
Particulate Node Sensors• Small Part. Count• Large Part. Count
![Page 11: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/11.jpg)
What is a Gateway?
• A gateway receives UDP data from all nodes registered to it
• An internal service:– Receives data continuously– Opens a server on a specified
port– Continually transmits UDP data
over this port
Gateway
![Page 12: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/12.jpg)
Requirements
1. Collect data from nodes once per second2. Scale to 100 gateways each with 64 nodes3. Detect events in real-time4. Notify users about events in real-time5. Retain all data collected for years, at least
We need to..
![Page 13: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/13.jpg)
What Is Big Data?
![Page 14: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/14.jpg)
What Is Big Data?
“When your data sets become so large that you have to start
innovating around how to collect, store, organize, analyze and share it.”
![Page 15: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/15.jpg)
Problems
Size Transmission
StorageRate
![Page 16: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/16.jpg)
Problems
Size Transmission
StorageRate
![Page 17: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/17.jpg)
Problems
Size Transmission
StorageRate
![Page 18: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/18.jpg)
Problems
Size Transmission
StorageRate
![Page 19: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/19.jpg)
Problems
Size Transmission
StorageRate
![Page 20: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/20.jpg)
Problems
Size Transmission
StorageRateRetrieval
![Page 21: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/21.jpg)
Collecting DataProblem:
Data cannot remain on the nodes or gateways due to security concerns.Limited infrastructure.
Constraints:
Store and retrieve immense amounts of data at a high rate.
?Gateway
8 GB / hour
Complex Analytics
![Page 22: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/22.jpg)
We Tried PostgreSQL…
• Advantages:– Reliable, tested and scalable– Relational => complex queries => analytics
• Problems:– Performance problems reading while writing at a
high rate; real-time event detection suffers– ‘COPY FROM’ doesn’t permit horizontal scaling
![Page 23: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/23.jpg)
Q: How can we decrease I/O load?
![Page 24: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/24.jpg)
Q: How can we decrease I/O load?
A: Read and write collected data directly from memory
![Page 25: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/25.jpg)
Enter Redis
Commonly used as a web application cache or pub/sub server
Redis is an in-memory NoSQL database
![Page 26: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/26.jpg)
Redis
• Created in 2009• Fully In-memory key-value store– Fast I/O: R/W operations are equally fast– Advanced data structures
• Publish/Subscribe Functionality– In addition to data store functions– Separate from stored key-value data
![Page 27: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/27.jpg)
Persistence
• Snapshotting– Data is asynchronously transferred from memory
to disk• AOF (Append Only File)– Each modifying operation is written to a file– Can recreate data store by replaying operations– Without interrupting service, will rebuild AOF as
the shortest sequence of commands needed to rebuild the current dataset in memory
![Page 28: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/28.jpg)
Replication
• Redis supports master-slave replication• Master-slave replication can be chained• Be careful: – Slaves are writeable!– Potential for data inconsistency
• Fully compatible with Pub/Sub features
![Page 29: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/29.jpg)
Redis Features Advanced Data Structures
List Set Sorted Set Hash
[A, B, C, D]
“A”
“B”
“C”
“D”
D
C
B
AA:3
C:1
D:2
B:4
{A, B, C, D} {C:1, D:2, A:3, D:4}
“A”
“B”
“C”
“D”
field1
field2
field3
field4
{field1:“A”, field2:“B”…}
{value:score} {key:value}
![Page 30: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/30.jpg)
Our Data Model
![Page 31: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/31.jpg)
Constraints
Our data store must:
– Hold time-series data
– Be flexible in querying (by time, node, sensor)
– Allow efficient querying of many records
– Accept data out of order
![Page 32: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/32.jpg)
Tradeoffs: Efficiency vs. Flexibility
MotionAudioLight
PressureHumidity
AccelerationTemperature
MotionVS
Light
Audio
Pressure
Temperature
Humidity
Acceleration
One record per timestamp
One record per sensor data type
A
![Page 33: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/33.jpg)
Our Solution: Sorted Set
Score
Value
Datapoint sensor:env:1011357542004000{“bat”: 192, "temp": 523, "digital_temp": 216, "mac_address": "20f", "humidity": 22, "motion": 203, "pressure": 99007, "node_type": "env", "timestamp": 1357542004000, "audio_p2p": 460, "light": 820, "acc_z": 464, "acc_y": 351, "acc_x": 311}
![Page 34: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/34.jpg)
Our Solution: Sorted Set
Score
Value
Datapoint sensor:env:1011357542004000{“bat”: 192, "temp": 523, "digital_temp": 216, "mac_address": "20f", "humidity": 22, "motion": 203, "pressure": 99007, "node_type": "env", "timestamp": 1357542004000, "audio_p2p": 460, "light": 820, "acc_z": 464, "acc_y": 351, "acc_x": 311}
![Page 35: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/35.jpg)
Our Solution: Sorted Set
Score
Value
Datapoint sensor:env:1011357542004000{“bat”: 192, "temp": 523, "digital_temp": 216, "mac_address": "20f", "humidity": 22, "motion": 203, "pressure": 99007, "node_type": "env", "timestamp": 1357542004000, "audio_p2p": 460, "light": 820, "acc_z": 464, "acc_y": 351, "acc_x": 311}
![Page 36: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/36.jpg)
Our Solution: Sorted Set
Score
Value
Datapoint sensor:env:1011357542004000{“bat”: 192, "temp": 523, "digital_temp": 216, "mac_address": "20f", "humidity": 22, "motion": 203, "pressure": 99007, "node_type": "env", "timestamp": 1357542004000, "audio_p2p": 460, "light": 820, "acc_z": 464, "acc_y": 351, "acc_x": 311}
![Page 37: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/37.jpg)
Sorted Set
1357542004000: {“temp”:523,..}1357542005000: {“temp”:523,..}
1357542007000: {“temp”:530,..}1357542008000: {“temp”:531,..}1357542009000: {“temp”:540,..} 1357542001000: {“temp”:545,..}…
![Page 38: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/38.jpg)
Sorted Set
1357542004000: {“temp”:523,..}1357542005000: {“temp”:523,..}1357542006000: {“temp”:527,..} <- fits nicely1357542007000: {“temp”:530,..}1357542008000: {“temp”:531,..}1357542009000: {“temp”:540,..} 1357542001000: {“temp”:545,..}…
![Page 39: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/39.jpg)
Know your data structure!A set is still a set…
Score
Value
Datapoint1357542004000{“bat”: 192, "temp": 523, "digital_temp": 216, "mac_address": "20f", "humidity": 22, "motion": 203, "pressure": 99007, "node_type": "env", "timestamp": 1357542004000, "audio_p2p": 460, "light": 820, "acc_z": 464, "acc_y": 351, "acc_x": 311}
![Page 40: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/40.jpg)
Requirement Satisfied
RedisGateway
![Page 41: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/41.jpg)
There is a disturbance in the Force..
![Page 42: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/42.jpg)
Collecting Data
RedisGateway
![Page 43: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/43.jpg)
“In Memory” Means Many Things
• The data store capacity is aggressively capped – Redis can only store as much data as the server
has RAM
![Page 44: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/44.jpg)
Collecting Big Data
RedisGateway
![Page 45: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/45.jpg)
We could throw away data…
• If we only cared about current values• However, our data– Must be stored for 1+ years for compliance– Must be able to be queried for historical/trend
analysis
![Page 46: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/46.jpg)
We Still Need Long-term Data Storage
Solution? Migrate data to an archive with expansive storage capacity
![Page 47: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/47.jpg)
Winning
Redis
Gateway
PostgreSQL
Archiver
![Page 48: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/48.jpg)
Winning?
Redis
Gateway
PostgreSQL
Archiver
??
?Some Poor Client
![Page 49: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/49.jpg)
Yes, Winning
Redis
Gateway
PostgreSQL
ArchiverAPI
Some Happy Client
![Page 50: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/50.jpg)
Gateway
Redis
PostgreSQL
ArchiverAPI
Best of both worlds
Redis allows quick access to real-time data, for monitoring and event detection
PostgreSQL allows complex queries and scalable storage for deep and historical analysis
![Page 51: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/51.jpg)
We Have the Data, Now What?
Incoming data must be monitored and analyzed, to detect significant events
![Page 52: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/52.jpg)
We Have the Data, Now What?
Incoming data must be monitored and analyzed, to detect significant events
What is “significant”?
![Page 53: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/53.jpg)
We Have the Data, Now What?
Incoming data must be monitored and analyzed, to detect significant events
What is “significant”?
What about new data types?
![Page 54: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/54.jpg)
Gateway
Django App
App DB
API
New guy: provide a way to read the data andcreate rules
motion > x && pressure < y&& audio > z
Redis
PostgreSQL
Archiver
![Page 55: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/55.jpg)
Gateway
Event MonitorEvent
MonitorDjango
AppApp DB
Redis
PostgreSQL
ArchiverAPI
New guy: read the rules and
data, trigger alarms
motion > x pressure < yaudio > z
All true?
![Page 56: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/56.jpg)
Gateway
Event MonitorEvent
MonitorDjango
AppApp DB
Redis
PostgreSQL
ArchiverAPI
Event monitor services can be scaled independently
![Page 57: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/57.jpg)
Getting The Message Out
![Page 58: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/58.jpg)
Getting The Message Out
Considerations
• Event monitor already has a job, avoid re-tasking as a notification engine
![Page 59: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/59.jpg)
Getting The Message Out
Considerations
• Event monitor already has a job, avoid re-tasking as a notification engine
• Notifications most efficiently should be a “push” instead of needing to poll
![Page 60: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/60.jpg)
Getting The Message Out
Considerations
• Event monitor already has a job, avoid re-tasking as a notification engine
• Notifications most efficiently should be a “push” instead of needing to poll
• Notification system should be generalized, e.g. SMTP, SMS
![Page 61: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/61.jpg)
If only…
![Page 62: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/62.jpg)
Gateway
Event MonitorEvent
MonitorDjango
AppApp DB
ArchiverAPI
Redis Data
Redis Pub/Sub
WorkerWorkerNotification
Worker
SMTP
Pub/Sub with synchronized workers is an optimal solution to real-time event notifications.
No need to add another system, Redis offers pub/sub services as well!
PostgreSQL
![Page 63: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/63.jpg)
Conclusions
• Redis is a powerful tool for collecting large amounts of data in real-time
• In addition to maintaining a rapid pace of data insertion, we were able to concurrently query, monitor, and detect events on our Redis data collection system
• Bonus: Redis also enabled a robust, scalable real-time notification system using pub/sub
![Page 64: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/64.jpg)
Things to watch
• Data persistence– if Redis needs to restart, it takes 10-20 seconds
per gigabyte to re-load all data into memory 1
– Redis is unresponsive during startup
1 http://oldblog.antirez.com/post/redis-persistence-demystified.html
![Page 65: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/65.jpg)
Future Work
• Improve scalability through:– Data encoding– Data compression– Parallel batch inserts for all nodes on a gateway
• Deep historical data analytics
![Page 66: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/66.jpg)
Acknowledgements
• Project engineers Chris Taschner and Jeff Hamed @ CMU SEI
• Prof. Anthony Rowe & CMU ECE WiSE Labhttp://wise.ece.cmu.edu/
• Our organizationsCMU https://www.cmu.eduCERT http://www.cert.orgSEI http://www.sei.cmu.eduCylab https://www.cylab.cmu.edu
![Page 67: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/67.jpg)
Thank You
![Page 68: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/68.jpg)
Thank You
Questions?
![Page 69: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/69.jpg)
Slides of Live Redis Demo
![Page 70: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/70.jpg)
A Closer Look at Redis Data
redis> keys *
1)"sensor:environment:f80”2)"sensor:environment:f81”3)"sensor:environment:f82"4)"sensor:environment:f83"5)"sensor:environment:f84"6)"sensor:power:f85"7)"sensor:power:f86"8)"sensor:radiation:f87"9)"sensor:particulate:f88"
![Page 71: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/71.jpg)
A Closer Look at Redis Data
redis> keys sensor:power:*
1)"sensor:power:f85"2)"sensor:power:f86”
![Page 72: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/72.jpg)
A Closer Look at Redis Data
redis> zcount sensor:power:f85 –inf +inf
(integer) 3565958(45.38s)
![Page 73: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/73.jpg)
A Closer Look at Redis Data
redis> zcount sensor:power:f85 1359728113000 +inf
(integer) 47
![Page 74: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/74.jpg)
A Closer Look at Redis Dataredis> zrange sensor:power:f85 -1000 -1
1)"{\"long_energy1\": 73692453, \"total_secs\": 6784, \"energy\": [49, 175, 62, 0, 0, 0], \"c2_center\": 485, \"socket_state\": 1, \"node_type\": \"power\", \"c_p2p_low2\": 437, \"socket_state1\": 0, \"mac_address\": \"103\", \"c_p2p_low\": 494, \"rms_current\": 6, \"true_power\": 1158, \"timestamp\": 1359728143000, \"v_p2p_low\": 170, \"c_p2p_high\": 511, \"rms_current1\": 113, \"freq\": 60, \"long_energy\": 4108081, \"v_center\": 530, \"c_p2p_high2\": 719, \"energy1\": [37, 117, 100, 4, 0, 0], \"v_p2p_high\": 883, \"c_center\": 509, \"rms_voltage\": 255, \"true_power1\": 23235}”
2)…
![Page 75: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062404/551a9cfe55034643688b625c/html5/thumbnails/75.jpg)
Redis Python APIimport redis
pool = redis.ConnectionPool(host=127.0.0.1, port=6379, db=0)r = redis.Redis(connection_pool=pool)
byindex = r.zrange(“sensor:env:f85”, -50, -1) # ['{"acc_z":663,"bat":0,"gpio_state":1,"temp":663,"light”:…
byscore = r.zrangebyscore(“sensor:env:f85”, 1361423071000, 1361423072000)
# ['{"acc_z":734,"bat":0,"gpio_state":1,"temp":734,"light”:…
size = r.zcount(“sensor:env:f85”, "-inf", "+inf") # 237327L