large-scale data collection using redis c. aaron cois, ph.d. -- tim palko cmu software engineering...

75
Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Upload: graham-hoskinson

Post on 31-Mar-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Large-Scale Data Collection Using Redis

C. Aaron Cois, Ph.D. -- Tim PalkoCMU Software Engineering Institute

© 2011 Carnegie Mellon University

Page 2: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Us

C. Aaron Cois, Ph.D.

Software Architect, Team LeadCMU Software Engineering InstituteDigital Intelligence and Investigations Directorate

Tim Palko

Senior Software EngineerCMU Software Engineering InstituteDigital Intelligence and Investigations Directorate

© 2011 Carnegie Mellon University

@aaroncois

Page 3: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Overview

• Problem Statement• Sensor Hardware & System Requirements• System Overview– Data Collection– Data Modeling– Data Access– Event Monitoring and Notification

• Conclusions and Future Work

Page 4: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

The Goal

Critical infrastructure/facility protection

via

Environmental Monitoring

Page 5: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Why?

Stuxnet• Two major components:

1) Send centrifuges spinning wildly out of control2) Record ‘normal operations’ and play them back to operators during the attack 1

• Environmental monitoring provides secondary indicators, such as abnormal heat/motion/sound

1 http://www.nytimes.com/2011/01/16/world/middleeast/16stuxnet.html?_r=2&

Page 6: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

The Broader Vision

Quick, flexible out-of-band monitoring

• Set up monitoring in minutes• Versatile sensors, easily repurposed • Data communication is secure (P2P VPN) and

requires no existing systems other than outbound networking

Page 7: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

A CMU research project called Sensor Andrew

• Features: – Open-source sensor platform– Scalable and generalist system supporting a

wide variety of applications– Extensible architecture• Can integrate diverse sensor types

The Platform

Page 8: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Sensor Andrew

Page 9: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Gateway

Gateway

Server

End Users

Sensor Andrew Overview

Nodes

Page 10: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

What is a Node?

Environment Node Sensors• Light• Audio• Humidity• Pressure• Motion• Temperature• Acceleration

Power Node Sensors• Current• Voltage• True Power• Energy

A node collects data and sends it to a collector, or gateway

Radiation Node Sensors• Alpha particle

count per minute

Particulate Node Sensors• Small Part. Count• Large Part. Count

Page 11: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

What is a Gateway?

• A gateway receives UDP data from all nodes registered to it

• An internal service:– Receives data continuously– Opens a server on a specified

port– Continually transmits UDP data

over this port

Gateway

Page 12: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Requirements

1. Collect data from nodes once per second2. Scale to 100 gateways each with 64 nodes3. Detect events in real-time4. Notify users about events in real-time5. Retain all data collected for years, at least

We need to..

Page 13: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

What Is Big Data?

Page 14: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

What Is Big Data?

“When your data sets become so large that you have to start

innovating around how to collect, store, organize, analyze and share it.”

Page 15: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Problems

Size Transmission

StorageRate

Page 16: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Problems

Size Transmission

StorageRate

Page 17: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Problems

Size Transmission

StorageRate

Page 18: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Problems

Size Transmission

StorageRate

Page 19: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Problems

Size Transmission

StorageRate

Page 20: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Problems

Size Transmission

StorageRateRetrieval

Page 21: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Collecting DataProblem:

Data cannot remain on the nodes or gateways due to security concerns.Limited infrastructure.

Constraints:

Store and retrieve immense amounts of data at a high rate.

?Gateway

8 GB / hour

Complex Analytics

Page 22: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

We Tried PostgreSQL…

• Advantages:– Reliable, tested and scalable– Relational => complex queries => analytics

• Problems:– Performance problems reading while writing at a

high rate; real-time event detection suffers– ‘COPY FROM’ doesn’t permit horizontal scaling

Page 23: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Q: How can we decrease I/O load?

Page 24: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Q: How can we decrease I/O load?

A: Read and write collected data directly from memory

Page 25: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Enter Redis

Commonly used as a web application cache or pub/sub server

Redis is an in-memory NoSQL database

Page 26: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Redis

• Created in 2009• Fully In-memory key-value store– Fast I/O: R/W operations are equally fast– Advanced data structures

• Publish/Subscribe Functionality– In addition to data store functions– Separate from stored key-value data

Page 27: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Persistence

• Snapshotting– Data is asynchronously transferred from memory

to disk• AOF (Append Only File)– Each modifying operation is written to a file– Can recreate data store by replaying operations– Without interrupting service, will rebuild AOF as

the shortest sequence of commands needed to rebuild the current dataset in memory

Page 28: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Replication

• Redis supports master-slave replication• Master-slave replication can be chained• Be careful: – Slaves are writeable!– Potential for data inconsistency

• Fully compatible with Pub/Sub features

Page 29: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Redis Features Advanced Data Structures

List Set Sorted Set Hash

[A, B, C, D]

“A”

“B”

“C”

“D”

D

C

B

AA:3

C:1

D:2

B:4

{A, B, C, D} {C:1, D:2, A:3, D:4}

“A”

“B”

“C”

“D”

field1

field2

field3

field4

{field1:“A”, field2:“B”…}

{value:score} {key:value}

Page 30: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Our Data Model

Page 31: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Constraints

Our data store must:

– Hold time-series data

– Be flexible in querying (by time, node, sensor)

– Allow efficient querying of many records

– Accept data out of order

Page 32: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Tradeoffs: Efficiency vs. Flexibility

MotionAudioLight

PressureHumidity

AccelerationTemperature

MotionVS

Light

Audio

Pressure

Temperature

Humidity

Acceleration

One record per timestamp

One record per sensor data type

A

Page 33: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Our Solution: Sorted Set

Score

Value

Datapoint sensor:env:1011357542004000{“bat”: 192, "temp": 523, "digital_temp": 216, "mac_address": "20f", "humidity": 22, "motion": 203, "pressure": 99007, "node_type": "env", "timestamp": 1357542004000, "audio_p2p": 460, "light": 820, "acc_z": 464, "acc_y": 351, "acc_x": 311}

Page 34: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Our Solution: Sorted Set

Score

Value

Datapoint sensor:env:1011357542004000{“bat”: 192, "temp": 523, "digital_temp": 216, "mac_address": "20f", "humidity": 22, "motion": 203, "pressure": 99007, "node_type": "env", "timestamp": 1357542004000, "audio_p2p": 460, "light": 820, "acc_z": 464, "acc_y": 351, "acc_x": 311}

Page 35: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Our Solution: Sorted Set

Score

Value

Datapoint sensor:env:1011357542004000{“bat”: 192, "temp": 523, "digital_temp": 216, "mac_address": "20f", "humidity": 22, "motion": 203, "pressure": 99007, "node_type": "env", "timestamp": 1357542004000, "audio_p2p": 460, "light": 820, "acc_z": 464, "acc_y": 351, "acc_x": 311}

Page 36: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Our Solution: Sorted Set

Score

Value

Datapoint sensor:env:1011357542004000{“bat”: 192, "temp": 523, "digital_temp": 216, "mac_address": "20f", "humidity": 22, "motion": 203, "pressure": 99007, "node_type": "env", "timestamp": 1357542004000, "audio_p2p": 460, "light": 820, "acc_z": 464, "acc_y": 351, "acc_x": 311}

Page 37: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Sorted Set

1357542004000: {“temp”:523,..}1357542005000: {“temp”:523,..}

1357542007000: {“temp”:530,..}1357542008000: {“temp”:531,..}1357542009000: {“temp”:540,..} 1357542001000: {“temp”:545,..}…

Page 38: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Sorted Set

1357542004000: {“temp”:523,..}1357542005000: {“temp”:523,..}1357542006000: {“temp”:527,..} <- fits nicely1357542007000: {“temp”:530,..}1357542008000: {“temp”:531,..}1357542009000: {“temp”:540,..} 1357542001000: {“temp”:545,..}…

Page 39: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Know your data structure!A set is still a set…

Score

Value

Datapoint1357542004000{“bat”: 192, "temp": 523, "digital_temp": 216, "mac_address": "20f", "humidity": 22, "motion": 203, "pressure": 99007, "node_type": "env", "timestamp": 1357542004000, "audio_p2p": 460, "light": 820, "acc_z": 464, "acc_y": 351, "acc_x": 311}

Page 40: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Requirement Satisfied

RedisGateway

Page 41: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

There is a disturbance in the Force..

Page 42: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Collecting Data

RedisGateway

Page 43: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

“In Memory” Means Many Things

• The data store capacity is aggressively capped – Redis can only store as much data as the server

has RAM

Page 44: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Collecting Big Data

RedisGateway

Page 45: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

We could throw away data…

• If we only cared about current values• However, our data– Must be stored for 1+ years for compliance– Must be able to be queried for historical/trend

analysis

Page 46: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

We Still Need Long-term Data Storage

Solution? Migrate data to an archive with expansive storage capacity

Page 47: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Winning

Redis

Gateway

PostgreSQL

Archiver

Page 48: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Winning?

Redis

Gateway

PostgreSQL

Archiver

??

?Some Poor Client

Page 49: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Yes, Winning

Redis

Gateway

PostgreSQL

ArchiverAPI

Some Happy Client

Page 50: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Gateway

Redis

PostgreSQL

ArchiverAPI

Best of both worlds

Redis allows quick access to real-time data, for monitoring and event detection

PostgreSQL allows complex queries and scalable storage for deep and historical analysis

Page 51: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

We Have the Data, Now What?

Incoming data must be monitored and analyzed, to detect significant events

Page 52: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

We Have the Data, Now What?

Incoming data must be monitored and analyzed, to detect significant events

What is “significant”?

Page 53: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

We Have the Data, Now What?

Incoming data must be monitored and analyzed, to detect significant events

What is “significant”?

What about new data types?

Page 54: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Gateway

Django App

App DB

API

New guy: provide a way to read the data andcreate rules

motion > x && pressure < y&& audio > z

Redis

PostgreSQL

Archiver

Page 55: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Gateway

Event MonitorEvent

MonitorDjango

AppApp DB

Redis

PostgreSQL

ArchiverAPI

New guy: read the rules and

data, trigger alarms

motion > x pressure < yaudio > z

All true?

Page 56: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Gateway

Event MonitorEvent

MonitorDjango

AppApp DB

Redis

PostgreSQL

ArchiverAPI

Event monitor services can be scaled independently

Page 57: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Getting The Message Out

Page 58: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Getting The Message Out

Considerations

• Event monitor already has a job, avoid re-tasking as a notification engine

Page 59: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Getting The Message Out

Considerations

• Event monitor already has a job, avoid re-tasking as a notification engine

• Notifications most efficiently should be a “push” instead of needing to poll

Page 60: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Getting The Message Out

Considerations

• Event monitor already has a job, avoid re-tasking as a notification engine

• Notifications most efficiently should be a “push” instead of needing to poll

• Notification system should be generalized, e.g. SMTP, SMS

Page 61: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

If only…

Page 62: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Gateway

Event MonitorEvent

MonitorDjango

AppApp DB

ArchiverAPI

Redis Data

Redis Pub/Sub

WorkerWorkerNotification

Worker

SMTP

Pub/Sub with synchronized workers is an optimal solution to real-time event notifications.

No need to add another system, Redis offers pub/sub services as well!

PostgreSQL

Page 63: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Conclusions

• Redis is a powerful tool for collecting large amounts of data in real-time

• In addition to maintaining a rapid pace of data insertion, we were able to concurrently query, monitor, and detect events on our Redis data collection system

• Bonus: Redis also enabled a robust, scalable real-time notification system using pub/sub

Page 64: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Things to watch

• Data persistence– if Redis needs to restart, it takes 10-20 seconds

per gigabyte to re-load all data into memory 1

– Redis is unresponsive during startup

1 http://oldblog.antirez.com/post/redis-persistence-demystified.html

Page 65: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Future Work

• Improve scalability through:– Data encoding– Data compression– Parallel batch inserts for all nodes on a gateway

• Deep historical data analytics

Page 66: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Acknowledgements

• Project engineers Chris Taschner and Jeff Hamed @ CMU SEI

• Prof. Anthony Rowe & CMU ECE WiSE Labhttp://wise.ece.cmu.edu/

• Our organizationsCMU https://www.cmu.eduCERT http://www.cert.orgSEI http://www.sei.cmu.eduCylab https://www.cylab.cmu.edu

Page 67: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Thank You

Page 68: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Thank You

Questions?

Page 69: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Slides of Live Redis Demo

Page 70: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

A Closer Look at Redis Data

redis> keys *

1)"sensor:environment:f80”2)"sensor:environment:f81”3)"sensor:environment:f82"4)"sensor:environment:f83"5)"sensor:environment:f84"6)"sensor:power:f85"7)"sensor:power:f86"8)"sensor:radiation:f87"9)"sensor:particulate:f88"

Page 71: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

A Closer Look at Redis Data

redis> keys sensor:power:*

1)"sensor:power:f85"2)"sensor:power:f86”

Page 72: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

A Closer Look at Redis Data

redis> zcount sensor:power:f85 –inf +inf

(integer) 3565958(45.38s)

Page 73: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

A Closer Look at Redis Data

redis> zcount sensor:power:f85 1359728113000 +inf

(integer) 47

Page 74: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

A Closer Look at Redis Dataredis> zrange sensor:power:f85 -1000 -1

1)"{\"long_energy1\": 73692453, \"total_secs\": 6784, \"energy\": [49, 175, 62, 0, 0, 0], \"c2_center\": 485, \"socket_state\": 1, \"node_type\": \"power\", \"c_p2p_low2\": 437, \"socket_state1\": 0, \"mac_address\": \"103\", \"c_p2p_low\": 494, \"rms_current\": 6, \"true_power\": 1158, \"timestamp\": 1359728143000, \"v_p2p_low\": 170, \"c_p2p_high\": 511, \"rms_current1\": 113, \"freq\": 60, \"long_energy\": 4108081, \"v_center\": 530, \"c_p2p_high2\": 719, \"energy1\": [37, 117, 100, 4, 0, 0], \"v_p2p_high\": 883, \"c_center\": 509, \"rms_voltage\": 255, \"true_power1\": 23235}”

2)…

Page 75: Large-Scale Data Collection Using Redis C. Aaron Cois, Ph.D. -- Tim Palko CMU Software Engineering Institute © 2011 Carnegie Mellon University

Redis Python APIimport redis

pool = redis.ConnectionPool(host=127.0.0.1, port=6379, db=0)r = redis.Redis(connection_pool=pool)

byindex = r.zrange(“sensor:env:f85”, -50, -1) # ['{"acc_z":663,"bat":0,"gpio_state":1,"temp":663,"light”:…

byscore = r.zrangebyscore(“sensor:env:f85”, 1361423071000, 1361423072000)

# ['{"acc_z":734,"bat":0,"gpio_state":1,"temp":734,"light”:…

size = r.zcount(“sensor:env:f85”, "-inf", "+inf") # 237327L