Cyberspace Law Committee Meeting, August 3, 2012
Big DataLois MermelsteinThe Law Office of Lois D. [email protected]
Ted ClaypooleWomble [email protected]
What Is Big Data?
✤ Data that exceeds the processing capacity of conventional database systems.
✤ Too much data
✤ It moves too fast
✤ It’s too diverse
How’d we get here?
✤ Storage, processing speed, and bandwidth are becoming exponentially faster
✤ Networking is expanding exponentially
✤ And you can buy all the pieces - data, infrastructure, processing
source: http://radar.oreilly.com/2011/08/building-data-startups.html
Crunching Big Data - Volume
✤ Turn 12 terabytes of tweets/day into improved product sentiment analysis
✤ Convert 350 billion annual meter readings to better predict power consumption
✤ Crunching Facebook recommendations based on your friends’ interests
Crunching Big Data - Velocity
✤ Time-sensitive analysis and decision-making - to catch important events as they happen
✤ When there’s too much input data (so toss some) or immediate decisions must be made
✤ Examples:
✤ Scrutinize 5 million trade events/day to identify potential fraud
✤ Analyze 500 million daily call detail records in real-time to predict customer churn faster
Crunching Big Data - Variety
✤ Not just names/addresses in a customer database
✤ Want to analyze text, sensor data, audio, video, location data, click streams, log files, and anything else that’s available
✤ Principle: when you can, keep everything - there might be something useful in what you throw away
Unexpected Consequences
✤ Anonymous AOL searcher isn’t (NYT, 8/9/2006)
✤ Anonymous Netflix users aren’t, when compared with IMDb database (Wired, 12/13/2007)
✤ For many, browsing history is unique and repeatable (8/1/2012)
✤ Target knows when you’re pregnant (NYT, 2/19/2012)
Lessons to (Re)learn
✤ Correlation isn't causation
✤ But correlation may be all you need
✤ You can't hide in the crowd
Personally Identifiable Information
PII as a mathematical function
How many points of data do you need?
Pineda v Williams Sonoma Stores, Inc. (Cal, Feb 10 2011)
HIPAA De-Identified Data
Re-Identifying De-Identified Data
Escaping Regulatory Requirements
Privacy
Fair Credit Reporting
Redlining
Employment Discrimination
Single Transaction Owned By:
Retailer
Wholesale vendor
Manufacturer
Shipping Company
Customer’s Bank
Customer’s ISP
Retailer’s Bank
Merchant Card Processor
Phone company/Hardware/Software
Government Using Big Data
Law Enforcement
Copyright Issues
Who owns the data?
Who owns the derivative works?
Combined data?