doing data science chapter 9 emerging it dps 2016 – fall 2014 dr. frank and dr. tappert by: javid...

30
Doing Data Science Chapter 9 Emerging IT DPS 2016 – Fall 2014 Dr. Frank and Dr. Tappert By: Javid Maghsoudi

Upload: francis-mcgee

Post on 21-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Doing Data Science Chapter 9

Emerging ITDPS 2016 – Fall 2014

Dr. Frank and Dr. TappertBy: Javid Maghsoudi

Introduction: (2 Topics)

Data Visualization and

Fraud Detection

Contributors:

Professor Mark Hansen, Columbia University (UCLA Graduate, heads the Media Innovation at Columbia, Teaches Statistics and Journalism at Columbia

and Works on NY Times R & D Ian Wong is an Inference Scientist at Square (now at

Prismatic). Came from San Francisco, doing Data Science on the topics of Risk, has degrees in Statistics and Electrical Engineering

What is Data Visualization:

It is a representation of data that helps you see what you otherwise would have been blind to if you looked only at the naked source. It enables you to see trends, patterns and outliers that tell you about yourself and what surrounds you - Nathen Yau, Data Points (Wiley)

The book shown by Steve Lindo

Mark Hansen: Data Visualization, History: Gabriel Tarde

- A Sociologist who believed that the Social Science had the capacity to produce vastly more data that the physical science

- The physical Science observes from the distance: they typically model or incorporate models to talk about an aggregate in some way – like a Biologist Talk about the function of the aggregate of our cell

- Tarde thinks there is a deficiency – because of lack of information

- We should instead talk about every cell

- If we replace every cell with people we can then collect a huge amount of information about individuals if they offer that themselves through sites like Facebook

Are we then missing the forest from the trees when we do this?

If we focus on the micro level, we might miss the larger picture of the cultural Significance of social interaction.

But, Bruno Latour (a French Sociologist ) weighs in:

“But the ”Whole” is now nothing more that a provisional visualization which can be modified and reversed at will, by moving back to the individual components, and by looking for yet other tools to regroup the same elements into alternative assemblies”

“Change the instrument & you will change the entire social theory that goes with them”

- Tarde even saw the emergence of Facebook in 1903 in the form of “daily press”

- They want us to consider both the structure of society changes as we observe it, and the ways of thinking about the relationship of the individual to the aggregate

Mark’s Thought Experiment:

As we get more data: relationships are impacted between ourselves and community, the community and the country, and the country and the world

What is data Science anyway?

- Data Science is doing the same what other scientists do in the other fields

- Close vs. Distance reading of text – Noted by Franco Moretti

- Distance reading is getting a sense of what someone is “talking” vs. line by line

- We do not go to someone’s backyard to play, we just watch them play and formalize and inform their processes with

our bells and whistle

- We then learn new games & expand our own fundamental concepts of data and how we approach to analyze them

Sample of data Visualization projects

- Power plant’s steam cloud- Building collecting dust - Tree planting- Mark’s data Visualization projects; - Moveable types – Video - Project cascade- Cronkite plaza- eBay transactions and books show the video- Public theater;

- So, what is data? It is “all” data

- “Processing “Programming Language:

Let Artists Do Programming

Processing Language

Moveable Types:https://www.youtube.com/watch?v=aU62VVtN_Ec

Ebay Transaction & Paypal: Items to Purchase on ebay

https://vimeo.com/50146828

Ian Wong: Data Science and Risk

• Data Mining !=Writing R Scripts• Data Visualization != producing a nice plot• ML (Machine learning) and data visualization augment human intelligence

About Square: (similar to Paypal)

- To Make commerce easy- To make the process easy to make payment and accept payment- Download an app and ready to go

Risk Challenge:

- Create a robust and efficient risk management system- How do you prevent the abuse of the service?- Using Machine Learning and data Visualization

Detect suspicious activities using machine learning

- Defining what is suspicious?- Be mindful of false positive (erodes customer trust)

& false negative (Square loses money)

Define Data Scheme

An Example of : a Supervised Learning Recipe – Model:

- Get Data- Derive the features- Train the Model- Estimate Performance- Publish Model

The trouble with performance estimation:

- Defining the error metrics- Defining the label? (Engaged User by Facebook)- What is label anyway?- Changes in the features and learning

Model building tips:

- Models are not black boxes- Develop the ability to perform rapid model iterations- Models and packages are not magic potion

To Code:

- Get a pair- Product ionizing machine learning models

Data Visualization at Square:

- Enable efficient transaction reviews- Reveal patterns for individual customers & across customer segments- Measure business health- Provide ambient analytics