social media academy 2016 keynote presentation, john wihbey

Post on 14-Apr-2017

102 Views

Category:

Marketing

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Defining a Digital Storytelling Discipline: Learning, Skills, and Knowledge

John Wihbey Northeastern University

@wihbey

Case study: Northeastern

University undergrads working on Boston Police Department data, “as is” - in a general digital skills

course

Murder data from the 1960sCity of Boston - Homicide data obtained through public

records request

Text text text

Murder data from the 1990sCity of Boston - Homicide data obtained through public

records request

Murder data from the 2000sCity of Boston - Homicide data obtained through public records

request

(http://the-accidental-housewife.blogspot.com/)

29 problems. 1 assignment.

List of problems/errors in structure and format of homicide data

1. Inconsistencies in case column, e.g. “01/06” vs “ ’09/06 ”2. No indication of meaning of red text3. No key for case column IDs4. Different text formats/styles for entire rows and cells5. Inconsistent descriptions of intersection addresses, e.g. “Washington @ Cedar St” vs “Willowood & Woodrow Ave” vs “Shawmut Ave. / Dwight”6. No key for weapon codes7. Race and gender are collapsed into single column8. No codes for race/gender (race: “W”, “B”, gender: “M”, “F”, “H”)9. Some R/G codes are “W/H/M”, “B/N/H” making it impossible to systematically split columns into 2 using the only delimiting character (/)10. Some R/G codes have NO delimiter (e.g., 2009 sheet), so cannot split at all11. Data for 2007 and later have two additional columns not in 2006: “defendant” and “DOA” (no indication of what DOA means)12. Some rows have merged cells13. Some merged cells have multiple values14. Missing data/empty cells – what do these mean?

List of problems/errors in structure and format of homicide data (cont.)

15. Location data is incomplete – no zip code information and Boston assumed as city (except in cases where “Dor” is appended at end of address)16. Only first couple of sheets have column header information; column headers have to be assumed for remaining ones to follow those with labeled headers17. Mysterious unexplained extra characters in date columns (e.g., (w) and xxx)18. Inconsistent syntax for times: 12:00am, 7:10pm, 02:16am, 2:56hrs, 15oo hrs (Letter “o” instead of number 0), 1:49 AM, 7:24 PM, 21:25, 21:32 PM, 12:39P.M., 2:22p.m., 1:49:pm, 12;24AM, :14 am19. Inconsistent syntax for dates: “07/24/2006” vs “2006/7/24” vs “6/31/64” vs “08/31/06”20. Inconsistent syntax for age: “1’7”21. Sheets for 2012/13/14 have new columns not in previous sheets22. Motive/Relation columns look identical but are not both present in all sheets, impossible to know which labels are which in those sheets without column headers23. Simple spelling errors: “Tauma”24. Inconsistent coding: Unk, UNK, unk25. Unexplained “DV” column that only appears in 201326. No explanation for meaning of row breaks – are these separating data rows into groups of some sort? Are these data that one existed but were removed?27. Multiple columns with same (non-unique) headers – “R/G”, “Age”, “DOB” for both VICTIM and DEFENDANTS28. Inconsistent district labels and squadron personnel names29. For cells with multiple data/names in cell merged column, have to assume respective values in adjacent cells are provided in same order

Existential experience of: #datafail & #GIGO risk

Good student requests for clarification

Many noble student attempts at cleaning, analysis, exploration,

viz:

Data viz using Plot.ly

Data viz using Carto

Google Maps

Fun with line graphs - an attempt to look at time-of-day patterns

Experiments in viz for exploratory purposes

D'où Venons Nous / Que Sommes Nous / Où Allons Nous - Paul Gauguin, 1897

2000

2005

2016

Wikimedia

1774

(Nielsen 2006 via www.nngroup.com)

G. Chaucer, The Canterbury Tales (courtesy: library.arizona.edu)

1400s

https://projects.propublica.org/docdollars/

Fatal Force https://www.washingtonpost.com/graphics/national/police-shootings/

http://www.nytimes.com/interactive/2012/01/15/business/one-percent-map.html

http://www.npr.org/news/graphics/2011/10/toxic-air/#4.00/39.00/-84.00

https://offshoreleaks.icij.org/#_ga=1.76851094.2020983486.1475355003

http://projects.latimes.com/value-added/

Labnol.org

Pew Research Center

Polarized Crowd: Two large dense groups with little interconnection

Pew Research Center

Tight Crowd: Highly interconnected group with few isolated participants

Pew Research Center

Brand clusters: Products, services, celebrities discussed by disparate persons

Pew Research Center

Community clusters: Popular topics attracting multiple smaller groups

Pew Research Center

Broadcast networks: Media-centric, with audience proliferating information

Pew Research Center

Support network: Customer complaints, with hub-and-spoke dynamics

Six degrees - Wikimedia

Facebook friends network

Nebraska local politicians; network graph - Matt Waite

http://www.poynter.org/2013/how-to-visually-explore-local-politics-with-network-graphs/218543/

Chicago community - homicide network (Andrew Papachristos et al.)

NYTimes.com

Gary King, et al, IQSS, Harvard

Global supply chain, Sourcemap.com

Opte Project

D'où Venons Nous / Que Sommes Nous / Où Allons Nous - Paul Gauguin, 1897

John Wihbey Northeastern University

@wihbeyj.wihbey@northeastern.edu

top related