literacy in the age of big data

44
#IMDAYS // @michael_smit Literacy in the Age of Big Data Mike Smit School of Informa9on Management Faculty of Management

Upload: centre-for-advanced-management-education

Post on 21-Feb-2017

36 views

Category:

Education


0 download

TRANSCRIPT

#IMDAYS    //    @michael_smit  

Literacy  in  the    Age  of  Big  Data  

 Mike  Smit  

School  of  Informa9on  Management  Faculty  of  Management  

#IMDAYS    //    @michael_smit  

What  is  Big  Data?    

•  Volume  /  Variety  /  Velocity,  or  •  Anything  more  than  I  can  handle,  or  •  Data  too  large  to  be  contained  by  a  single  computer,  or  

•  Data  beyond  human  scale,  or  •  Data  measured  in  TB  or  bigger,  or  •  Anything  I  have  a  beFer  chance  of  selling  you  by  claiming  it  is  Big  Data.  

#IMDAYS    //    @michael_smit  

#IMDAYS    //    @michael_smit  

Twi;er  Example  •  {"created_at":"Sat  Nov  16  12:18:36  +0000  2013","id":401685732185899000,"id_str":"401685732185899008","text":"SpoFed  this  in  the  

Hicks  building.  At  Dal,  even  the  graffi9  is  academically  rigorous.  hFp://t.co/n8jpJSGorN","source":"<a  href=\"hFp://twiFer.com/download/iphone\"  rel=\"nofollow\">TwiFer  for  iPhone</a>","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":2182346850,"id_str":"2182346850","name":"Richard  Florizone","screen_name":"DalPres","loca9on":"Nova  Sco9a","url":"hFp://dal.ca","descrip9on":"11th  President  -­‐  Dalhousie  University.  Online  as  oien  as  possible.","protected":false,"followers_count":21,"friends_count":15,"listed_count":1,"created_at":"Fri  Nov  08  14:49:09  +0000  2013","favourites_count":1,"utc_offset":null,"9me_zone":null,"geo_enabled":false,"verified":false,"statuses_count":5,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"hFp://a0.twimg.com/profile_background_images/378800000117347877/3f9b5575de267ee12db6c1b4eb6e6332.jpeg","profile_background_image_url_hFps":"hFps://si0.twimg.com/profile_background_images/378800000117347877/3f9b5575de267ee12db6c1b4eb6e6332.jpeg","profile_background_9le":false,"profile_image_url":"hFp://pbs.twimg.com/profile_images/378800000743858713/b7417c514d6e85dd67895f1802b784ae_normal.jpeg","profile_image_url_hFps":"hFps://pbs.twimg.com/profile_images/378800000743858713/b7417c514d6e85dd67895f1802b784ae_normal.jpeg","profile_banner_url":"hFps://pbs.twimg.com/profile_banners/2182346850/1384540163","profile_link_color":"0084B4","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"no9fica9ons":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"en99es":{"hashtags":[],"symbols":[],"urls":[],"user_men9ons":[],"media":[{"id":401685731921629200,"id_str":"401685731921629184","indices":[88,110],"media_url":"hFp://pbs.twimg.com/media/BZMTC4KIEAA62wo.jpg","media_url_hFps":"hFps://pbs.twimg.com/media/BZMTC4KIEAA62wo.jpg","url":"hFp://t.co/n8jpJSGorN","display_url":"pic.twiFer.com/n8jpJSGorN","expanded_url":"hFp://twiFer.com/DalPres/status/401685732185899008/photo/1","type":"photo","sizes":{"medium":{"w":600,"h":450,"resize":"fit"},"large":{"w":1024,"h":768,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"small":{"w":340,"h":255,"resize":"fit"}}}]},"favorited":false,"retweeted":false,"possibly_sensi9ve":false,"filter_level":"medium","lang":"en"}  

#IMDAYS    //    @michael_smit  

What  is  Big  Data?    

•  Volume  /  Variety  /  Velocity,  or  •  Anything  more  than  I  can  handle,  or  •  Data  too  large  to  be  contained  by  a  single  computer,  or  

•  Data  beyond  human  scale,  or  •  Data  measured  in  TB  or  bigger,  or  •  Anything  I  have  a  beFer  chance  of  selling  you  by  claiming  it  is  Big  Data.  

WHY    is  Big  Data?  

Reason  #1:  Web  2.0  turned  

everyone  into  content  creators  

Reason  #2:  Internet  of  Things  turns  everything  into  data  creators  

Image:  GE  press  release  

#IMDAYS    //    @michael_smit  

Reason  #3:  Data  accumulaOon  is  less  visible  

#IMDAYS    //    @michael_smit  

#IMDAYS    //    @michael_smit  

Reason  #4:    Declining  Price  of  Storage  

#IMDAYS    //    @michael_smit  

1.  How  much  would  it  cost  to  buy  enough  hard  drives  to  store  all  the  music  in  the  iTunes  store?    2.  Same  quesOon,  but  pretend  it  is  10  years  ago.  

#IMDAYS    //    @michael_smit  

Reason  #5:  Cloud  storage  (and  pay-­‐as-­‐you-­‐go  pricing)  

#IMDAYS    //    @michael_smit  

Pre-­‐Cloud  

0  1  

Price  

Time  

#IMDAYS    //    @michael_smit  

Cloud  Era  

0  1  

Price  

Time  

#IMDAYS    //    @michael_smit  

Ds1fFtZx4olD5acndKSToGizuuj2D9Ut9prJlDLPpq35mNVHQghsDpGo13qZKpgF8Qe1xQnjKU0VEDwn3aXNTe4miEwbAq2WqkjWx2NZSH70kdK4x3h7L6E6DxnZrZeOlBZLXlFcCkluiScz0Ei13tqpALVvObQ3BnepwdPUpFSMnqvYaSQ4P3F6We9zXKIZDb9PGl8yyDw6XEAEMUcAq8mR4Z9WOY3XZG8b9QGwINtRMeTdKosHnTobzwf4gFFszjx1E0EJA22up0zg8Ub35gEd8wHc7yTmTZWZMU6hBVsEzhzcTaWx2wlYHstAiYjRAIAoYbuNupw0iWaxweJaCWl9y8J5zZ05YwTlAsh6jAl0Mp2RIkL3F00if8GGt3kaAzT5VQLHZSV1rJSTdMt9g1ldQsPm5U95oZN3Cx9B8sHbsgNINq5yiMjuVlO3rkCd1ShH210wGaIIlLpZ41U2$gK2fCEY5rvKU0p5sQHuIizchKc2zuGGP2FAZ3utFXhXLDIyhdzExe9VKo8DtAQqprlBOrkvdMjVWAs2Jj6H9GW4LrrFZrXY3VC6h6v3pZkDcfwmT6jRwkJrwvbuFCq0t4vOUBjPeSggukZKFAs1IryTkYKTPsJN5Lf5ZXhOqOcc9MB5MnkMAS1yqD5ayDv8kWWW29hLFRiSLF6zkEQA95yer84R91Lt3dfglI2yamX4UDO7j18ocflmcu9zfLklOLbR4Kg63GIvbfafqpv7wcNlBZ3Q3vJsjTmlbR6Is6kIlh3BQIF3W1QWosPhG9oNmR3bzTfK5gACtmgmBTAtKNrtRIK4XAfpRwmUZnBLYWJcjGIgjpD5237WhfZMFSEaMfOSi5SFD1aAq12D0cMh5WW  

Reason  #6:  The  lingering  hope  of  finding  valuable  informaOon  

#IMDAYS    //    @michael_smit  

Why  Do  We  Have  Big  Data?  

   

…  because  we  can.  

#IMDAYS    //    @michael_smit  

What  is  Big  Data?  

   

A  Problem,  not  a  Solu8on.  

#IMDAYS    //    @michael_smit  

So,  uh,  thanks  for  having  me!  

(Just  Kidding)  

#IMDAYS    //    @michael_smit  

Where  Do  We  Start?  

•  Admit  you  have  a  problem  •  S9ck  together  •  Remain  Calm!  We  fear  what  we  don’t  understand:  data  literacy  educa9on.  

•  Analy9cs  (self-­‐serve  business  intelligence)  •  There  is  no  subs9tute  for  human  aFen9on…  but  when  that’s  not  feasible,  what  else  you  got?    –  Idea:  Cogni9ve  Compu9ng  for  improved  automa9on  –  Idea:  Knowledge  Graph  for  RM  

•  Records  Management  

#IMDAYS    //    @michael_smit  

Admit  you  have  a  problem  

#IMDAYS    //    @michael_smit  

SOck  Together  

26  

Ascend  the  

Pyramid  

(AnalyOcs,  self-­‐service  

business  intelligence,  

etc.)  

#IMDAYS    //    @michael_smit  

Historic  Flood  Database:    A  Big  Data  Approach  

•  Automa9cally  processing  newspaper  ar9cles  to  produce  open  datasets  describing  geo-­‐located  floods  in  Nova  Sco9a.  

•  Visual  interface  

#IMDAYS    //    @michael_smit  

Remain  Calm:    Strength  through  EducaOon  

(Data  Literacy)  

Skills  Gap  

•  Predicted  for  US  in  2018  by  McKinsey  Global  Ins9tute  

Posi%ons:(465k(

Workforce:(300k(

Deep$Analy*cs$Skills$

Posi%ons:(4m(

Workforce:(2.5m(

Deep$Analy*cs$Skills$ Data1savvy$

Data  Literacy  

•  The  ability  to  create,  comprehend,  and  communicate  data.  

•  The  ability  to  collect,  manage,  evaluate,  and  apply  data,  in  a  cri9cal  manner.  

•  Spans  disciplines,  sectors,  universi9es,  …    

Appendix 2 - Data Literacy Word Cloud The following is a word cloud generated from the major definitions of data literacy in the reviewed literature.

!

#IMDAYS    //    @michael_smit  

Data  Literacy  EducaOon  Conceptual Framework Introduction to Data

Knowledge and understanding of dataKnowledge and understanding of the uses and applications of data

Data Collection Data Discovery and Collection Performs data exploration Identifies useful data Collects data Evaluating and Ensuring Quality of Data and Sources

Crtically assesses sources of data for trustworthiness

Critically evaluates quality of datasets for errors or problems

Data Organization Knowledge of basic data organization methods and tools Asesses data organization requirements Organizes data

Data ManipulationAsesses methods to clean data Identifies outliers and anomalies Cleans data

Data Management Data Conversion (from format to format)

Knowledge of different data types and conversion methods

Converts data from one format or file type to another

Metadata Creation and UseCreates metadata descriptors

Assigns appropriate metadata descriptors to original data sets

Data Curation, Security, and Re-Use

Assesses data curation requirements (e.g. retention schedule, storage, accessibility, sharing requirements, etc.)

Assess data security requirements (e.g. restricted access, protected drives, etc.) Curates data

Data PreservationAssesses requirements for preservation Asseses methods and tools for data preservation Preserves data

Data Tools Knowledge of data analysis tools and techniques

Selects appropriate data analysis tool or technique

Applies data analysis tools and techniques

Basic Data AnalysisDevelops analysis plans Applies analysis methods and tools Conducts exploratory analysis Evaluates results of analysis

Compares results of analysis with other findings

Data Interpretation (Understanding Data) Reads and understands charts, tables, and

graphsIdentifies key take-away points, and integrates this with other important information

Identifies discrepancies within the data

Data Evaluation Identifying Problems Using Data

Uses data to identify problems in practical situations (e.g. workplace efficiency)

Uses data to identify higher level problems (e.g. policy, environment, scientific experimentation, marketing, economics, etc.)

Data Visualization Creates meaningful tables to organize and visually present data

Creates meaningful graphical representations of data

Evaluates effectiveness of graphical representations

Critically assesses graphical representations for accuracy and misrepresentation of data

Presenting Data (Verbally) Asssess the desired outcome(s) for presenting the data

Assesses audience needs and familiarity with subject(s)

Plans the appropriate meeting or presentation type

Utilizes meaningful tables and visualizationsto communicate data

Presents arguments and/or outcomes clealy and coherently

Data Driven Decisions Making (DDDM) (Making decisions based on data) Prioritizes information garnered from data Converts data into actionable information

Weighs the merit and impacts of possible solutions/decisions Implements decisions/solutions

Critical Thinking Aware of high level issues and challlenges associated with data Thinks critically when working with data

Data Culture Recognizes the importance of data

Supports an environment that fosters critical use of data for learning, research, and decision-making

Data Application Data Ethics Aware of legal and ethical issues associated with data Applies and works with data in an ethical manner

Data Citation Knowledge of widely-accepted data citation methods Creates correct citations for secondary data sets

Data Sharing Assesses methods and platforms for sharing data Shares data legally, and ethically

Evaluating Decisions Based on Data

Collects follow-up data to assess effectiveness of decisions or solutions based upon data Conducts analysis of follow-up data

Compares results of analysis with other findings

Evaluates decisions or solutions based on data

Retains original conclusions or decisiosn, or implements new decisions/solutions

#IMDAYS    //    @michael_smit  

There  is  no  subsOtute    for  human  a;enOon  

But  some9mes  we  have  too  much  data  and  not  enough  humans!  

Google’s  Knowledge  Graph  

39  

CogniOve  CompuOng  

40  

#IMDAYS    //    @michael_smit  

Discussion  

•  [email protected]  •  @michael_smit  •  I’m  here  all  day!  

#IMDAYS    //    @michael_smit  

Image  Credits  (1)  

•  hFp://www.scien9ficamerican.com/media/inline/blog/Image/wisdom.jpg  •  hFp://rudyloans.com/wp-­‐content/uploads/2013/11/Arrow-­‐Up-­‐4.jpg  •  hFp://www.mrwallpaper.com/cat-­‐and-­‐dog-­‐cuddle-­‐wallpaper/  •  hFp://poFermore.wikia.com/wiki/Category:Gryffindor  •  hFp://poFermore.wikia.com/wiki/File:Slytherin_mark.png  •  hFp://daverobertsfilm.wordpress.com/2011/02/02/media-­‐studies-­‐key-­‐debates/  •  hFp://www.themobilityresource.com/wearable-­‐technology-­‐and-­‐how-­‐it-­‐affects-­‐

people-­‐with-­‐disabili9es/    •  Original  source  unknown;  available    •  hFp://adpaascu.wordpress.com/tag/global-­‐ci9zens/  •  hFps://www.torproject.org/  •  hFp://www.gnupg.org/  •  hFp://www.iden9tyfinder.com/  

 42  

#IMDAYS    //    @michael_smit  

Image  Credits  (2)  

•  hFp://www.gartner.com/technology/research/hype-­‐cycles/  •  hFp://blog.udacity.com/2013/07/new-­‐course-­‐design-­‐of-­‐everyday-­‐things.html  •  Screenshot  from  hFp://pennystocks.la/internet-­‐in-­‐real-­‐9me/  •  hFp://www.officeimaging.com/  •  hFp://www.clipartbest.com/gradua9on-­‐caps-­‐clip-­‐art  •  Cost  per  GB  from  hFp://www.mkomo.com/cost-­‐per-­‐gigabyte-­‐update  •  Images  on  slides  47,  57,  58  are  ©  Mike  Smit,  2014.  •  Slide  35:  screenshot  of  personal  laptop  &  cell  phone  •  Slide  37:  Vancouver  Archives,  hFp://searcharchives.vancouver.ca/power-­‐lines-­‐

and-­‐suppor9ng-­‐structure-­‐in-­‐lane-­‐west-­‐of-­‐main-­‐street-­‐at-­‐pender-­‐street    •  Slide  43:  Screenshot  of  Watson  User  Modeling.    Made  from  my  own  copy  of  their  

demo  applica9on,  but  also  available  publicly  at  hFp://watson-­‐um-­‐demo.mybluemix.net/  

43  

#IMDAYS    //    @michael_smit  

Image  Credits  (3)  

•  All  graphs  were  created  for  the  purpose  of  this  presenta9on  •  Logos  on  slide  38  are  from  the  respec9ve  websites  •  Images  on  slide  39:    

–  BoFom  lei:  Thalmic  Labs  via  TechCrunch  hFp://techcrunch.com/2013/06/05/thalmic-­‐labs-­‐raises-­‐14-­‐5m-­‐to-­‐make-­‐the-­‐myo-­‐armband-­‐the-­‐next-­‐big-­‐thing-­‐in-­‐gesture-­‐control/  

–  Top  lei:  Apple.com  –  Top  right:  fitbit.com  –  BoFom  right:  hFps://www.google.ca/glass/start/  

•  Slide  41:  hFp://www.geekwire.com/2013/ibm-­‐takes-­‐watson-­‐cloud/  

44