the social science data revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf ·...
TRANSCRIPT
![Page 1: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/1.jpg)
The Social Science Data Revolution
Gary King
Institute for Quantitative Social ScienceHarvard University
(People, Power, & CyberPolitics Workshop, MIT, 12/8/11)
Gary King (Harvard, IQSS) The Data Revolution 1 / 12
![Page 2: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/2.jpg)
The Changing Evidence Base of Social Science Research
The Last 50 Years:
Survey research
Aggregate government statistics
In depth studies of individual places, people, or events
The Next 50 Years: Spectacular increases in new data sources, due to. . .
Much more of the above
Shrinking computers & the growing Internet: data everywhere
The replication movement: academic data sharing (e.g., Dataverse)
Analogue-to-digital transformation of government records
Advances in statistical methods, informatics, & software
The march of quantification: through academia, professions,government, & commerce (SuperCrunchers, The Numerati,MoneyBall)
The end of the quantitative-qualitative divide
Gary King (Harvard, IQSS) The Data Revolution 2 / 12
![Page 3: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/3.jpg)
The Changing Evidence Base of Social Science Research
The Last 50 Years:
Survey research
Aggregate government statistics
In depth studies of individual places, people, or events
The Next 50 Years: Spectacular increases in new data sources, due to. . .
Much more of the above
Shrinking computers & the growing Internet: data everywhere
The replication movement: academic data sharing (e.g., Dataverse)
Analogue-to-digital transformation of government records
Advances in statistical methods, informatics, & software
The march of quantification: through academia, professions,government, & commerce (SuperCrunchers, The Numerati,MoneyBall)
The end of the quantitative-qualitative divide
Gary King (Harvard, IQSS) The Data Revolution 2 / 12
![Page 4: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/4.jpg)
The Changing Evidence Base of Social Science Research
The Last 50 Years:
Survey research
Aggregate government statistics
In depth studies of individual places, people, or events
The Next 50 Years: Spectacular increases in new data sources, due to. . .
Much more of the above
Shrinking computers & the growing Internet: data everywhere
The replication movement: academic data sharing (e.g., Dataverse)
Analogue-to-digital transformation of government records
Advances in statistical methods, informatics, & software
The march of quantification: through academia, professions,government, & commerce (SuperCrunchers, The Numerati,MoneyBall)
The end of the quantitative-qualitative divide
Gary King (Harvard, IQSS) The Data Revolution 2 / 12
![Page 5: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/5.jpg)
The Changing Evidence Base of Social Science Research
The Last 50 Years:
Survey research
Aggregate government statistics
In depth studies of individual places, people, or events
The Next 50 Years: Spectacular increases in new data sources, due to. . .
Much more of the above
Shrinking computers & the growing Internet: data everywhere
The replication movement: academic data sharing (e.g., Dataverse)
Analogue-to-digital transformation of government records
Advances in statistical methods, informatics, & software
The march of quantification: through academia, professions,government, & commerce (SuperCrunchers, The Numerati,MoneyBall)
The end of the quantitative-qualitative divide
Gary King (Harvard, IQSS) The Data Revolution 2 / 12
![Page 6: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/6.jpg)
The Changing Evidence Base of Social Science Research
The Last 50 Years:
Survey research
Aggregate government statistics
In depth studies of individual places, people, or events
The Next 50 Years: Spectacular increases in new data sources, due to. . .
Much more of the above
Shrinking computers & the growing Internet: data everywhere
The replication movement: academic data sharing (e.g., Dataverse)
Analogue-to-digital transformation of government records
Advances in statistical methods, informatics, & software
The march of quantification: through academia, professions,government, & commerce (SuperCrunchers, The Numerati,MoneyBall)
The end of the quantitative-qualitative divide
Gary King (Harvard, IQSS) The Data Revolution 2 / 12
![Page 7: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/7.jpg)
The Changing Evidence Base of Social Science Research
The Last 50 Years:
Survey research
Aggregate government statistics
In depth studies of individual places, people, or events
The Next 50 Years: Spectacular increases in new data sources, due to. . .
Much more of the above
Shrinking computers & the growing Internet: data everywhere
The replication movement: academic data sharing (e.g., Dataverse)
Analogue-to-digital transformation of government records
Advances in statistical methods, informatics, & software
The march of quantification: through academia, professions,government, & commerce (SuperCrunchers, The Numerati,MoneyBall)
The end of the quantitative-qualitative divide
Gary King (Harvard, IQSS) The Data Revolution 2 / 12
![Page 8: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/8.jpg)
The Changing Evidence Base of Social Science Research
The Last 50 Years:
Survey research
Aggregate government statistics
In depth studies of individual places, people, or events
The Next 50 Years: Spectacular increases in new data sources, due to. . .
Much more of the above
Shrinking computers & the growing Internet: data everywhere
The replication movement: academic data sharing (e.g., Dataverse)
Analogue-to-digital transformation of government records
Advances in statistical methods, informatics, & software
The march of quantification: through academia, professions,government, & commerce (SuperCrunchers, The Numerati,MoneyBall)
The end of the quantitative-qualitative divide
Gary King (Harvard, IQSS) The Data Revolution 2 / 12
![Page 9: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/9.jpg)
The Changing Evidence Base of Social Science Research
The Last 50 Years:
Survey research
Aggregate government statistics
In depth studies of individual places, people, or events
The Next 50 Years: Spectacular increases in new data sources, due to. . .
Much more of the above
Shrinking computers & the growing Internet: data everywhere
The replication movement: academic data sharing (e.g., Dataverse)
Analogue-to-digital transformation of government records
Advances in statistical methods, informatics, & software
The march of quantification: through academia, professions,government, & commerce (SuperCrunchers, The Numerati,MoneyBall)
The end of the quantitative-qualitative divide
Gary King (Harvard, IQSS) The Data Revolution 2 / 12
![Page 10: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/10.jpg)
The Changing Evidence Base of Social Science Research
The Last 50 Years:
Survey research
Aggregate government statistics
In depth studies of individual places, people, or events
The Next 50 Years: Spectacular increases in new data sources, due to. . .
Much more of the above
Shrinking computers & the growing Internet: data everywhere
The replication movement: academic data sharing (e.g., Dataverse)
Analogue-to-digital transformation of government records
Advances in statistical methods, informatics, & software
The march of quantification: through academia, professions,government, & commerce (SuperCrunchers, The Numerati,MoneyBall)
The end of the quantitative-qualitative divide
Gary King (Harvard, IQSS) The Data Revolution 2 / 12
![Page 11: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/11.jpg)
The Changing Evidence Base of Social Science Research
The Last 50 Years:
Survey research
Aggregate government statistics
In depth studies of individual places, people, or events
The Next 50 Years: Spectacular increases in new data sources, due to. . .
Much more of the above
Shrinking computers & the growing Internet: data everywhere
The replication movement: academic data sharing (e.g., Dataverse)
Analogue-to-digital transformation of government records
Advances in statistical methods, informatics, & software
The march of quantification: through academia, professions,government, & commerce (SuperCrunchers, The Numerati,MoneyBall)
The end of the quantitative-qualitative divide
Gary King (Harvard, IQSS) The Data Revolution 2 / 12
![Page 12: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/12.jpg)
The Changing Evidence Base of Social Science Research
The Last 50 Years:
Survey research
Aggregate government statistics
In depth studies of individual places, people, or events
The Next 50 Years: Spectacular increases in new data sources, due to. . .
Much more of the above
Shrinking computers & the growing Internet: data everywhere
The replication movement: academic data sharing (e.g., Dataverse)
Analogue-to-digital transformation of government records
Advances in statistical methods, informatics, & software
The march of quantification: through academia, professions,government, & commerce (SuperCrunchers, The Numerati,MoneyBall)
The end of the quantitative-qualitative divide
Gary King (Harvard, IQSS) The Data Revolution 2 / 12
![Page 13: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/13.jpg)
The Changing Evidence Base of Social Science Research
The Last 50 Years:
Survey research
Aggregate government statistics
In depth studies of individual places, people, or events
The Next 50 Years: Spectacular increases in new data sources, due to. . .
Much more of the above
Shrinking computers & the growing Internet: data everywhere
The replication movement: academic data sharing (e.g., Dataverse)
Analogue-to-digital transformation of government records
Advances in statistical methods, informatics, & software
The march of quantification: through academia, professions,government, & commerce (SuperCrunchers, The Numerati,MoneyBall)
The end of the quantitative-qualitative divide
Gary King (Harvard, IQSS) The Data Revolution 2 / 12
![Page 14: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/14.jpg)
The Changing Evidence Base of Social Science Research
The Last 50 Years:
Survey research
Aggregate government statistics
In depth studies of individual places, people, or events
The Next 50 Years: Spectacular increases in new data sources, due to. . .
Much more of the above
Shrinking computers & the growing Internet: data everywhere
The replication movement: academic data sharing (e.g., Dataverse)
Analogue-to-digital transformation of government records
Advances in statistical methods, informatics, & software
The march of quantification: through academia, professions,government, & commerce (SuperCrunchers, The Numerati,MoneyBall)
The end of the quantitative-qualitative divide
Gary King (Harvard, IQSS) The Data Revolution 2 / 12
![Page 15: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/15.jpg)
Examples of what’s now possible
Opinions of activists:
≈1,000 interviews millions of politicalopinions in social media posts (1B every 4 days)
Exercise:
A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, electronic address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, or networks of roads and otherinfrastructure
Many, many more. . .
Gary King (Harvard, IQSS) The Data Revolution 3 / 12
![Page 16: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/16.jpg)
Examples of what’s now possible
Opinions of activists:
≈1,000 interviews millions of politicalopinions in social media posts (1B every 4 days)
Exercise:
A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, electronic address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, or networks of roads and otherinfrastructure
Many, many more. . .
Gary King (Harvard, IQSS) The Data Revolution 3 / 12
![Page 17: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/17.jpg)
Examples of what’s now possible
Opinions of activists: ≈1,000 interviews
millions of politicalopinions in social media posts (1B every 4 days)
Exercise:
A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, electronic address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, or networks of roads and otherinfrastructure
Many, many more. . .
Gary King (Harvard, IQSS) The Data Revolution 3 / 12
![Page 18: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/18.jpg)
Examples of what’s now possible
Opinions of activists: ≈1,000 interviews millions of politicalopinions in social media posts (1B every 4 days)
Exercise:
A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, electronic address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, or networks of roads and otherinfrastructure
Many, many more. . .
Gary King (Harvard, IQSS) The Data Revolution 3 / 12
![Page 19: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/19.jpg)
Examples of what’s now possible
Opinions of activists: ≈1,000 interviews millions of politicalopinions in social media posts (1B every 4 days)
Exercise:
A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, electronic address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, or networks of roads and otherinfrastructure
Many, many more. . .
Gary King (Harvard, IQSS) The Data Revolution 3 / 12
![Page 20: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/20.jpg)
Examples of what’s now possible
Opinions of activists: ≈1,000 interviews millions of politicalopinions in social media posts (1B every 4 days)
Exercise: A survey: “How many times did you exercise last week?
500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, electronic address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, or networks of roads and otherinfrastructure
Many, many more. . .
Gary King (Harvard, IQSS) The Data Revolution 3 / 12
![Page 21: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/21.jpg)
Examples of what’s now possible
Opinions of activists: ≈1,000 interviews millions of politicalopinions in social media posts (1B every 4 days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, electronic address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, or networks of roads and otherinfrastructure
Many, many more. . .
Gary King (Harvard, IQSS) The Data Revolution 3 / 12
![Page 22: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/22.jpg)
Examples of what’s now possible
Opinions of activists: ≈1,000 interviews millions of politicalopinions in social media posts (1B every 4 days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, electronic address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, or networks of roads and otherinfrastructure
Many, many more. . .
Gary King (Harvard, IQSS) The Data Revolution 3 / 12
![Page 23: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/23.jpg)
Examples of what’s now possible
Opinions of activists: ≈1,000 interviews millions of politicalopinions in social media posts (1B every 4 days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts: A survey: “Please tell me your 5 best friends”
continuous record of phone calls, emails, text messages, bluetooth,social media connections, electronic address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, or networks of roads and otherinfrastructure
Many, many more. . .
Gary King (Harvard, IQSS) The Data Revolution 3 / 12
![Page 24: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/24.jpg)
Examples of what’s now possible
Opinions of activists: ≈1,000 interviews millions of politicalopinions in social media posts (1B every 4 days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts: A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, electronic address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, or networks of roads and otherinfrastructure
Many, many more. . .
Gary King (Harvard, IQSS) The Data Revolution 3 / 12
![Page 25: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/25.jpg)
Examples of what’s now possible
Opinions of activists: ≈1,000 interviews millions of politicalopinions in social media posts (1B every 4 days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts: A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, electronic address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, or networks of roads and otherinfrastructure
Many, many more. . .
Gary King (Harvard, IQSS) The Data Revolution 3 / 12
![Page 26: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/26.jpg)
Examples of what’s now possible
Opinions of activists: ≈1,000 interviews millions of politicalopinions in social media posts (1B every 4 days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts: A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, electronic address books
Economic development in developing countries: Dubious ornonexistent governmental statistics
satellite images ofhuman-generated light at night, or networks of roads and otherinfrastructure
Many, many more. . .
Gary King (Harvard, IQSS) The Data Revolution 3 / 12
![Page 27: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/27.jpg)
Examples of what’s now possible
Opinions of activists: ≈1,000 interviews millions of politicalopinions in social media posts (1B every 4 days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts: A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, electronic address books
Economic development in developing countries: Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, or networks of roads and otherinfrastructure
Many, many more. . .
Gary King (Harvard, IQSS) The Data Revolution 3 / 12
![Page 28: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/28.jpg)
Examples of what’s now possible
Opinions of activists: ≈1,000 interviews millions of politicalopinions in social media posts (1B every 4 days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts: A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, electronic address books
Economic development in developing countries: Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, or networks of roads and otherinfrastructure
Many, many more. . .
Gary King (Harvard, IQSS) The Data Revolution 3 / 12
![Page 29: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/29.jpg)
One Example
of Automated Text Analysis
Gary King (Harvard, IQSS) The Data Revolution 4 / 12
![Page 30: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/30.jpg)
One Example
of Automated Text Analysis
Gary King (Harvard, IQSS) The Data Revolution 4 / 12
![Page 31: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/31.jpg)
How to Read Billions of Social Media Posts
Daniel Hopkins and Gary King. “A Method of Automated NonparametricContent Analysis for Social Science” AJPS. 54 (2010): 229-247
1 Downloaded & analyzed all English-language blog posts every day.
(We learned: The university is not a research, not production,environment!)
2 Commercialized in 2008:
3 CH collects all social media posts, runs huge servers with our methods
4 Crimson Hexagon Academic Grant Program to be announced soon
(I.e., easy to do what I’ll describe today)
Gary King (Harvard, IQSS) The Data Revolution 5 / 12
![Page 32: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/32.jpg)
How to Read Billions of Social Media Posts
Daniel Hopkins and Gary King. “A Method of Automated NonparametricContent Analysis for Social Science” AJPS. 54 (2010): 229-247
1 Downloaded & analyzed all English-language blog posts every day.
(We learned: The university is not a research, not production,environment!)
2 Commercialized in 2008:
3 CH collects all social media posts, runs huge servers with our methods
4 Crimson Hexagon Academic Grant Program to be announced soon
(I.e., easy to do what I’ll describe today)
Gary King (Harvard, IQSS) The Data Revolution 5 / 12
![Page 33: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/33.jpg)
How to Read Billions of Social Media Posts
Daniel Hopkins and Gary King. “A Method of Automated NonparametricContent Analysis for Social Science” AJPS. 54 (2010): 229-247
1 Downloaded & analyzed all English-language blog posts every day.
(We learned: The university is not a research, not production,environment!)
2 Commercialized in 2008:
3 CH collects all social media posts, runs huge servers with our methods
4 Crimson Hexagon Academic Grant Program to be announced soon
(I.e., easy to do what I’ll describe today)
Gary King (Harvard, IQSS) The Data Revolution 5 / 12
![Page 34: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/34.jpg)
How to Read Billions of Social Media Posts
Daniel Hopkins and Gary King. “A Method of Automated NonparametricContent Analysis for Social Science” AJPS. 54 (2010): 229-247
1 Downloaded & analyzed all English-language blog posts every day.(We learned: The university is not a research, not production,environment!)
2 Commercialized in 2008:
3 CH collects all social media posts, runs huge servers with our methods
4 Crimson Hexagon Academic Grant Program to be announced soon
(I.e., easy to do what I’ll describe today)
Gary King (Harvard, IQSS) The Data Revolution 5 / 12
![Page 35: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/35.jpg)
How to Read Billions of Social Media Posts
Daniel Hopkins and Gary King. “A Method of Automated NonparametricContent Analysis for Social Science” AJPS. 54 (2010): 229-247
1 Downloaded & analyzed all English-language blog posts every day.(We learned: The university is not a research, not production,environment!)
2 Commercialized in 2008:
3 CH collects all social media posts, runs huge servers with our methods
4 Crimson Hexagon Academic Grant Program to be announced soon
(I.e., easy to do what I’ll describe today)
Gary King (Harvard, IQSS) The Data Revolution 5 / 12
![Page 36: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/36.jpg)
How to Read Billions of Social Media Posts
Daniel Hopkins and Gary King. “A Method of Automated NonparametricContent Analysis for Social Science” AJPS. 54 (2010): 229-247
1 Downloaded & analyzed all English-language blog posts every day.(We learned: The university is not a research, not production,environment!)
2 Commercialized in 2008:
3 CH collects all social media posts, runs huge servers with our methods
4 Crimson Hexagon Academic Grant Program to be announced soon
(I.e., easy to do what I’ll describe today)
Gary King (Harvard, IQSS) The Data Revolution 5 / 12
![Page 37: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/37.jpg)
How to Read Billions of Social Media Posts
Daniel Hopkins and Gary King. “A Method of Automated NonparametricContent Analysis for Social Science” AJPS. 54 (2010): 229-247
1 Downloaded & analyzed all English-language blog posts every day.(We learned: The university is not a research, not production,environment!)
2 Commercialized in 2008:
3 CH collects all social media posts, runs huge servers with our methods
4 Crimson Hexagon Academic Grant Program to be announced soon
(I.e., easy to do what I’ll describe today)
Gary King (Harvard, IQSS) The Data Revolution 5 / 12
![Page 38: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/38.jpg)
How to Read Billions of Social Media Posts
Daniel Hopkins and Gary King. “A Method of Automated NonparametricContent Analysis for Social Science” AJPS. 54 (2010): 229-247
1 Downloaded & analyzed all English-language blog posts every day.(We learned: The university is not a research, not production,environment!)
2 Commercialized in 2008:
3 CH collects all social media posts, runs huge servers with our methods
4 Crimson Hexagon Academic Grant Program to be announced soon(I.e., easy to do what I’ll describe today)
Gary King (Harvard, IQSS) The Data Revolution 5 / 12
![Page 39: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/39.jpg)
Example: Reactions to John Kerry’s Botched Joke
You know, education — if you make the most of it . . . you cando well. If you don’t, you get stuck in Iraq.
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●●
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Affect Towards John Kerry
2006−2007
Pro
port
ion
Sept Oct Nov Dec Jan Feb Mar
−2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Affect Towards John Kerry
2006−2007
Pro
port
ion
Sept Oct Nov Dec Jan Feb Mar
−1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Affect Towards John Kerry
2006−2007
Pro
port
ion
Sept Oct Nov Dec Jan Feb Mar
0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Affect Towards John Kerry
2006−2007
Pro
port
ion
Sept Oct Nov Dec Jan Feb Mar
1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Affect Towards John Kerry
2006−2007
Pro
port
ion
Sept Oct Nov Dec Jan Feb Mar
2
Gary King (Harvard, IQSS) The Data Revolution 6 / 12
![Page 40: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/40.jpg)
Example: Reactions to John Kerry’s Botched Joke
You know, education — if you make the most of it . . . you cando well. If you don’t, you get stuck in Iraq.
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●●
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Affect Towards John Kerry
2006−2007
Pro
port
ion
Sept Oct Nov Dec Jan Feb Mar
−2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Affect Towards John Kerry
2006−2007
Pro
port
ion
Sept Oct Nov Dec Jan Feb Mar
−1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Affect Towards John Kerry
2006−2007
Pro
port
ion
Sept Oct Nov Dec Jan Feb Mar
0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Affect Towards John Kerry
2006−2007
Pro
port
ion
Sept Oct Nov Dec Jan Feb Mar
1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Affect Towards John Kerry
2006−2007
Pro
port
ion
Sept Oct Nov Dec Jan Feb Mar
2
Gary King (Harvard, IQSS) The Data Revolution 6 / 12
![Page 41: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/41.jpg)
Example: Reactions to John Kerry’s Botched Joke
You know, education — if you make the most of it . . . you cando well. If you don’t, you get stuck in Iraq.
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●●
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Affect Towards John Kerry
2006−2007
Pro
port
ion
Sept Oct Nov Dec Jan Feb Mar
−2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Affect Towards John Kerry
2006−2007
Pro
port
ion
Sept Oct Nov Dec Jan Feb Mar
−1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Affect Towards John Kerry
2006−2007
Pro
port
ion
Sept Oct Nov Dec Jan Feb Mar
0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Affect Towards John Kerry
2006−2007
Pro
port
ion
Sept Oct Nov Dec Jan Feb Mar
1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Affect Towards John Kerry
2006−2007
Pro
port
ion
Sept Oct Nov Dec Jan Feb Mar
2
Gary King (Harvard, IQSS) The Data Revolution 6 / 12
![Page 42: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/42.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions
(% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 43: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/43.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions
(% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 44: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/44.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)
Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions
(% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 45: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/45.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)
Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions
(% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 46: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/46.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions
(% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 47: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/47.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions
(% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 48: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/48.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)
Social Science: category proportions
(% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 49: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/49.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions
(% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 50: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/50.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions (% of email which is spam;
%negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 51: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/51.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions (% of email which is spam; %negative comments about Obama;
% of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 52: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/52.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions (% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime
; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 53: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/53.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions (% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 54: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/54.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions (% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 55: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/55.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions (% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurate
High classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 56: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/56.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions (% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions
70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 57: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/57.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions (% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportions
New methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 58: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/58.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions (% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions
,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 59: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/59.jpg)
Data and Quantities of Interest
Input Data:
All social media posts (or other documents)Categories (e.g., posts about US candidates: extremely negative,negative, neutral, positive, extremely positive, no opinion, not a blog)Example documents from each category
Quantities of interest
Computer science: individual document classification (spam filters,Google searches)Social Science: category proportions (% of email which is spam; %negative comments about Obama; % of Egyptian posts supporting theregime; support for different solutions to the Euro $ crisis)
Estimation
Classifications add up to proportions only if accurateHigh classification accuracy ; unbiased category proportions70% classification accuracy is high ⇒ disaster for category proportionsNew methodology unbiased category proportions,(even when classification accuracy is low)
Gary King (Harvard, IQSS) The Data Revolution 7 / 12
![Page 60: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/60.jpg)
What Else Can We do With this?
You choose:
Data: country, documents, languageCategories: based on sentiment, topics, people, events, etc.(often pre-censorship)
You provide: example documents for each category
Results: Highly accurate category proportions over time
Qualifications:
Opinion not sampled randomly; but no pop quizzes about unknownsubjectsMeasures the ongoing conversation: the classical notion of “activatedpublic opinion”
Potential academic applications: very widespread
Gary King (Harvard, IQSS) The Data Revolution 8 / 12
![Page 61: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/61.jpg)
What Else Can We do With this?
You choose:
Data: country, documents, language
Categories: based on sentiment, topics, people, events, etc.(often pre-censorship)
You provide: example documents for each category
Results: Highly accurate category proportions over time
Qualifications:
Opinion not sampled randomly; but no pop quizzes about unknownsubjectsMeasures the ongoing conversation: the classical notion of “activatedpublic opinion”
Potential academic applications: very widespread
Gary King (Harvard, IQSS) The Data Revolution 8 / 12
![Page 62: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/62.jpg)
What Else Can We do With this?
You choose:
Data: country, documents, languageCategories: based on sentiment, topics, people, events, etc.
(often pre-censorship)
You provide: example documents for each category
Results: Highly accurate category proportions over time
Qualifications:
Opinion not sampled randomly; but no pop quizzes about unknownsubjectsMeasures the ongoing conversation: the classical notion of “activatedpublic opinion”
Potential academic applications: very widespread
Gary King (Harvard, IQSS) The Data Revolution 8 / 12
![Page 63: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/63.jpg)
What Else Can We do With this?
You choose:
Data: country, documents, languageCategories: based on sentiment, topics, people, events, etc.(often pre-censorship)
You provide: example documents for each category
Results: Highly accurate category proportions over time
Qualifications:
Opinion not sampled randomly; but no pop quizzes about unknownsubjectsMeasures the ongoing conversation: the classical notion of “activatedpublic opinion”
Potential academic applications: very widespread
Gary King (Harvard, IQSS) The Data Revolution 8 / 12
![Page 64: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/64.jpg)
What Else Can We do With this?
You choose:
Data: country, documents, languageCategories: based on sentiment, topics, people, events, etc.(often pre-censorship)
You provide: example documents for each category
Results: Highly accurate category proportions over time
Qualifications:
Opinion not sampled randomly; but no pop quizzes about unknownsubjectsMeasures the ongoing conversation: the classical notion of “activatedpublic opinion”
Potential academic applications: very widespread
Gary King (Harvard, IQSS) The Data Revolution 8 / 12
![Page 65: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/65.jpg)
What Else Can We do With this?
You choose:
Data: country, documents, languageCategories: based on sentiment, topics, people, events, etc.(often pre-censorship)
You provide: example documents for each category
Results: Highly accurate category proportions over time
Qualifications:
Opinion not sampled randomly; but no pop quizzes about unknownsubjectsMeasures the ongoing conversation: the classical notion of “activatedpublic opinion”
Potential academic applications: very widespread
Gary King (Harvard, IQSS) The Data Revolution 8 / 12
![Page 66: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/66.jpg)
What Else Can We do With this?
You choose:
Data: country, documents, languageCategories: based on sentiment, topics, people, events, etc.(often pre-censorship)
You provide: example documents for each category
Results: Highly accurate category proportions over time
Qualifications:
Opinion not sampled randomly; but no pop quizzes about unknownsubjectsMeasures the ongoing conversation: the classical notion of “activatedpublic opinion”
Potential academic applications: very widespread
Gary King (Harvard, IQSS) The Data Revolution 8 / 12
![Page 67: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/67.jpg)
What Else Can We do With this?
You choose:
Data: country, documents, languageCategories: based on sentiment, topics, people, events, etc.(often pre-censorship)
You provide: example documents for each category
Results: Highly accurate category proportions over time
Qualifications:
Opinion not sampled randomly; but no pop quizzes about unknownsubjects
Measures the ongoing conversation: the classical notion of “activatedpublic opinion”
Potential academic applications: very widespread
Gary King (Harvard, IQSS) The Data Revolution 8 / 12
![Page 68: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/68.jpg)
What Else Can We do With this?
You choose:
Data: country, documents, languageCategories: based on sentiment, topics, people, events, etc.(often pre-censorship)
You provide: example documents for each category
Results: Highly accurate category proportions over time
Qualifications:
Opinion not sampled randomly; but no pop quizzes about unknownsubjectsMeasures the ongoing conversation: the classical notion of “activatedpublic opinion”
Potential academic applications: very widespread
Gary King (Harvard, IQSS) The Data Revolution 8 / 12
![Page 69: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/69.jpg)
What Else Can We do With this?
You choose:
Data: country, documents, languageCategories: based on sentiment, topics, people, events, etc.(often pre-censorship)
You provide: example documents for each category
Results: Highly accurate category proportions over time
Qualifications:
Opinion not sampled randomly; but no pop quizzes about unknownsubjectsMeasures the ongoing conversation: the classical notion of “activatedpublic opinion”
Potential academic applications: very widespread
Gary King (Harvard, IQSS) The Data Revolution 8 / 12
![Page 70: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/70.jpg)
Some New Data Types
1 Unstructured text: emails (1 LOC every 10 minutes), speeches,government reports, blogs, social media updates, web pages,newspapers, scholarly literature
2 Commercial activity: credit cards, sales data, and real estatetransactions, product RFIDs
3 Geographic location: cell phones, Fastlane or EZPass transponders,garage cameras
4 Health information: digital medical records, hospital admittances,google/MS health, and accelerometers and other devices beingincluded in cell phones
5 Biological sciences: effectively becoming social sciences as genomics,proteomics, metabolomics, and brain imaging produce huge numbersof person-level variables.
6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, and campaign contributions
Gary King (Harvard, IQSS) The Data Revolution 9 / 12
![Page 71: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/71.jpg)
Some New Data Types
1 Unstructured text: emails (1 LOC every 10 minutes), speeches,government reports, blogs, social media updates, web pages,newspapers, scholarly literature
2 Commercial activity: credit cards, sales data, and real estatetransactions, product RFIDs
3 Geographic location: cell phones, Fastlane or EZPass transponders,garage cameras
4 Health information: digital medical records, hospital admittances,google/MS health, and accelerometers and other devices beingincluded in cell phones
5 Biological sciences: effectively becoming social sciences as genomics,proteomics, metabolomics, and brain imaging produce huge numbersof person-level variables.
6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, and campaign contributions
Gary King (Harvard, IQSS) The Data Revolution 9 / 12
![Page 72: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/72.jpg)
Some New Data Types
1 Unstructured text: emails (1 LOC every 10 minutes), speeches,government reports, blogs, social media updates, web pages,newspapers, scholarly literature
2 Commercial activity: credit cards, sales data, and real estatetransactions, product RFIDs
3 Geographic location: cell phones, Fastlane or EZPass transponders,garage cameras
4 Health information: digital medical records, hospital admittances,google/MS health, and accelerometers and other devices beingincluded in cell phones
5 Biological sciences: effectively becoming social sciences as genomics,proteomics, metabolomics, and brain imaging produce huge numbersof person-level variables.
6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, and campaign contributions
Gary King (Harvard, IQSS) The Data Revolution 9 / 12
![Page 73: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/73.jpg)
Some New Data Types
1 Unstructured text: emails (1 LOC every 10 minutes), speeches,government reports, blogs, social media updates, web pages,newspapers, scholarly literature
2 Commercial activity: credit cards, sales data, and real estatetransactions, product RFIDs
3 Geographic location: cell phones, Fastlane or EZPass transponders,garage cameras
4 Health information: digital medical records, hospital admittances,google/MS health, and accelerometers and other devices beingincluded in cell phones
5 Biological sciences: effectively becoming social sciences as genomics,proteomics, metabolomics, and brain imaging produce huge numbersof person-level variables.
6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, and campaign contributions
Gary King (Harvard, IQSS) The Data Revolution 9 / 12
![Page 74: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/74.jpg)
Some New Data Types
1 Unstructured text: emails (1 LOC every 10 minutes), speeches,government reports, blogs, social media updates, web pages,newspapers, scholarly literature
2 Commercial activity: credit cards, sales data, and real estatetransactions, product RFIDs
3 Geographic location: cell phones, Fastlane or EZPass transponders,garage cameras
4 Health information: digital medical records, hospital admittances,google/MS health, and accelerometers and other devices beingincluded in cell phones
5 Biological sciences: effectively becoming social sciences as genomics,proteomics, metabolomics, and brain imaging produce huge numbersof person-level variables.
6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, and campaign contributions
Gary King (Harvard, IQSS) The Data Revolution 9 / 12
![Page 75: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/75.jpg)
Some New Data Types
1 Unstructured text: emails (1 LOC every 10 minutes), speeches,government reports, blogs, social media updates, web pages,newspapers, scholarly literature
2 Commercial activity: credit cards, sales data, and real estatetransactions, product RFIDs
3 Geographic location: cell phones, Fastlane or EZPass transponders,garage cameras
4 Health information: digital medical records, hospital admittances,google/MS health, and accelerometers and other devices beingincluded in cell phones
5 Biological sciences: effectively becoming social sciences as genomics,proteomics, metabolomics, and brain imaging produce huge numbersof person-level variables.
6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, and campaign contributions
Gary King (Harvard, IQSS) The Data Revolution 9 / 12
![Page 76: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/76.jpg)
Some New Data Types
1 Unstructured text: emails (1 LOC every 10 minutes), speeches,government reports, blogs, social media updates, web pages,newspapers, scholarly literature
2 Commercial activity: credit cards, sales data, and real estatetransactions, product RFIDs
3 Geographic location: cell phones, Fastlane or EZPass transponders,garage cameras
4 Health information: digital medical records, hospital admittances,google/MS health, and accelerometers and other devices beingincluded in cell phones
5 Biological sciences: effectively becoming social sciences as genomics,proteomics, metabolomics, and brain imaging produce huge numbersof person-level variables.
6 Satellite imagery: increasing in scope, resolution, and availability.
7 Electoral activity: ballot images, precinct-level results, individual-levelregistration, primary participation, and campaign contributions
Gary King (Harvard, IQSS) The Data Revolution 9 / 12
![Page 77: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/77.jpg)
Some New Data Types
1 Unstructured text: emails (1 LOC every 10 minutes), speeches,government reports, blogs, social media updates, web pages,newspapers, scholarly literature
2 Commercial activity: credit cards, sales data, and real estatetransactions, product RFIDs
3 Geographic location: cell phones, Fastlane or EZPass transponders,garage cameras
4 Health information: digital medical records, hospital admittances,google/MS health, and accelerometers and other devices beingincluded in cell phones
5 Biological sciences: effectively becoming social sciences as genomics,proteomics, metabolomics, and brain imaging produce huge numbersof person-level variables.
6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, and campaign contributions
Gary King (Harvard, IQSS) The Data Revolution 9 / 12
![Page 78: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/78.jpg)
Some More New Data Examples
8 Social media: facebook, twitter, social bookmarking, blog comments,product reviews, virtual worlds, game behavior, crowd sourcing
9 Web surfing artifacts: clicks, searches, and advertising clickthroughs.(Google collects 1 petabyte/72 minutes on human behavior!)
10 Multiplayer web games and virtual worlds: Billions of highlycontrolled experiments on human behavior
11 Government bureaucracies: moving from paper to electronic databases, increasing availability
12 Governmental policies: requiring more data collection, such e.g., “NoChild Left Behind Act”; allowing randomized policy experiments;Obama pushing data distribution
13 Scholarly data: the replication movement in academia, led in part bypolitical science, is massively increasing data sharing
Gary King (Harvard, IQSS) The Data Revolution 10 / 12
![Page 79: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/79.jpg)
Some More New Data Examples
8 Social media: facebook, twitter, social bookmarking, blog comments,product reviews, virtual worlds, game behavior, crowd sourcing
9 Web surfing artifacts: clicks, searches, and advertising clickthroughs.(Google collects 1 petabyte/72 minutes on human behavior!)
10 Multiplayer web games and virtual worlds: Billions of highlycontrolled experiments on human behavior
11 Government bureaucracies: moving from paper to electronic databases, increasing availability
12 Governmental policies: requiring more data collection, such e.g., “NoChild Left Behind Act”; allowing randomized policy experiments;Obama pushing data distribution
13 Scholarly data: the replication movement in academia, led in part bypolitical science, is massively increasing data sharing
Gary King (Harvard, IQSS) The Data Revolution 10 / 12
![Page 80: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/80.jpg)
Some More New Data Examples
8 Social media: facebook, twitter, social bookmarking, blog comments,product reviews, virtual worlds, game behavior, crowd sourcing
9 Web surfing artifacts: clicks, searches, and advertising clickthroughs.(Google collects 1 petabyte/72 minutes on human behavior!)
10 Multiplayer web games and virtual worlds: Billions of highlycontrolled experiments on human behavior
11 Government bureaucracies: moving from paper to electronic databases, increasing availability
12 Governmental policies: requiring more data collection, such e.g., “NoChild Left Behind Act”; allowing randomized policy experiments;Obama pushing data distribution
13 Scholarly data: the replication movement in academia, led in part bypolitical science, is massively increasing data sharing
Gary King (Harvard, IQSS) The Data Revolution 10 / 12
![Page 81: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/81.jpg)
Some More New Data Examples
8 Social media: facebook, twitter, social bookmarking, blog comments,product reviews, virtual worlds, game behavior, crowd sourcing
9 Web surfing artifacts: clicks, searches, and advertising clickthroughs.(Google collects 1 petabyte/72 minutes on human behavior!)
10 Multiplayer web games and virtual worlds: Billions of highlycontrolled experiments on human behavior
11 Government bureaucracies: moving from paper to electronic databases, increasing availability
12 Governmental policies: requiring more data collection, such e.g., “NoChild Left Behind Act”; allowing randomized policy experiments;Obama pushing data distribution
13 Scholarly data: the replication movement in academia, led in part bypolitical science, is massively increasing data sharing
Gary King (Harvard, IQSS) The Data Revolution 10 / 12
![Page 82: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/82.jpg)
Some More New Data Examples
8 Social media: facebook, twitter, social bookmarking, blog comments,product reviews, virtual worlds, game behavior, crowd sourcing
9 Web surfing artifacts: clicks, searches, and advertising clickthroughs.(Google collects 1 petabyte/72 minutes on human behavior!)
10 Multiplayer web games and virtual worlds: Billions of highlycontrolled experiments on human behavior
11 Government bureaucracies: moving from paper to electronic databases, increasing availability
12 Governmental policies: requiring more data collection, such e.g., “NoChild Left Behind Act”; allowing randomized policy experiments;Obama pushing data distribution
13 Scholarly data: the replication movement in academia, led in part bypolitical science, is massively increasing data sharing
Gary King (Harvard, IQSS) The Data Revolution 10 / 12
![Page 83: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/83.jpg)
Some More New Data Examples
8 Social media: facebook, twitter, social bookmarking, blog comments,product reviews, virtual worlds, game behavior, crowd sourcing
9 Web surfing artifacts: clicks, searches, and advertising clickthroughs.(Google collects 1 petabyte/72 minutes on human behavior!)
10 Multiplayer web games and virtual worlds: Billions of highlycontrolled experiments on human behavior
11 Government bureaucracies: moving from paper to electronic databases, increasing availability
12 Governmental policies: requiring more data collection, such e.g., “NoChild Left Behind Act”; allowing randomized policy experiments;Obama pushing data distribution
13 Scholarly data: the replication movement in academia, led in part bypolitical science, is massively increasing data sharing
Gary King (Harvard, IQSS) The Data Revolution 10 / 12
![Page 84: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/84.jpg)
Some More New Data Examples
8 Social media: facebook, twitter, social bookmarking, blog comments,product reviews, virtual worlds, game behavior, crowd sourcing
9 Web surfing artifacts: clicks, searches, and advertising clickthroughs.(Google collects 1 petabyte/72 minutes on human behavior!)
10 Multiplayer web games and virtual worlds: Billions of highlycontrolled experiments on human behavior
11 Government bureaucracies: moving from paper to electronic databases, increasing availability
12 Governmental policies: requiring more data collection, such e.g., “NoChild Left Behind Act”; allowing randomized policy experiments;Obama pushing data distribution
13 Scholarly data: the replication movement in academia, led in part bypolitical science, is massively increasing data sharing
Gary King (Harvard, IQSS) The Data Revolution 10 / 12
![Page 85: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/85.jpg)
Enormous Emerging Opportunities for Social Scientists
For the first time: technologies, policies, data, and methods aremaking it feasible to attack some of the most vexing problems thatafflict human society
A massive change from studying problems to understanding andsolving problems
And then there’s you & me:
In legislatures, courts, academic departments, . . . , change comes fromreplacement not conversionWill we wait to be replaced? or put in the effort to convert and learnhow to use the new information?
Gary King (Harvard, IQSS) The Data Revolution 11 / 12
![Page 86: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/86.jpg)
Enormous Emerging Opportunities for Social Scientists
For the first time: technologies, policies, data, and methods aremaking it feasible to attack some of the most vexing problems thatafflict human society
A massive change from studying problems to understanding andsolving problems
And then there’s you & me:
In legislatures, courts, academic departments, . . . , change comes fromreplacement not conversionWill we wait to be replaced? or put in the effort to convert and learnhow to use the new information?
Gary King (Harvard, IQSS) The Data Revolution 11 / 12
![Page 87: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/87.jpg)
Enormous Emerging Opportunities for Social Scientists
For the first time: technologies, policies, data, and methods aremaking it feasible to attack some of the most vexing problems thatafflict human society
A massive change from studying problems to understanding andsolving problems
And then there’s you & me:
In legislatures, courts, academic departments, . . . , change comes fromreplacement not conversionWill we wait to be replaced? or put in the effort to convert and learnhow to use the new information?
Gary King (Harvard, IQSS) The Data Revolution 11 / 12
![Page 88: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/88.jpg)
Enormous Emerging Opportunities for Social Scientists
For the first time: technologies, policies, data, and methods aremaking it feasible to attack some of the most vexing problems thatafflict human society
A massive change from studying problems to understanding andsolving problems
And then there’s you & me:
In legislatures, courts, academic departments, . . . , change comes fromreplacement not conversionWill we wait to be replaced? or put in the effort to convert and learnhow to use the new information?
Gary King (Harvard, IQSS) The Data Revolution 11 / 12
![Page 89: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/89.jpg)
Enormous Emerging Opportunities for Social Scientists
For the first time: technologies, policies, data, and methods aremaking it feasible to attack some of the most vexing problems thatafflict human society
A massive change from studying problems to understanding andsolving problems
And then there’s you & me:
In legislatures, courts, academic departments, . . . , change comes fromreplacement not conversion
Will we wait to be replaced? or put in the effort to convert and learnhow to use the new information?
Gary King (Harvard, IQSS) The Data Revolution 11 / 12
![Page 90: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/90.jpg)
Enormous Emerging Opportunities for Social Scientists
For the first time: technologies, policies, data, and methods aremaking it feasible to attack some of the most vexing problems thatafflict human society
A massive change from studying problems to understanding andsolving problems
And then there’s you & me:
In legislatures, courts, academic departments, . . . , change comes fromreplacement not conversionWill we wait to be replaced? or put in the effort to convert and learnhow to use the new information?
Gary King (Harvard, IQSS) The Data Revolution 11 / 12
![Page 91: The Social Science Data Revolutionscholar.harvard.edu/files/gking/files/evbase-ecir.pdf · government, & commerce (SuperCrunchers, The Numerati, MoneyBall) The end of the quantitative-qualitative](https://reader034.vdocuments.us/reader034/viewer/2022050208/5f5b25a380e3196e971ba62a/html5/thumbnails/91.jpg)
For more information
http://GKing.Harvard.edu
Gary King (Harvard, IQSS) The Data Revolution 12 / 12