how and why study big cultural data

Post on 17-Nov-2014

9.689 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Lev Manovich. How and why study big cultural data. Presentation at Data Mining and Visualization for the Humanities symposium, NYU, March 19, 2012. softwarestudies.com

TRANSCRIPT

How and why study big cultural data

Lev Manovichmanovich@ucsd.edusoftwarestudies.com

New York Times (November 16, 2010):“The next big idea in language, history and the arts? Data.”

NEH/NSF Digging into Data competition (2009): “How does the notion of scale affect humanities and social science research? Now that scholars have access to huge repositories of digitized data—far more than they could read in a lifetime—what does that mean for research?”

Why study big cultural data ?

1 study societies through the social media traces - social computing (but do we study society or only social media itself?)

2 more inclusive understanding of history and present (using much larger samples)

3 detect large scale cultural patterns 4 the best way to follow global professionally produced digital culture; understand new developed cultural fields (“X” design)

5 map cultural variability and diversity

Data: 3,724 18th century volumes, using 10,000 most frequent words (excluding proper nouns). Ted Underwood. The Differentiation of Literary and nonliterary diction, 1700-1900.

Growth of a global culture space after 1990: Cumulative number of new art biennales, 1895-2008.

6

modern (19th-20th centuries) social and cultural theory: describe what is similar (classes, structures, types) / statistics (reduction)

computational humanities and social science should focus on describing what is different / variability / diversity

not “from data to knowledge” but from (incomplete) knowledge to actual cultural data

We are no longer interested in the conformity of an individual to an ideal type; we are now interested in the relation of an individual to the other individuals with which it interacts... Relations will be more important than categories; functions, which are variable, will be more important than purposes; transitions will be more important than boundaries; sequences will be more important than hierarchies.

Louis Menand on Darvin, 2001.

Visualization: Thinking without “large” categories

“The ontological status of assemblages, large and small, is always that of unique, singular individuals.” “Unlike taxonomic essentialism in which genus, species and individuals are separate ontological categories, the ontology of assemblages is flat since it contains nothing but differently scaled individual singularities.”Manuel DeLanda. A New Philosophy of Society.

Bruno Latour:

The “whole” is now nothing more than a provisional visualization which can be modified and reversed at will, by moving back to the individual components, and then looking for yet other tools to regroup the same elements into alternative assemblages.

How to study big cultural data ?

how to explore massive visual collections (exploratory media analysis)?

which data analysis and visualization techniques are appropriate for non-technical users? How to democratize data analysis?

Our approach:

media visualization (visualizing media directly rather than only using abstract infovis language)

visualizing large non-visual data using abstraction

media visualization: showing visual data directly

Every cover of Times magazine, 1923-2009 (4535 images).X-axis = publication date. Y-axis = saturation mean.

our media visualization software on 287 megapixel display (image: 1 million manga pages)

our software on new display wall with thin bezels (data: 4535 Time magazine covers)

Our methods:

1. media visualization using existing metadata - show complete collection

2. media visualization using existing metadata - use samples to better reveal patterns

3. digital image processing + media visualization (use simple image features which have direct perceptual meaning - and gradually introduce humanists to image processing)

1. media visualization / existing metadata: montage

2. media visualization / existing metadata / sample

Image plots of selected paintings by six impressionist artists. X-axis = mean saturation. Y-axis = median hue.Megan O’Rourke, 2012.

3. digital image processing + media visualization

Advantages:

replacing discrete categorieswith continuos attributes

1. from timelines to curves

2. better represent analog cultural attributes

3. understand cultural landscapes (fuzzy / overlapping / hard clusters?)

4. visualize cultural variability

5. discover new gropings

1. from timelines to curves

2. better represent analog attributes

3. our maps of cultural landscapes reveal fuzzy/overlapping clusters - rather than discrete categories with hard boundaries

4. visualize cultural variability

5. discover new groupings

Studying large cultural data challenges our existing theoretical concepts and assumptions

example: what is “style”?

one million manga pages

single short manga series (>1000 pages)

776 Vincent van Gogh paintings

Selected current projects:

7000 year old stone arrowheads (with UCSD anthropologist and CS postdoc at University of Washington)

comparing Art Now & Graphic design Flickr groups (340,000 images)(with CS collaborator from Laurence Berkeley National Laboratory)

One million images (+ metadata) from deviantArt (with an art historian / DH collaborator from Netherlands Academy of Arts and Sciences)

4.7 million newspaper pages from Library of Congress (UCSD undergraduate students)

virtual world / game analytics (NSF Eager, with UCSD Experimental Games Lab)

SEASR tools and workflows for working with image and video data (with NCSA at University of Illinois, Urbana-Champaign)

Conclusion: Computational humanities vs.digital humanities

“The capacity to collect and analyze massive amounts of data has transformed such fields as biology and physics. But the emergence of a data-driven 'computational social science' has been much slower. Leading journals in economics, sociology, and political science show little evidence of this field. But computational social science is occurring in Internet companies such as Google and Yahoo, and in government agencies such as the U.S. National Security Agency.”“Computational Social Science.” Science, vol. 323, no. 6, February 2009.

Digital humanities: scholars are mostly working with the archives of digitized historical cultural archives which were created by libraries and universities with the funding from NEH and other institutions.

Computational humanities: Analyzing massive amounts of cultural content and and peoples' conversations, opinions, and cultural activities online - personal and professional web sites, general and specialized social media networks and sites. This data offers us unprecedented opportunities to understand cultural processes and their dynamics and develop new concepts and models which can be also used to better understand the past.

Current players in computational humanities: - Google, Facebook, YouTube, Blue Fin Lans, Echonest, and many other companies which analyze social media signals (blogs, Twitter, etc.) and the content of media on social networks.- Computer scientists who are working with this data.

manovich@ucsd.edu

www.softwarestudies.com

Appendix:

visualizing video collections

use media visualization with a set of keyframes

automatic selection of key frames (for example, using free shot detection software)

top related