dealingwithdata · dealing with data. spring/summer 2012. d. igital data can be found everywhere,...

12
THE NEWSLETTER OF BROWN UNIVERSITY LIBRARY SPRING/SUMMER 2012 Dealingwith Data

Upload: others

Post on 11-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DealingwithData · Dealing with Data. Spring/Summer 2012. D. igital data can be found everywhere, rushing around us in a virtual flood ... and extracting previously unknown bits of

THE NEWSLETTER OF BROWN UNIVERSITY LIBRARY SPRING/SUMMER 2012

DealingwithData

Page 2: DealingwithData · Dealing with Data. Spring/Summer 2012. D. igital data can be found everywhere, rushing around us in a virtual flood ... and extracting previously unknown bits of

Introduction

2 Dealing with Data Spring/Summer 2012

Digital data can be found everywhere, rushing around us in a virtual flood emanating from all manner of sensors and computers, representing (and tracking…) every sector of the economy and government, every organization and every user of digital technology. Likewise, scholarly

practice across the disciplines is becoming increasingly data and computationally intensive.

By capturing, analyzing, and extracting previously unknown bits of information we are empowered to use data to identify needs, improve services, provide fresh insights into research and education, better understand ourselves and the world around us, and even predict the future. Data drives innovation, productivity, and growth as well as new modes of competition and evaluation. This so-called “industrial revolution of data” is about exploration and about using our data-based findings to make smarter decisions faster.

The idea of learning from data has been around for a long time, and today it is both people and computers who collect data, identify patterns, and draw meaning from their results. These results can be used to formulate or disprove hypotheses and raise previously unimagined questions. While data is plentiful, the ability to garner wisdom from it is still scarce. To realize the potential of data we must be able to effectively capture, aggregate, describe, and search data and then provide access to its diverse content in ways that enable users to easily visualize, analyze, link, and share their findings and ultimately stimulate the growth of new knowledge.

As part of its mission to support the changing needs of teaching, learning, and research, the Brown University Library is evolving from a place used primarily for warehousing and consuming information to a physical and virtual space for experimentation, production, and processing of new knowledge. The goals of the Library are to provide students and faculty with the resources, tools, and support to access and interact with various forms of recorded knowledge in creative ways, to produce and re-produce knowledge with evolving techniques and methods, and to enable research and scholarship in the context of our data-driven world. To support these goals, the Library has retrained existing staff and added new areas of expertise, such as data management, data analysis, digital humanities, information design, data description and mark-up, repository and preservation expertise, and data visualization.

The Library is also actively engaged with others at Brown and beyond in building a cyberinfrastructure that will both sustain and promote the collection, study, and use of data across the disciplines. Broadly speaking, cyberinfrastructure consists of computing systems, data storage systems, advanced instruments, and data repositories, along with visualization environments and people — all linked together by software and high performance networks to improve research productivity and enable breakthroughs not otherwise possible.

Dealing with Data provides a bird’s-eye view of how the Library is responding to this growing mandate and incorporates comments from three of Brown’s preeminent faculty as they face data challenges in their areas of teaching and research.

Harriette HemmasiJoukowsky Family University LibrarianCover: From Data to the Real World and Reverse

by Bruce Boucek, Social Sciences Data Librarian

Tabl

e of

Con

tent

s IntroductionHarriette HemmasiJoukowsky Family University Librarian

Brown’s Digital RepositoryAndy AshtonDirector of Digital Technologies

MetadataCatherine Busselen Music Catalog/Metadata LibrarianAnn CaldwellHead, Imaging and Metadata Services, Digital Technologies

Reusing Research: Information Design and Scholarly WorkJean BauerDigital Humanities Librarian

Digital Texts, Under the HoodJulia FlandersDirector, Women Writers Project

Planning for Big DataAmanda RinehartE-Science Librarian

Enabling Data Visualization Harriette HemmasiJoukowsky Family University Librarian

Mining for Poets: Data and Literary Arts John CayleyProfessor of Literary Arts

Preserving the Past for the FutureSue AlcockDirector of Joukowsky Institute for Archaeology and the Ancient World Joukowsky Family Professor in Archaeology Professor of Classics

Data-Driven EducationJan S. HesthavenProfessor of Applied MathematicsDirector of Center for Computation and Visualization (CCV)Deputy Director of Institute for Computational and Experimental Research

in Mathematics (ICERM)

2

3

4

5

6

7

8

9

10

11

THE NEWSLETTER OF BROWN UNIVERSITY LIBRARY SPRING/SUMMER 2012

DealingwithData

Page 3: DealingwithData · Dealing with Data. Spring/Summer 2012. D. igital data can be found everywhere, rushing around us in a virtual flood ... and extracting previously unknown bits of

Spring/Summer 2012 Dealing with Data 3

Data comes in many forms — from images of library collections or field work, to data produced by scholars through interactions with computer software. Brown University Library is building a

digital archive to store and preserve a variety of data produced by and used in research at Brown. This system, known as the Brown Digital Repository (BDR), is designed to provide the foundation for the Library’s work in curating, accessing, and preserving digital information.

What does the BDR do?The BDR is an online searchable database, library catalog,

and cloud storage system that supports multiple functions. It serves as a forum for scholars at Brown to upload, tag, and share their work. It allows librarians to catalog and load entire collections of digitized library content. It enables computer programmers to write their own programs to store and retrieve data. It lets web page creators publish data to their web pages, knowing that their data is stored in a safe and permanent repository.

Data in the BDR uses current standards, set by the Library of Congress and other national organizations, for managing digital data. These standards ensure that we will be able to handle the ever-growing body of data that is produced in the research enterprise.

The BDR offers secure and flexible options for controlling access to stored data. Items may, for example, be open to the public for download, or open only to members of the Brown Community, certain departments, or individuals. Researchers from other institutions may also be given access to BDR

materials if their research partners are based at Brown. Those items that are open to the public are easily findable in Google and other internet search engines.

What is in the BDR?The BDR is a stable and flexible online repository with a

growing capacity to store and provide access to all kinds of projects at Brown. Examples:

Dissertations: The BDR stores and publishes the dissertations submitted to the Graduate School. As of January 2012, the BDR contains approximately 650 dissertations going back to 2008. Previous years’ dissertations are in the process of being collected from other sources and added to the BDR.

Faculty research materials: Faculty from a number of departments have developed digital projects with the support of the Library. Many of these projects involve creating and publishing digital materials that act as primary sources for an online publication, search engine, or research tool. The BDR is the home for these materials, and will ensure that these valuable intellectual objects remain findable and accessible, even if particular websites, publications, or technologies go offline or become obsolete.

Administrative materials: The BDR provides a central repository for some types of administrative materials that have enduring value for the University. Just as the physical archives organize and maintain the paper-based record of the University’s work, the BDR provides for long-term storage and access to selected digital records.

Digital library collections: Many collections from the John Hay Library are digitized and published online in a searchable database of images, audio, video, and text. The Library’s digital collections contain more than 65,000 items (as of January 2012) and more than 10,000 items are added every

Brown’s Digital RepositoryAndy Ashton Director of Digital Technologies

Continued on page 10

Above right: Complex objects.Below: High resolution manuscripts in the repository.

Page 4: DealingwithData · Dealing with Data. Spring/Summer 2012. D. igital data can be found everywhere, rushing around us in a virtual flood ... and extracting previously unknown bits of

4 Dealing with Data Spring/Summer 2012

MetadataCatherine Busselen Music Catalog/Metadata Librarian

Ann Caldwell Head, Imaging and Metadata Services, Digital Technologies

Following a relaxing vacation, you gather all of the photographs that you’ve taken on your new digital camera and decide to put them online for your extended family to see. You examine the services that

offer this capability, choose one, upload your photos, and realize that you need to fill in information about the photos — who is in the photo, where and when it was taken, and even information that will enable you to group one particular photo with other similar ones. All of this information is metadata. From digital photography to online shopping to renewing your driver’s license online, we live in a world that relies on metadata.

So, what is this thing called metadata? Metadata aids in online searching (who created this?), controls ownership (which library owns this?), and describes a digital object (what

does it look like?), or manages its structure (page 3 follows page 2 follows page 1). Metadata can be broken down into three main categories: descriptive metadata, administrative metadata, and structural metadata. It can be gathered automatically or created manually.

Descriptive metadata is just that, descriptive. It is used to facilitate discovery and identification of the data, e.g., creator, title, date of publication, subject. Structural metadata provides information on how data has been put together, i.e., data format, media type, hardware and software needed to render the data. Finally, administrative data facilitates the management of data information: when and where it was created, file type, rights management data, and preservation or curation management data.

Let’s follow up on the above example to illustrate the Continued on page 11

three types of metadata. You begin by completing the descriptive metadata. You created the photographs, so you may want to enter your own name. You will certainly want to enter the places depicted in the photographs and perhaps the dates since this was a lengthy vacation. You may want to enter “keywords,” brief descriptors that will aid in retrieval of the images. These might include words like “beach” or “rainy day.” When you notify your family that you have put all of your photos online, you may include words so that the images will be categorized into larger groups and subsets.

There are at least two flavors of administrative metadata. In the case of your photos, there may be a section of “rights” metadata. If there are certain photographs that you don’t want anyone but your family to see, you can code them “limited access.” Others could be coded for everyone to see.

During image capture, digital cameras record a wealth of administrative and structural metadata in the photo header. This may include information such as the make and model of the camera used (structural), shutter speed (structural), date and time the photo was taken (administrative), GPS geolocation information (administrative), and so on. Continuing with your vacation scenario, if you were to download your images to certain types of software (Adobe Photoshop, for example), you would be able to capture the extensive metadata from the header of the photograph, add it to your own descriptive metadata, then upload everything to the image service. This information would enable you and your family members to narrow searches even further. If, in the future, you were to purchase a different brand of camera, you would be able to narrow your search by camera.

So why is metadata important? It allows for online searching to be done quickly, it ensures authenticity, and enables data sharing and reuse. If we look back at the example of the digital photos, we can imagine a further use of the metadata that we captured and recorded. Worried that someone might try to take credit for your photograph? Well, you have metadata attached to your image that says that it was created on a specific date and time at a specific place and by a specific make and model of camera, and you can also assign your own name as the creator as well as a copyright date.

Just as metadata can be found all around us in our personal lives, metadata is found in nearly all aspects of the library. Here are some examples:

First, there is the online catalog. The metadata used for our catalog records relies on a combination of rules and a metadata schema called MARC as well as a variety of controlled vocabularies. This allows us to search Josiah by particular fields (title, author, subject) or to search any combination of these fields to provide for more accurate results than a general keyword search. Additionally, our data and the data of libraries from around the world work together to allow for single searches in databases like WorldCat and Borrow Direct.

A student from Hong Kong creates metadata by taking a photograph of exhibited objects in the John Hay Library.

Page 5: DealingwithData · Dealing with Data. Spring/Summer 2012. D. igital data can be found everywhere, rushing around us in a virtual flood ... and extracting previously unknown bits of

As more and more scholars work with born digital or digitized materials, new possibilities arise — possibilities like the chance to combine information developed by two different researchers to answer

questions that neither researcher could have tackled on his own. The Brown University Library partners with scholars to curate (and sometimes even publish) their primary research data during the research process. This intervention helps scholars in their own work, and provides glimpses of how that work might live on in new forms.

For example: just last semester I helped a student design a database for her Senior Thesis. She is researching the use of Latin inscriptions in English churches and needed a way to organize her material so that she could keep track of the churches, the text of the inscriptions, and the people mentioned in the inscriptions. Using the database we designed, the student can now analyze her inscriptions in new ways, including tracing families through the generations as they participated (and were buried) in their local church. If she goes on to graduate school, her data will be free of duplicates and errors and easily enriched with further research.

On a larger scale, The Modernist Journals Project (http://dl.lib.brown.edu/mjp/), based at Brown and the University of Tulsa, digitizes literary magazines from the early 20th century and makes the texts available on the web. The MJP started in 1995, and in fall 2011, with assistance from the Brown University Library, they launched a virtual MJP Lab (http://dev.stg.brown.edu/projects/mjplab/). The creators state upfront that: “The

Lab is dedicated to experimentation — playing with the MJP data, and drawing new patterns and knowledge out of its journal files.” The MJP Lab hosts an increasing number of visualizations designed to answer research questions about the corpus: Which authors published in which journals? Are there any Modernist journals that had more female authors than male? More importantly, all the data files used to create these visualizations are available for download and are heavily documented, allowing other researchers to ask their own questions.

Both of these projects are examples of good information design. Information design is the practice of structuring data so that it can be used efficiently and effectively, both now and in the future. One can even imagine the two projects telling a new story if they both found a home in the Brown Digital Repository. Imagine you are reading a poem in one of the Modernist Journals and it references an old church in the English countryside. You see a link to a Senior Thesis. You click through the reference into the student’s database (which was not lost when her laptop died shortly after graduation because a copy was entrusted to the library) and realize the poet’s great grandmother was buried in that very church.

It could happen. Data no longer have to exist in isolated corners of individual work spaces. Sharing data opens up new possibilities for connection and discovery. But without careful attention to how we design and preserve our data structures, the chance of making that kind of connection becomes vanishingly small. u

Spring/Summer 2012 Dealing with Data 5

Reusing Research: Information Design and Scholarly WorkJean Bauer Digital Humanities Librarian

Databases are a common way of organizing and accessing research data.

Page 6: DealingwithData · Dealing with Data. Spring/Summer 2012. D. igital data can be found everywhere, rushing around us in a virtual flood ... and extracting previously unknown bits of

like the John Adams Papers at the Massachusetts Historical Society (published online by the University of Virginia Press), we might be able to see the evolution of policies and political strategy with far-reaching consequences for our understanding of the making of history.

This kind of markup — which follows international standards set by the World Wide Web Consortium and the Text Encoding Initiative (http://www.tei-c.org) — underlies many of the most important scholarly digital resources now being used in teaching and research. Brown University has been a strong early participant in the development of these standards, which also help ensure that digital data can be long-lasting as well as provocative and illuminating.

Where can you go to explore further? The Brown University Library supports numerous digital humanities projects that demonstrate the potential of this kind of rich data, including the historical archive of documents from the Committee on Slavery and Justice (http://dl.lib.brown.edu/slaveryandjustice/), Shadows at Dawn (http://brown.edu/Research/Aravaipa/), the Modernist Journals Project (http://dev.stg.brown.edu/projects/mjplab/), Inscriptions of Israel-Palestine (http://www.stg.brown.edu/projects/Inscriptions/), Digital Humanities Quarterly (http://www.digitalhumanities.org/dhq/), and the Women Writers Project (http://www.wwp.brown.edu). Through the Women Writers Project, the Library’s Center for Digital Scholarship also offers workshops on text encoding and other digital topics, which have attracted participants from Brown and other colleges and universities across North America and Europe. To find out more, please visit us at http://library.brown.edu/cds/, or check the workshop schedule at http://www.wwp.brown.edu/outreach/seminars/. u

What do we readers know about the digital collections we use in our work as students, teachers, researchers? When we hold a book in our hands, we can see how it is made, and we

can learn even more by taking it apart to understand the stitching, folding, gluing, and other production details that affect how it behaves as a reading device. Digital texts are equally complex — and just as a book’s construction affects its cultural and textual meaning, so the hidden construction of a digital text can have a huge impact on how we read and use it.

Take a simple example: a letter from Brown’s online collection of historical documents. Viewed as an image, this document conveys a tremendous amount of information through material details such as handwriting, paper, and folding. When we create a digital transcription of the letter, we gain access to another layer of information: the text is now searchable, and also more readable and approachable by those unfamiliar with 18th-century handwriting.

Enriching that transcription is still another layer of information, about the structure and meaning of the letter. For instance, early in the letter, the author deleted the word “dispatch” and substituted “Haste” — perhaps for greater clarity, or emphasis. We can represent this revision in the digital transcription using special codes, or tags: <subst><del>dispatch</del><add>Haste</add></subst>

This data makes it possible to display the word “dispatch”, and if we wanted, we could also suppress “dispatch” altogether (showing the final revised version of the document), or suppress “Haste” to focus solely on the author’s first thoughts. And in general, this kind of data provides more flexible ways to control the layout (the fonts, the formatting, the information that is shown and hidden) of the electronic publication. But there are even more interesting ways to use the encoding. What if we wanted to study the revision of this document, together with others in the collection, to learn what kinds of words and phrases were most often deleted? Where do the authors have second thoughts? What words do they substitute? In a collection of poetry, like the Walt Whitman Archive, this kind of data can reveal the poet’s mind at work, trying out words and moving them around. In a collection of political writings,

6 Dealing with Data Spring/Summer 2012

Digital Texts, Under the HoodJulia Flanders Director, Women Writers Project

Right: Screen shot of a page image of a Benson letter, from the repository of historical documents accompanying the report of the Brown University Steering Committee on Slavery and Justice.Below: Sample XML encoding of Benson manuscript excerpt, using the TEI Guidelines.

Page 7: DealingwithData · Dealing with Data. Spring/Summer 2012. D. igital data can be found everywhere, rushing around us in a virtual flood ... and extracting previously unknown bits of

Traditionally, scientists recorded data in physical notebooks according to accepted standards. For instance, when recording wildlife behavior, the ecologist would write down the date and time,

weather conditions, location, and any other relevant organisms nearby. Now, however, many scientific observations can be recorded by instruments automatically. We can capture animal behavior with a motion sensitive video recorder. Its use eliminates bias introduced by the human observer. The video recorder can function continuously both day and night, and accurately record sounds, colors, and movement. Chances are, the video recorder couldn’t monitor temperature, but temperature may have been recorded by a weather station, which would also have picked up wind speed, humidity, and rainfall. The disadvantage comes when managing the data. These instruments may not have recorded information at the exact same times and they may produce files that are in different formats and are difficult to synchronize.

Instead of a few short pages of observations, we now have large and complex files that can be mined for different reasons and by different users. The original researcher may have been looking at mating behaviors, but another scientist may be interested in the effect of local air particulates on animal coloration. Still another may be interested in feeding behaviors and another in the vegetation at that particular place and time. How do we ensure that this information is organized for present and future use? The best method of ensuring proper organization is to create a data management plan.

Scientists who study neurological diseases, such as Alzheimer’s or autism, can use magnetic resonance imaging to visualize the brain. Since these diseases progress over a lifetime, it will be important to compare future brain images with present ones. Images need to be documented appropriately, kept securely, and saved in the optimal format for future use. As well, there will be genetic information from tissue samples, surveys regarding the patient’s lifestyle, cognitive tests,

Spring/Summer 2012 Dealing with Data 7

physiological measurements, and mental health questionnaires. There will be vast amounts of data. All of these data formats can be categorized by type of measurement technique, date, and location; or what is more formally termed protocol, chronological, and geographic metadata. By consistently applying this metadata, the data starts to become organized. A complete metadata schema includes a great deal of detail, from what make and model of MRI was used, to the gender and birth date of the patient. Careful recording and preservation of all this data will allow future researchers to search for specific MRI images, perhaps from different studies, and to re-analyze them for new discoveries.

Although the types of scientific data and needs of researchers vary a great deal, they all can benefit from a good data management plan. Many science metadata schema are available, and as librarians collaborate with scientists to organize and describe their data, more metadata schemas are being delineated. Working together, scientists and librarians can address the challenges of managing massive data. u

Planning for Big DataAmanda Rinehart E-Science Librarian

Above left: Image of asparagus roots with and without vesicular arbuscular mychorrizae (VAM). VAM is a symbiotic fungi that at low levels increases a plant’s nutrient absorption. At high levels, it can act as a parasite. This is an example of one of the forms that scientific data can take.Above right: This brain image is baseline data in a longitudinal study of neurodevelopmental disorders. The person is known to carry genetic markers that may later result in dementia or tremors. By taking pictures of the brain over many years, researchers will be able to see the development of the disorder and visualize whether experimental therapies slow or stop its progress.

Page 8: DealingwithData · Dealing with Data. Spring/Summer 2012. D. igital data can be found everywhere, rushing around us in a virtual flood ... and extracting previously unknown bits of

8 Dealing with Data Spring/Summer 2012

As computer systems generate and store massive amounts of data, and networks and virtual libraries provide unprecedented access to data, we need better ways to see and interact with data in order to

understand it and act on its potential. Visualization improves our ability to explore and explain data more quickly and deeply. By going beyond the traditional hypothesis-and-test method of inquiry which relies on asking the right question at the right time, data visualization brings new questions and answers to the surface.

In the fall of 2012 the Rockefeller Library will premier the Digital Scholarship Lab featuring a large scale visualization video wall comprised of twelve 55 inch high resolution LED displays. This 7x16 foot display will have a combined resolution of over 24 megapixels — well beyond what can be seen on a standard desktop or projector. Offering high quality viewing and analytical space not publicly available elsewhere on campus, the Lab will provide the opportunity for experimentation by novice users as well as sophisticated professionals.

Enhancing collaboration, the system will include a controller platform that enables centralized input as well as control and input from multiple media devices both inside and outside the room. Image and video displays will be regulated across all twelve screens or directed to specific areas of the wall. This “side-by-side” style of presentation will provide novel ways of examining and comparing diverse data sets and may heighten understanding of data properties and their possibilities. Outfitted with a surround-sound audio system and specialized lighting, the Lab will also offer several individual touch-screen monitors that can be used independently or linked to the video wall for collaborative display and interaction. With its flexible furniture and adaptable layout designed to accommodate 16-24 individuals, the space will fit a variety of usage scenarios including classroom, workshop/seminar room, group collaboration, individual projects, digital art gallery, and video conferencing.

The Digital Scholarship Lab will provide necessary tools for scholars across disciplines to engage with research data using advanced visualization software. Students and faculty will be able to examine and compare high-resolution digital content and experience audiovisual media in a unique setting, bridging the gap between desktop and total immersion. Complementing Brown’s burgeoning supercomputing capabilities, the Lab’s technology will distinguish itself from other visualization systems on campus through its accessibility, reliability, and ease of use, as well as its ability to display almost any kind of data and interact with a wide range of software applications.

As visual literacy becomes a core skill, students need specialized resources and expert assistance in order to develop as critical thinkers who are capable of analyzing visually as well as textually. With its large scale visualization wall, surrounding touch-enabled technologies, and staffed with the Data Visualization Coordinator, the Library’s Digital Scholarship Lab will offer a transformative visual and kinesthetic learning and research environment where students and faculty can deal with both multi-dimensional and multimodal data. Users will be able to visualize, analyze, and interact with data sets and images ranging from the tiniest brush strokes to the largest galaxies — images that would otherwise be impossible for the human eye to grasp.

Funded through private donations and a research grant, the Digital Scholarship Lab will enable faculty and staff to explore and define scholarly forms beyond their current capabilities. It will allow them to visualize and interact with data and with each other in ways that are rare today, but essential for 21st century scholarship and discovery. u

Enabling Data Visualization Harriette Hemmasi Joukowsky Family University Librarian

Renderings for the Digital Scholarship Lab, scheduled to open in the fall of 2012 in the John D. Rockefeller, Jr. Library.

Page 9: DealingwithData · Dealing with Data. Spring/Summer 2012. D. igital data can be found everywhere, rushing around us in a virtual flood ... and extracting previously unknown bits of

Spring/Summer 2012 Dealing with Data 9

another program to test, using Google, the relative frequency of the adjective-noun phrases that form the lines of my poem. I only choose lines that have never been indexed in Google Books. I’ve searched their database. The lines I select are not there (yet). Now, I (my programs and I, actually) make, literally, thousands of these “poems” using the same procedure. After all this data gathering, curation, and analysis, I claim, “They’re ‘not bad’ these poems.”

So now the questions. Is this poetry? Is this good poetry? Or interesting poetry? Is this one poem or some huge flock of very similar poems? What do we think about the “huge number” involved? Is this some sort of numerical or statistical sublime? A computational literary sublime? Is this a new aspect of literary aesthetics, or is it just a misdirection — the expense of ingenuity in a waste of digital-utopian naivety?

These are all, I would argue, good questions, however answered. They are questions for both the Arts and the Humanities — for artist-researchers and for scholars of literature who may take an interest in the way that data will be used to produce, perhaps to “co-author,” the future objects of literary criticism. Marjorie Perloff, amongst others, is already well aware, positively, of certain implications (Unoriginal Genius, University of Chicago Press, 2010). At last year’s Digital Humanities Conference, I presented related work as part of a panel on “Poetry and/as Data.” Although it seems to me likely that “Digital” will vanish into all the fields and domains that it currently qualifies, such as “Literature,” “Art,” “Humanities,” and so on, this will not be due to our abandoning the universe of data and databases, it will be because elaborate databases — along with other digital forms — have encompassed, if not overwhelmed, all the formerly predominant media and institutions of cultural practice — of scholarship and art — and most particularly the book and the library. u

I suppose that for many readers of these pages, a brief article that brings “data” together with “Literary Arts” will still come as something of a shock, and not necessarily a pleasant one. Yet nearly everyone who writes now writes

with a computer of some kind and with access to the internet. This access provides many services of reference and these services are now, fundamentally, data-driven. They are certainly not books of reference, even though they may have emerged from formerly print-based resources, as have the online Oxford English Dictionary or the online Encyclopedia Britannica. More and more of the services of reference that we use day-to-day are, essentially, databases. They are composed of discrete, variously related “records.” We consult or “search” these records using “programs,” “algorithms,” “engines” — take your pick — and the results are set out for us in “windows” that are generated by other, associated programs. These tabulating programs may be plain and simple, or they may provide complex, powerful, implicated visualizations of our results, often with linked material or advertisement. (Advertisement! Why?) Such programs are now ubiquitous. As I write this, there is a spellchecker that is “running in the background” and constantly comparing everything I type with records in its (whose?) database. It isn’t yet (I don’t think), but it could be, doing all this over the internet. My relationship with spelling has changed. My relationship to writing has changed. When I spell, I work, in the first instance, with a proactive database, rather than with my dictionary. The dictionary lies, dustier than ever, on a nearby shelf, rarely opened. My memory is still at play, but it plays differently. There is something “helping me” to spell, “helping me” to write, and it is doing so during every passing moment of my writing.

I am stating the all-too-recently obvious but I do so simply to show how important it must be to examine in detail — for the sake of both scholarship and art — the relationships between some part of what we call data and what we think of as an aesthetic practice of writing. Data and database are already here, almost despite us, during our most private and intimate moments of writing.

At Brown, we call aesthetic or “creative” writing “Literary Arts” and, as this department’s faculty member explicitly charged with writing in digital media, I am grateful for a name that allows me to relate my work to profound changes in the Arts generally, where “data” and “data visualization” are now significant fields for the research that many artists undertake. There is, I believe, a place for similar research and practice within Literary Arts, for what may well be a specialist but no less exciting way to write and to experiment with what writing may become.

One example: I “write” — I mean I code — a program to generate more or less random adjective-noun pairs, intended to serve as very simple lines of what? Poetry, let’s call it. I work out a way to give an arbitrary number of lines (9) some ghostly grammatical or narrative shape. At the same time, I write

Mining for Poets: Data and Literary Arts John Cayley Professor of Literary Arts

Right: Monoclonal Microphone: 30 over 1,021 by John Cayley, 2011, visual poem, custom software.

Page 10: DealingwithData · Dealing with Data. Spring/Summer 2012. D. igital data can be found everywhere, rushing around us in a virtual flood ... and extracting previously unknown bits of

10 Dealing with Data Spring/Summer 2012

Old dead things is what I do. These days I work at Petra in Jordan. And I’ll use Petra to help make my case. My pitch? That new data, big data, can save the past for the future, which just may make for a

better, more sustainable world.Archaeological data sets today are multiplying and

deepening. This is good for the past, but also a power for the present. For archaeologists today are thinking — creatively and longitudinally — about the relationship of ancient sites and the people who live in symbiosis with them.

Now don’t go imagining me having at it with a pick axe and a shovel. Think instead of a growing array of high-tech, high-resolution data snatch devices. Archaeologists can capture information on everything you see at Petra, everything — and at a multitude of scales, from satellite imagery to nanoindentation.

Why is it important to maximize our data? One simple answer: archaeology is a finite, non-renewable resource. Like oil, like water. Like the dodo. World heritage resources are disappearing at an unimaginable, irreversible rate, a rate beyond

my capacity either to describe or to illustrate.This irrevocable loss of old dead things may not bother

everyone. But it should. Because people like the past; people need the past. And the past makes a difference to people.

And this is where new forms of data collection transform what archaeologists can do: our shooting, scanning, zapping, point clouding, rectifying, digitizing, and so on. Most basically, they let us record what we can, while we can, at unprecedented levels of resolution. We will be able to recreate — virtually — parts of Petra that will, inevitably, be lost.

New data collection also enables us to manage, to triage, against future inevitable damage and destruction. If you are talking about sustainable development, then there are better and there are worse places to stick a hotel, or a bathroom, in this archaeological landscape. Good data allows good planning, always with an eye to the future. New data sets can make for a smarter, cleaner, and more collaborative process. New data, big data, enables us to save the past for the future, and for a better, more sustainable world. u

Preserving the Past for the FutureSue Alcock Director of Joukowsky Institute for Archaeology and the Ancient WorldJoukowsky Family Professor in Archaeology, Professor of Classics

year. Examples of collections in the BDR include the Minassian Collection of Qur’anic Manuscripts (http://library.brown.edu/quran), the Anne S. K. Brown Military Collection (http://library.brown.edu/cds/askb/), the Harris Broadsides Collection (http://library.brown.edu/cds/harris/) and many others.

Student work: Increasingly, the BDR is a resource for storing and accessing student work as a part of research projects and coursework. In collaboration with others on campus, the Library is developing tools for storing media and writing submitted by students during their time at Brown.

Who uses the BDR?The BDR is used by faculty, students, administrators,

archivists, librarians, and members of the community — everyone who does research at the Brown University Library has most likely drawn information from this digital archive. BDR materials appear in our catalog as digital collections, on Google, on web sites about particular topics, as part of research projects in disciplines ranging from Italian Studies to Computer Science, and in many other sources.

Digital information permeates the work of the University in all areas, and it is increasingly created and stored outside of the University in cloud services, YouTube, Amazon, and Google. The BDR complements these and other campus services, such as CCV’s OSCAR storage service, by ensuring that the University has a stable and reliable place for the information that is representative of the work that happens at Brown, so that future generations of scholars will be able to find and build on the work of their predecessors. u

Continued from page 3

Rock cut ‘Palace Tomb’ at Petra.

Page 11: DealingwithData · Dealing with Data. Spring/Summer 2012. D. igital data can be found everywhere, rushing around us in a virtual flood ... and extracting previously unknown bits of

We are very good at collecting and generating data. More data will be created in the next two years than in the past 40,000 years. We all appreciate, consume, and contribute to the

colossal amounts of data collected by Google and other social network sites. However, the complexities of a data-driven society and the convergence of global challenges require new strategies and tools in business, government, and academia. A new education model reaching across disciplines and career fields is at the heart of this data-driven transformation.

Recent technological advances in high resolution telescopes, biomedical imaging, genetic sequencing, and satellites for monitoring and surveillance are among the more immediate sources generating data of unimaginable size and complexity. Collecting and utilizing these data streams leads to the emergence of new disciplines such as eco-informatics, bio-informatics, astro-informatics, etc.

Social networks, search engines, and novel models also contribute to the growing data challenges in an increasingly inter- and hyper-connective society. This global connectivity emerges as a force of democracy by allowing access to and sharing of otherwise scarce information, e.g., virtual observatories, virtual libraries, virtual art collections and, at some point in time, virtual tourism.

As we join others in creating processes to address these daunting challenges and to benefit from the remarkable opportunities, one major question remains. How do we educate our students to deal with and integrate data in meaningful and relevant ways?

Most undergraduate students enter higher education with limited exposure to the digital concepts and processes required in a networked world. Traditional undergraduate students are often labeled as digital natives, primarily for their use of gaming, mobile devices, and social networks. However, few are accustomed to using specific technologies for learning. In college, there is no consistent training across the curriculum in undergraduate courses to prepare students for a data-centric society. The ECAR Study of Undergraduate Students and Information Technology (2009) indicated that fewer than half of students reported effective use of IT on the part of their instructors, yet those who are able to learn effectively with technologies are at an advantage in the classroom and in the workplace.

Digital literacy is an emerging field of undisputed importance and impact. The Knight Commission on the Information Needs of Communities in a Democracy (2010) defines digital and media literacy as “life skills that are necessary for full participation in our media-saturated, information-rich society.”

In response to these concerns, I propose that a new approach be explored at Brown which engages students broadly in digital literacy, and presents them with opportunities to explore and discuss questions posed by our increasingly digital and data-driven world. This approach will infuse computing and networked learning into the fabric of the educational enterprise across all disciplines — the humanities, social sciences, and physical and life sciences. It will allow students to become leaders in innovation, creativity, and problem solving through the intelligent and informed use of computing, data, and digital tools in the broadest possible sense. u

Spring/Summer 2012 Dealing with Data 11

Data-Driven EducationJan S. Hesthaven Professor of Applied Mathematics Director of Center for Computation and Visualization (CCV)Deputy Director of Institute for Computational and Experimental Research in Mathematics (ICERM)

Continued from page 4The Brown Digital Repository (BDR) relies on various

forms of metadata. The BDR provides faculty with a place to store and retrieve their electronic documents and data, and it uses metadata to facilitate the easy identification and retrieval of files. By applying the appropriate metadata options, users can allow public access to their files or limit access for materials that require more restrictions placed on them. In addition, metadata helps to track the life cycle of the data stored in the repository.

So the next time you purchase something from Amazon, scan an item in a grocery store, or renew your driver’s license online, remember that you’re able to do it because of metadata! u

Richard White, Margaret Byrne Professor of American History at Stanford University, explains “Per Capita Income in the United States, 1880-1910” during his lecture “The Spatial Turn in History” as part of the 2011-2012 Digital Arts and Humanities Lecture Series at Brown University. The data for “Per Capita…” comes from a study by Alexander Klein, “Personal Income of U.S. States: Estimates for the Period 1880-1910,” Warwick Economic Research Papers, no. 916, Department of Economics, University of Warwick.

Page 12: DealingwithData · Dealing with Data. Spring/Summer 2012. D. igital data can be found everywhere, rushing around us in a virtual flood ... and extracting previously unknown bits of

Active since 1938, the Friends of the Library bring together a community of individuals dedicated to the support and development of the Brown University Library.

Yes, I would like to join or renew with Friends of the LibraryAll members receive the Library newsletter, Dealing with Data, and invitations to lectures, exhibits and special events.

$25 Student Membership

$45 Brown Faculty/Staff

$60 General Membership

$100-499 Sponsor

$500-999 Patron

$1,000-4,999 Benefactor

$5,000 Nicholas Brown Society Also includes membership in the

University’s Nicholas Brown Society

Enclosed is my Friends of the Library membership fee of $

Name

Address

City

State ZIP

Telephone

E-mail This Membership is a gift from:

Name

Address

City

State ZIP Please make your check payable to Brown University, and in the memo line, please write Friends of the Library membership, and mail to:

Brown University Friends of the Library Box 1877 Providence, Rhode Island 02912To pay with your credit card, visit our secure website at: https://gifts.development.brown.edu/Library/ChooseGifts.aspx

Join Friends, or renew your membership today!

NON-PROFITORGANIZATION

U.S. POSTAGEPAID

PERMIT NO. 202PROVIDENCE, RISPONSORED BY BROWN UNIVERSITY LIBRARY AND FRIENDS OF THE LIBRARY

FUNDING PROVIDED BY THE RICHARD AND EDNA SALOMON PUBLICATIONS FUND

DealingwithData

Friends of the Library Box A, Brown University Providence, RI 02912

Telephone: 401-863-2163

Editors: Amy Atticks Jane Cabral Daniel O'Mahony

Designer: Douglas Devaux

THE NEWSLETTER OF BROWN UNIVERSITY LIBRARY

DealingwithData