leslie johnston keynote, best practices exchange 2011
TRANSCRIPT
![Page 1: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/1.jpg)
From Records to Data: It’s Not Just About Collections Any More
Leslie Johnston, Library of CongressBest Practices Exchange 2011
![Page 2: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/2.jpg)
What are the Biggest Insights that we have
Learned in Fifteen Years of Building Digital Collections?
![Page 3: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/3.jpg)
Researchers do not use digital collections the same way that they use analog collections
![Page 4: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/4.jpg)
We Can Never Guess Every Way that Our Collections Will
Be Used
![Page 5: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/5.jpg)
Stewardship organizations have, until recently, spoken of “collections” or “content” or “records” or even “files,” but not data.
![Page 6: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/6.jpg)
We Have Data in our Libraries, Archives and Museums?
Yes.
Data is not just generated by satellites, identified during experiments, or collected
during surveys.
![Page 7: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/7.jpg)
Datasets are not just scientific and business tables and spreadsheets: our collections are now considered data.
They are the building blocks for interpretation and discovery that transform and combine them into entities that we may not recognize.
![Page 8: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/8.jpg)
More and more researchers want to use collections as a whole, mining and organizing the information in novel ways.
Researchers use algorithms to mine the rich information and tools to create pictures that translate that information into knowledge.
Researchers may want to interact with a collection of artifacts, or they may want to work with a data corpus.
![Page 9: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/9.jpg)
Consider the Digging Into Data Challenge
The repositories available for research include not only scientific information—astronomy, geology, physics, biology, social science surveys—but images, film, sound, newspapers, maps, art, archaeology, architecture and government records.
http://www.diggingintodata.org/
![Page 10: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/10.jpg)
What Constitutes “Big Data?”The definition of Big Data is very fluid, as it is a moving target — what cannot be easily manipulated with common tools — and specific to the organization: what can be managed and stewarded by any one institution in its infrastructure. One researcher or organization’s concept of a large data set is small to another.
Not too long ago, an organization would be surprised to need 10 TB of storage for a large digital collection. Now a collection can increase by 10 TB in a single week.
![Page 11: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/11.jpg)
We still have collections. But what we also have is Big Data, which requires us to rethink the infrastructure that is needed to support Big Data services. Our community used to expect researchers to come to us, ask us questions about our collections, and use our digital collections in our environment.
Now our collections are, more often than not, self-serve.
![Page 12: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/12.jpg)
Case Study: Web Archives• Web Archives, such as the one at the
Library of Congress, may be comprised of billions of files.
• When we began archiving election web sites, we imagined users browsing through the web pages, studying the graphics or use of phrases or links. But when our first researchers came to the Library, they wanted to know about all those topics, but they used scripts to query for them and sort them into categories. They were not very much interested in reading web pages.
http://www.loc.gov/webarchiving/
![Page 13: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/13.jpg)
Case Study: Historic Newspapers• The Chronicling America collection
has over 4 million page images from historic newspapers with OCR from organizations in 25 states.
• The site gets approximately 4 million views per day.
• Some researchers want to search for stories in historic newspapers.
• Some researchers want to mine newspaper OCR for trends across time periods and geographic areas.
• Requests have come in to analyze all 4 million page images.
http://chroniclingamerica.loc.gov/
![Page 14: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/14.jpg)
Case Study: Twitter• The Twitter archive has 10s of billions
of tweets in it.• Research requests have included users
looking for their own Twitter history, the study of the geographic spread of news, the study of the spread of epidemics, and the study of the transmission of new uses of language.
status
privacycommercial
personal
events
social media
visualization
social science
![Page 15: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/15.jpg)
Can each of our organizations support real- time querying of billions of full-text items? Can we provide tools for collection analysis and visualization? Can we support the frequent downloading by researchers of collections that may be over 200 TB each?
These are among the questions that all of our institutions are grappling with as we build large digital collections and discover new ways in which they can be used.
![Page 16: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/16.jpg)
So what are our institutions doing about preservation and access to our Big Collections and Big Data?
![Page 17: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/17.jpg)
Collaboration
The National Digital Stewardship Alliance is an initiative of the National Digital Information Infrastructure and Preservation Program at the Library of Congress, with almost 100 member organizations that share a sense of dedication to digital preservation, and want to work collaboratively across the community.
The NDSA operates through five working groups: Content; Standards and Practices; Infrastructure; Innovation; and Outreach.
www.digitalpreservation.gov/ndsa
![Page 18: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/18.jpg)
Tool Development
All stewardship organizations can and should participate in the development and use of open access tools for use across the community.
NDIIPP is revising its Tools and Services Directory to include a broader range of projects, some of which are always looking for other organizations to contribute to!
http://www.digitalpreservation.gov/partners/resources/tools
![Page 19: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/19.jpg)
As an Example…
Seeing and Sharing Digital Cultural Heritage Collections Differently with ViewShare/Recollection
![Page 20: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/20.jpg)
bigish ideas
› heterogeneous data › one big distributed collection› open distributed infrastructure› mindset: records -> data
![Page 21: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/21.jpg)
Beyond thinking like records
![Page 22: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/22.jpg)
to thinking like data
![Page 23: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/23.jpg)
the ViewShare ideadigital cultural heritage collections include temporal, locative, and categorical data that, could be tapped to better dynamically interact with and understand those collections.
![Page 24: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/24.jpg)
the challenges› we all have different kinds of metadata
› that data is in different kinds of systems
› much of that data is messy
› much of that data is not in the format we might wish it was
![Page 25: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/25.jpg)
what ViewSharedoes
![Page 26: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/26.jpg)
take this
![Page 27: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/27.jpg)
or this
![Page 28: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/28.jpg)
and make…
![Page 29: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/29.jpg)
![Page 30: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/30.jpg)
![Page 31: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/31.jpg)
![Page 32: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/32.jpg)
![Page 33: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/33.jpg)
ingest collection descriptions from spreadsheets, MODS records, or ATOM and RSS
![Page 34: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/34.jpg)
Augment: derive ISO dates, latitude and longitude coordinates, and break apart data
![Page 35: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/35.jpg)
design views: graphical interface for assembling views
![Page 36: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/36.jpg)
publish views on the site or embed views with one line of javascript into any HTML document.
![Page 37: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/37.jpg)
![Page 38: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/38.jpg)
visually review data
![Page 39: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/39.jpg)
share data and viewsshare not only the end results, but also the raw data for other others to create their own views.
data use and re-use
![Page 40: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/40.jpg)
recent work› support for public/private views and data› beta support for OAI and ContentDM data loading› full open source release on SourceForge: http://sourceforge.net/projects/loc-recollect/
![Page 41: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/41.jpg)
what’s next?› viewshare.org public launch on November 1, 2011› big data sets: in a while› remix across data sets: long view
![Page 42: Leslie Johnston Keynote, Best Practices Exchange 2011](https://reader035.vdocuments.us/reader035/viewer/2022081403/556355c3d8b42a6f7b8b5670/html5/thumbnails/42.jpg)
contact us› Let us know if you are interested in participation in the NDSA through the web site› Let us know if there is a tool or service that is missing from our directory› visit http://recollection.zepheira.com/ to get a sneak peek at ViewShare› email [email protected] if you are interested in a ViewShare account