political tweets librarianship and the political archive

38
#politicalarchive #webarc15 POLITICAL TWEETS LIBRARIANSHIP AND THE POLITICAL ARCHIVE

Upload: brian-lau

Post on 13-Apr-2017

134 views

Category:

Education


1 download

TRANSCRIPT

#politicalarchive #webarc15

POLITICAL TWEETS LIBRARIANSHIPAND THE POLITICAL ARCHIVE

#politicalarchive #webarc15

#politicalarchive#webarc15

#politicalarchive #webarc15

TWEETWALLYwww.tweetwally.com

#politicalarchive #webarc15

THE POLITICAL TWEETS LIBRARIAN

#politicalarchive #webarc15

WHY TWITTER?

#politicalarchive #webarc15

OPTION: ARCHIVE-IT

#politicalarchive #webarc15

GUIDELINES FOR POLITICAL TWEETS LIBRARIANSHIP

#politicalarchive #webarc15

#politicalarchive #webarc15

COPYRIGHT ISSUESAn emerging issue not yet tested

by Canadian law and writteninto Canadian legislation

Small, Kasianovitz, Blanford, & Celaya, 2012

#politicalarchive #webarc15

ARGUMENTS AGAINST• The small size of Tweets • The content within them • Similarities between Tweets

Reinberg, 2009; Small et al., 2012

#politicalarchive #webarc15

ARGUMENTS FOR• Some Tweets meet criteria for originality• Collections of Tweets as a whole meets minimum

Reinberg, 2009

#politicalarchive #webarc15

TWITTER TERMS OF SERVICE“By submitting, posting or displaying Content on or through the Services, you grant [Twitter] a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribu-

tion methods (now known or later developed)”

Twitter, Inc., 2015a

#politicalarchive #webarc15

CANADIAN COPYRIGHT ACT - FAIR DEALINGAn individual is able to use copyright material for

• research • private study • education • satire • parody • criticism • review • news reporting

Copyright Act, 1985

#politicalarchive #webarc15

OUR PROPOSED RECOMMENDATIONS

• Institutional policy that clearly outlines who, when, and what level of access is granted to researchers

• One legitimate possibility is keeping the Tweets in a “dark archive” accessible only to specific faculty, students, andother researchers after they have signed a user agreement

• The user agreement would emphasize that the use of archive information is for research, study, and other educational purposes and restricted to onsite access

• Tweets are maintained in a dark archive for a minimum of seventy-five years

Antracoli et al., 2014; Copyright Act, 1985

#politicalarchive #webarc15

PRIVACY AND ETHICAL GUIDELINES

• To assist with daily tasks and long term issues associated with the harvesting, curation and storage of Tweets

• The American Anthropological Association (AAA) Statement of Ethics to provide an ethical framework for the Political Tweets Librarian

• Designed as a tool to provoke thought rather than a precise set of instructions

#politicalarchive #webarc15

1. Protect Twitter Authors From Any(Direct or Indirect) Potential Harm

• Weigh the library’s Tweet curation needs against any potential harm to Tweet authors • Political Tweet Librarians empowered to stop  the curation process at any point • Special consideration given to protection of “vulnerable populations”

American Anthropological Association, 2012a,f;

SalahEldeen & Nelson, 2012; Small et al., 2012;

S. Li & N. Worby, personal communication, January 29, 2015

#politicalarchive #webarc15

2. Ensure to the Best of Your Knowledgethe Author has Granted Consent to

Capture and Curate Twitter DataTweet capture and curation is a “grey” area in terms of consent

Consent can be viewed in two ways:

1) The Tweet author potentially grants “informed consent” at the time of Twitter account activation

2) The social context (private network vs. public access) of Tweet must be considered

Librarians must consider the “quality of consent” implied by the Tweet author

Allen, 2013; American Anthropological Association 2012c; Twitter, 2015a; Small et al., 2012

#politicalarchive #webarc15

• Concern surrounding lack of external “dark” archive scrutiny

• Suggest Political Tweet Librarians open dark archives to internal and external auditors

American Anthropological Association, 2012b; B. Glushko, personal

communication, October 27, 2015

3. Consider the Implications of Keeping Your Archives “Dark” and Unavailable to Public Scrutiny

#politicalarchive #webarc15

4. Record Storage and PreservationShould Be Considered From the Beginning Stages of the Curation Process

• Establish an institution-wide standard of ethics concerning the harvest, capture, curation, research, use and disposal of Twitter data

• Steps must be taken to protect unauthorized copy ing and security of captured and curated Tweets

• Political laws and customs of a server’s host nation need to be considered

American Anthropological Association, 2012f; American Library

Association, 2008; S. Li & N. Worby, personal communication,

January 29, 2015

#politicalarchive #webarc15

CHALLENGES ASSOCIATED WITHHARVESTING AND CURATION

OF POLITICAL TWEETS

#politicalarchive #webarc15

A) Easy loss of social media data

Solution: Be vigilant

Antracoli et al., 2014

#politicalarchive #webarc15

Solution: Search for thematic hashtags

S. Li & N. Worby, personal communication, January 29, 2015

B) Difficulty finding event-related hashtags without accidentally capturing a Tweet author’s personal Twitter page

#politicalarchive #webarc15

Solution: Work alongside information specialists directly involved with the political event for greater Tweet harvesting

Antracoli et al., 2014; S. Li & N. Worby, personal communication, January 29, 2015

C) Political Tweets Librarians may not be able to access every Tweet for a political event

#politicalarchive #webarc15

Solution: Obtain as balanced a collection as possible of Tweets from all sides of the political debate

Antracoli et al., 2014; S. Li & N. Worby, personal communication, January 29, 2015

D) There is more than one side to any political event

#politicalarchive #webarc15

Solution: Collaborate with other institutions

S. Li & N. Worby, personal communication, January 29, 2015

E) Very time consuming effort

#politicalarchive #webarc15

Solution: Ensure that users understand the context that political Tweets likely do not represent all communities involved in the event

B. Glushko, personal communication, October 27, 2015; Russell, 2011

F) Not everybody from the event will be participating in the Twitter discussion

#politicalarchive #webarc15

WHAT IS IT?

#politicalarchive #webarc15

• derived from Political ARChive

• a library devoted to the capture, curation, archive, storage and access of political Tweets

• staffed by Political Tweet Librarians, digital archivists, copyright experts, ethics scholars and technology experts

• a stepping stone to assist libraries in entering the digital age

PARC IS:

#politicalarchive #webarc15

PARC’s physical structure is designed to represent the nature of political Tweets:

• Space for political protest• Public gathering spaces• Space for academic discussion, discovery, analysis and research• Live social media streaming of political events as they occur

PARC aims to be eco-friendlyPARC aims to be connected to Tweet archives around the world

PARC : STRUCTURE

#politicalarchive #webarc15

#politicalarchive #webarc15

VIDEO WALL TO DISPLAY LIVE TWEETS AND LIVE STREAMING OF POLITICAL PROTESTS

SOLAR PANELS TO GENERATE ELECTRICITY FOR SELF USE

SERVER TOWER

CAPTURES HEAT GENERATED BY SERVERS FOR SELF USE

(EXTERIOR TOP) PUBLIC SPACE

(INTERIOR) ARCHIVAL / LIBRARY / COMMUNITY SPACES

#politicalarchive #webarc15

• Workshops and annual conferences for industry professionals

• Lead collaborative harvesting and curation efforts with other institutions

• Explore options on how to deal with continually expanding Tweet catalogues

• Develop open-source Tweet harvesting and curation platforms

• Create a library-specific Tweet metadata schema

PARC : ACTIVITIES

#politicalarchive #webarc15

• Increase scope of Tweet themes captured and archived and other types of social media

• Examine how social media platforms can better allow for distinct group representations

Brock, 2012; Dwoskin, 2014; Lindtner, Anderson, Dourish, 2012;

North Carolina State University, 2014; Scola, 2015;

Williams, Terras & Warwick, 2012; Zimmer, 2015;

B. Glushko, personal communication, October 27, 2015

PARC : POSSIBLE MANDATE EXPANSIONS

GUIDELINESFOR THE POLITICAL TWEETS LIBRARIAN

ARCHIVECOPYRIGHT + PRIVACY + ETHICSCURATE

PARC is derived from Political ARChive.

It is an exploration into how Political Tweets can be

collected, stored and disseminated for research,

study and educational purposes.

Social media-based political knowledge is in danger

of disappearing and one of the most vulnerable

forms of digital data is the political Tweet.

As an important facet of our culture, it is imperative

that librarians harvest and curate Tweets related to

local, national, and international political events as

they occur, for the purposes of future research,

study, and education.

Librarians can preserve the fragile political knowledge

inherent in this social media data.

The following guidelines are designed to assist them

in this effort.

GUIDELINES: POLITICAL TWEETS LIBRARIAN

Purpose: Provide instructions for using Archive-It to harvest and curate Tweets related to international political events, for the purposes of future research, study,and education. As a corollary, highlight issues of copy-right, privacy and ethics, and harvesting and curation encountered when conducting these tasks.

Key Strategy: Harvested Tweets should represent a balanced perspec-tive, re�ecting the views of as many of a political event’s participants and observers as possible, both internal and external to the actual event. Due to these Tweets’ political sensitivity, they must be harvested and curated immediately and consistently.

ARCHIVEUsing Archive-It to harvest Tweets

• Ensure that the Twitter feed is publicly available (default setting for most Twitter accounts)

• Select Twitter feed as seed URL, adding an ending ‘/’ to the URL, to ensure capturing only the speci�c feed, not all of Twitter

• For example, http://twitter.com/internetarchive/ (with an ending/)

• Seed URLs cannot have a ‘www’ - by default, Twitter URLs do not have a www and www.twitter.com is blocked by ‘robots.txt’ •Reminder: to enable the ‘ignore Robots.txt’ feature, use the Archive-It ‘Submit a Question’ link to have the Archive-It support team turn on this feature for the account

• By default, dynamically scrolling content should be in scope and captured

• Languages: For any given Tweet, the page is captured in all languages the Twitter interface supports, including Chinese. For example, for an original Tweet URL of the following format:

http://twitter.com/[user name]/[tweet ID]/

The following are also captured:

http://twitter.com /[user name]/[tweet ID]?lang=zh-tw (Traditional Chinese) http://twitter.com /[user name]/[tweet ID]?lang=zh-cn (Simpli�ed Chinese)

• To capture Twitter content in only one language, usa a ‘regular expression’ to block other language variations

On the Archive-It account:

• From the ‘Host Constraints’ tab of the ‘Modify Crawl Scope’ page, add the host ‘Twitter.com’

• Then click ‘Add Rule’; from the dropdown menu select ‘Matches the following regular expression’

• In the box add the following: ^.*lang=(?len).*S (to capture English only)

• Adjust the regular expression to capture in other languages by changing the language abbreviation in the parenthesis; for example, to capture only Traditional Chinese content, use ^.*lang=(?!zh-tw).*S

• Run a test crawl after adding this regular expression

Privacy and Ethics

One of the biggest questions is how to address ethical concerns surrounding the harvest-ing and curation of Tweets. The issue is addressed through the core principles of the American Anthropological Association’s Statement of Ethics (2012), resulting in the following guidelines:

• Protect Twitter Authors From Any (Direct or Indirect) Potential Harm

• Ensure to the Best of Your Knowledge the Author has Granted Consent to Capture and Curate Twitter Data

• Consider the Implications of Keeping Your Archives “Dark” and Unavailable to Public Scrutiny

• Data Records Storage and Preservation Should Be Considered from the Beginning S tages of the Curation Process

(Additional Source: S. Li & N. Worby, personal communication, January 29, 2015)

CURATEChallenges associated with the harvesting and curation of political Tweets

Challenges• There is an easy loss of social media data

• It is frequently difficult to find event-related hashtags without accidentally capturing a Tweet author’s personal Twitter page

• Political Tweet librarians may not be able to access every Tweet for a political event

• There is more than one side to any political event and therefore, harvesting and curation must be as unbiased as possible

• This is a very time consuming effort, especially if the number of political Tweets librarians is limited

Solutions• Be vigilant when harvesting and curating data from a political event to avoid losing data

• Search for thematic hashtags to try and access Tweets from every standpoint

• Work alongside information specialists directly involved with the political event for greater Tweet harvesting - journalists, news media broadcasters, political group leader/member

• To reduce bias, obtain as balanced a collection as possible of Tweets from all sides of the political debate

• Collaborate with other institutions devoted to political Tweets harvesting and curation to save time

(Additional Source: S. Li & N. Worby, personal communication, January 29, 2015)

Pamphlet Authors Kali Braden, Master of Information StudentAlex Herd, Master of Information StudentBrian Lau, Master of Information StudentMagdalene Schifferer, Master of Information StudentMy Anh Truong, Master of Information Student

Other notes when using Archive-It:

• Robots.txt might block styling and image information; in these cases, ignore robots.txt for twimg.com (Twitter’s photograph hosting domain)

• Crawling links in Tweets: Many Tweets contain a URL that links to a page outside of Twitter (often using a URL shortener service such as bit.ly or tinyurl.com); by default, these links from Twitter posts will not be captured, but this can be done easily, as all links in Tweets redirect through the Twitter URL shortener: http://t.co/

• To include the pages linked on Tweets in a crawl, add an ‘Expand Scope’ rule for any URLs that contain http://t.co/ to allow the crawler to go to the link (or in the case of URL shorteners, the URL that the shortener redirects to), and archive that page (i.e., the actual page and content that was Tweeted about)

• All embedded files (images, CSS files, javascript files, etc.) will also be archived, enabling a complete capture of the page Tweeted about

• More than this one page will not be crawled (unless there are additional scoping rules in place), so an entire site that is linked to in a Tweet will not be crawled

• Again, conduct a test crawl with this new Expand Scope rule in place, to ensure it has been entered properly

• to.co URLs are blocked by robots.txt; therefore, ignore robots.txt for t.co

• This ‘Expand Scope’ rule will apply to all seeds and crawls in the collection

• Archiving Twitter searches: Search pages should archive similarly to Twitter feeds, however, dynamically scrolling content on search pages may not be captured

• As with Twitter feeds, include an ending ‘/’ on any search seed URLs to avoid putting all of Twitter in scope

• For example: twitter.com/search?qMichi gan%20right-to-work&src=typd/

COPYRIGHT, PRIVACY and ETHICSCopyright

• Copyright and social media is an emerging issue and as of yet has not been tested by law or written into legislation

• Scholars argue either for or against a Tweet’s eligibility for copyright

• Twitter’s Terms of Service (2015) state that by posting Tweets, the author’s consent to their Tweets’ collection and archiving

• Political Tweets are exempt from copyright under the Fair Dealing exemption, Copyright Act of Canada, because they are used for education, study or research

• Librarians should develop an official document stating who, when, where, and what access is granted as part of institutional social media curation policy

• A dark archive is recommended to store Tweets for a minimum of 75 years, to protect the institution from the copyright issues that accompany regular public access

#politicalarchive #webarc15

REFERENCES

American Anthropological Association (AAA). (2012a). Do No Harm. Retrieved from http://ethics.aaanet.org/ethics-state-ment-1-do-no-harm/

American Anthropological Association (AAA). (2012b). Make Your Results Accessible. Retrieved from http://ethics.aaanet.org/eth-ics-statment-5-make-your-results-accessible/

American Anthropological Association (AAA). (2012c). Obtain Informed Consent and Necessary Permissions. Retrieved from http://ethics.aaanet.org/ethics-statement-3-obtain-in-formed-consent-and-necessary-permissions/

American Anthropological Association (AAA). (2012d). Principles of Professional Responsibility. Retrieved from http://eth-ics.aaanet.org/ethics-statement-0-preamble/

American Anthropological Association (AAA). (2012e). Protect and Preserve Your Records. Retrieved from http://ethics.aaanet.org/-ethics-statement-6-protect-and-preserve-your-records/

American Anthropological Association (AAA). (2012f). Weigh Competing Ethical Obligations Due Collaborators and Affected Parties. Retrieved from http://ethics.aaanet.org/ethics-state-ment-4-weigh-compet-ing-ethical-obligations-due-collaborators-and-affected-parties/

AAA Commission on the Engagement of Anthropology with the US Security and Intelligence Communities (CEAUSSIC). (2009). Final Report on the Army’s Human Terrain System Proof of Concept Program. Retrieved from http://www.aaanet.org/cmtes/commis-sions/CEAUSSIC/upload/CEAUSSIC_HTS_Final_Report.pdf

Antracoli, A., Duckworth, S., Silva, J., & Yarmey, K. (2014). Cap-ture all the URLs: First steps in web archiving. Pennsylvania Libraries: Research & Practice, 2(2),155-170. doi:10.5195/pal-rap.2014.67

Bragg, M., & Rollason-Cass, S. (2014). Archiving social network-ing sites w/Archive-It. Retrieved from https://webarchive.-jira.com/wiki/pages/viewpage.action?pageId=3113092

Copyright Act, Revised Statutes of Canada, 1985, c. C-42. Retrieved February 2, 2015 from http://laws.justice.gc.-ca/en/C-42/index.html

Small, H., Kasianovitz, K., Blanford, R., & Celaya, I. (2012). What your Tweets tell us about you: Identity, ownership and privacy of Twitter data. International Journal of Digital Curation, 174-197. Retrieved from http://www.ijdc.net/index.php/ijdc/arti-cle/view/214http://www.ijdc.net/index.php/ijd c/article/view/214

Twitter, Inc. (2015a). Terms of Service. Retrieved from https://Twitter.com/tos/

Twitter, Inc. (2015b). GET help/languages. Retrieved from https://dev.twitter.com/rest/reference/get/help/languages

#politicalarchive #webarc15#politicalarchive #webarc15

END

#politicalarchive #webarc15#politicalarchive #webarc15

TWEETWALLYwww.tweetwally.com