boutique big data: understanding 19th-century reprint culture with plagiarism detection software

17
BOUTIQUE BIG DATA Understanding 19 th -Century Reprint Culture With Plagiarism Detection Software M. H. Beals (ORCID: 0000-0002-2907-3313) Loughborough University @MHBEALS

Upload: m-h-beals

Post on 15-Apr-2017

230 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software

BOUTIQUE BIG DATAUnderstanding 19th-Century Reprint Culture With Plagiarism Detection SoftwareM. H. Beals (ORCID: 0000-0002-2907-3313)Loughborough University@MHBEALS

Page 2: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software

THE HISTORICAL PROBLEM• Culture of Reprinting in 18th and 19th Centuries

• Inconsistent Attribution

• Inconsistent Survival of Network Components

• Limited Historiographical Resources

Image Courtesy of Mike Licht (CC BY) at https://www.flickr.com/photos/notionscapital/2313507405

Page 3: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software

SEARCH AND TRANSCRIBE

Left Image Courtesy of Dan Tantrum (CC BY NC ND) at https://www.flickr.com/photos/tantrum_dan/2344581860

Page 4: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software

Meta Data

Page Text

Page 5: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software
Page 6: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software

COPYFIND REPRINT DETECTION• Freeware Programme Developed by Lou Bloomfield

http://plagiarism.bloomfieldmedia.com/z-wordpress/software/copyfind/

• Highly Customisable Search As Well as Open Source

• Measures Left, Right and Overall Matches

• Displays Left-Right Comparisons of Text

• Extremely Effective at Discovering OCR-Transcribed Matches

Image Courtesy of the Lou Bloomfield at http://rabi.phys.virginia.edu/lab3e/

Page 7: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software
Page 8: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software

COPYFIND REPRINT DETECTION• Freeware Programme Developed by Lou Bloomfield

http://plagiarism.bloomfieldmedia.com/z-wordpress/software/copyfind/

• Highly Customisable Search As Well as Open Source

• Measures Left, Right and Overall Matches

• Displays Left-Right Comparisons of Text

• Extremely Effective at Discovering OCR-Transcribed Matches

Image Courtesy of the Lou Bloomfield at http://rabi.phys.virginia.edu/lab3e/

Page 9: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software
Page 10: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software

ESTABLISHING LIKELY CANDIDATES

• Single Year (1810-1819) Contained over 200,000 Possible Matches

• Removed Internal (Same Title) Reprints

• Restricted Match Size (90 Right, 90 Left or 160 Overall)

• Restricted Date Separation (200 Days)

Page 11: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software

1810

1811

1812

1813

1814

1815

1816

1817

Page 12: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software

DIRECTIONALITY• Reprint Maps are Non-Linear,

Similar to Phytogenic Trees

• Paths of Specific Branches Dictated by Date, Content, Errors

• Similar Method to Meme-Tracking (Adamic et al, 2014)

• Attributions Are Often Red Herrings

Page 13: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software

1818-1819

Page 14: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software

“WITHIN THIS COLLECTION”

Page 15: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software

WWW.SCISSORSANDPASTE.NET

Page 16: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software

WWW.GITHUB.COM/MHBEALS/SCISSORSANDPASTE

Page 17: Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism Detection Software

THANK YOUM. H. Beals (ORCID: 0000-0002-2907-3313)Loughborough University@MHBEALS