Transcript
Page 1: Managing the Digitization of Large Press Archives
Page 2: Managing the Digitization of Large Press Archives

The New Library of Alexandria Overview

Bibliotheca Alexandrina (BA)  

Page 3: Managing the Digitization of Large Press Archives

Ø  Center of excellence in the production and dissemination of knowledge

Ø  Place of dialogue, learning and understanding between cultures and peoples

Page 4: Managing the Digitization of Large Press Archives

Ø  The World’s Window on Egypt

Ø  Egypt’s Window on the World Ø  Instrument for Rising to the Challenges of

the Digital Age

Ø  Center for Dialogue Between Peoples and Civilizations

Page 5: Managing the Digitization of Large Press Archives

Not just a Library of Books but rather a vast cultural and scientific complex

Page 6: Managing the Digitization of Large Press Archives

A library that can accommodate millions of books  

Page 7: Managing the Digitization of Large Press Archives

7

http://archive.bibalex.org

Page 8: Managing the Digitization of Large Press Archives

8

Page 9: Managing the Digitization of Large Press Archives
Page 10: Managing the Digitization of Large Press Archives
Page 11: Managing the Digitization of Large Press Archives
Page 12: Managing the Digitization of Large Press Archives
Page 13: Managing the Digitization of Large Press Archives
Page 14: Managing the Digitization of Large Press Archives

14

Page 15: Managing the Digitization of Large Press Archives

15

http://descegy.bibalex.org

Page 16: Managing the Digitization of Large Press Archives

16

http://lartarab.bibalex.org

Page 17: Managing the Digitization of Large Press Archives

17

More than 230,000 Arabic books are freely available online for Arabic

readers worldwide

Page 18: Managing the Digitization of Large Press Archives

18

http://suezcanal.bibalex.org

Page 19: Managing the Digitization of Large Press Archives

19

Page 20: Managing the Digitization of Large Press Archives

20

http://naguib.bibalex.org/

Page 21: Managing the Digitization of Large Press Archives

21

http://nasser.bibalex.org

Page 22: Managing the Digitization of Large Press Archives

22

http://sadat.bibalex.org

Page 23: Managing the Digitization of Large Press Archives
Page 24: Managing the Digitization of Large Press Archives

Ø  Project Overview Ø  Collection Overview Ø  Data Representation Ø  System Workflow

�  DAF (Digital Assets Factory) �  Cataloguing �  Website

§  Solr search Engine §  Article Viewer

24

Page 25: Managing the Digitization of Large Press Archives

25

Page 26: Managing the Digitization of Large Press Archives

Ø  Centre for Economic, Judicial, and Social Study and Documentation (CEDEJ) collaborated with Bibliotheca Alexandrina (BA) for the digitization of its archive of massive press articles collection

Ø  The project consists of multiple modules to: �  Index the Press Archive Collection �  Control data entry workflow �  Digitize and process data �  Catalogue and review Articles �  Archive Web Publishing

26

Page 27: Managing the Digitization of Large Press Archives

27

Page 28: Managing the Digitization of Large Press Archives

Ø  Package of press archive �  800,000+ press clips varying between

§  Press §  Reports

�  500+ publishers �  60,000+ writers and reporters �  200 Different subjects

§  Economic, politics, social life, etc… �  Archive Languages:

§  Arabic, English and French �  Date range from 1966 to 2009

28

Page 29: Managing the Digitization of Large Press Archives

Ø  Finished so far �  115,000 press clips varying between

§  Press §  Reports

�  200 publishers �  14,000 writers and reporters �  100 Different subjects

§  Economic, politics, social life, etc… �  Archive Languages:

§  Arabic, English and French �  Date range from 1966 to 2009

29

Page 30: Managing the Digitization of Large Press Archives

30

Page 31: Managing the Digitization of Large Press Archives

Ø  A list of packaged press archive is submitted to

Bibliotheca Alexandrina to be scanned and catalogued

Ø  Source of data is a collection of boxes Ø  The box is organized on the following

hierarchy �  Folder �  File �  Sub-File �  Document

Ø  Document represents a single page of press

31

Page 32: Managing the Digitization of Large Press Archives

32

Page 33: Managing the Digitization of Large Press Archives

33

Page 34: Managing the Digitization of Large Press Archives

34

Page 35: Managing the Digitization of Large Press Archives

35

Page 36: Managing the Digitization of Large Press Archives

36

Page 37: Managing the Digitization of Large Press Archives

37

Page 38: Managing the Digitization of Large Press Archives

38

Page 39: Managing the Digitization of Large Press Archives

Article Creation

39

Page 40: Managing the Digitization of Large Press Archives

Article Metadata

40

Page 41: Managing the Digitization of Large Press Archives

Lookups Management

41

Page 42: Managing the Digitization of Large Press Archives

Reports

42

Page 43: Managing the Digitization of Large Press Archives

43

Page 44: Managing the Digitization of Large Press Archives

44

Page 45: Managing the Digitization of Large Press Archives

45

Page 46: Managing the Digitization of Large Press Archives

Ø  Based on Apache Lucene project v4.1

Ø  SolrNet API is used to connect to Solr server

Ø  Features �  Simple/Advanced search �  Results Highlighting �  Fields AutoComplete �  Text search (Article Viewer)

46

Page 47: Managing the Digitization of Large Press Archives

47

Page 48: Managing the Digitization of Large Press Archives

48

Page 49: Managing the Digitization of Large Press Archives

49

Page 50: Managing the Digitization of Large Press Archives

50

Page 51: Managing the Digitization of Large Press Archives

51

Page 52: Managing the Digitization of Large Press Archives

52

Page 53: Managing the Digitization of Large Press Archives

53

Page 54: Managing the Digitization of Large Press Archives

Ø  Article viewer is used for previewing articles �  It is one of multiple viewers developed at BA

Ø  Architecture �  Server Side: RESTful services �  Client Side: JavaScript using JSONP

Ø  Features �  Image preview �  Metadata preview �  Text selection �  Searching/highlighting �  Zooming options: fit width/height

54

Page 55: Managing the Digitization of Large Press Archives

Ø  Viewer Web Services �  Metadata Web Service:

§  Retrieve article catalogue metadata §  Return technical information (width, height, page

count..) �  Content Web Service:

§  Retrieve the image of each single page in the article applying scaling to custom width and height responsively

§  Return the selected text based on the user highlighted area

�  Search Web Service: §  Perform the search using Solr engine APIs in the

content of the articles §  Highlight the matching phrases in the article image

55

Page 56: Managing the Digitization of Large Press Archives

56

Page 57: Managing the Digitization of Large Press Archives

57

Page 58: Managing the Digitization of Large Press Archives

58


Top Related