peter chan curatecamp

41
ePADD Email, Process, Appraise, Discover, Deliver CurateCamp 2015 Peter Chan Digital Archivist Apr. 23, 2015

Upload: juliaykim

Post on 12-Apr-2017

124 views

Category:

Government & Nonprofit


0 download

TRANSCRIPT

ePADD  Email,  Process,  Appraise,  Discover,  Deliver  

 CurateCamp  2015  

Peter  Chan  Digital  Archivist  Apr.  23,  2015  

Emails  Archives  in  Our  Collec?ons  

•  Robert  Creeley  -­‐  ~50,000  •  Richard  Fikes  -­‐  ~100,000  •  Terry  Winograd  -­‐  ~650,000  •  Benoit  Mandelbrot  •  Harrison  Studio  •  Stanford  Humanity  Lab    

Common  Ways  to  Archive  Emails  

Paper  •  Print  the  emails  •  File  the  printed  emails  to  

the  respec?ve  content  folders  

 

Electronic  •  Archive  emails  using  

func?ons  provided  in  email  clients  

Process  

Appraise  

Deliver  

Preserve  

Discover  

Normaliza?on  

•  Converts  email  from  the  closed,  proprietary  file  formats  to  standard,  portable  formats    

 

•  Emailchemy,  MailStore    

 

Appraisal  •  Owner:    

–  Filter  messages  to/from  certain  correspondents  

–  Review  messages  containing  certain  words  (divorce,  daughter,  etc.)  

•  Curator:    –  Ensure  certain  informa?on  exists    

–  Get  overall  view  on  who,  where,  what  are  men?oned  in  the  messages    

 

•  Email  clients  •  ePADD  

•  Email  clients  •  ePADD  

Processing  •  Place  restric?on  on  

messages  containing  •  personal  iden?fiable  

informa?on  (SS#,  credit  card  #,  etc.)  

•  privacy  informa?on  (student  grades,  salary,  grievances,  medical  informa?on,  etc.)  

•  Informa?on  s?pulated  by  donors  

   

•  ePADD  

Can  do  more  than  paper  based  archives!!  

Processing  Organizing  

•  Group  messages  on  certain  words  (project  name,  event  name)  together  

•  Gather  all  messages  belong  to  the  same  person  with  mul?ple  emails  together  

•  Group  all  image  a_achments  in  one  place  

•  List  all  person,  loca?on,  organiza?on  en??es  

•  ePADD  

20  Email  Addresses  for  1  Person    

•  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected].

edu  

•  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]

du  

Processing  •  Facilitate  reconcilia?on  with  

authority  files    •  OCLC  FAST  •  Freebase  •  Geonames  

•  User  defined  regular  expressions  

•  Local  kill  list  

•  ePADD  

List  of  Reconciled  Authority  Records  

Processing  Extract  interes?ng  items  

•  List  all  books,  movies  men?oned  in  all  messages  

•  Give  breakdown  of  organiza?ons  by  type  (Universi?es,  Companies  and  Museums,  etc.)  

•  List  events  •  List  all  topics  discussed  in  

messages  •  Create  local  authority  records  

•  Future  ePADD  

Discovery  

•  Existence  of  email  archives  

•  Informa?on  about  the  email  archives  (as  in  tradi?onal  finding  aids)  

•  Informa?on  about  the  email  archives  (all  person,  loca?on,  organiza?on  en??es  and  correspondents)  

•  Ins?tu?on  catalog  system,  Wiki,  Finding  Aid  Repository  (OAC  etc.),  search  engines  

•  Finding  Aids  •  ePADD  

Delivery  •  Email  messages  •  Full  text  search  •  Request  copy  •  See  a_achment  files  

(documents,  spreadsheets)  

•  See  image  a_achments  •  Bulk  search  •  Annotate  messages  •  Organize  messages  

•  Email  clients  •  ePADD  •  Quickview  Plus  

Named  En?ty  Recogni?on  

•  Stanford  Named  En?ty  Recognizer  (NER)  –  Jenny  Rose  Finkel,  Trond  Grenager,  and  Christopher  Manning.  2005.  Incorpora?ng  Non-­‐local  Informa?on  

into  Informa?on  Extrac?on  Systems  by  Gibbs  Sampling.  Proceedings  of  the  43nd  Annual  Mee?ng  of  the  Associa?on  for  Computa?onal  Linguis?cs  (ACL  2005),  pp.  363-­‐370.  h_p://nlp.stanford.edu/~manning/papers/gibbscrf3.pdf  

–  GNU  General  Public  License  (v2  or  later)  

•  OpenNLP    –  (Apache  license)  

•  Custom  NER  – Use  address  book,  Wikipedia,  Freebase