2 dr birgit plietzsch arts computing advisor [email protected] swithun crowe developer for arts...
TRANSCRIPT
![Page 1: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/1.jpg)
1
Using Alfresco to create an Open Archival Information SystemDr Birgit Plietzsch
Arts Computing Advisor
Swithun Crowe
Developer for Arts and
Humanities Computing projects
&
IT Services, University of St Andrews
![Page 2: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/2.jpg)
2
Structure
1. Introduction to the University of St Andrews Digital Archiving Project (DAP)
2. The DAP Open Archival Information System
3. Developing the OAIS Ingest function in Alfresco
![Page 3: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/3.jpg)
3
Digital Preservation
Digital Preservation is …• the active management of digital information over time to ensure its
accessibility• long-term, error-free storage of digital information, with means for retrieval
and interpretation, for the entire time span the information is required for.• Long-term is defined as "long enough to be concerned with the impacts of changing
technologies, including support for new media and data formats, or with a changing user community. Long Term may extend indefinitely”.
• Retrieval means obtaining needed digital files from the long-term, error-free digital storage, without possibility of corrupting the continued error-free storage of the digital files.
• Interpretation means that the retrieved digital files, files that, for example, are of texts, charts, images or sounds, are decoded and transformed into usable representations. This is often interpreted as "rendering", i.e. making it available for a human to access. However, in many cases it will mean able to be processed by computational means.
(Source: Wikipedia)
![Page 4: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/4.jpg)
4
Institutional context
• Legal requirements (e.g. Freedom of Information Act)
• Protection of institutional intellectual property
• Funding body requirements• until 2008 Arts and Humanities Data Service for Arts and
Humanities (national depository for arts and humanities research data)
• no such body exists now for the Arts and Humanities• other subjects national support is patchy
• Moral obligations• protection of cultural and corporate memory
![Page 5: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/5.jpg)
5
Records of the Parliaments of Scotland project
www.rps.ac.uk
• proceedings of the Scottish Parliament from the first surviving act of 1235 to the union of 1707
• 10 years of research• no print publication• c16.5m words• issues:
• inconsistent editorial practices
• obsolescence of software originally used
• long-term sustainability of research data
![Page 6: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/6.jpg)
6
Digital Archiving Project (DAP)
• Pilot project
• Scope:• data contained in electronic resources produced within the Faculty
of Arts, University of St Andrews
• Aims:• ensure long-term sustainability of RPS data• investigate the requirements of digital archiving and obtain
experience• meet funding body requirement• flexible implementation (to allow for additional future uses)
![Page 7: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/7.jpg)
7
The DAP archive
Concepts and Properties of Archives and Hosting in the Strategy and their Relationships ©Charles Beagrie Ltd 2009. CreativeCommons Attribution-Share Alike3.0 Key: solid colour represents core properties and fading colour represents weaker properties of archives and hosting services.
Concepts and Properties of Archives and Hosting in the Strategy and their Relationships
© Charles Beagrie Ltd 2009. CreativeCommons Attribution-Share Alike3.0
![Page 8: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/8.jpg)
8
Structure
1. Introduction to the University of St Andrews Digital Archiving Project (DAP)
2. The DAP Open Archival Information System
3. Developing the OAIS Ingest function in Alfresco
![Page 9: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/9.jpg)
9
The DAP Open Archival Information System
• An Open Archival Information System (or OAIS) is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community.
• reference model: ISO 14721:2003
![Page 10: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/10.jpg)
10
Open Archival Information System: workflows
Seven functions
• Ingest • Archival
Storage • Data
Management • Administration • Preservation
Planning • Access • Management
SIP Submission Information PackageAIP Archival Information PackageDIP Dissemination Information Package
![Page 11: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/11.jpg)
11
Open Archival Information System: data package
Implementation
• Content Information:• XML• TIFF• DOC• Etc
• Preservation Description Information:
• PREMIS
• Descriptive Information:
• MODS
• Packaging Information:
• METS
![Page 12: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/12.jpg)
12
Preservation strategy
• What needs to be preserved?• data• layout• functionality• user experience
• What are the significant properties?• generic low-level properties (e.g. basic data unit, byte-level encoding, data type, and logical schema)• data type specific properties (example: text)
• underlying abstract forms (font, spacing, layout)• sub-properties (e.g. font type, style, family, size, colour)
• How do we preserve?• bit stream preservation• emulation• migration
• Adopted approach:• data is preserved• combination of bit stream preservation and file format migration upon ingest
![Page 13: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/13.jpg)
13
Data models
• description needs of different types of material• electronic resources• digital images • video• research papers• University records• etc.
• introduce flexibility• future wider uses of the archive
![Page 14: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/14.jpg)
14
Electronic resources data model
• expressed in MODS
• 3 layers
• use for pilot
• more models can be developed
Project
Research data
Documen-tation
Code
Resource type
Digital object
Resource Discovery Metadata
![Page 15: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/15.jpg)
15
Approaches investigated
Monolithic approach
• Repository framework: Fedora Commons
• issues with suitable front end for Ingest, Access, Preservation Planning, or Administration functions
• highly customisable
• Metadata• MODS• METS• PREMIS
• DSpace• issues with Archival Storage
and Data Management functions
• EPrints• issues with Administration
and Access functions
• RODA• technical issues
No support for Preservation Planning
Breakdown into OAIS requirements
![Page 16: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/16.jpg)
16
Access
• Plato• Testbed
Implementation of DAP
Software used
• Alfresco• www.alfresco.com
• Fedora Commons
• fedora-commons.org
• Planets Suite• www.openplanets
foundation.org
Archival storage &
Data Management
Management
• Share• Explorer• Records Management
Ingest Preservation Planning
Administration
![Page 17: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/17.jpg)
17
The DAP Open Archival Information System
![Page 18: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/18.jpg)
18
Unresolved issues
• Version control of AIPs• Alfresco / Fedora interaction?
• Access front end• Fedora Commons front ends do not normally support OAIS
functions
• Can extra properties be added to folders and files in Records Management site?
We welcome ideas that might help us resolve the above three issues.
![Page 19: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/19.jpg)
19
Structure
1. Introduction to the University of St Andrews Digital Archiving Project (DAP)
2. The DAP Open Archival Information System
3. Developing the OAIS Ingest function in Alfresco
![Page 20: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/20.jpg)
20
Developing the OAIS Ingest in Alfresco
• FITS and PREMIS• Technical metadata
• RPS and MODS• Resource discovery metadata
• Antivirus scanning• METS
• Wrapping files and metadata
Introduction
![Page 21: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/21.jpg)
21
FITS and PREMIS
• FITS (File Information Tool Set)• http://code.google.com/p/fits/
• Consolidates file format metadata from 3rd party tools• Jhove, DROID, NLNZ ME, Exiftool and others
• Output as XML• PREMIS (PREservation Metadata: Implementation
Strategies)• http://www.loc.gov/standards/premis/
• Data dictionary of semantic units, maps to XML• Transform FITS XML to PREMIS using XSLT
Introduction
![Page 22: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/22.jpg)
22
FITS and PREMIS
• Text property defined in custom aspect for storing FITS XML in node metadata
• Create temporary file containing content of node• Run FITS on temporary file• Put output into custom property• Later on, transform this to PREMIS XML• Can be run as space rule• Compile to AMP using Alfresco SDK
The action
![Page 23: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/23.jpg)
23
FITS and PREMIS
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>
<beans>
<bean id="fits-action-messages" class="org.alfresco.i18n.ResourceBundleBootstrapComponent">
<property name="resourceBundles">
<list><value>alfresco.module.FitsAction.fits-action-messages</value></list>
</property>
</bean>
<bean id="fits-model-bootstrap" parent="dictionaryModelBootstrap" depends-on="dictionaryBootstrap">
<property name="models">
<list><value>alfresco/module/FitsAction/context/fitsModel.xml</value></list>
</property>
</bean>
<bean id="fits-action“ class="uk.ac.st_andrews.repo.action.executer.FitsActionExecuter“ parent="action-executer">
<property name="serviceRegistry"><ref bean="ServiceRegistry"/></property>
</bean>
</beans>
fits-action-context.xml
![Page 24: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/24.jpg)
24
FITS and PREMIS
package uk.ac.st_andrews.repo.action.executer;
public class FitsActionExecuter extends ActionExecuterAbstractBase
{
public void setServiceRegistry(ServiceRegistry serviceRegistry);
protected void addParameterDefinitions(List<ParameterDefinition> paramList);
protected void executeImpl(Action action, NodeRef actionedUponNodeRef);
}
FitsActionExecuter
![Page 25: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/25.jpg)
25
FITS and PREMIS
63 // make sure node exists
64 if (!nodeService.exists(actionedUponNodeRef))
65 {
66 throw new Exception("no node");
67 }
68
69 // make sure that node has fits aspect
70 QName fitsAspect = QName.createQName(fitsURI, "fitsAspect");
71 if (!nodeService.hasAspect(actionedUponNodeRef, fitsAspect))
72 {
73 this.nodeService.addAspect(actionedUponNodeRef, fitsAspect, null);
74 }
75
76 // create new FITS instance
77 Fits fits = new Fits();
78 Fits.allowRounding = true;
79 FitsOutput result = null;
FitsActionExecuter.executeImpl (fragment)
![Page 26: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/26.jpg)
26
FITS and PREMIS
81 // put input into temp file
82 ContentReader reader =
83 contentService.getReader(actionedUponNodeRef, ContentModel.PROP_CONTENT);
84 String fileName =
85 (String) nodeService.getProperty(actionedUponNodeRef, ContentModel.PROP_NAME);
86 File inputFile =
87 TempFileProvider.createTempFile("FitsActionExecuter_", "." + fileName);
88 reader.getContent(inputFile);
89
90 // transform into technical metadata
91 result = fits.examine(inputFile);
92 Document doc = result.getFitsXml();
93
94 // put result of transformation into output
95 XMLOutputter serializer = new XMLOutputter(Format.getPrettyFormat());
96 String output = serializer.outputString(doc);
97
98 // get property to write to
99 QName fitsProp = QName.createQName(fitsURI, "fitsOutput");
100 nodeService.setProperty(actionedUponNodeRef, fitsProp, output);
FitsActionExecuter.executeImpl (fragment cont.)
![Page 27: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/27.jpg)
27
FITS and PREMIS
<identification status="CONFLICT">
<identity format="Microsoft Word" mimetype="application/msword">
<tool toolname="Exiftool" toolversion="8.25" />
<tool toolname="file utility" toolversion="5.04" />
<tool toolname="NLNZ Metadata Extractor" toolversion="3.4GA" />
<tool toolname="ffident" toolversion="0.2" />
</identity>
<identity format="OLE2 Compound Document Format" mimetype="application/octet-stream">
<tool toolname="Droid" toolversion="3.0" />
<externalIdentifier toolname="Droid" toolversion="3.0" type="puid">fmt/111</externalIdentifier>
</identity>
</identification>
Fragment of FITS XML showing conflicting file formats
![Page 28: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/28.jpg)
28
FITS and PREMIS
<premis:format> <premis:formatDesignation> <premis:formatName>Microsoft Word</premis:formatName> </premis:formatDesignation></premis:format><premis:format> <premis:formatDesignation> <premis:formatName>OLE2 Compound Document Format</premis:formatName> </premis:formatDesignation> <premis:formatRegistry> <premis:formatRegistryName>Droid (3.0)</premis:formatRegistryName> <premis:formatRegistryKey>fmt/111</premis:formatRegistryKey> <premis:formatRegistryRole>puid</premis:formatRegistryRole> </premis:formatRegistry></premis:format>
Corresponding fragment of PREMIS XML
![Page 29: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/29.jpg)
29
RPS and MODS
• Records of the Parliaments of Scotland marked up in thousands of XML documents
• http://www.rps.ac.uk
• Using Text Encoding Initiative (TEI) • http://www.tei-c.org/index.xml
• TEI headers contain resource discovery metadata• Extract metadata from documents and populate custom
metadata fields• Can be run as space rule• Compile as AMP using Alfresco SDK
Introduction
![Page 30: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/30.jpg)
30
RPS and MODS
<TEI.2 id="_william_and_mary_t1689_3_6_d6_trans" n="william_and_mary_trans">
<teiHeader>
<fileDesc>
<titleStmt>
<title>A committee appointed for controverted elections</title>
</titleStmt>
<editionStmt>
<edition n="session">william_and_mary_t1689_3_1_d2_trans</edition>
</editionStmt>
<publicationStmt>
<date>16890314</date>
</publicationStmt>
</fileDesc>
</teiHeader>
<text>...</text>
</TEI.2>
TEI example Unique ID for document
Document belongs to translated version of records from reign of William and Mary
Main heading in document
Pointer to session that document belongs to
Date of document, in YYYYMMDD format
![Page 31: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/31.jpg)
31
RPS and MODS
package uk.ac.st_andrews.repo.content.metadata;
public class RPSMetadataExtracter extends org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter
{
public RPSMetadataExtracter();
protected Map<String, Serializable> extractRaw(ContentReader reader);
}
RPSMetadataExtracter
![Page 32: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/32.jpg)
32
RPS and MODS
63 // set up parser
64 SAXParser sp = spf.newSAXParser();
65 InputStream cis = reader.getContentInputStream();
66 InputSource is = new InputSource(cis);
67 RPSSaxParser teip = new RPSSaxParser();
68
69 // do parsing
70 teip.setProperties(map);
71 sp.parse(is, teip);
72 map = teip.getProperties();
73
74 // loop over properties found
75 Set s = map.entrySet();
76 Iterator it = s.iterator();
77 while (it.hasNext())
78 {
79 Map.Entry m = (Map.Entry) it.next();
80 putRawValue((String) m.getKey(), (String) m.getValue(), rawProperties);
81 }
RPSMetadataExtracter.extractRaw
![Page 33: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/33.jpg)
33
RPS and MODS
package uk.ac.st_andrews.repo.content.metadata;
public class RPSSaxParser extends org.xml.sax.helpers.DefaultHandler
{
public void setProperties(Map<String, Serializable> prop);
public Map<String, Serializable> getProperties();
public void startElement(String uri, String localName, String qName, Attributes attributes);
public void endElement(String uri, String localName, String qName);
public void characters(char[] ch, int start, int length);
private void handleID(String id);
private void handleDate(String d);
}
RPSSaxParser
![Page 34: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/34.jpg)
34
RPS and MODS
// property names
21 private static final String KEY_ID = "rpsID";
22 private static final String KEY_REIGN = "rpsReign";
23 private static final String KEY_VERSION = "rpsVersion";
24 private static final String KEY_HEADING = "rpsHeading";
25 private static final String KEY_SESSION = "rpsSession";
26 private static final String KEY_DATE = "rpsDate";
27 private static final String KEY_TITLE = "cmTitle";
// some properties get set in RPSSaxParser.characters
185 if (true == inTitle)
186 {
187 rawProperties.put(KEY_TITLE, new String(ch, start, length));
188 }
189 else if (true == inSession)
190 {
191 rawProperties.put(KEY_SESSION, new String(ch, start, length));
192 }
RPSSaxParser
![Page 35: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/35.jpg)
35
RPS and MODS
# Namespaces
namespace.prefix.rps=http://www.rps.ac.uk/ns/1.0
namespace.prefix.cm=http://www.alfresco.org/model/content/1.0
# Mapping of property names to Qualified names used in model
rpsID=rps:id
rpsReign=rps:reign
rpsSession=rps:session
rpsDate=rps:date
rpsVersion=rps:version
rpsHeading=rps:heading
cmTitle=cm:title
RPSMetadataExtracter.properties
![Page 36: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/36.jpg)
36
RPS and MODS
<aspect name="rps:metadata">
<title>RPS Metadata</title>
<properties>
<property name="rps:id"><type>d:text</type></property>
<property name="rps:reign"><type>d:text</type></property>
<property name="rps:session"><type>d:text</type></property>
<property name="rps:date"><type>d:text</type></property>
<property name="rps:heading"><type>d:text</type></property>
<property name="rps:version"><type>d:text</type></property>
</properties>
</aspect>
rpsModel.xml (fragment showing aspect)
![Page 37: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/37.jpg)
37
RPS and MODS
# I18N strings
rpsID=RPS ID
rpsReign=RPS Reign
rpsSession=RPS Session
rpsDate=RPS Date
rpsVersion=RPS Version
rpsHeading=RPS Heading
webclient.properties
![Page 38: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/38.jpg)
38
RPS and MODS
• Metadata Object Description Schema • http://www.loc.gov/standards/mods/
• MODS is a resource discovery metadata standard• Working on defining MODS data models
• For Project, Resource Type and Digital Object levels
• Will move RPS metadata into MODS fields
Using MODS
![Page 39: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/39.jpg)
39
Antivirus Action
• Creates an action for scanning files for viruses• Uses ClamAV
• http://www.clamav.net/lang/en/
• Can be configured for other tools• Emails creator of file if virus found• Deletes file from repository if virus found• Can be run as space rule• Compile as AMP using Alfresco SDK
Introduction
![Page 40: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/40.jpg)
40
Antivirus Action
antivirus-action.xml (fragment)
<bean id="antivirus-action" class="uk.ac.st_andrews.repo.action.executer.AntivirusActionExecuter" parent="action-executer">
<!– services needed by bean -->
<property name="contentService“><ref bean="contentService" /></property>
<property name="nodeService"><ref bean="nodeService" /></property>
<property name="templateService"><ref bean="templateService" /></property>
<property name="actionService"><ref bean="actionService" /></property>
<property name="personService"><ref bean="personService" /></property>
<!– person that email will come from, defined in alfresco-golbal.properties -->
<property name="fromEmail">
<value>${antivirus.mailer}</value>
</property>
<!– path to Freemarker template, defined in alfresco-golbal.properties -->
<property name="emailTemplate">
<value>${antivirus.template}</value>
</property>
![Page 41: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/41.jpg)
41
Antivirus Action
antivirus-action.xml (fragment, cont.)
<property name="command">
<bean class="org.alfresco.util.exec.RuntimeExec">
<property name="commandMap">
<map>
<!– command to run, ${antivirus.exe} set in alfresco-golbal.properties, ${source} in Java class -->
<entry key=".*" value="${antivirus.exe} ${source}"/>
</map>
</property>
<property name="errorCodes">
<value>1</value><!– exit code 1 indicates that virus was found -->
</property>
</bean>
</property>
</bean>
![Page 42: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/42.jpg)
42
Antivirus Action
AntivirusActionExecuter
package uk.ac.st_andrews.repo.action.executer;
public class AntivirusActionExecuter extends ActionExecuterAbstractBase
{
public void setContentService(ContentService contentService);
public void setNodeService(NodeService nodeService);
public void setTemplateService(TemplateService templateService);
public void setActionService(ActionService actionService);
public void setPersonService(PersonService personService);
public void setFromEmail(String fromEmail);
public void setCommand(RuntimeExec command);
public void setEmailTemplate(String emailTemplate);
public void init();
protected void addParameterDefinitions(List<ParameterDefinition> paramList);
protected void executeImpl(final Action ruleAction, final NodeRef actionedUponNodeRef);
}
![Page 43: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/43.jpg)
43
Antivirus Action
AntivirusActionExecuter.executeImpl (fragment)
135 // put content into temp file
136 ContentReader reader =
137 contentService.getReader(actionedUponNodeRef, ContentModel.PROP_CONTENT);
138 String fileName =
139 (String) nodeService.getProperty(actionedUponNodeRef, ContentModel.PROP_NAME);
140 File sourceFile =
141 TempFileProvider.createTempFile("anti_virus_check_", "_" + fileName);
142 reader.getContent(sourceFile);
143
144 // set source property for command
145 Map<String, String> properties = new HashMap<String, String>(1);
146 properties.put(VAR_SOURCE, sourceFile.getAbsolutePath());
147
148 // execute the transformation command
149 ExecutionResult result = null;
150 try
151 {
152 result = command.execute(properties);
153 }
154 catch (Throwable e)
155 {
156 throw new AlfrescoRuntimeException("Antivirus check error: \n" + command, e);
157 }
![Page 44: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/44.jpg)
44
Antivirus Action
AntivirusActionExecuter.executeImpl (fragment, cont.)
165 // try to get document creator's details
166 String creatorName = (String) nodeService.getProperty(actionedUponNodeRef,
167 ContentModel.PROP_CREATOR);
168 if (null == creatorName || 0 == creatorName.length())
169 {
170 throw new Exception("couldn't get creator's name");
171 }
172
173 NodeRef creator = personService.getPerson(creatorName);
174 if (null == creator)
175 {
176 throw new Exception("couldn't get creator");
177 }
178
179 String creatorEmail = (String) nodeService.getProperty(creator,
180 ContentModel.PROP_EMAIL);
181 if (null == creatorEmail || 0 == creatorEmail.length())
182 {
183 throw new Exception("couldn't get creator's email address");
184 }
![Page 45: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/45.jpg)
45
Antivirus Action
AntivirusActionExecuter.executeImpl (fragment, cont.)
186 // put together message
187 Map<String, Object> model = new HashMap<String, Object>(8, 1.0f);
188 model.put("filename", fileName);
189 model.put("message", result);
190
191 String emailMsg = templateService.processTemplate("freemarker", emailTemplate, model);
192
193 // send email message
194 Action emailAction = actionService.createAction("mail");
195 emailAction.setParameterValue(MailActionExecuter.PARAM_TO, creatorEmail);
196 emailAction.setParameterValue(MailActionExecuter.PARAM_FROM, fromEmail);
197 emailAction.setParameterValue(MailActionExecuter.PARAM_SUBJECT,
198 "Virus found in " + fileName);
199 emailAction.setParameterValue(MailActionExecuter.PARAM_TEXT, emailMsg);
200 emailAction.setExecuteAsynchronously(true);
201 actionService.executeAction(emailAction, null);
202
203 // delete node
204 nodeService.addAspect(actionedUponNodeRef, ContentModel.ASPECT_TEMPORARY, null);
205 nodeService.deleteNode(actionedUponNodeRef);
![Page 46: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/46.jpg)
46
METS and Fedora Commons
• Metadata and Encoding Transmission Standard (METS)• http://www.loc.gov/standards/mets/
• METS is a wrapper for other metadata documents• Plan to generate METS documents containing/referencing:
• Ingested files• Renderings of these files (thumbnails, reference copies, archival
formatted versions etc.)• Resource discovery metadata• Technical metadata
• Fedora Commons can ingest METS documents as SIPs• http://fedora-commons.org/
Introduction
![Page 47: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/47.jpg)
47
Find out more
• FITS in Alfresco• http://forge.alfresco.com/projects/fitsinalfresco/
• RPS Metadata Extracter• http://forge.alfresco.com/projects/rpsmetadata/
• Antivrus• http://forge.alfresco.com/projects/antivirus/
• http://www.st-andrews.ac.uk/itsupport/academic/arts
Project source code available on Alfresco Forge
University of St Andrews Digital Archiving Project
![Page 48: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk](https://reader035.vdocuments.us/reader035/viewer/2022062417/551b3459550346dd1a8b4ffb/html5/thumbnails/48.jpg)
48
Using Alfresco to create an Open Archival Information SystemDr Birgit Plietzsch
Arts Computing Advisor
Swithun Crowe
Developer for Arts and
Humanities Computing projects
&
IT Services, University of St Andrews