technologies for appraising and managing electronic records
DESCRIPTION
This presentation was delivered with live demonstrations of three software solutions.TRANSCRIPT
![Page 1: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/1.jpg)
Technologies For A i i dAppraising and Managing Electronic RecordsRecordsPresented by: Peter Bajcsy-Research Scientist at NCSA-Associate Director of I-CHASS, I3 ,Institute-Adjunct Assistant Professor, CS & ECE UIUC
National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana-Champaign
![Page 2: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/2.jpg)
Acknowledgement
• This research was partially supported by a National Archive and Records Administration (NARA) supplement ( ) ppto NSF PACI cooperative agreement CA #SCI-9619019 and NCSA Industrial Partners.The ie s and concl sions contained in this doc ment• The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the National Archive and Records Administration, or the U.S. government.
• Contributions by: Peter Bajcsy Kenton McHenry Rob• Contributions by: Peter Bajcsy, Kenton McHenry, Rob Kooper, Michal Ondrejcek, Jason Kastner, William McFadden, and Sang-Chul Lee
Imaginations unbound
![Page 3: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/3.jpg)
Outline
• IntroductionA disco er of relationships among digital• A discovery of relationships among digital file collections (file2learn)
• A comprehensive comparison of contemporary documents (doc2learn)
• Automated file format conversions and conversion quality assessment (Polyglot)
• Summary
![Page 4: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/4.jpg)
Introduction
![Page 5: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/5.jpg)
Supporting NARA’s Strategic Plan
• According to The Strategic Plan of The National Archives and Records Administration 2006–2016. “Preserving the Past to Protect the Future”
“Strategic Goal: We will preserve and• “Strategic Goal: We will preserve and process records to ensure access by the public as soon as legally possible”public as soon as legally possible • “D. We will improve the efficiency with
which we manage our holdings from the time they are scheduled through accessioning, processing, storage, preservation and public use ”preservation, and public use.
![Page 6: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/6.jpg)
To Be Preserved!Digital representation of
information & knowledge
Preservation
Information transfer ?transfer ?
Imaginations unbound
AGENCY ARCHIVES
![Page 7: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/7.jpg)
Do We Know the Answers?
Questions During Appraisal of Electronic Records SeriesRecords Series
• (1) Given M full DVDs with files, which files are related?
• (2) Given N versions of the ‘same’ file• (2) Given N versions of the same file, which file version(s) should be
d?preserved?
![Page 8: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/8.jpg)
Do We Know the Answers?• (3) Given P file formats, which file format
to use and which conversion software to use a d c co e s o so t a eto use so files would be possible to view in a long run?in a long run? • How much information is lost during file format
conversion?conversion?
• (4) What is the granularity of i f ti th t h ldinformation that one should preserve about a decision process in order to reconstruct it?
![Page 9: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/9.jpg)
Goal: Design Technologies for Appraising and Managing Electronic Recordsand Managing Electronic Records
• Technologies should address the followingTechnologies should address the following problems: • (1) a discovery of relationships among• (1) a discovery of relationships among
digital file collections (file2learn)(2) a comprehensive comparison of• (2) a comprehensive comparison of contemporary documents (doc2learn)(3) d fil f i d• (3) automated file format conversions and conversion quality assessment (Polyglot)
![Page 10: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/10.jpg)
A Discovery of Relationships Among Digital File CollectionsAmong Digital File Collections
![Page 11: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/11.jpg)
Discovering Relationships Among Files
• How should one establish relationships among electronic records comingamong electronic records coming• From disparate sources or
From the same source at multiple time• From the same source at multiple time instances?
• Need to Understand the Complexity of the P blProblem
Imaginations unbound
![Page 12: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/12.jpg)
Discovering Relationships Among Files: Componentsp
• Metadata describing electronic records • How to extract metadata?• How to automate metadata extraction from multiple data
types, e.g., 2D drawings and 3D CAD models?• Storage of metadata
• What ontology to use to represent the extracted metadata?H t t d t d t d t d t ?• How to represent and store data and metadata?
• Exploratory and Search CapabilitiesHo to a tomate disco er of relationships?• How to automate discovery of relationships?
• How to support discovery of relationships between electronic records corresponding to the same physical p g p yobjects but different multidimensional observations?
Imaginations unbound
![Page 13: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/13.jpg)
Relationships Among Multiple Data Types• Example Data: Torpedo Weapon Retriever 841
• 784 existing 2D image drawings and N>22 3D CAD modelsmodels
• How to establish relationships among the 3D CAD models and 2D image drawings during a product lifecycle?
Hypothetical Distribution of 3D CAD models for TWR 841
Imaginations unbound
Hypothetical Distribution of 3D CAD models for TWR 841
![Page 14: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/14.jpg)
Methodology• File Identification• Information Extraction from
File S stem• File System• File Content
• Information Organization• Taxonomy
(classification) • Ontology
(relationships)• Information
Representation, Integration and Storage• XML• RDF
• Relationship Discovery
![Page 15: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/15.jpg)
File Identification and File System Analyses
• File Identification• What is the file format?• Is the file format well formed?
• Approach: Used DROID built on top of the PRONOM File Registry with additional NCSA support of 3D file identification
• Metadata extraction about a file system • Where is the file located?• Where is the file located?• What is the file size, time stamp, etc.?
• Approach: Use any file system information extractionApproach: Use any file system information extraction software, such as Aperture (cross platform, open source, active development), Google desktop, OS specific solutions (e.g., Apple Spotlight Linux MS Search)Apple Spotlight, Linux, MS Search)
![Page 16: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/16.jpg)
Content Analyses: Automation ?
isco
very
tions
hip
Di
OCRR
elat
Part name, Author,
Software, Date, …
Imaginations unbound
File Descriptors
![Page 17: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/17.jpg)
Content Analyses: Optical Character Recognition (OCR) of 2D DrawingsRecognition (OCR) of 2D Drawings
Reference Block
Title Block
MMC Block (Marinette Marine Corporation)MMC Block (Marinette Marine Corporation)
![Page 18: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/18.jpg)
‘Standard’ Title Blocks: Organization and Ontology TEMPLATESOntology
• Examples of title blocks used on drawings prepared by Naval
TEMPLATES
drawings prepared by Naval Construction Battalion and Naval Construction Regiment
![Page 19: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/19.jpg)
Title Block: Ontology and Metadata RepresentationRepresentation
Ontology for sub-fields:• A – Record of preparation (<tdrw:recordOfPreparation>), • B – Drawing title (<tdrw:drawingTitle>), • C – Preparing Activity <tdrw:preparingActivity>, p g y p p g y• F – Code identification number (<tdrw:FSCMNumber> ), • G – Drawing size (<tdrw:drawingSize>), • H Drawing number (<tdrw:drawingNumber>)• H – Drawing number (<tdrw:drawingNumber>), • J – Scale (<tdrw:drawingScale>), • K – Specification number (<tdrw:drawingNumber>), • L – Sheet number (<tdrw:sheetNumber>).
Resource Description Framework (RDF):• Metadata representation: subject – predicate - objectMetadata representation: subject predicate object
![Page 20: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/20.jpg)
MMC and Reference Blocks: Organization
• MMC Blocks
•The list variesin lengthg•The notation is not standardized
Inconsistencies
![Page 21: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/21.jpg)
Summary of OCR Based Analyses
• Manually encoded block coordinates for 784 files in PNG (converted from originally LZW compressed TIFF files)
• Automated OCR and executed OCR on • 700 title blocks, • 150 reference blocks,• dozen of revision and list of material• about 200 additional areas with the drawing numbers• about 200 additional areas with the drawing numbers
(MMC DWG. NO.).• Performance benchmarks:
• Full OCR of TB, MMC and RF for about 50 image files (105 blocks) took about 6 hours on a quad core machinemachine
![Page 22: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/22.jpg)
Content Based Extraction from STEP Files
• 3D CAD models in STEP file format are searched for any ASCII strings matching English dictionary and following STEPstrings matching English dictionary and following STEP metadata specification.
Example Metadata for TWR841 ship deck
STEP METADATA SPECIFICATION EXPECTED STEP METADATA PARSED STEP METADATA
FILE_DESCRIPTION( /* description */ (''), /* implementation_level */ '2;1'); FILE_NAME(
FILE_DESCRIPTION((''), /* implementation_level */ '2;1'); FILE_NAME( '120 TORPEDO WEAPONS RETRIEVER,
FILE_DESCRIPTION((''),'2;1'); FILE_NAME('D:\\NARA\\Archieve_data_samples\\BHD_FR12\\
/* name */ '',
/* time_stamp */'', /* author */ (''), /* organization */ (''), /* preprocessor version */ ' '
TRANSVERSE BULKHEADS BELOW, MAIN DECK', ‘04-10-86', ('LDOBSON'), ('NAVAL SEA SYSTEMS COMMAND'), ' '
U2110_BHD12_2007_05_09.stp',
'2007-05-10T13:45:37',('rakowpj'),(''),'Autodesk Inventor 11'/* preprocessor_version */ ,
/* originating_system */ '', /* authorization */ ' ');
, 'IDA-STEP', ' ');
Autodesk Inventor 11 ,'Autodesk Inventor 11','');
![Page 23: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/23.jpg)
Exploratory Framework – User Interface OverviewOverview
Filter for Files Filter for Files
Graph of Relationships Between Selected Files
Files Files
Preview of Selected
Preview of Selected
Data Data
![Page 24: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/24.jpg)
Exploratory Framework – User Interface OverviewOverview
Additional Import/Export and Preference Options
Table of Relationships Between Selected Files
![Page 25: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/25.jpg)
Exploratory Framework: Modes of Operations
• Detection of discrepancies/anomalies in file descriptors• OCR results
• View 2D drawings and OCR results, and then edit OCR descriptors
• 3D Model3D Model• View 3D model and content based extraction, and then edit
descriptors
Comparison of pairs of files• Comparison of pairs of files• Pairs of 2D drawings• Pairs of 3D models• Pairs of (2D drawing, 3D model)
• Establish file relationships• Insert logical links to relate a pair of files• Insert logical links to relate a pair of files
![Page 26: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/26.jpg)
Detection of Anomalies in OCR Results
![Page 27: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/27.jpg)
Comparison of Files
Color encoding:P di t• Predicates and values match
• Predicates matchP di t• Predicate occurs only in one file
![Page 28: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/28.jpg)
Establish File Relationships
![Page 29: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/29.jpg)
Establish File Relationships: Logical Link
![Page 30: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/30.jpg)
A C h i C iA Comprehensive Comparison of Contemporary Documents
![Page 31: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/31.jpg)
Support of Appraisals by Enabling Comparisons• How to compare containers with heterogeneous
information (text, images, vector graphics, ( g ganimation, 3D, etc.)?• Methodology• Metrics• Weighting factors for fusion g g
• How to quantify similarities between the same type of information? • Encodings and Representations• Metrics• Local versus global differences
Imaginations unbound
![Page 32: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/32.jpg)
Example: Adobe Portable Document Format (PDF)Format (PDF)
• Why PDF? - PDF is just an example of a container• Office environment (Adobe PDF PS MS Word HTML )Office environment (Adobe PDF, PS, MS Word, HTML, …)• Satellite measurements (HDF, netCDF, …)
3D
Adobe Library 6.0
Movie
Ad b Lib 7 0Adobe Library 7.0
Imaginations unbound
![Page 33: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/33.jpg)
Comparisons
Imaginations unbound
![Page 34: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/34.jpg)
Example: Compare Veterans Affairs Fact Sheets in PDF and MS Word file formatsSheets in PDF and MS Word file formats• Test data: 108 files from RG 015 - Records of the
Department of Veterans Affairs/FactDepartment of Veterans Affairs/Fact Sheets/www1.va.gov/opa/fact/docs. • These files are Veterans Affairs Fact Sheets and are stored in both
PDF and MS Word file formats (54 MS word and 54 PDF files)PDF and MS Word file formats (54 MS word and 54 PDF files).
• Which files have identical content?• Demo: 6 filesDemo: 6 files
• amwars-2.pdf, amwars.pdf• claimpro-2.pdf, claimpro.pdf
comprates 2 pdf comprates pdf• comprates-2.pdf, comprates.pdf
![Page 35: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/35.jpg)
Methodology
Pair-wise comparison +pof the same
digital objects
+ …
Comparison of multiple and
heterogeneous digital objects
Relationship toPermanent Records
+ …
![Page 36: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/36.jpg)
Exploration of Text Components
LOADED FILES
Occurrence of numbersOccurrence of words “Ignore” words
![Page 37: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/37.jpg)
Exploration of Image Components
LOADED FILES
Occurrence of colorsList of images Preview“Ignore” colors
![Page 38: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/38.jpg)
Exploration of Vector Graphics ComponentsComponents
LOADED FILESLOADED FILES
Preview
Occurrence of v/h lines
Imaginations unbound
![Page 39: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/39.jpg)
Comprehensive Pair-Wise Comparison of DocumentsDocuments
Grouping and Visualization Control
Similarity Values
Visualization Control
Document ID
![Page 40: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/40.jpg)
Visual Comparison for 6 Test Files
Result:amwars-2.pdf = amwars.pdf
claimpro-2.pdf = claimpro.pdfcomprates-2.pdf = comprates.pdfcomprates 2.pdf comprates.pdf
![Page 41: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/41.jpg)
Computational Requirements forRequirements for Executing the MethodologyMethodology
Yellow indicatescomputations
Relationship toPermanent Records
Appraisal & Sampling
![Page 42: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/42.jpg)
Work in progress: Group and Validate DocumentsDocuments
men
tsof
doc
urib
utes
o
Order of documents
Attr
Order of documents
![Page 43: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/43.jpg)
Automated File Format ConversionsAutomated File Format Conversions and Conversion Quality AssessmentAssessment
![Page 44: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/44.jpg)
Conversions of Electronic Records• Conversions of electronic records are needed because
• Visual exploration depends on various software packages
• Many formats are retired (deprecated) over timeHow to measure the degree of information• How to measure the degree of information preservation when files are converted from format A to format B?• During conversions, information could be lost, added or
modifiedWh t i th i t f h b t bj t t ?• What is the importance of each byte, object, etc. ?
• How to design a test bed for analyzing the quality of conversion and visualization software?
Imaginations unbound
![Page 45: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/45.jpg)
Illustration of 3D File Format Reality* k3d* * b * .k3d*.ma, *.mb, *.mp*.pdf (*.prc, *.u3d)
*.w3d
*.dwg *.max, *.3ds*.blend *.iam*.lwo *.c4d
![Page 46: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/46.jpg)
Our Survey about 3D Content
• Q: How Many 3D File Formats Exist?• A: We have found more than 140 3D file• A: We have found more than 140 3D file
formats. Many are proprietary file formats. Many are extremely complex (1,200 and more pages y p ( , p gof specifications).
• Q: How Many Software Packages Support 3D File Format Import, Export and Display?
• A: We have documented about 16 software packages. There are many more. Most of them are proprietary/closed source code. Many contain incomplete support of file specificationscontain incomplete support of file specifications.
![Page 47: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/47.jpg)
Examples of 3D Formats and Stored Content
Geometry Appearance Scene Format Faceted Parametric CSG B-Rep Color Material Texture Bump Lights Views Trans. Groups
Animation
3ds √ √ √ √ √ √ √ √ √
igs √ √ √ √ √ √ √
lwo √ √ √ √ √ √
obj √ √ √ √ √ √ √
ply √ √ √ √ √
stp √ √ √ √ √ √
wrl √ √ √ √ √ √ √ √ √ √ √
u3d √ √ √ √ √ √ √ √ √
x3d √ √ √ √ √ √ √ √ √ √ √
• Some content may be more important than others• The relative importance is situation dependent
![Page 48: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/48.jpg)
Example: Conversion of X3D to STEP to X3D
Software:
X3D
X3dToVrml97
WRLX3DSoftware:
A3D Reviewer
Software: Software: Nothing!A3D Reviewer Vrml97ToX3d
STEP WRL X3D
![Page 49: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/49.jpg)
Towards a Universal Converter
• Use what is available in 3rd party software to f iperform conversions
• Document what formats can be d/i t d b h li tiopened/imported by each application
• Document what formats can be saved/exported by each applicationsaved/exported by each application
• Automate the use of each application and combine their abilities to perform conversionscombine their abilities to perform conversions over larger set of formats
![Page 50: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/50.jpg)
Input/Output Graphs
Adobe 3D Reviewer
![Page 51: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/51.jpg)
Automation of 3D File Format Mapping
Find the shortest pathFind the shortest path
Convert
Preview
Imaginations unbound
![Page 52: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/52.jpg)
Automation of 3D File Format Conversion
• The I/O-Graph stores the information needed to convert between the formats represented in the graphbetween the formats represented in the graph.
• In order to perform the conversion we must execute the conversion path found.p• Many high end graphics programs are found on the windows
platform• Those on other platforms, such as Linux, tend to have windowsThose on other platforms, such as Linux, tend to have windows
ports• Some are command line driven (usually small converter
applications).applications).• Many have only GUI interfaces• AutoHotKey: a scripting language for the Windows GUI.
![Page 53: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/53.jpg)
Methodology EXTENSIBILITY
Cloud Computing
AUTOMATION
COMPUTATIONAL SCALABILITY
Services to Archivists
![Page 54: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/54.jpg)
NCSA Polyglot – Conversion Services
• Web interface: user can drag and drop filescan drag and drop files into upload area for conversion
• Java interface:
PolyglotRequest pgr;
pgr = new PolyglotRequest(“http://???”, “obj”);
pgr.convertFile(“file.wrl”, “./”);
Number of PCs One PC Two PCs
• Scalability Test
Processing Time 33 minutes 6 seconds
16 minutes 40 seconds
![Page 55: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/55.jpg)
NCSA Polyglot – Data Loss Measurement ServicesServices
We would like to assign a value to each
conversion edge …
![Page 56: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/56.jpg)
Geometry Based Content Retention
• Several metricsD t d i i t• Data driven assignment
• Example resultsp
Metric\Result Single Optimal Conversion ‘Best’ File FormatSoftware From To Information
RetentionFormat Information
RetentionRetention RetentionLight Fields Adobe 3D
Reviewer.pdf .stp 61.67 .stp 40.73
Spin Images Adobe 3D .obj .pdf 59.07 .stl 34.89p gReviewer
j p
![Page 57: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/57.jpg)
Summary• Technologies for appraisal of electronic records
should assist archivists• They are designed to support decisions and data
explorations by automating appraisal tasks• The software for doc2learn and Polyglot is available• The software for doc2learn and Polyglot is available
for downloading at http://isda.ncsa.uiuc.edu/download/
• File2learn software – the work is still in progress• Feedback is very welcome
• Questions: Peter Bajcsy – [email protected]
![Page 58: Technologies For Appraising and Managing Electronic Records](https://reader034.vdocuments.us/reader034/viewer/2022042614/55512584b4c905325d8b4578/html5/thumbnails/58.jpg)
Demo exercise
• Step 1: Check the path exists betweenpath exists between wrl and pdf
• Step 2: drag and drop heart.wrl; select target to be pdf click uploadto be pdf, click upload
• Step 3: download toStep 3: download to desktop and open in Adobe PDF Viewer