interlib(-related) activities at sdsc/dice

Post on 14-Jan-2016

49 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

InterLib(-Related) Activities at SDSC/DICE. IBM HPSS (Storage/Archival, e.g. ADL) SDSC SRB/(E)MCAT (Data Handling/Information Discovery) AMICO Image Collection (CDL Testbed) Excelon as XML Data Server - PowerPoint PPT Presentation

TRANSCRIPT

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

InterLib(-Related) Activities at SDSC/DICE

Bertram Ludaescher

ludaesch@sdsc.edu

• IBM HPSS (Storage/Archival, e.g. ADL)

• SDSC SRB/(E)MCAT (Data Handling/Information Discovery)

• AMICO Image Collection (CDL Testbed)

Excelon as XML Data Server

• MIX: Mediation of Information using XML (with DB-Lab UCSD)

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

HPSS, SRB, MCAT

• HPSS: Storage/Archival of large datasets• (UCB, UCSB, Stanford)

• SRB/(E)MCAT: Data Handling/Information Discovery• transparent access to remote storage• replication• containers for large number of small items• caching• authorization• proxy operation support (filtering, data subsetting)• usage of security infrastructure (GSI)

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

SRB Interface

Application

MCAT

SRB MasterSRB Agent

Application

SRB Server

SRB Server

SRB Server

MCATCore

DublinCore

EcoCore

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

Managing Metadata: EMCAT

• Extensible Meta Data Catalog - EMCAT• Exploits dependencies & relationships (m:n, tc, <=>, …)• T-Language - Markup, Filter & Presentation• Meta Data Repository (Object-, System-, Collection-level)• Based on Kernel Meta Meta Data • Extensible • Uniform Access and Federation interface• Metadata exchange Interface Protocol

• MAPS- Meta data Attribute Presentation Structure• query, update and result structures• Close to Z39.50

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

SRB/MCAT Future

• Performance Improvements and Consolidation• Delayed Action Manager - mirror, cronjobs• Support for Methods• Handling Very Large Data sets - partitions• More Drivers - Sybase, NTFS, LDAP• Extensible MCAT• Language Support - Perl, Fortran

http://www.npaci.edu/DICE/SRB

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

The AMICO Digital Library Project

http://www.amico.orghttp://www.npaci.edu/DICE/AMICO

Art Museum Image ConsortiumRichard Marciano et. al.

55,146 objects 750 MB

53,763 thumbnail images 319 MB

57,609 full tiff images 180 GB

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

AM

ICO

Co

nso

rtiu

m

of

26 (

no

w 3

1) m

use

um

s AGO_ Art Gallery of Ontario AIC_ Art Institute of Chicago AKAG Albright-Knox Art Gallery, Buffalo, NY ASIA Asia Society BMFA Boston Museum of Fine Arts CCP_ Center for Creative Photography, U. Arizona CMA_ The Cleveland Museum of Art DMCC Davis Museum and Cultural Center, Wellesley College, MA FASF Fine Arts Museums of San Francisco GEH_ George Eastman House, Rochester, NY JPGM J. Paul Getty Museum, Los Angeles, CA LACM Los Angeles County Museum of Art LOC_ Library of Congress MACM Musée d'art contemporain de Montréal MBAM Musée des beaux-arts de Montréal MCAS Museum of Contemporary Art, San Diego MIA_ The Minneapolis Institute of Arts MMA_ The Metropolitan Museum of Art NGC_ National Gallery of Canada, Ottawa/Ontario NMAA National Museum of American Art, Smithsonian Institution PMA_ Philadelphia Museum of Art SFMO San Francisco Museum of Modern Art SJMA San Jose Museum of Art TFC_ The Frick Collection, NY WAC_ Walker Art Center, Minneapolis, MN WMAA Whitney Museum of American Art, NY

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

Raw Metadata Structure - catdata: 8 files 16,604 year1.d990429 14,430 year1.d990512 22,938 year1.d990520 54,303 year1.d990627 15 year1.d990708 54,298 year1.d990731 93 year1.d990806 657 year1.d990813

- tiffmetadata: 23 files 2963 AGO_.tiffmetadata.txt 1016 AIC_.tiffmetadata.txt 894 AKAG.tiffmetadata.txt 187 ASIA.tiffmetadata.txt 7591 BMFA.tiffmetadata.txt 401 CCP_.tiffmetadata.txt 1455 CMA_.tiffmetadata.txt 56 DCMC.tiffmetadata.txt 470 DMCC.tiffmetadata.txt 10141 FASF.tiffmetadata.txt 2137 GEH_.tiffmetadata.txt 1459 JPGM.tiffmetadata.txt 1013 LACM.tiffmetadata.txt 20654 LOC_.tiffmetadata.txt 86 MACM.tiffmetadata.txt 50 MBAM.tiffmetadata.txt 31 MCAS.tiffmetadata.txt 1440 MIA_.tiffmetadata.txt 550 MMA_.tiffmetadata.txt 1507 NGC_.tiffmetadata.txt 1416 NMAA.tiffmetadata.txt 154 PMA_.tiffmetadata.txt 158 SFMO.tiffmetadata.txt 86 SJMA.tiffmetadata.txt 68 Such.tiffmetadata.txt 396 WAC_.tiffmetadata.txt 37069 replacements.txt 57499 replacements2.txt

- thumbmeta: 52,689 files AGO_.1016.25_thum.met* AGO_.1016.32_thum.met* AGO_.1016.39_thum.met* …... WAC_.994C_thum.met WAC_.996C_thum.met WAC_.998C_thum.met WAC_.99C_thum.met* WMAA.1557_56_thum.met WMAA.31_426_thum.met

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

AMICO Metadata Conversion Steps

Merge“Raw” Metadata files: - catdata (8 files), - tiffmetada (23 files), - thumbmeta (52,689 files)

Convert toXML

Split-by-museums 1

XML fileper museum

Split-by-file size

MultipleXML files

per museum

eXcelonDump&Load

Utility

eXcelonData Server

Split-by-machines

1 XML fileper museum

Multiplemuseum XML

files per machine

3 XML files: - 1 catdata - 1 tiffmetadata - 1 thumbmeta

eXcelonData Server

eXcelonData Server

ConsolidatedMetadata files: - 1 catdata - 1 tiffmetadata - 1 thumbmeta

Tape Read

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

Alternative System Architectures

AMICOmetadata

server

* eXcelonHPSS

SRB

* Oracle 8i* DB2

Fileserver180GBRAID

180GBRAID

Data Server

100Mbit Ethernet

HPSS

DB2

Data Server

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

Current catalog metadata count (per museum)

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

Average tiff size in MB (per museum)

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

Excelon Metadata Layout

XMLStore

Museum1 Museum2 Museum-n

File1.xml

Machine1 Machine2

Binder doc.xml

XQL Query

File2.xml

Museum directories

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

XMAS query

MIX: Mediation of Information Using XML ...… for the AMICO CDL Prototype

MIXmengine

MIXmengine

Wrapper

MARCDatabase

XML doc

AMICO XMLDatabase

AMICO XMLDatabase

SRB/MCAT

HPSS

Request forimage (X.509)

tif file

BBQ Interface(slide carousel

interface)

XMAS: XML Matching and Structuring query language

View based onAMICO DTD

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

SDSC/DICE Discussion Topics

• ADL: caching of HPSS data• ADEPT access to ADL for CDL testbed: SRB?• “Union Catalog”:

• AMICO DTD <=XMAS=> MARC

• SDLIP access to SRB/MCAT and MIX• Use of GINF (Stanford) • ...

top related