getting to disk-based lossless digital video preservation – an introduction paul theerman, walter...

44
Getting to Disk-based Lossless Digital Video Preservation An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine NIH/HHS

Upload: claire-hampton

Post on 30-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Getting to Disk-based Lossless Digital Video Preservation –

An Introduction

Paul Theerman, Walter Cybulski, Glenn Pearson

National Library of MedicineNIH/HHS

Page 2: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Historical Audiovisuals at the National Library of Medicine

Paul Theerman, Ph.D.

Head, Images and Archives

History of Medicine Division, NLM

Page 3: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Historical Audiovisuals at NLM

• Origins as the National Medical Audiovisual Collection

• A clearinghouse for these materials

• Variously held here and at CDC, Atlanta

• Only relatively recently transferred to the History of Medicine Division

Page 4: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Historical Audiovisuals at NLM

• Current definition of the collection

• All audiovisuals before 1970

• Films and videos of historical interest dating after 1970—that is, of interest for historical value, not informational value

Page 5: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Historical Audiovisuals at NLM

• The collection ranges from the first decade of the 20th century through the 1990s

• Content:– Early films on “how to go to the doctor”and

other public service and public information films

– Films on the U.S. Public Health Service

Page 6: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Historical Audiovisuals at NLM

• Content– Dental films due to an ADA donation– Training films for surgical procedures– Military: battlefield surgical films– Large recent donations from NIMH and FDA– “home movies”– Research footage– Films promoting usage of films in medicine

Page 7: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Historical Audiovisuals at NLM

• Size: the largest such collection in the U.S.

• Total number of titles: ~9650

• Number cataloged: 4300

• Number inventoried: 3550

• Number to be inventoried: ~1800

• Number preserved: 2250+

Page 8: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Historical Audiovisuals at NLM

• The ability to collect is dependent on the ability to preserve and to catalog, and, in the short run, to stabilize in order to preserve and to catalog in the future

• Controlled environments– On-site cool vault for new accessions,

masters– Off-site cool and cold vaults for new

accessions, originals

Page 9: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Historical Audiovisuals at NLM

• The decision to preserve and to catalog is not made lightly, because of the investment of resources

• Based on condition and content assessments

Page 10: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Historical Audiovisuals at NLM

• Condition assessment– Age of medium– Obsolescence of format– Possible or actual deterioration of medium

• Nitrate• Acetate

– Generation

Page 11: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Historical Audiovisuals at NLM

• Content assessment– Ownership and restrictions– Uniqueness– Age, especially pre-1950 – Then a sliding scale, based on collection

development guidelines

Page 12: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Historical Audiovisuals at NLM

• When both condition and content indicate, then:

– Preservation copying, to three copies (in some cases two)

– Cataloging, either to full or core records

Page 13: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Historical Audiovisuals at NLM

• Currently we are on the cusp of moving to digital formats, but our originals are chiefly analog, and our duplication and viewing copies are as well– Betacam SP for duplication copies– VHS for use copies

• This also matches patron needs for Interlibrary Loan and production

Page 14: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Historical Audiovisuals at NLM

• The Preservation and Collection Management Section enters the picture: – Determining formats– Technical specifications – Managing vendor copying– Managing on-site and off-site cool and cold

vaults– Managing shelving for use copies

Page 15: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Historical Audiovisuals at NLM

• New Ventures with Center for Information Technology (CIT) at NIH– Videocasting service of “history in the making”– Possible collaboration with NLM– Interlocking systems for preservation and

cataloging – New venture for NLM in a large cache of

digital materials

Page 16: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Historical Audiovisuals at NLM

• New Library Research at NLM

• NLM’s Lister Hill Center is looking at means of digital preservation

• The origin of this conference—excited what it will bring

Page 17: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Analog Motion Picture and Tape Preservation at NLM –

Duplication & Offsite Storage

Walter Cybulski, Preservation Librarian

Preservation & Collection Management Section,

Public Services Division, NLM

Page 18: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Examples of Film and Tape Media in the NLM Collections

8mm

16mm

35mm

2” Quadruplex

1” Type C

¾” U-Matic

½” Beta

Page 19: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Deterioration

Page 20: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Nitrate added spice to the idea of deterioration – unfortunately, nothing but hot pepper

(There are no nitrate film materials at NLM)

Page 21: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

“250 TEASPOONFULS OF VINEGAR FOR A 1,000 FOOT CAN OF 35mm FILM”

Page 22: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine
Page 23: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Main Objectives of Preservation

• Identify content that merits preserving

• Mitigate against known risks

• Extend useful life of content

Page 24: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Mitigate against risk.

Temperature (F) Relative Humidity

Years for Acidity to Double

Room Temperature: 70º F 50% RH 5

NLM Cool Vault: 55º F

Magnetic Tape

30% RH 50

NLM Cold Vault: 35º F

Acetate Movie Film

25% RH 200

Page 25: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

SECURE, CLIMATE-

CONTROLLED STORAGE AT

IRON MOUNTAIN

Page 26: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Extend useful life :

copy onto new media + ==

Page 27: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

For libraries and archives, obtaining new copies may not be possible, and copying content on deteriorated media to the same media (e.g. 35mm to 35mm film transfer) can

be prohibitively expensive

Page 28: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

At this point, the most widely used AV preservation media are BetacamSP and

Digital Betacam

But the clock is ticking even as we copy content onto these formats…

Page 29: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Rapidly changing technology takes its toll

Page 30: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

with each technological advance, the storage picture changes …

Page 31: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

WE ARE TRANSITIONING FROM FILMS AND TAPES TO DATA, BUT THE QUESTION REMAINS:

HOW TO EXTEND THE USEFUL LIFE OF THE CONTENT

101010101010101010101001010101000100101010100101101001000101010101010101010101010101010101010101010111101001110101011010101010101010100

101010101010101010001010101010000101001010111001011010110101010101011010101010101010101010101010101010100101010101010101001010100110010

101010100110000101010101010101011010101010101010101001101010110

101010101010101010110011011111010101110101010000001011010101110

Page 32: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Getting to Disk-based Lossless Digital Video Preservation –

Which Way Forward?

Communications Engineering Branch,Lister Hill National Center for Biomedical Communications

NLM

Glenn Pearson, Ph.D.Senior Software Developer

Page 33: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Generational Loss Once Digital

• Migration as preservation strategy– To cope with obsolescence of digital formats, gear

• If using lossy image compression algorithms– No degradation when making exact copy

Master Master– Degradation when migrating (or editing)

Master uncompress recompress Master– Examples: M-JPEGs, DVs, MPEG-1, -2, most -4

• Mathematically-lossless algorithms– Avoid this problem– Don’t compress as well (2x – 4x) as “virtually lossless”

(5x – 9x) or obviously lossy (web streaming)

Page 34: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Lossless Video Storage

• Uncompressed video– Can be stored with general binary file compressors (RLL, LZW

[zip] ), typically 1.6:1 - 2:1 compression

• Lossless video codecs– Standardized, open (but may be patents)

• HuffYUV – original, uses Huffman “entropy” encoding• Apple Quicktime “None” codec [documented, not standard]• JPEG 2000 Lossless (within, say, Motion JPEG 2000)• MPEG4/AVC Lossless

– Proprietary• Matrox DigiSuite: Lossless = entropy-only portion of M-JPEG• New - MatrixView’s “Adaptive Binary Optimization”, from patented

“Repetition Coded Compression” (boolean grids + Huffman)

Page 35: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Economics of Digital Storage

Sources: E. Grochowski & R. Halem, IBM Sys J, 42(2), 2003 (Disk, Flash)

R. Harada, Comp Tech Rev, June 2004 (Tape)

$ per GigaByte

0.01

0.1

1

10

100

1000

10000

1998 2002 2006 2010 2014 2020

DRAM/Flash

HDD Storage System

2.5" Hard Disk Drive

3.5" Hard Disk Drive

Tape Media Data is for computer tape, but digital video tape uses the same technology, which drives media price

Page 36: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

The Twilight of Tape

0.01

0.1

1

10

100

1000

10000

1998 2002 2006 2010 2014 2020

DRAM/FlashHDD Storage System2.5" Hard Disk Drive3.5" Hard Disk DriveTape Media

Hierarchical storage yesterday:

Hierarchical storage tomorrow:

HardDisks

Tapes

FlashDisks

HardDisks*

*Powered on-demand

Page 37: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Economics of Subsampling and Lossless Compression

• Gold Standard for digital video: 4:4:4 uncompressed• Not so affordable today for archives

Chroma

Luma

4:4:4 4:2:2

Uncompressed 1 2/3

Lossless ~1/3 ~1/4

• 4:2:2 lossless– will be affordable 2 years before 4:4:4 uncompressed– stay ¼ the cost

• When is 4:2:2 good enough for preservation?

In YUV colorspace:

Y is luma (B&W intensity)

U, V are red, blue color differences . respectively

4:4:4 = full sample/pixel

4:2:2 = sample for Y at full pixel resolution, for U, V at half resolution

Page 38: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Film Master Digital Master

• Traditional good advice: Film Film• Can Film Digital be

– as good as/better than Film Film– as affordable?

• Quality of source– 8mm, 16mm, 35mm, 65/70m– B/W vs color– camera original, intermediate print, distribution print

• Versus quality of target• HD video has1920x1080 (“e-Cinema”)

– Variety matching film best: progressive-scan 24 fps (1080p24)– But video has but 8-10 bits linear/component – less than film’s range– Good enough for archiving some 16mm B&W distribution prints?– HD 16:9 aspect matches some sources, not others

Page 39: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Film Master Digital Master- Hollywood Style

• Better than HD but $$

• 12 bit linear/component (36 bits/pixel)

• Or 10-bit log/component

• No subsampling

• 2K @ 24 fps = most practical res. & rate– 2K = 2048 x 1080– That’s outer bounds for various aspect ratios

Page 40: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

3 Steps, 3 Types of File Formats

• Sources (Production)

• Digital Intermediate

• Package for Theatrical Release

Page 41: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Sources• Computer Graphics• New cinema digital cameras

– Viper, Dalsa Origin 4K, Arri D-20, Kinetta

• Film Scanners– Kodak Genesis, Northlight, Arriscan, Imagina

• “Datacines” (data telecines)– Thomson Spirit, Cintel DSX, Millennium

• Raw, Unwrapped Frame-per-file Formats– Flexible resolution, aspect ratio– But sound, most metadata in separate files

• Awkward: per-shot info

– Examples• Kodak Cineon scanner .CIN (10-bit log rgb)• SMPTE std DPX (derived from Cineon)• Others: TIFF, SGI, EXR, JP2• “Digital Negative” from 1-CCD camera with Bayer-pattern color filters atop pixels

Magazinehas 1240 GBiPod

Drives

Page 42: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Digital Intermediate Process

• Creates Digital Masters– May include “Digital Source Master” from which multiple

masters come: DVD master, TV master, DCDM

• Typical Steps– Color grading, compositing, editing, finishing– Projects moved along in vendor formats or AAF– End products archived in vendor formats or MXF

• Such unencrypted masters closely held by studios, but archivists could make their own

Page 43: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Theatrical Distribution

• DCI Distribution Master (DCDM)– MXF wrapper + JPEG2000 frames– But lossy due to real-time bandwidth

constraints (250 Mb/s peak)– Something Similar for Archivists?

• a lossless variety of this• or MJ2 instead of MXF

Page 44: Getting to Disk-based Lossless Digital Video Preservation – An Introduction Paul Theerman, Walter Cybulski, Glenn Pearson National Library of Medicine

Roadblocks in Getting to a Disk-Based Lossless Archive Master

• Rapid digital-technology change• High current costs

– Top quality needs massive storage, high-speed pipelines– An uncompressed color movie (2K @ 24 fps, 12-bit)

• Would consume ~2 Gigabits per second bandwidth if realtime• Needs 0.8 TB storage per hour of length

– Plus $$$ for color grading/restoration services & software

• Analog tape SD digital is more affordable now• A proliferation of standards

– File Formats• Essence representation/codecs/color spaces• Wrappers

– Metadata & Rights Management• Can we help find a way forward?