scheduled scientific data releases using .backup volumes
DESCRIPTION
How the Mars Space Flight Facility uses (and abuses) the .backup snapshot feature of OpenAFS. This presentation was given by Chris Kurtz and myself at the 2008 AFS and Kerberos Best Practices Workshop in Newark, NJ.TRANSCRIPT
May 23rd 2008
Chris Kurtz
Zach SchimkeMars Space Flight Facility
Arizona State University
Scheduled Scientific
Data Releases
Using .backup Volumes
2
Outline
Introduction: The Mars Space Flight Facility
Spacecraft Data and You
Image Processing
The Problem: Released and Unreleased Data
The Solution: AFS and .backups
Overview of MSFF use of AFS
Feature Requests
Questions
3
Introduction
NASA/Jet Propulsion Lab funded research institution
Scientists, Mission Planners, Developers, SysAdmins
Four instruments on Mars:
TES (Thermal Emission Spectrometer)
Mars Global Surveyor (1996-2006)
THEMIS (THermal EMission Imaging System)
Mars Odyssey (2001 to current)
Mini-TES
MER Rovers Spirit and Opportunity (2004 to current)
Over 80 TB of collected mission data (including AFS)
4
Spacecraft Data and You
Instrument captures data on Mars
Spacecraft combines data from all instruments, adds
spacecraft telemetry, and sends to Earth via radio to
be received by the DSN (Deep Space Network)
JPL correlates, decodes, and packages data for each
instrument
MSFF pulls the raw data for its instrument from JPL
MSFF processes the data through multiple steps
5
Spacecraft Data and You: THEMIS Data Types
IR NIGHT
IR DAY
VISIBLE
THEMIS Data Types:
Infrared (IR)
100m per pixel
Daytime and
Nighttime Images
Visible Light (VIS)
18m per pixel
6
Image Processing
Raw
(EDR)
Calibrated
(RDR)Projected
(GEO)
SFDU: Standard Formatted
Data Unit
EDR: Experiment Data
Record
RDR: Reduced Data Record:
GEO: Geometrically
Registered Record
(2x) (4x)
7
Image Processing
Due to the volume of data, two 100-CPU Linux
clusters are used for processing and the resulting
products are stored on a high-end NFS server from
Network Appliance
These data products are made available to Science
Team members immediately via authenticated
services
JPL contract requires data to be released to the public
6 months after being received (to give Operations time
to validate, calibrate, process, perform scientific
analysis, etc) – this is the crux of the problem
8
Image Processing
Snow and Ice in Udzha Crater
(VIS – False Color)
Image Credit: NASA/JPL/ASU
9
Image Processing
Hemetite in MeridianiPlanum
(IR – False Color)
Image Credit: NASA/JPL/ASU
10
The Problem: Released and Unreleased Data
There is a 6-month grace period between data
collection and public release
Previous methodology was to copy over 25 TB of data
via rsync from internal NFS to stand-alone web
server(s)...This had issues:
It took forever just to build the file list
The rsync itself took days
Releases took longer and longer (we regularly re-
process old data with updated calibration, so have to re-
release)
Webservers needed fast, expensive, redundant disk
11
The Solution: AFS and .backup
Data is moved from expensive NFS to cheap AFS
AFS excels at storing large amounts of Read Only
data redundantly and at reasonable costs
AFS snapshot backups allow us to keep public data
public and private data private
12
The Solution: NFS vs AFS
NFS (Network Appliance)
High Speed (Trunked GigE)
High throughput (100,000 ops/sec)
Redundant (Mod. RAID4, clustered servers)
EXPENSIVE!!! ($5000 per TB)
AFS (CentOS Linux Servers)
Fast RO
Slower RW
Redundant (RAID5)
Cheap! (Less than $1000 per TB)
VS
13
The Solution: .backup volumes
AFS .backup volumes are point-in-time copies that are
independent of the original volume (a “reverse delta”) -
since the original volume can be altered without
affecting the .backup, this is useful!
New methodology:
All volumes of released data have a .backup volume
created using standard tools (vos backup)
Website references backup volume names
This new process takes an hour or two (depending on
how many new .backup volumes are created)
Process moved from SysAdmins to Operations
14
MSFF and OpenAFS
Once processed, data is stored in AFS in 100-orbit
“chunks” (afs volumes) according to various data
types, such as “themis.RDR.V284XXRDR” (THEMIS
instrument container volume, RDR container volume,
Visible Camera orbits 28400-28499 RDRs)
Co-Investigators at other Universities access the data
via authenticated AFS, FTP, and website as it is
proprietary...for a while
Public access via web, ftp, and AFS
15
MSFF OpenAFS Specifics
Cell: mars.asu.edu
AFS DB servers are Xen virtual machines
Servers:
8 AFS File Servers
CentOS 5.1 (formerly Fedora Core 4)
15,000 volumes / 35 Tb of AFS storage (RAID 5)
4000 read/write volumes (8000 .readonly)
3500 .backup
Nagios monitoring of BOS, Disk Space, rxdebug
16
Feature Requests
Additional snapshot capability besides .backup
At least one .snapshot, but more would be nicer.
File Server implied ACLs for this .snapshot
Volume Autorelease
Built-in Mechanism to automatically release volumes.
Better VOS granularity
Allow users to release specific volumes or volume sets
rather than it being all or nothing.
(Open)LDAP support for PT Server
Better cron support (mostly solved by k5start)
17
Questions
Gusev Crater (VIS – False Color)
Image Credit: NASA/JPL/ASU, Mars Express HSRC Camera, ESA/DLR/FU Berlin (G. Neukum)
18
Final Remarks
Utopia Plains(IR/VIS – False Color)
Image Credit: NASA/JPL/ASU