sony pictures entertainment digital backbone file
TRANSCRIPT
This document is solely for the use of Sony Pictures Entertainment (SPE) personnel. No part of it may be circulated, quoted, or reproduced for distribution outside the SPE organization without prior written approval from IBM.
© Copyright IBM Corporation 2009
Sony Pictures Entertainment
Digital Backbone File Management and Infrastructure
October 15, 2009
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Agenda
Review of assumptions and figures from our last meetingIntroduction and discussion of proposed approach to file managementReview and discussion of impact new requirements will have on server and storage infrastructureLTO Library Capacity Requirements
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
100MB/s
Data FROM CW = 330MB/s
DBB TO LTO = 430MB/s
Data FROM ‘other sources’ = 100MB/s
100MB/s
Data TO ‘other sources’ = 100MB/s
200MB/s
Data TO other Destinations = 200MB/s
Data FROM LTO Library to DBB = 86MB/sData FROM CW = 330MB/sData FROM other sources = 100MB/s
DBB TO LTO = 430MB/sData TO CW = 100MB/sData TO other sources = 100MB/sData TO other Destinations = 200MB/s
Total WRITE = 516MB/s
It was mentioned that there may be a desire to have two copies of everything on tape and/or send a second copy through compression for storage to LTO off site. Assuming we need to keep up with ‘constant ingest’ this will add an additional 430MB/s for DR.
Total System Bandwidth = 1,776MB/s
DBB TO LTO2 (DR) = 430MB/s
Other Base Assumptions:o 1 Week = 6 dayso 1 Day = 24 hourso Data Retention (DBB) will vary but is assumed to be 18 monthso CW Scanner Generate 25TB/day (150TB/week)o CW Lib Masters and Digital Cam shots add another 50TB/week o CW File Size can be between 10 and 110MBo “Other” sources to DBB will provide 50M files/year at 20MB/file
and 70TB/week/weeko Original workflow called for all content to go to BB and then be
“MOVED” to tape. Selected sequences would be called via a list from Sony and “COPIED” back from Tape to Disk.
o There may be alternative workflows which are more efficient.o Sufficient Network infrastructure exists within the DBBo Other assumptions related to file management and works are
included in deck
10/14/09
430MB/sSpecified at 20% = 86
Imageworks DBB
330MB/s 100MB/s
Data TO CW = Undefined – Assuming 100MB/s
86MB/s READ from LTO (20% specified)
Total READ = 830MB/s
70TB/week1
200TB/6-day week (35TB/day)1
Volumes:CW to DBB = 210TB/Week (35TB/day specified)Other Sources to DBB = 70TB/WeekTotal expected volume to DBB (and therefore LTO)
= 280TB/Week1Note:1. Assuming 330MB/s over 24 hours yields 28.5TB NOT the 35TB specified2. Assuming 100MB/s over 24 hours yields 8.6TB NOT the 12TB specified
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Sony Backbone ArchiveAssumptions and Details
Steve KauffmanRainer RichterECM Lab Services – San Jose10/14/09
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Sony Backbone Archive Overview
High speed tape accessA group of files are written as a single object on tape.A smaller number of larger tape objects is more efficient.A reduction in the number of objects in the tape storage system results in a smaller tape object database.Larger objects are written to tape at a faster rate.The map between tape objects and disk files is stored with the files, both on disk and on tape, and can also be saved independently.Individual files can be retrieved without retrieving the entire group.Greater flexibility in what protocols locations may be enabled for LTO read/writeImproved organization of files on LTOEase of recovery from system problems Integrated with standard TSM and GPFS products
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Virtual File Repository
SonyScripts
VFR API
TSM
TSM tape
GPFSBackbone
VFRMovers
Sony scripts interact with the files through the VFR API, requesting files for applications
VFR moves files between GPFS and TSM.
VFR stores a sequence of files as one TSM object
VFR retrieves single files, partial sequences of files, or an entire sequence of files from TSM
TSMDatabase
SonyApplications
GPFS
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Phase Overview
Write tapes in durabytes formatGPFS/TSM/Durabytes
TBDDurabytes5
Ingest directly to tape, which reduce GPFS bandwidth
GPFS / TSM AggTSM AggTSM Agg4
Data is moved to tape, GPFS deletesGPFS / TSM AggTSM AggGPFS3
Tapes are write only pending development of recall code
GPFSTSM AggGPFS2
Application use GPFS only during TSM config and VFR development
GPFSNoneGPFS1
CommentRecall fromTape Mgmt
Ingest ToPhase
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Key Considerations
Phase 3 needs to be complete before GPFS capacity is exceeded
Workflow responsibility for WIP tracking, error notification and retries
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Phase 1- Disk Only
Write data to GPFS arrayTest feeds and speedsSony Applications are unchanged, point to GPFSDevelop VFR components
Pre-Reqs1. GPFS installed and operational2. Sony apps can access GPFS
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Phase 2 – Tape Archive
GPFS data Archived to tape in aggregated format, source files REMAIN– Tapes will be valid for future phases, no migration will be needed– Immediate aggregation benefit
Sony Applications may still access GPFS for recalled dataCustom applications select files to archive by directory
Pre-Reqs
TSM Archive Aggregation code completed– Mapper
– Selects directories to be archived– Creates the Mapfile for each directory, saved in that directory
– Archiver– Locates directories with Mapfiles and data files, invokes PRouter to archive
them– PRouter - TSM API archive application
– Aggregates a directory of files (including the mapfile) and stores in TSM
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Phase 3 – Tape RecallGPFS Ingest data Archived to tape, source files DELETEDSony Apps link to VFR API
– Data retrieved from GPFS or TSM– Delivered directly to destination
Pre-Reqs
VFR API complete– Recall processor
– Interface to Sony App to get Request file– Convert Sony Request file to a Recall Mapfile, invoke PRouter
– PRouter TSM recall capability– Read the Recall Mapfile, Recall files from TSM via partial object restore,
write to destination via FTP, CIFS, NFS etc.Sony apps modified to access VFR
– List based requests to optimize sequential tape reads– Optional : Recall to GPFS to preposition data for subsequent fast recall
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Phase 4 – Direct Archive
Ingested data archived from source to tape– No intermediate GPFS copy
– Reduces GPFS and network bandwidth
Pre-Reqs
Modify Ingest routines to access source files– Mapper, Archiver, PRouter
Trigger routines based on source availability – Files must be in place before triggering
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Phase 5 - Durabytes
Data archived to LTO tape in Durabytes format– Open, portable, self-describing format
– Potential tape write speed increase
– No migration of existing TSM data– VFR concurrently supports TSM and Durabytes repositories
Pre-Reqs
Durabytes Tape Manager– TSM or new
PRouter converted to support Durabytes
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
State Transition Diagram - Phase 3
Initial Ingested Mapped Archived
HotFolder
0 GPFS files 1440 files 1441 files 1 file
Ingest(Sony) Mapper Archiver
/ PRouter Recall
PrePosition
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
GPFS Contents - Initial
/Project1
00:00:00 00:01:00 00:02:00 00:03:00
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
GPFS Contents - Ingested
/Project1
00:00:00 00:01:00 00:02:00 00:03:00
00:00:00:01.dpx00:00:00:02.dpx
.
.
.
.00:00:59:23.dpx
00:01:00:01.dpx00:01:00:02.dpx
.
.
.
.00:01:59:23.dpx
00:02:00:01.dpx00:02:00:02.dpx
.
.
.
.00:02:59:23.dpx
00:03:00:01.dpx00:03:00:02.dpx
.
.
.
.00:03:59:23.dpx
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
GPFS Contents - Mapped
/Project1
00:00:00 00:01:00 00:02:00 00:03:00
Project1:00:00.map00:00:00:01.dpx00:00:00:02.dpx
.
.
.
.00:00:59:23.dpx
Project1:00:01.map00:01:00:01.dpx00:01:00:02.dpx
.
.
.
.00:01:59:23.dpx
Project1:00:02.map00:02:00:01.dpx00:02:00:02.dpx
.
.
.
.00:02:59:23.dpx
Project1:00:03.map00:03:00:01.dpx00:03:00:02.dpx
.
.
.
.00:03:59:23.dpx
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
GPFS Contents - Archived
/Project1
00:00:00 00:01:00 00:02:00 00:03:00
Project1:00:00.map Project1:00:01.map Project1:00:02.map Project1:00:03.map
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Hot Folder Contents - Recall
/Project1_ClientA_Req1234
00:01:12:18.dpx00:01:12:19.dpx
.
.
.
.00:01:13:04.dpx
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
State Transition Diagram - Phase 4
Initial Mapped Archived
HotFolder
1440 source files 1441 src files 1 GPFS file
Mapper Archiver/ PRouter Recall
PrePosition
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Backbone Assumptions 1. The backbone will have disk storage and tape storage2. There will be a single file system view which unifies file searching and access regardless
of the actual file location3. Files are copied to the backbone, or copied from the backbone. They are NEVER
updated directly while on the backbone.4. If a file is changed while off the backbone, it will be returned to the backbone under a
different name.a. i.e. there will only ever be a single version of each file on the backbone, though it
can be in multiple locations, disk, tape.b. Each file will have a unique combination of path and name. i.e. names can repeat
but must be in different subdirectories, There is no support for overwrite.5. All files arriving on the backbone disk will be copied to tape, resulting in 2 copies of the
file, 1 on disk, 1 on tapea. The copy to tape will be asynchronous as capacity allowsb. The copy on tape may be suitable for DR c. Optional policies will create copies on 2 different tapes for redundancy
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Backbone Assumptions
6. Files on disk may be deleted per policy, leaving only a single copy on tape7. Files on tape will be deleted via manual processes on a project basis8. Requests for access to a file on tape will cause the file to be copied from tape to the client
locationa. The file is NOT returned to the disk area of the backbone since the presumption is that
the files will be changed and return under a different name.b. File delivery will be highly asynchronous due to request queues, tape, and tape drive
access limitations9. Directory naming conventions will help optimize performance
a. directory names will isolate projects from each other i. facilitate tape reclamationii. map to TSM constructs, e.g. tape pools
b. directory arrangements will group together data likely to be accessed together (e.g. frames with the same minute) i. minimize tape mounts required for file recall
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Summary
Use GPFS as the Index for all files– Directly if the file is on disk
– Indirectly via a Mapfile if the file is on Tape–~1500X reduction in GPFS and TSM objects
Mapfile is an aggregation of stubsCustom code enables Aggregation– Aggregate files to TSM
– Partial object restore for efficient recall
This document is solely for the use of Sony Pictures Entertainment (SPE) personnel. No part of it may be circulated, quoted, or reproduced for distribution outside the SPE organization without prior written approval from IBM.
© Copyright IBM Corporation 2009
Disk Performance Options for Sony Pictures Digital Backbone
October 15, 2009
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Objectives
Understand aggregate storage performance requirements
Design a system to meet those requirements
Reuse existing assets
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
100MB/s
Data FROM CW = 330MB/s
DBB TO LTO = 430MB/s
Data FROM ‘other sources’ = 100MB/s
100MB/s
Data TO ‘other sources’ = 100MB/s
200MB/s
Data TO other Destinations = 200MB/s
Data FROM LTO Library to DBB = 86MB/sData FROM CW = 330MB/sData FROM other sources = 100MB/s
DBB TO LTO = 430MB/sData TO CW = 100MB/sData TO other sources = 100MB/sData TO other Destinations = 200MB/s
Total WRITE = 516MB/s
It was mentioned that there may be a desire to have two copies of everything on tape and/or send a second copy through compression for storage to LTO off site. Assuming we need to keep up with ‘constant ingest’ this will add an additional 430MB/s for DR.
Total System Bandwidth = 1,776MB/s
DBB TO LTO2 (DR) = 430MB/s
Other Base Assumptions:o 1 Week = 6 dayso 1 Day = 24 hourso Data Retention (DBB) will vary but is assumed to be 18 monthso CW Scanner Generate 25TB/day (150TB/week)o CW Lib Masters and Digital Cam shots add another 50TB/week o CW File Size can be between 10 and 110MBo “Other” sources to DBB will provide 50M files/year at 20MB/file
and 70TB/week/weeko Original workflow called for all content to go to BB and then be
“MOVED” to tape. Selected sequences would be called via a list from Sony and “COPIED” back from Tape to Disk.
o There may be alternative workflows which are more efficient.o Sufficient Network infrastructure exists within the DBBo Other assumptions related to file management and works are
included in deck
10/14/09
430MB/sSpecified at 20% = 86
Imageworks DBB
330MB/s 100MB/s
Data TO CW = Undefined – Assuming 100MB/s
86MB/s READ from LTO (20% specified)
Total READ = 830MB/s
70TB/week1
200TB/6-day week (35TB/day)1
Volumes:CW to DBB = 210TB/Week (35TB/day specified)Other Sources to DBB = 70TB/WeekTotal expected volume to DBB (and therefore LTO)
= 280TB/Week1Note:1. Assuming 330MB/s over 24 hours yields 28.5TB NOT the 35TB specified2. Assuming 100MB/s over 24 hours yields 8.6TB NOT the 12TB specified
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Ran disk modeling tool
Assumptions:2 MB block sizeVaried throughput till one of the system components began to be stressed
Measures:Internal FC utilizationExternal Host Adapter utilization Hard Drive utilizationProcessor UtilizationPCI Bus utilization
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Ran disk modeling tool
Assumptions:2 MB block sizeVaried throughput till one of the system components began to be stressed
Measures:Internal FC utilizationExternal Host Adapter utilization This is where most bottlenecks occurredHard Drive utilizationProcessor UtilizationPCI Bus utilization
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Results of modeling
Disk Throughput Comparisons
0500
100015002000250030003500
DS48004 x 4 Gb HAs
192 disks
DS51008 x 4 Gb HAs
384 disks
DS51008 x 8 Gb HAs
384 disks
DS530016 x 8 Gb HAs
384 disks
MB/
sec
Estimated Sony DBB Throughput
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Observations
Loop optimizationsDS4800 performs best with disk drawers in even multiples of 4DS5000 performs best with disk drawers in even multiples of 8
Performance optimizationsIncreasing quantity and speed of Host Adapters had biggest impact, as would be expected for a large block size workloadSATA disk drives provide sufficient bandwidth if present in sufficient quantity
Other WorkloadsGPFS metadata and TSM/VFS database should be on separate, high performance Fiber channel subsystems
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Options for DBB
1. 2 x DS5300; each with– 384 x 1 TB SATA drives, – 16 x 8 Gb/sec Host Adapters– 16 GB Cache
– Provides significant headroom, more than double the current required performance
– Provides performance capacity to allow remote copy capability if desired
2. 2 x DS5100; each with– 384 x 1 TB SATA drives, – 16 x 8 Gb/sec Host Adapters– 8 GB Cache
– This will handle the current requirements, but will already be at about 60% of performance capacity at the beginning
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Reuse options
1. One new DS5300 and upgrade current DS5100 to DS5300DS5300s to have 16 x 8 Gb/sec Host Adapters and 16 GB cache
Reuse SATA drives, adding 4 drawers per subsystem
2. One new DS5100 and upgrade current DS5100 to have 8 x 8Gb/sec Host Adapters and addnl drive attachment (up to 448)
Reuse SATA drives, adding 4 drawers per subsystem
For GPFS metadata and TSM/VFS, reuse DS4800s and add 1 drawer of FC drives to each
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Other considerations
SAN Switch Fabric
Additional GPFS servers
TSM/VFS servers
Backup and restore requirements
DR requirements
This document is solely for the use of Sony Pictures Entertainment (SPE) personnel. No part of it may be circulated, quoted, or reproduced for distribution outside the SPE organization without prior written approval from IBM.
© Copyright IBM Corporation 2009
Projected LTO Library Capacity Requirements
IBM Media & Entertainment Solutions
Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009
Projected Library Capacity Requirements
Additional Required Frames
SLOTS L53 D53 D53 S54 S54 S54 S54 D53Exisitng Slot Count 2423 287 408 408 1,320 1,320 1,320 1,320 440Required Slot Count 6,701 6,823
10 Feet20 Feet
MONTHS WEEKSTOTAL WEEKS
70TB+35TB PER WEEK
TOTAL TB STORED
18 4.5 81 105 8,505
Assumptions:
• 18 months of stored data at 105 TB of data per week.
• 8.5 PB of storage in month 18.
• The stored data is not compressible.
• The media will fill to 70% of the native capacity.
• LTO4 (560 GB consumed capacity) requires 10,600 slots.
• Current library has 2,423 slots.
• The plan includes moving to LTO5 (1.2 TB consumed capacity) upon its introduction.
• An undetermined amount of LTO4 media will remain in the library.
IBM recommendation:
Add three S54 and one D53 frames (totaling 6,700 slots) upon the introduction of LTO5.
10 Feet Existing FramesAdd 10 Feet Additional frames