2017-04 campaign storage 30
TRANSCRIPT
Contents
• Memoryclassstorage&Campaignstorage• ObjectStorage• CampaignStorage• SearchandPolicyManagement• DataMovers&Servers• RoadAhead
4/1/17 CampaignStorageLLC 2
CampaignStorage
• CampaignStoragewasinventedatLosAlamosNationalLaboratory• 2014-
• PeterBraam &NathanThompsonfoundedCampaignStorage,LLCinMarch2016• deliverproductsinthisspace• SoftwareDefinedStorage– wewillpartnerwithintegrators
• OthercompaniesareaddressingpartsofCampaignStoragealso
3/16/17 CampaignStorageLLC 3
CPUorGPUpackages
CPUcores
HighBandwidthMemory
NVRAMe.g.XPOINT,PCM,STTRAM
RAM
FLASH DISK
NodeBW (GB/sec) 1TB/s 100GB/s 20 GB/s 5GB/s 350MB/s
ClusterBW(TB/sec)
1 PB/s 100TB/s 5TB/s 100GB/s 10’sGB/s
Software Languagelevel Languagelevel,NVMlibsHDF5&DAOS
HDF5DAOS
ParallelFS&CampaignStorage
Archive&Campaign
Keyfeatures transparentcomputation transparentcomputationultra-fast storageapps
namespacescientificformatsFS stylecontainer
bulk datamovement- manyfiles- subtreesofMD
BWCost $/(GB/s) $10(CPUincluded!) $10 $300 $2K $30K
CapacityCost$/GB $ $8 $0.3 $0.05 $0.01
3/16/17 CampaignStorageLLC 5
BurstBuffers–DDNIME,CrayDataWarp
TAPE
Roleofcontainers
Fundamentallyunlikely:differenttiersperformdatamovementatsimilargranularity
Containersareamust-have
3/16/17 CampaignStorageLLC 6
TiersandNVRAMConsiderationsTiering
RAMtiersareforcomputationèmigratepointers,pages
Flashstorageis5xfasterwithlargeIODisksimilarlyisveryIOsizesensitive:
èRetrieve&storecontainers (distributed?)èShowinternalstructureonfastersideèStreamandserializedatatoslowerside
Internalprogramdataformatsnotre-usableè computingformattonamespace
Persistence
DistinguishingNVMfeatureisthatdatastaysifpowerisoff.
NVRAMwillbethefasteststoragedeviceè formostdemandingstorageapplications
NVRAM:whatotherbenefitstocomputing?
Currentlibraries– transactions,persistentheaps(notsonovel– seeCamelot&RVMfrom1980’s)
3/16/17 CampaignStorageLLC 7
Tiers&TransparencyRAM
- Demoteinfrequentlyusedpointers- Promotefrequentlyusedpointers
Ifpointersarenotfirstclassobjects- Promoteuponaccess- Demotefindinglessusedones
Lowlevellanguages– HWorOSsupport
Storage
Sameprinciple– transparencyrequiresaccessingdatathroughahandle
Onehandlesystemwithlocationdatabaseallowsotherobjectstomove
ExpectdistributedtieredKVstore• Keyvaluelookup• Callbacksforinvalidation
3/16/17 CampaignStorageLLC 9
EnterpriseDataServices
- Groupsoffileservers- Availability- Performance
EnterpriseDataServices
- Groupsoffileservers- Availability- Performance
DataCenterComputeNodes
DataCenter
3/16/17 CampaignStorageLLC 11
CampaignStorage
ServiceLayer- SMB,NFS,other
DataMovers- Parallelingest&restore
Storagelayer- ZFS&object- Integratedsearch- AnalyticsSupport- Datamanagement- Massive,low$
EnterpriseDataServices
- Groupsoffileservers- Availability- Performance
DataCenterComputeNodes
DataCenterComputeNodes
Containersforglobalnamespace
Identityandnamespacemanagementwithe.g.ADorLDAP
Cloudobjectstores– pros&cons
promassivescalabilityverygoodstoragemanagementwidelyagreedS3RESTAPIrunsoncheapesthardware
condatalacksorganizationAPI’sdon’tallowdistributedconcurrentaccessorrandomwritesperformancecanbedisappointingdifficulttore-useasacomponentofotherstoragesystems
3/16/17 CampaignStorageLLC 14
Toomuchchoice?• CaringoSwarm (formerlyCAStor)• CleversafedsNet
• Cloudian
• DataDirectNetworksWebObjectScaler (WOS)
• EMCAtmos• EMCCentera
• EMCElasticCloudStorage(ECS)
• HPStoreAll
• HGSTHimalaya• HGSTActiveArchive
• HitachiDataSystemsHCP
• NetAppStorageGridWebscale
• QuantumLattus• ScalityRing
• SwiftStackSwift
Tomentionafew…….(othersS3,CEPH,SNIAT10,SeagateA200,DDNWOZ….)
- Normalread/writeIOperobject- NonoverlappingIOfrommultipleclients- 3tierhierarchicalredundancy(box,rack,datacenter)- Transactionprotocoltosnapshotconsistentstate
Whatisneededoffers:
3/16/17 CampaignStorageLLC 15
CampaignStorage- anewtier
3/16/17 CampaignStorageLLC 17
ParallelFileSystem
Archive
ParallelFileSystem
Archive
BurstBuffer
CampaignStorage
Cloud
decreasingemphasis
HighBW,high$$$Decreasingcapacities
OldWorld NewWorld
new
CampaignStorage
Itis…AfilesystemFocus:stagingandarchivingBuiltfrom
• Industrystandardobjectstores• Existingmetadatastores
LowestcostHWHighcapacity,ultrascalableNothighestBWorlowestlatency
• 10-100xhigherthanarchives• 10xlowerthanPFS
Itisnot…Generalpurposefilesystem• Wait… thesedon’texistactually
Usingobjectstoreshasproblems• Limitedsetofdatamoverssupported
3/16/17 CampaignStorageLLC 18
HPCClusterA
SimulationCluster20PF
BurstBuffer5PB&5TB/s
HPCD&Viz Cluster HDFS
HPCClusterB
HPCDCluster20PF
BurstBuffer5PB&5TB/s
Lustre
FS1TB/s
CampaignStorage
CampaignStorageMoverNodes
CampaignStorageMetadata Repository
CampaignStorageMoverNodes
ParallelStaging&Archiving
CampaignStorageObject Repository
Search&DataManagement
CampaignStorage
3/16/17 CampaignStorageLLC 20
FileSystemInterface
Optionalothertools:• Policymanagers(e.g.Robinhood)• Workflowmanagers(e.g.Irods)
customerinfrastructure
Histogramsforsubtreesearch
EverydirectoryhashistogramDBrecordingpropertiesofitssubtree:• i.e.#files,#bytesinthesubtreehaveaproperty?• Limitedgranularity,limitedrelationalalgebra• Storeperhaps~100,000propertiesinmultiplehistograms
Examples:• Quotainsubtree?• Whatfileserverscontainfiles?• Geospatialinformationinfile?• (filetype,size,accesstime)tuples
• Allowslimitedrelationalalgebra• Userdatabaseforsubtree– eliminatesrelianceonexternalidentitymanagement
Notanewidea.CanbeaddedtoZFS&Lustre
3/16/17 CampaignStorageLLC 24
DataMovers
DataMovement
Today• LANL“parallelrsync”– pftool• Lustre HSMmover• Packingsmallfiles&stripingbigfiles
Candidates• DMAPIHSMmover• Gridftp• FullPOSIXinterface
MetadataMovement
Today• TraditionalmetadataAPI• Multiplenamespaces
Coming• Bulkintegrationofcontainers• Accompanyingmetadata
3/16/17 CampaignStorageLLC 26
pftool internals
3/16/17 CampaignStorageLLC 27
LoadBalancerScheduler
dirs queue
statqueue
cp/S/Vqueue
readdir
stat
copy/syncvalidate
DONE
QUEUE
REPORTER
FeaturesofDS3archivaldatamover
• Objectstoremovesbatchesoffiles• Newconcept:filelevelI/Ovectorization• Includesserverdrivenordering• Packingsmallfilesintooneobject
3/16/17 CampaignStorageLLC 28
int copy_file_range_fv(copy_range *r, uint count, int flags)
struct copy_range {int source_fd;int dest_fd;off_t source_offset;off_t dest_offset;size_t length;
}
Services
CampaignStoragealwaysexportstheMarFS filesystem
Enterpriseservicesasfurtherexportedprotocols:- SMB,NFS,HTTP- Datamovementcanbeoutofband
Integrationofnamespaces,userdatabases,otherplugins
3/16/17 CampaignStorageLLC 29
Workflows- HPC
Staging&De-staging• Schedulemigrationwithpftool
HSM• Copymetadatafirst• Usesubtreesearchindex• Executepolicies• Specializeddatamovers
• Fortransparentretrieval&attributes
Singleprojectextraction• UseZFSnamespaceandobjectbucketperproject
HotvscoldCampaignLocations• Selectdestinationobjectstores• Migrationoncampaignstorage
Multisite• Leverageobjectbucketreplication• LeverageZFSpoolreplication
Cloud• MigratepoolandbucketstoS3
• UseSnowball?
3/16/17 CampaignStorageLLC 31
Workflows– DataCenter
Staging&archive• Schedulemigrationwithpftool
ServiceoffloadtoCampaign• Dataavailablewithoutstaging
Singleprojectextraction• UseZFSnamespaceandobjectbucketperproject
Hotvscoldlocations• Selectdestinationobjectstores• Migrationwithincampaignstorage• Automaticmovementwhenservicesneedthedata
Multisite• Leverageobjectbucketreplication• LeverageZFSpoolreplication
Cloud• MigratepoolandbucketstoS3
• UseSnowball?
3/16/17 CampaignStorageLLC 32
RoadForward
Uniqueopportunitytoinnovatedatamanagement
LANLandCampaignStoragecreatedan“IndustrySteeringGroup”
Seekagreementon• Datalayouthandling• Attributesusedinconnectionwithlongtermstorage• Interfacesforworkflows
3/16/17 CampaignStorageLLC 33
Conclusions
Hardwarediversificationè SoftwareSpecialization
Expectarichhighspeedexa-scaleI/Oplatformtousecontainers
Similarcontainerswillorganizeenterprisetiersofstorage
CampaignStorage:bulkdatastore,archive&datamovement
3/16/17 CampaignStorageLLC 35