thebig-data cloud
TRANSCRIPT
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 1
Patrick Fuhrmann
On behave of the project team
The Big-Data Cloud
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 2
Content
â˘âŻ About DESY â˘âŻ Project Goals â˘âŻ Suggested Solution and
components â˘âŻ Quick introduction of
ââŻdCache ââŻownCloud
â˘âŻ The proposed hybrid System â˘âŻ Status and issues
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 3
dCache and Cloud
â˘âŻ This is how it started: Status Oct 2013 ââŻAuto-Registration: www.dcache.org/cloud ââŻYou need a certificate to register ââŻSet you private user/password to log in ââŻWorks with available WebDAV Clients ââŻYou get your private space ââŻThere is no way of sharing
â˘âŻ Next Step: public sharing â˘âŻ Further: slowly implementing âCloud Systemâ
ââŻWith proper syncân share
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 4
Why did we suddenly change our plans ?
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 5
Why suddenly âCloudâ ?
â˘âŻ Due to the well know political affaires, DESY banned all non-local mail and storage providers. ââŻFor mail we had a replacement right away ââŻNo replacement for DropBox
â˘âŻ Replacement had to be available asap. â˘âŻ So we had to find a âCloudâ system for
DESY within months.
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 6
Project Goal â˘âŻ Currently maintained storage systems are focused
on âScientific Big Dataâ. â⯠Access with POSIX semantics â⯠Sharing via ACLs.
â˘âŻ Customers, especially new/young communities (Photon Science), are requesting âCloudâ storage semantics.
â˘âŻ Project Objective: â⯠Installation of a modern Cloud Storage System for
scientists within 6 months. â⯠Integrated into the existing AAI and storage
infrastructure. â⯠If possible: Reducing amount of existing systems.
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 7
We had to find out what âCloudâ means for our scientific customers.
â˘âŻ Big Data management â˘âŻ Support of Scientific data lifecycle â˘âŻ Web 2.0 feeling
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 8
The âBig Dataâ management ?
â˘âŻ Unlimited storage space, pay per use â⯠Quotas are a âno goâ and pointless
â˘âŻ Indestructible data store, never loosing data â˘âŻ âAmazon S3 is designed to provide 99.999999999% durability of
objects over a given year. ⌠For example, if you store 10,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000,000 years.â
â˘âŻ Different Quality of Services (payments) â⯠Access Latency (How long do I have to wait) â⯠Retention Policy (How save is my data, durability)
â˘âŻ Extremely high availability of storage service â⯠No regular maintenance breaks below âonce a year, 4
daysâ
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 9
Scientific Data Lifecycle
High Speed Data Ingest
Fast Analysis NFS 4.1/pNFS
Wide Area Transfers (Globus Online, FTS) by GridFTP
Visualization & Sharing by WebDAV
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 10
The âWeb 2.0â experience ?
â˘âŻ Easy sharing with â˘âŻ Registered Users and Groups â˘âŻ The public (publishing)
â˘âŻ Synchronizing (bidirectional) with all relevant OSâes
â˘âŻ Access from mobile devices, preferable upload/download OS integrated.
â˘âŻ Web Browser access and configuration
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 11
The DESY Cloud What does that mean for DESY? Big Data Part
Web 2.0
? Here we need some help
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 12
Web 2.0 Cloud interface
â˘âŻ For the web 2.0 interface we needed some experts.
â˘âŻ Not much time for evaluation. â˘âŻ Going for the most popular solution
ââŻReduce likelihood for âproduct disappearingâ ââŻPossibly building a user-community (like today)
â˘âŻ TU-Berlin, FZ-JĂźlich, TU-Dresden **** â˘âŻ CERN, United Nations
ââŻCERN is evaluating a similar approach and we are in contact anyway (WLCG)
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 13
What exactly do we need from ownCloud
â˘âŻ The sync clients for all OSâs â˘âŻ Upload/download clients for mobile
devices â˘âŻ Sharing of data with individuals and
groups (including public links) â˘âŻ Web Browser based file access and
configuration â˘âŻ Thatâs it for now.
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 14
Now, whatâs a dCache ?
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 15
dCache Cheat - sheet
â˘âŻ dCache.org is an international Collaboration, composed of developers and support people from DESY, Fermilab, NDGF and the HTW Berlin.
â˘âŻ dCache is operated on about 70 sites around the world.
â˘âŻ Total space about 120 Petabytes. ââŻWe store 50 % of the entire WLCG storage.
â˘âŻ Biggest dCache holds about 50 Petabytes. â˘âŻ Larges dCache spans 4 countries.
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 16
dCache spec for Dummies
SSDs
Spinning Disks
Tape, Blue Ray âŚ
Unlimited hierarchical
Storage Space dCache
Automatic and
Manual Media
transitions
Virtual File-system Layer
NFS/pNFS gridFTP httpWebDAV xRootd/dCap
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 17
Starting with possibly the biggest
40 PBytes Tape
Information provided by Catalin Dumitrescu and Dmitry Litvintsev
US-CMS Tier I 14 PBytes on
Disk 770 Write
Pools
420 Read Pools
26 Stage Pools
***
260 Doors
Total:
6 Head 280 Pool/Door
Physical Hosts
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 18
4 Countries
Slide stolen from Mattias Wadenstein, NDGF
To certainly the most widespread
One dCache
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 19
To very likely the smallest One Machine â One Process
Pool
NFS 4.1 Door
WebDAV Door
PoolManager
gPlazma
1 TB
700 MHz ARM 512 MB Memory 2 * USB 2 100 MB Ethernet
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 20
dCache cheat sheet (cont) â˘âŻ Protocol support
ââŻNFS 4.1 / pNFS (scalable NFS) ââŻWedDAV ââŻGridFTP (Grid transfers) ââŻxRootd ââŻdCap
â˘âŻ User/Authz support ââŻKerberos ââŻUser / password ââŻLDAP ââŻX509 (Certificates and Proxies)
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 21
What do we need from dCache â˘âŻ Scales out massively â˘âŻ Managed space (Uptime)
ââŻMigration between media and decommissioning of hardware w/o downtime.
â˘âŻ Multi protocol access (Scientific use) ââŻNFS, CDMI(Cloud), WebDAV,
gridFTP(GlobusOnline) â˘âŻ Service Classes with automatic and manual
transitions (Access Latency, Retention Policy) â˘âŻ Hot spot detection â˘âŻ Tape â˘âŻ Spinning Disk â˘âŻ SSDâs
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 22
What does the integration look like ?
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 23
dCache â ownCloud Integration
SSDs
Spinning Disks
Tape, Blue Ray âŚ
Unlimited hierarchical Storage Space
NFS 4.1 GridFTP, WebDAV
WEB 2.0 Sync & share
dCache
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 24
dCache â ownCloud âScientific Data Lifecycleâ
Unlimited hierarchical Storage Space
NFS 4.1 / pNFS HPC, HTC
GridFTP
Globus Online
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 25
dCache ownCloud What does it look like for the user
My dCache XXL Home
My ownCloud Home Sync Share
Web 2.0
NFS 4.1/pNFS GridFTP WebDAV
SRM (some private Grid Protocols)
dCap xRootD
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 26
dCache ownCloud Scalability (NFS4.1/pNFS does it)
NFS Client NFS Client NFS Client
pNFS Door
pNFS Door
pNFS Door
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 27
dCache OwnCloud integration
â˘âŻ Simply running ownCloud on dCache was the easy bit and works nicely.
â˘âŻ dCache provides an NFSv4.1/pNFS interface which lets it look like a regular file system.
â˘âŻ This is exactly what ownCloud needs. â˘âŻ The fact the dCache doesnât allow files
to be modified doesnât really bother ownCloud.
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 28
But how about ownership ? â˘âŻ Owner ship
â˘âŻ Files owned by âpatrickâ in OwnCloud are owned by apache/owncloud in dCache
â˘âŻ That prevents us from using the same data with NFS4.1, gridFTP or CDMI from dCache
â˘âŻ Tigran solved that issue.
â˘âŻ dCache ACLâs versus OwnCloud Sharing
â˘âŻ Files shared in OwnCloud should have similar ACLs in dCache.
â˘âŻ Data shared in ownCloud is not automatically shared in dCache
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 29
Ownership/mapping issue
NFS WebDAV, GridFTP, CDMI
Web 2.0 Sync Share
Kerberos DESY LDAP
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 30
More issues
Besides the permission one
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 31
We have We need
Name Space Issue
Patrick
Paul
Tanja
Patrick
Paul
***
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 32
What we need
WebDAV redirection to our nodes
WebDAV/http redirect
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 33
What actually would be good
â˘âŻ Instead of requiring a mounted filesystem (POSIX) for ownCloud primary space, an network API/protocol would be better.
â˘âŻ Best would be a standard (e.g. Cloud Data Management Interface, CDMI).
â˘âŻ CDMI is provided by big vendors â˘âŻ Allows to handle meta data and user and
ownership as well.
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 34
Whatâs done â˘âŻ We already installed two systems.
ââŻOne connected to the DESY LDAP for DESY employees
ââŻOne with the dCache.org private cloud â˘âŻ For HTW students (different user contract ) â˘âŻ Self registration with any valid Certificate
â˘âŻ Most features are already available â˘âŻ Ordering more hardware
ââŻAbout 200 Terabytes on top of the 100 Terabytes which are already deployed in two systems.
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 35
Whatâs still missing ?
â˘âŻ The platform adapter needs to be written â˘âŻ Resource access to ownCloud defined by
group membership in DESY LDAP â˘âŻ Customizing the ownCloud name space to
support our schema. â˘âŻ HTW Student (Leonie) is evaluating a
ownCloud sync client working against dCache directly (under supervision of Tigran)
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 36
Testing and verification
â˘âŻ Defining a set of reproducible test, which we can run on about 20 machines ââŻVerify scalability ââŻGuaranty for future dCache or OwnCloud
updates â˘âŻ Functional â˘âŻ Performance
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 37
Further timeline
â˘âŻ We expect to have a pre-production system ready in about 6 - 8 weeks.
â˘âŻ DESY IT colleagues and HTW students will be guinea pigs
The BIG DATA Cloud| 8th dCache Workshop, DESY| Patrick Fuhrmann | 15 May 2014 | 38
The End
further reading www.dCache.org