![Page 1: Federated Data Stores Volume, Velocity Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC](https://reader033.vdocuments.us/reader033/viewer/2022052710/5a4d1b827f8b9ab0599bb9b6/html5/thumbnails/1.jpg)
Federated Data StoresVolume, Velocity &
Variety
Future of Big Data Management Workshop
Imperial College LondonJune 27-28, 2013
Andrew Hanushevsky, SLAChttp://xrootd.org
![Page 2: Federated Data Stores Volume, Velocity Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC](https://reader033.vdocuments.us/reader033/viewer/2022052710/5a4d1b827f8b9ab0599bb9b6/html5/thumbnails/2.jpg)
June 27-28, 2013 2Workshop On the Future Of Big Data Management
Big Data Access & The 3 V’sVolume Increasing amount of data
No single site can host all of the dataVelocity Increasing number of analysis jobs
No single site can host all of the jobsVariety Increasing number of sites
Introduces many different storage systems
![Page 3: Federated Data Stores Volume, Velocity Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC](https://reader033.vdocuments.us/reader033/viewer/2022052710/5a4d1b827f8b9ab0599bb9b6/html5/thumbnails/3.jpg)
June 27-28, 2013 3Workshop On the Future Of Big Data Management
Data & Access & The World
Data Many places
Complete subsetsSometimes not
Compute Many places
Data co-locatedSometimes not
Data is distribute and many times replicated largely driven by computational needs
![Page 4: Federated Data Stores Volume, Velocity Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC](https://reader033.vdocuments.us/reader033/viewer/2022052710/5a4d1b827f8b9ab0599bb9b6/html5/thumbnails/4.jpg)
June 27-28, 2013 4Workshop On the Future Of Big Data Management
Multiple Sites – Unified ViewReality check… Multiple sites Different administrative domainsHow to logically combine all the storage? Provide storage access across multiple
sites Requires a minimal set of rules
Intersecting security model Promise of minimal service
![Page 5: Federated Data Stores Volume, Velocity Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC](https://reader033.vdocuments.us/reader033/viewer/2022052710/5a4d1b827f8b9ab0599bb9b6/html5/thumbnails/5.jpg)
June 27-28, 2013 5Workshop On the Future Of Big Data Management
Data Storage Federations“A collection of disparate space resources managed by co-operating but independent administrative domains transparently accessible via a common name space.”Unifies storage access Independent of data and compute
location
![Page 6: Federated Data Stores Volume, Velocity Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC](https://reader033.vdocuments.us/reader033/viewer/2022052710/5a4d1b827f8b9ab0599bb9b6/html5/thumbnails/6.jpg)
June 27-28, 2013 6Workshop On the Future Of Big Data Management
A Solution Using XRootD
6
A system for scalable cluster data access
Not a file systemNot just for file systems To handle varietyUsed in HEP and Astrophysics
xrootd
cmsd
![Page 7: Federated Data Stores Volume, Velocity Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC](https://reader033.vdocuments.us/reader033/viewer/2022052710/5a4d1b827f8b9ab0599bb9b6/html5/thumbnails/7.jpg)
May 15-17, 2013 7GoogleIO
XRootD Synergistic ApproachMinimize latency
Minimize hardware requirementsMinimize human costMaximize scaling
Velocity
Volume
VarietyMaximize utility
![Page 8: Federated Data Stores Volume, Velocity Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC](https://reader033.vdocuments.us/reader033/viewer/2022052710/5a4d1b827f8b9ab0599bb9b6/html5/thumbnails/8.jpg)
June 27-28, 2013 8Workshop On the Future Of Big Data Management
Variety Via Plug-In Architecture
8
Storage SystemHDFS gpfs Lustre UFS, …
Authentication
krb5 sss x.509 …
Clustering(cmsd)
Authorization Entity Names
Logical File System
dpm sfs sql …
Protocol
cms http xroot …
Protocol Driver
Any n protocols
![Page 9: Federated Data Stores Volume, Velocity Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC](https://reader033.vdocuments.us/reader033/viewer/2022052710/5a4d1b827f8b9ab0599bb9b6/html5/thumbnails/9.jpg)
June 27-28, 2013 9Workshop On the Future Of Big Data Management
Volume Via B64 Scaling
Private ClusterGCE Ephemeral Storage
SLAC
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
641 = 64
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
642 = 4096
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
643 = 262144
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
644 = 16777216
Manager(Root Node)
Data Server(Leaf Nodes)
Supervisors(Interior Nodes)
xrootd
cmsd
xrootd
cmsd
cmsdxroot
d
![Page 10: Federated Data Stores Volume, Velocity Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC](https://reader033.vdocuments.us/reader033/viewer/2022052710/5a4d1b827f8b9ab0599bb9b6/html5/thumbnails/10.jpg)
June 27-28, 2013 10Workshop On the Future Of Big Data Management
WYSIWYG Scalable Access
redirectopen()redirectopen()
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
641 = 64
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
642 = 4096
Clientopen()
cmsdxroot
d
Request routing is very different from traditional data management models
![Page 11: Federated Data Stores Volume, Velocity Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC](https://reader033.vdocuments.us/reader033/viewer/2022052710/5a4d1b827f8b9ab0599bb9b6/html5/thumbnails/11.jpg)
June 27-28, 2013 11Workshop On the Future Of Big Data Management
Real World Example (HEP)Federated ATLAS XRootD (FAX)
Independent sites federated by region
global
regional1
endpoint1 endpoint2
regional2
endpoint3
a b
c
c=max(a,b)
Graphic courtesy of Rob Gardner)
![Page 12: Federated Data Stores Volume, Velocity Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC](https://reader033.vdocuments.us/reader033/viewer/2022052710/5a4d1b827f8b9ab0599bb9b6/html5/thumbnails/12.jpg)
June 27-28, 2013 12Workshop On the Future Of Big Data Management
ATLAS FAX Infrastructure (From Rob Gardner)
Provides a global namespaceUnifies dCache, DPM, Lustre/GPFS, Xrootd storage backendsXrootd an efficient protocol for WAN accessMain Fall-back use case in production at many sitesRegional redirection network provides lookup scalability
A powerful capability which must be introduced to production carefully
![Page 13: Federated Data Stores Volume, Velocity Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC](https://reader033.vdocuments.us/reader033/viewer/2022052710/5a4d1b827f8b9ab0599bb9b6/html5/thumbnails/13.jpg)
June 27-28, 2013 13Workshop On the Future Of Big Data Management
HEP DeploymentLHC ALICE Data catalog driven federationLHC ATLAS Regional topologyLHC CMS Uniform topologyLSST (Large Synoptic Sky Telescope) Clusters mySQL servers for parallel queries
![Page 14: Federated Data Stores Volume, Velocity Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC](https://reader033.vdocuments.us/reader033/viewer/2022052710/5a4d1b827f8b9ab0599bb9b6/html5/thumbnails/14.jpg)
June 27-28, 2013 14Workshop On the Future Of Big Data Management
ConclusionFederated storage is key for big data Distributed management + uniform
access Preserves administrative autonomy Inherently scalable
The whole is greater than the sum of its partsXRootD provides flexible federation Addresses volume, velocity, and variety
Three main big data challenges
![Page 15: Federated Data Stores Volume, Velocity Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC](https://reader033.vdocuments.us/reader033/viewer/2022052710/5a4d1b827f8b9ab0599bb9b6/html5/thumbnails/15.jpg)
June 27-28, 2013 15Workshop On the Future Of Big Data Management
AcknowledgementsCurrent Software Contributors ATLAS: Doug Benjamin, Patrick McGuigan, CERN: Lukasz Janyst, Andreas Peters, Justin Salmon Fermi: Tony Johnson JINR: Danila Oleynik, Artem Petrosyan Root: Gerri Ganis, Bertrand Bellenet, Fons Rademakers SLAC: Andrew Hanushevsky, Wilko Kroeger, Daniel Wang, Wei
Yang UCSD: Matevz Tadel UNL: Brian Bockelman WLCG: Fabrizio Furano, David Smith
US Department of Energy Contract DE-AC02-76SF00515 with Stanford University