axel naumann - desy · motivation 2009-11-30 axel naumann • root i/o @ desy computing seminar 3...
TRANSCRIPT
![Page 1: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/1.jpg)
Axel Naumann
![Page 2: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/2.jpg)
OutlineMotivationBasic ingredients of I/OX-Ray of a TTreeAnalysis EnvironmentsOptimizing a TTreeTSelector & PROOF
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 2
![Page 3: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/3.jpg)
Motivation
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3
First data @ LHC!Reports of mis-designed TTreesReports of mis-designed data transferBored coresMisleading recommendations, rumors, misunderstanding
Let’s explain how I/O and TTrees work!
![Page 4: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/4.jpg)
Reflection, C++ Objects
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 4
![Page 5: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/5.jpg)
Storing C++ Objects
Need to know:TypeMembersLocation in memoryHow to create an object when reading
Provided by dictionary (rootcint / genreflex)
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 5
TNamed n("name","title");file->WriteTObject(&n);
![Page 6: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/6.jpg)
I/O's CPU TimeSerialization and zipping takes time
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 6
![Page 7: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/7.jpg)
C++ Objects versus Disk
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 7
Disk stores series of bytesC++ objects structured:
Data membersBase classesPointers
ROOT I/O convertsStreaming or Serialization
![Page 8: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/8.jpg)
Zipping: CPU vs. Real TimeExample for reading:
zipped file9.4MB/s disk I/OCPU unzipscorresponds to 34MB/s data
unzipped file25MB/s disk I/O
Zipping can increase bandwidth!Especially for concurrent disk access!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 8
![Page 9: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/9.jpg)
There is more than branches…
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 9
![Page 10: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/10.jpg)
Views Of A TTree: C++ Access
Branch / leaf structureSplitting: generate branches recursively according to C++ class layout;create sub-branches for
Data membersMembers of base classesContainers: split elements!(C's members, not vector<C>'s members)
Split level: where to stop splitting and put entire object into one branch instead
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 10
MyClass fMember
A: public B
vector<C> fC
![Page 11: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/11.jpg)
Splitting Across CollectionsSplit vector<C>
D.fC.fM, D.fC.fNOr even vector<C*>
D.fC.fM, D.fC.fNOr even polymorph, with split level >100
D.fC.C.fM, D.fC.C.fND.fC.C1.fC1D.fC.C2.fC2
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 11
class C {int fM, fN;};
class D1{vector<C*> fC;};
class C1: public C {int fC1;};class C2: public C {float fC2;};class D2{vector<C*> fC;};
class D0{vector<C> fC;};
![Page 12: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/12.jpg)
Obvious Data Considerations
Don't store empty or useless data!Can use //! to not store members
Combine branchesBetter store the jet algo name with the jets than one jet branch per algoConsider vector<T*> with split level > 100
Branch granularity is read selectivityAlways reading x,y,z,E? Don't split them!Split xyzE saves a bit of disk space, though
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 12
![Page 13: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/13.jpg)
Data Layout ConsiderationsAllocating objects takes time
TClonesArray faster than vector<T>vector<T> faster than vector<T*>
Building objects takes timeFlat inheritance hierarchyReduce object containment:class A has member of class B, which has member of class C,…STL platform dependent; need extra layer
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 13
![Page 14: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/14.jpg)
Data ReferencesReferences are easy to get wrong
NO map (dead slow) or uuid etc (slow + big)Better use indices
Optimal: TRef / TRefArrayGood reason to inherit from TObject!Extremely fast object dereferencing, embedded in ROOT I/OSupport for merged TTreesSupport for autoloading of branches
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 14
![Page 15: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/15.jpg)
Non-Split CollectionsNon-split storing of C:1. object-wise
Faster object retrieval
2. member-wise
Faster member retrieval,Better compression
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 15
class C {int fM, fN;};
class D0{vector<C> fC;};
![Page 16: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/16.jpg)
TTree's Data LayoutTTree::Fill() adds to baskets
Baskets: most important internal concept of TTrees!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 16
class C {int fM;long fN;}
BasketsObjects TTree HeaderC.fM
fMfN
…
C.fN offset …
offsetfMfM fMfMfM
fN fN fNfN fN
![Page 17: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/17.jpg)
TTree's Data Layout: Baskets
Baskets concatenate collection elements across TTree entriesWhen basket full: zip, write to TFile, store file offset in TTree headerImportant parameters:
basket sizesizeof(element)sizeof(element)*collection entries per tree entry
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 17
e.g. std::vector<C>
![Page 18: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/18.jpg)
1 versus 1M BranchesEach branch has management overhead (baskets,…):1 branch ideal!Each branch can be accessed independently, without reading anything else: 1M branches ideal!Reasonable number of branches:tens to few hundreds
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 18
![Page 19: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/19.jpg)
Spin, little disk, spin!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 19
![Page 20: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/20.jpg)
Reading TTreesTask: read (subset of) all branches for a
TTree entry numberGet file offsets for requested branchesRead necessary baskets from fileUnzip baskets and fill objects
Plus: schema evolution, endianness,…
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 20
![Page 21: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/21.jpg)
Reading TTreesReading baskets: what can happen?
Only part of basket is needed
Need to skip baskets of other branches
Huge basket size, tiny contained values: basket might be written at end of file
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 21
![Page 22: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/22.jpg)
Read Access PatternTraditional TTree has many small reads
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 22
![Page 23: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/23.jpg)
I/O Performance AnalysisMonitor TTree reads with TTreePerfStats
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 23
TFile *f = TFile::Open("xyz.root");T = (TTree*)f->Get("MyTree");
TTreePerfStats *ps = new TTreePerfStats("ioperf",T);
Long64_t n = T->GetEntries();for (Long64_t i = 0;i < n; ++i) {
GetEntry(i);DoSomething();
}ps->SaveAs("perfstat.root");
New in v5.25/04!
![Page 24: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/24.jpg)
Study TTreePerfStatsVisualizes read access:x: tree entry
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 24
TFile f("perfstat.root");ioperf->Draw();ioperf->Print();
y: file offsety: real time
![Page 25: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/25.jpg)
Reading BasketsProblem: many seeks
Reduces throughputfrom O(100) MB/s to O(1) MB/s (real time)
Disk cannot support >1 userLatency for each request
typical network, typical file: 120ms round trip, 1M readsone day waiting time!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 25
![Page 26: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/26.jpg)
Legi, Vidi, Vici!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 26
![Page 27: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/27.jpg)
Fewer Requests, Part 1Less sensitive to latency
better ask once for 1M baskets than 1M times for 1 basket
NOT A SOLUTION: only one branchno granularityno parallelizationcharging network with irrelevant data
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 27
![Page 28: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/28.jpg)
Fewer Requests, Part 2Less sensitive to latency
better ask once for 1M baskets than 1M times for 1 basket
NOT A SOLUTION: only one branchBetter: sending a collection of requests
Storage (kernel / disk / disk server) can sort requests
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 28
![Page 29: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/29.jpg)
TTreeCacheSends a collection of read requests before analysis needs the basketsMust predict baskets:
learns from previous entriestakes TEntryList into account
Enabled per TTree
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 29
Improved in v5.25/04!f = new TFile ("xyz.root");T = (TTree*)f->Get("Events");T->SetCacheSize(30000000);T->AddBranchToCache("*");
![Page 30: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/30.jpg)
TTreeCache UsageWithout: analysis after transfer + latency:
With TTreeCache, transfer and analysis of prior TTree entry in parallel:
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 30
CPU
I/O
CPU
I/O
![Page 31: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/31.jpg)
TTreeCache vs. SeeksTTreeCache sends collected read request (readv)Merges only adjacent baskets, reducing number of requests by almost nothingDisks hate seeking, love sequential readingMuch cheaper to read 2MB than to read 1k at the beginning and 1k at the end
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 31
![Page 32: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/32.jpg)
Read PaddingMerges all read requests within a given distance by also requesting bytes in betweenTypical window: 2MBDramatically reduces load onstorage device, even local diskDramatically increases throughputMust-use for concurrent storageaccess and / or network
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 32
![Page 33: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/33.jpg)
Half WayMuch more ordered readsStill lots of jumps because baskets spread acrossfile
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 33
![Page 34: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/34.jpg)
Problem: Basket SizeIdeally, reading TTree entry is one seekAll TTree entries' baskets consecutive
In reality, most baskets not full after filling a TTree entryBaskets shared by several TTree entriesNeed to seek to read all baskets
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 34
![Page 35: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/35.jpg)
OptimizeBaskets, AutoFlushSolution, enabled by default:
Tweak basket size!Flush baskets at regular intervals!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 35
New in v5.25/04!
![Page 36: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/36.jpg)
TTree Optimizations: Results
Studying Atlas and CMS AOD filesResults for Atlas: factor 6 improvement!That can be 1 hour instead of 6!Concurrent data access now possible
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 36
TTreeCache off 30MB TTreeCacheOriginal File 658s real time 183s real time
166s CPU time 126s CPU timeOptimized File 117s real time 109s real time
102s CPU time 99s CPU time
![Page 37: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/37.jpg)
We know how to process your data!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 37
![Page 38: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/38.jpg)
TSelectorEverybody uses TTreesObvious to create a common analysis frameworkDerive from TSelectorSeparates analysis into steps
Init() – "this is your tree!"SlaveBegin() – "create your histogram!"Process() – "analyze the event!"Terminate() – "we're done, fit!"
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 38
![Page 39: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/39.jpg)
Parallel AnalysisAnalyze several TTree entries in parallel, e.g. in a batchTypical steps:1. send code2. send split data3. analyze4. merge resultsUse the same TSelector also for parallel analysis!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 39
![Page 40: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/40.jpg)
PROOF
Axel Naumann • ROOT @ NTNU Tech Screening 40
PROOF farm
Storage
MASTER
commands,commands,scriptsscripts
list of outputlist of outputobjectsobjects
(histograms, (histograms, ……))
Client
Workers
2009-11-09
![Page 41: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/41.jpg)
Creating a PROOF SessionIn ROOT type:
Connects ROOT to the master machine on the PROOF cluster (here: "uberpc")TSelectors will be run in PROOF
Axel Naumann • ROOT @ NTNU Tech Screening 41
TProof *p = TProof::Open("uberpc")
2009-11-09
![Page 42: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/42.jpg)
PROOF Lite
Axel Naumann • ROOT @ NTNU Tech Screening 42
commands,commands,scriptsscripts
list of outputlist of outputobjectsobjects
(histograms, (histograms, ……))
Client
Multi-core Desktop/Laptop
2009-11-09
![Page 43: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/43.jpg)
Creating a PROOF Lite SessionIn ROOT type:
TSelectors will be run on all cores in parallelConverts your multi-core computer into a PROOF cluster!
Axel Naumann • ROOT @ NTNU Tech Screening 43
TProof *p = TProof::Open("lite")
2009-11-09
![Page 44: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/44.jpg)
PROOF AnalysisExample of local TChain analysis
Axel Naumann • ROOT @ NTNU Tech Screening 44
PROOF// Create a chain of treesroot[0] TChain *c = new TChain("myTree");root[1] c->Add("http://www.any.where/file1.root");root[2] c->Add("http://www.any.where/file2.root");
// MySelector is a TSelectorroot[3] c->Process("MySelector.C+");
2009-11-09
![Page 45: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/45.jpg)
PROOF AnalysisSame example with PROOF
Axel Naumann • ROOT @ NTNU Tech Screening 45
// Create a chain of treesroot[0] TChain *c = new TChain("myTree");root[1] c->Add("http://www.any.where/file1.root");root[2] c->Add("http://www.any.where/file2.root");
// Start PROOF and tell the chain to use itroot[3] TProof::Open("lite");root[4] c->SetProof();
// Process goes via PROOFroot[5] c->Process("MySelector.C+");
2009-11-09
![Page 46: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/46.jpg)
PROOF Is InteractiveSee results while they accumulate
Calculation wrong?Forgot to fill histogram?Restart now instead of in 8 hours
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 46
![Page 47: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/47.jpg)
PROOF Is QuickOptimized for quick results,not batch system occupancy
Send TTree entries to workers while running, based on their past performanceReduces "tail"Allowed ALICE tosee first collisions after two minutesinstead of hours!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 47
time
![Page 48: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/48.jpg)
PROOF AvailabilityPROOF Lite is "just there" >= 5.24Set up a local PROOF cluster, e.g. allow a batch cluster to also be used by PROOF!People who use it and the grid or traditional job-based batches love itBut you already have it:PROOF@NAF!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 48
![Page 49: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/49.jpg)
We're still not done!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 49
![Page 50: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/50.jpg)
Upcoming ChallengesDecrease CPU time of I/O
Parallel unzipping (CPU time / core)
Building objects in a smarter wayShorten merge time!
Merge in parallel to analysisEasy for histograms etc, tricky for TTrees…
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 50
I/O
CPUAnalysis
ZIP
![Page 51: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/51.jpg)
What To Take To Your Office
Many optimizations enabled by default,Except for those:
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 51
f = new TFile ("xyz.root");T = (TTree*)f->Get("Events");T->SetBranchStatus("*", 0);T->SetBranchStatus("MyBranch*", 1);T->SetCacheSize(30000000);T->AddBranchToCache("MyBranch*");
![Page 52: Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3 First data @ LHC! Reports of mis-designed TTrees Reports of mis-designed data transfer](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1c1fda7f8b9a2d258f5b18/html5/thumbnails/52.jpg)
SummaryI/O costs real time, CPU timePerformance monitoring and optimizations part of ROOTDefault optimizers show huge benefit for network transfer and even local files!Build a good tree, see how it behavesAnalyze with PROOF for quick results!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 52