a thread-parallel geant4 with shared geometry gene cooperman and xin dong college of computer and...

Post on 18-Jan-2016

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Thread-Parallel Geant4 with Shared Geometry

Gene Cooperman and Xin DongCollege of Computer and Information Science

Northeastern University360 Huntington Avenue

Boston, MA 02115USA

{gene, xindong}@ccs.neu.edu

Jointed with Geant4 TeamJohn Apostolakis

…Supported by Openlab program

Sverre Jarp

Outline

• Concept

• Methodology

• Implementation

Memory layout for multiple threads

• TLS: thread local storage• At compile time, for any static data declared using __thread, the

compiler will reserve space in the TLS of each new thread that is created.

TLS syntax and effect

• static type variable -> static __thread type variable• (global) type variable -> (global) __thread type variable• extern type variable -> extern __thread type variableEach thread initializes and holds its own data

First implementation: data replicated for each thread

• Image size is huge because of multiple copies of data

Outline

• Concept• Methodology

• Implementation

Multi-threaded Geant4: current implementation

• Data that is not changed by ProcessOneEvent should be shared.

Three questions for the shared data model

1. Which data can be safely shared? – Data initialized dynamically. – Geant4 source code does not explicitly declare shared

data.

2. How do we share the data?– Each instance may contain read-only data members

(sharable) and read-write data members– For read-write data members (unshared), C++ does not

allow __thread if the data member is non-static.

3. What is the correct way to initialize the worker thread? – Shared data is allocated and initialized by main (master)

thread.– Workers make thread-private copies of read-write data

members.

1. Which data can be safely shared?

• Expand ProcessOneEvent until variable access. Unavailable.– Complicated inheritance relationship– Virtual methods

• Use valgrind to check memory accesses dynamically at runtime.– valgrind --tool=helgrind a.out for checking data races– If two threads pass through and change the same variable

without adequate locking, this tool issues an error message.– In the case of fullCMS, it is not practical to check how many

data is changed by ProcessOneEvent.– Use unit tests for each module of Geant4 -- especially for the

case geometry and navigation.

2. How do we share the data?

An example – class G4PVReplica

1 2 3 4 50 . . .

1 2 3 4 50 . . .

1 2 3 4 50 . . .

G4PVReplica instance

copyno: 0, 1, 2, 3, 4, 5…

thread worker 1

thread worker 2

1

4

Physical volumes:

Multi-threaded Geant4: first implem.

No shared instances; each instance has a unique copyno

Multi-threaded Geant4: current implem.

Shared G4PVReplica instances; each thread sees private copyno

3. What is the correct way to initialize the worker thread?

• For main (master) thread case, initialize data in the standard way.• The worker thread begins initialization only after main thread has finished

its own initialization.• For worker thread case

– Run manager skips some initialization routines. For example, it skips construct method of detector construction class.

– The worker thread initialize thread-private data only. (For example copyno in the case of G4PVReplica.)

Outline

• Concept

• Methodology• Implementation

TestG4Navigation1.cc with multiple threads

G4VPhysicalVolume *myTopNode;int sleepTime = 10;

void *my_worker_thread1(void *waitTime_ptr){ //wait until the first thread finish sleep(*(int *)waitTime_ptr); testG4Navigator1(myTopNode); testG4Navigator2(myTopNode); //sleep forever, so valgrind can analyze it sleep(sleepTime);}

TestG4Navigation1.cc (continued)

int main(){ myTopNode=BuildGeometry(); // Build the geometry G4GeometryManager::GetInstance()->CloseGeometry(false);

pthread_create( &tid1, NULL, my_worker_thread1, &waitTime1); pthread_create( &tid2, NULL, my_worker_thread1, &waitTime2); pthread_join(tid1, NULL); pthread_join(tid2, NULL);}

Start and analyze output

• Start– valgrind --tool=helgrind --log-file=testG4Navigator1output

testG4Navigator1• Analyze output example 1

– ==538== Possible data race during write of size 4 at 0x56360A0– ==538== at 0x42B944: G4PVReplica::SetCopyNo(int)

(G4PVReplica.cc:180)– ==538== by 0x4191E7:

G4ParameterisedNavigation::LevelLocate(G4NavigationHistory&, G4VPhysicalVolume const*, int, CLHEP::Hep3Vector const&, CLHEP::Hep3Vector const*, bool, CLHEP::Hep3Vector&) (G4ParameterisedNavigation.cc:636)

– ==538== Old state: owned exclusively by thread #2– ==538== New state: shared-modified by threads #2, #3– ==538== Reason: this thread, #3, holds no locks at all

Start and analyze output (continued)

• Analyze output example 2– ==538== Possible data race during write of size 8 at 0x5635F68– ==538== at 0x415218: G4LogicalVolume::SetSolid(G4VSolid*)

(G4LogicalVolume.icc:217)– ==538== by 0x419201:

G4ParameterisedNavigation::LevelLocate(G4NavigationHistory&, G4VPhysicalVolume const*, int, CLHEP::Hep3Vector const&, CLHEP::Hep3Vector const*, bool, CLHEP::Hep3Vector&) (G4ParameterisedNavigation.cc:641)

– ==538== Old state: shared-readonly by threads #2, #3– ==538== New state: shared-modified by threads #2, #3– ==538== Reason: this thread, #3, holds no consistent locks– ==538== Location 0x5635F68 has never been protected by any lock

Start and analyze output (continued)

• Analyze output example 3– ==538== Possible data race during write of size 8 at

0x5634E18– ==538== at 0x40B1FD: G4Box::SetXHalfLength(double)

(G4Box.cc:118)– ==538== by 0x407E6D:

G4LinScale::ComputeDimensions(G4Box&, int, G4VPhysicalVolume const*) const (testG4Navigator1.cc:67)

– ==538== Old state: owned exclusively by thread #2– ==538== New state: shared-modified by threads #2, #3– ==538== Reason: this thread, #3, holds no locks at all

Shared instances in geometry

Only these three geometry classes are currently shared

• Physical volumes– G4VPhysicalVolume

• Thread private data members: G4RotationMatrix *frot; G4ThreeVector ftrans;– G4PVReplica

• Thread private data members: G4int fcopyNo;• Logical volumes

– Thread private data members: G4Material* fMaterial; G4VSolid* fSolid; G4MaterialCutsCouple* fCutsCouple; G4VSensitiveDetector* fSensitiveDetector; G4Region* fRegion;

• Solids– We may need more copies for each solid used by G4Parameterised.

Share logical volumes: step 1

ADD A NEW CLASS

ADDED:class G4LogicalVolumePrivateData{ public: G4Material* fMaterial; G4VSolid* fSolid; G4MaterialCutsCouple* fCutsCouple; G4VSensitiveDetector* fSensitiveDetector; G4Region* fRegion;};

class G4LogicalVolume{…}

In class G4LogicalVolume, delete all thread private data members.

Share logical volumes: step 2

CREATE NEW CLASS

class G4LogicalVolumeObjectCounter{public: PrivateObjectManager* shadowOffset; //shadow pointer for offset static __thread PrivateObjectManager* offset; int AddNew() {...} void WorkerCopy() {...} void FreeWorker() {...}}

Share logical volumes: step 3

ADD TWO DATA MEMBERS TO G4LogicalVolume static G4LogicalVolumeObjectCounter G4LogicalVolume::objectCounter; int G4LogicalVolume::objectOrder;

MODIFY ALL CONSTRUCTORS OF G4LogicalVolumeG4LogicalVolume::G4LogicalVolume(…) { objectOrder = objectCounter.AddNew(); //allocatePrivateData … //initialize in similar way to constructor …}

Share logical volumes: step 4

Redefine the read-write data members to make them thread- private

#define fMaterial (objectCounter.offset[objectOrder]->fMaterial)

We create a new static, thread local array: objectCounter.offset.

objectOrder is the unique instance ID described in the concept slides.

Worker logical volumes: step 5

Worker starts after master has initialized all data.1. When a worker starts, it copies offset content from main thread using method WorkerCopy() of G4LogicalVolumeObjectCounter 2. For each logical volume, call worker constructor to

allocate memory space for thread-private datainitialize them.

3. In some cases, thread-private data is constant and can be shared by all threads. Then, one just skips the above step.

Share logical volumes final results

Share physical volumes

The solid for a G4Parameterised instance

Physics tables: the other large consumer of memory in Geant4

Questions?

Thank you!

top related