shimin chen lba reading group presentation

30
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin Chen LBA Reading Group Presentation

Upload: nita

Post on 30-Jan-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007. Shimin Chen LBA Reading Group Presentation. Motivation. Synchronization is a challenging step in parallel programming - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Shimin Chen LBA Reading Group Presentation

Colorama: Architectural Support for Data-Centric Synchronization

Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007

Shimin ChenLBA Reading Group

Presentation

Page 2: Shimin Chen LBA Reading Group Presentation

Motivation

Synchronization is a challenging step in parallel programming

Transactional Memory helpful but still complicated Programmers have to reason non-locally Code-centric approach

Data-Centric Synchronization (DSC) desirable Associate synchronization constraints with data

structures Which data items should be in the same critical section System automatically inserts sync operations into code Reason locally

Page 3: Shimin Chen LBA Reading Group Presentation

What’s New? Existing DCS proposal are SW-only (S-

DCS) Cannot handle C/C++ pointer aliasing Unrealistic

New proposal: hardware DCS (H-DCS) Colorama HW primitives to start and exit critical

sections Independent of the underlying sync

mechanisms

Page 4: Shimin Chen LBA Reading Group Presentation

Outline

Introduction Data-Centric Synchronization

(DCS) Architectures of Colorama Programming with Colorama Evaluation Conclusion

Page 5: Shimin Chen LBA Reading Group Presentation

Data-Centric Synchronization (DCS)

Data consistency domain Two threads cannot access the same domain at the

same time For example: X, and Y are in the same domain

If a thread is accessing X, no other threads can access X & Y

System needs to automatically infer entry and exit points of critical sections:

Entry: access to data in a domain Exit: define a simple, clear exit policy and let

programmers write code to conform to this policy

Page 6: Shimin Chen LBA Reading Group Presentation

Software DCS (S-DCS) Vaziri et al’s Atomic Sets Compiler and language extensions to Java Data consistency domain: atomic set, subset

of fields of a Java class Entry point: compiler analysis Exit policy: insert exit point

In the same method as the entry point and Right before method return

Page 7: Shimin Chen LBA Reading Group Presentation

Colorama: Hardware DCS Data consistency domain: color

Data item belongs to a domain: colored Entry point: detected by HW Exit policy: driven by compiler Examples:

Page 8: Shimin Chen LBA Reading Group Presentation

Examples Cont’d

Page 9: Shimin Chen LBA Reading Group Presentation

Outline

Introduction Data-Centric Synchronization

(DCS) Architectures of Colorama Programming with Colorama Evaluation Conclusion

Page 10: Shimin Chen LBA Reading Group Presentation

Structures Overview

Every colored data item has an entry in Palette (details next)

Per-thread: all 3 structures have the same number of entries

Owned color array: current critical sections CAB, CRB: used for exit policy

Page 11: Shimin Chen LBA Reading Group Presentation

Palette

Palette based on Mondrian Memory Protection system (Witchel et al. ASPLOS’02) – the white part

Extend with color ID (the gray part)

SW managed

HW

Page 12: Shimin Chen LBA Reading Group Presentation

Entry Point HW monitors each load and store

Check cached Palette for the mem op Check owned colors array Trigger a user-level SW handler if accessing

a colored region not owned Handler for entry point:

Add color ID into owned colors array Start critical section (e.g. begin transaction)

Page 13: Shimin Chen LBA Reading Group Presentation

Exit Policy Exit a critical section when the thread returns

from the subroutine where the critical section was entered

Page 14: Shimin Chen LBA Reading Group Presentation

Implementing Exit Policy Color acquire bitmap register (CAB) and color release

bitmap register (CRB) CAB automatically set by HW at entry points Compiler generates the following code:

Subroutine prologue:Push CABCAB 0

Subroutine epilogue:CRB CABPop CAB

Upon write to CRB: HW triggers user-level handler Handler: remove Color ID from owned color array, exit

critical section

Page 15: Shimin Chen LBA Reading Group Presentation

Handling Pointers as Subroutine Arguments

Perform multiple operations on a structure together Propose “colorcheck” instruction

Page 16: Shimin Chen LBA Reading Group Presentation

Using Locks as Sync Mechanisms

Colorama can also use locks Two potential problems:

Longer critical section thus maybe more contention May deadlock See evaluations

Page 17: Shimin Chen LBA Reading Group Presentation

Outline

Introduction Data-Centric Synchronization

(DCS) Architectures of Colorama Programming with Colorama Evaluation Conclusion

Page 18: Shimin Chen LBA Reading Group Presentation

Correctness Critical sections of the same color are

serialized Correctly colored programs data-race free Possible programming errors:

Fail to color shared data structures Use different colors to data that should be protected

together

Page 19: Shimin Chen LBA Reading Group Presentation

Compatibility Issues Legacy libraries that do not use Colorama

OK if they explicitly protect lib data using locks, etc. Colorama protects application data outside of lib

Cases requires extensions to Colorama Worker thread executes an infinite loop that

processes incoming request Needs to release lock, wait, acquire lock in the same

loop Colorama extensions: getcolorid etc.

Page 20: Shimin Chen LBA Reading Group Presentation

Complete API

Page 21: Shimin Chen LBA Reading Group Presentation

Outline

Introduction Data-Centric Synchronization

(DCS) Architectures of Colorama Programming with Colorama Evaluation Conclusion

Page 22: Shimin Chen LBA Reading Group Presentation

Setup Evaluation is based on analyzing applications

by using a Pin-based tool

Page 23: Shimin Chen LBA Reading Group Presentation

Is the Exit Policy Suitable?

Matched: lock acquire & release in same subroutine Almost all dynamic and 95% static critical sections Answer: Yes

Page 24: Shimin Chen LBA Reading Group Presentation

Critical Section Size Increase

Page 25: Shimin Chen LBA Reading Group Presentation

How often multiple independent critical sections are in the same subroutine?

Potential deadlocks 1% dynamic and 4% static Detailed analysis shows that the resulting lock order

always same, thus no deadlocks

Page 26: Shimin Chen LBA Reading Group Presentation

Structure Sizes

# palette rows: # of allocated regions + # of static data objects

# of colors: # lock addr

# of Owned Colors Array entries: max # of active locks held by a thread

Page 27: Shimin Chen LBA Reading Group Presentation

Colorama Instruction Overheads

Per-routine: Prologue & epilogue: 6 insn/routine 1 colorcheck insn per pointer argument Estimate 7 insn/routine On avg, 1.6 routines per 100 dynamic insns: so ~11%

insns Entry and exit handlers: low freq of critical section

enry and exit, so low overhead Coloring overheads ~ memory allocation calls

# of insns between allocations: firefox, gaim, gftp – 2-4K Memory allocators can keep pools of colored memory (??)

Page 28: Shimin Chen LBA Reading Group Presentation

Memory Overhead

MMP: Mondrian Memory Protection Palette adds 1-2.5% more space over app footprint

Page 29: Shimin Chen LBA Reading Group Presentation

Conclusions Colorama: Hardware Data-Centric

Synchronization HW support for entry and exit points Evaluation suggests:

Exit policy is suitable Low impact on critical section lengths Modest additional overhead over MMP

This paper does not even do simulation!

Page 30: Shimin Chen LBA Reading Group Presentation

Related Work

monitors