efficient locking techniques for databases on modern hardware

Click here to load reader

Upload: yardan

Post on 24-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Efficient Locking Techniques for Databases on Modern Hardware. Hideaki Kimura #*. Goetz Graefe +. Harumi Kuno +. # Brown University * Microsoft Jim Gray Systems Lab. + Hewlett-Packard Laboratories. a t ADMS'12. Slides/papers available on request. Email us: - PowerPoint PPT Presentation

TRANSCRIPT

CORADD CORrelation Aware Database Designer for Materialized Views & Indexes

Hideaki Kimura#*Efficient Locking Techniques for Databases on Modern HardwareGoetz Graefe+Harumi Kuno+#Brown University*Microsoft Jim GraySystems Lab+Hewlett-Packard Laboratoriesat ADMS'12Slides/papers available on request. Email us:[email protected], [email protected], [email protected]#/26Traditional DBMS on Modern HardwareOptimized for Magnetic Disk Bottleneck

Fig. Instructions and Cycles for New Order[S. Harizopoulos et al. SIGMOD08]Disk I/OCostsOther CostsUseful WorkQuery Execution OverheadThenWhatsThis?#/262Context of This Paper

Achieved up to 6x overall speed-upFoster B-treesThis PaperWork in progressConsolidation Array, Flush-PipelineShore-MT/Aether [Johnson et al'10]#/263Our Prior Work: Foster B-treesFoster RelationshipFence KeysSimple Prefix CompressionPoor-man's Normalized KeysEfficient yet Exhaustive VerificationOn Sun Niagara. Tested without locks. only latches.Low Latch ContentionHigh Latch Contention2-3x speed-up6x speedupImplemented by modifying Shore-MT and compared with it:[TODS'12]#/26Talk OverviewKey Range Locks w/ Higher ConcurrencyCombines fence-keys and Graefe lock modesLightweight Intent LockExtremely Scalable and FastScalable Deadlock DetectionDreadlocks Algorithm applied to DatabasesSerializable Early-Lock-ReleaseSerializable all-kinds ELR that allows read-only transaction to bypass logging#/2651. Key Range Lock102030SELECT Key=10UPDATE Key=30XSSELECT Key=20~25SELECT Key=15GapMohan et al. : Locks neighboring key. Lomet et al.: Adds a few new lock modes. (e.g., RangeX-S)Still lacks a few lock modes, resulting in lower concurrency.#/26Our Key Range LockingGraefe Lock Modes. All 3*3=9 modesCreate a ghost record (pseudo deleted record) before insertion as a separate Xct.Use Fence Keys to lock on page boundary

EAEBEZDEEFFence Keys#/262. Intent Lock

Coarse level locking (e.g., table, database)Intent Lock (IS/IX) and Absolute Lock (X/S/SIX)Saves overhead for large scan/write transactions(just one absolute lock)[Gray et al]#/26Intent Lock: Physical ContentionKey-AKey-BDB-1VOL-1IND-1DB-1VOL-1IND-1Key-AKey-BLock QueuesISISISSIXIXIXX

ISISISSIXIXIXX

LogicalPhysical#/26Lightweight Intent Lock Key-AKey-BDB-1VOL-1IND-1Key-AKey-BLockQueuesfor KeyLocksSXISISISSIXIXIXXLogicalPhysicalCountersforCoarseLocksISIXSXDB11100VOL11100IND11100No Lock Queue, No Mutex#/26Intent Lock: SummaryExtremely Lightweight for ScalabilityJust a set of counters, no queueOnly spinlock. Mutex only when absolute lock is requested.Timeout to avoid deadlockSeparate from main lock tableMain Lock TableIntent Lock TablePhysical ContentionLowHighRequired FunctionalityHighLow#/263. Deadlock HandlingDeadlock Prevention (e.g., wound-wait/wait-die) can cause many false positivesDeadlock Detection (Cycle Detection)Infrequent check: delayFrequent/Immediate check: not scalable on many coresTimeout: false positives, delays, hard to configure.Traditional approaches have some drawback#/26Solution: DreadlocksImmediate deadlock detectionLocal Spin: Scalable and Low-overhead Almost* no false positives (*)due to Bloom filterMore details in paper Issues specific to databases:Lock modes, queues and upgradesAvoid pure spinning to save CPU cyclesDeadlock resolution for flush pipeline[Koskinen et al '08]#/264. Early Lock ReleaseResourcesABCLockCommit RequestFlushWaitUnlockCommitProtocolT2:XT1:ST1:ST3:ST3:XLocksTransactionsT1T2T3S: ReadX: Write10ms-T4T5T1000Group-CommitFlush-PipelineMore and MoreLocks, Waits, Deadlocks[DeWitt et al'84][Johnson et al'10]#/26Prior Work: AetherFirst implementation of ELR in DBMSSignificant speed-up (10x) on many-coreSimply releases locks on commit-request" [must hold] until both their own and their predecessors log records have reached the disk. Serial log implementations preserve this property naturally,"[Johnson et al VLDB'10]Problem: A read-only transactionbypasses loggingT1: WriteT1: CommitT2: CommitELRSerial LogLSN101112Dependent#/26 Databases have Serially written log, in which each log has sequential number, LSN. In this log, T1, the predecessor writes commit log first and does ELR. then T2 commits after that. And, the log flusher flushes logs to disk in this order. So, it doesn't violate serializability Is it true? 15Anomaly of Prior ELR TechniqueD=10EventLatestLSNDurableLSNT2: D=2010(T1: Read D) 20T2: Commit-Req30T1: Read D40T1: Commit51..2T2: Commit..3T2:XT1:SLock-queue: "D"Crash!D=20D is 20!

T1Rollback T2#/2616Nave SolutionsFlush wait for Read-Only TransactionOrders of magnitude higher latency.Short read-only query: microsecondsDisk Flush: millisecondsDo not release X-locks in ELR (S-ELR)Concurrency as low as No-ELRAfter all, all lock-waits involve X-locks#/26Safe SX-ELR: X-Release TagD=100EventLatestLSNDurableLSNT2: D=2010(T1: Read D) 20T2: Commit-Req30T1: Read D (max-tag=3)40T1: Commit-Req51T3: Read E (max-tag=0) & Commit62T1, T2: Commit73E=50tagtagT2:XT1:ST3:S3Lock-queue: "D"Lock-queue: "E"D=20

T3E is 5

T1

max-tag#/2618Safe SX-ELR: SummarySerializable yet Highly ConcurrentSafely release all kinds of locksMost read-only transaction quickly exitsOnly necessary threads get waitedLow OverheadJust LSN comparisonApplicable to Coarse LocksSelf-tag and Descendant-tag SIX/IX: Update Descendant-tag. X: Upd. Self-tag IS/IX: Check Self-tag. S/X/SIX: Check Both#/26ExperimentsTPC-B: 250MB of data, fits in bufferpoolHardwareSun-Niagara: 64 Hardware contextsHP Z600: 6 Cores. SSD driveSoftwareFoster B-trees (Modified) in Shore-MT (Original) with/without each techniqueFully ACID, Serializable mode.#/26

Key Range LocksZ600, 6-Threads,AVG & 95% on 20 Runs#/26Ran a series of range query and update. Some of range queries exactly hits existing keys, so key range lock mode doesn't matter. Other range queries don't hit existing keys, so we take range locks.21

Lightweight Intent LockSun Niagara, 60 threads,AVG & 95% on 20 Runs#/26Measured overhead of taking and releasing intent locks.22Dreadlocks vs Traditional

Sun Niagara,AVG on 20 Runs#/26Next, we compared our implementation of Dreadlocks with traditional methods, namely wound-wait and wait-die. In this experiment, there is no true deadlock. So, all of the deadlocks raised by traditional methods are false positives. And the number of such false deadlocks quickly goes up with the number of threads. In contrast, Dreadlocks raise almost no false positives thus acheieve up to 3 to 4 times better throughput.23

Early Lock Release (ELR)SX-ELR performs 5x faster.S-only ELR isnt usefulAll improvements combined, -50x faster.HDD LogSSD LogZ600, 6-Threads,AVG & 95% on 20 Runs#/26Related WorkARIES/KVL, IM [Mohan et al]Key range locking [Lomet'93]Shore-MT at EPFL/CMU/UW-MadisonSpeculative Lock Inheritance [Johnson et al'09]Aether [Johnson et al'10]Dreadlocks [Koskinen and Herlihy'08]H-Store at Brown/MIT#/26Wrap upLocking as bottleneck on Modern H/WRevisited all aspects of database lockingGraefe Lock ModesLightweight Intent LockDreadlockEarly Lock ReleaseAll together, significant speed-up (-50x)Future Work: Buffer-pool#/26#/26Reserved: Locking Details#/26Transactional ProcessingHigh ConcurrencyVery Short LatencyFully ACID-compliantRelatively Small Data

# Digital TransactionsModern Hardware

CPUClock Speed#/2629Many-Cores and Contentions

Logical ContentionPhysical ContentionCriticalSectionShared Resource01011100Mutex orSpinlockDoesn't Help,even Worsens!#/2630Background: Fence keysA~~BB~~CC~~EA~~ZACEAMVA~~M~CDefine key ranges in each page.#/26Key-Range Lock Mode [Lomet '93]102030RangeX-SXSRangeI-NIAdds a few new lock modesConsists of 2 parts; Range and KeyRangeS-SS (RangeN-S)But, still lacks a few lock modes*(*) InstantX lock#/26Example: Missing lock modes102030SELECT Key=15UPDATE Key=20RangeS-N?RangeS-SXRangeA-B#/26Graefe Lock Modes

New lockmodes(*)SSSXXX*#/26(**) Ours locks the key prior to the range while SQL Server uses next-key locking.RangeS-N NSNext-key lockingPrior-key locking#/26LIL: Lock-Request Protocol

#/26LIL: Lock-Release Protocol

#/26

Dreadlocks [Koskinen et al '08]ABCDEA waits for B (live lock)(dead lock)ThreadABCEDDigest*{A}{B}{C}{E}{D}(*) actually a Bloom filter (bit-vector).1. does it contain me?2. add it to myself{A,B}{C,D}{D,E}{E,C}{E,C,D}Ddeadlock!!#/26invented by Koskinen and Herlihy. deadlock detection algorithm geared to many cores.380Nave Solution: Check Page-LSN?Read-only transaction can exit only after Commit Log of dependents becomes durable.LSNPageD=10E=51: T2, D, 10202: T2, Z, 20103: T2, CommitLog-buffer201T2T1PagePageZMimmediately exits if durable-LSN1?#/26

Deadlock Victim & Flush Pipeline#/26

Victim & Flush Pipeline (Cont'd)#/26Dreadlock + Backoff on Sleep TPC-B, Lazy commit, SSD, Xct-chain max 100k

#/26Related Work: H-Store/VoltDBDisk-based DB Pure Main-Memory DBShared-everything -nothing in each nodeDifferences

RAM(Note: both areshared-nothing across-nodes)Foster B-Trees/Shore-MTVoltDBDistributed XctRAMKeep 'em, but improve 'em.Get rid of latches.Accessible RAM per CPUSimplicity and Best-case PerformancePros/ConsBoth are interestingdirections.#/2643Reserved: Foster B-tree Slides#/26Latch Contention in B-trees1. Root-leaf EX Latch2. Next/Prev Pointers

#/26Foster B-trees ArchitectureA~~BB~~CC~~EA~~ZACEAMV1. Fence-keys2. Foster RelationshipA~~M

~Ccf. B-link tree [Lehman et al81]#/26More on Fence KeysEfficient Prefix Compression

Powerful B-tree VerificationEfficient yet Exhaustive VerificationSimpler and More Scalable B-treeNo tree-latchB-tree code size HalvedKey Range LockingHigh: "AAP"Low: "AAF""AAI31""I31""I3""J1"Slot arrayPoor man'snormalization"I31", xxxTuple#/26

B-tree lookup speed-upNo Locks. SELECT-only workload.#/26

Insert-Intensive Case6-7x Speed-upLatchContentionBottleneckLog-BufferContentionBottleneckWill port"ConsolidationArray"[Johnson et al]#/26Chain length: Mixed 1 Thread

#/26

Eager-Opportunistic#/26B-tree Verification#/26