lecture 4: transactions (recovery)

123
1 Lecture 4: Transactions (Recovery) Wednesday, October 19, 2011 Dan Suciu -- CSEP544 Fall 2011

Upload: kumiko

Post on 17-Feb-2016

28 views

Category:

Documents


1 download

DESCRIPTION

Lecture 4: Transactions (Recovery). Wednesday, October 19, 2011. Homework 3. The key concepts here: Connect to db and call SQL from java Dependent joins Integrate two databases Transactions Amount of work: 20 SQL queries+180 lines Java ≈ 12 hours (?). Outline. Transaction basics - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 1

Lecture 4:Transactions (Recovery)

Wednesday, October 19, 2011

Page 2: Lecture  4: Transactions (Recovery)

Homework 3The key concepts here:• Connect to db and call SQL from java• Dependent joins• Integrate two databases• TransactionsAmount of work:• 20 SQL queries+180 lines Java ≈ 12 hours (?)

2

Page 3: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 3

Outline

• Transaction basics• Recovery• Concurrency control (next lecture)

Page 4: Lecture  4: Transactions (Recovery)

Reading Material for Lectures 4/5

Textbook (Ramakrishnan): Ch. 16, 17, 18

Second textbook (Garcia-Molina)• Ch. 17.2, 17.3, 17.4• Ch. 18.1, 18.2, 18.3, 18.8, 18.9

Optional: M. Franklin, Concurrency Control and Recovery

Page 5: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 5

TransactionDefinition: a transaction is a unit of program execution that accesses/updates various data items• It consists of all operations between:

BEGIN TRANSACTIONEND TRANSACTION

• The collection of steps must appear to the user as a single, indivisible unit

Page 6: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011

Transaction

6

BEGIN TRANSACTION

[SQL statements]

COMMIT or ROLLBACK (=ABORT)

May be omitted:first SQL query

starts txn

In ad-hoc SQL: each statement = one transaction

Page 7: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011

Turing Awards to Database Researchers

• Charles Bachman 1973 for CODASYL

• Edgar Codd 1981 for relational databases

• Jim Gray 1998 for transactions

7

Page 8: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011

Implementing TransactionsWhat can go wrong ?

• System crash– Division by 0, power failure

• Interferences with other users– “Anomalies” – 3 have famous names

8

Page 9: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 9

System Crash

What can go wrong?

Client 1:BEGIN TRANSACTIONUPDATE AccountsSET balance= balance - 500WHERE name= ‘Fred’

UPDATE AccountsSET balance = balance + 500WHERE name= ‘Joe’COMMIT

Crash !

Page 10: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 10

1st Famous Anomaly:Lost Update

Client 1:BEGIN TRANSACTION

UPDATE CustomerSET rentals= rentals + 1WHERE cname= ‘Fred’

COMMIT

Client 2:BEGIN TRANSACTION

UPDATE CustomerSET rentals= rentals + 1WHERE cname= ‘Fred’

COMMIT

What can go wrong ?

Page 11: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 11

2nd Famous Anomaly:Inconsistent Read

Client 1: move from gizmogadgetBEGIN TRANSACTIONUPDATE Products SET quantity = quantity + 5WHERE product = ‘gizmo’

UPDATE Products SET quantity = quantity - 5WHERE product = ‘gadget’COMMIT

Client 2: inventory….BEGIN TRANSACTIONSELECT sum(quantity) FROM ProductCOMMIT

What can go wrong ?

Page 12: Lecture  4: Transactions (Recovery)

3rd Famous Anomaly:Dirty Reads

What can go wrong ?

-- Client 1: transfer $100 acc1 acc2BEGIN TRANSACTIONX = Account1.balance; Account2.balance += 100

If (X>=100 and other stuff OK…) { Account1.balance -=100; COMMIT }else {print(“Denied !”); ROLLBACK}

-- Client 2: transfer $100 acc2 acc3BEGIN TRANSACTIONX = Account2.balance; Account3.balance += 100

If (X>=100 and other stuff OK…) { Account2.balance -=100; COMMIT }else {print(“Denied !”); ROLLBACK}

Page 13: Lecture  4: Transactions (Recovery)

13

The Three Famous anomalies• Lost update

– Two tasks T and T’ both modify the same data– T and T’ both commit– Final state shows effects of only T, but not of T’

• Dirty read– T reads data written by T’ while T’ has not committed– What can go wrong: T’ write more data (which T has

already read), or T’ aborts

• Inconsistent read– One task T sees some but not all changes made by T’

Page 14: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 14

ACID Properties• Atomic

– State shows either all the effects of txn, or none of them

• Consistent– Txn moves from a state where integrity holds, to

another where integrity holds• Isolated

– Effect of txns is the same as txns running one after another (ie looks like batch mode)

• Durable– Once a txn has committed, its effects remain in the

database

Page 15: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 15

ACID: Atomicity

• Two possible outcomes for a transaction– It commits: all the changes are made– It aborts: no changes are made

• That is, transaction’s activities are all or nothing

Page 16: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 16

ACID: Isolation

• A transaction executes concurrently with other transaction

• Isolation: the effect is as if each transaction executes in isolation of the others

Page 17: Lecture  4: Transactions (Recovery)

17

ACID: Consistency• The database satisfies integrity constraints

– Account numbers are unique– Stock amount can’t be negative– Sum of debits and of credits is 0

• Consistency = if the database satisfied the constraints at the beginning of the transaction, and if the application is written correctly, then the constraints must hold at the end of the transactions

• Introduced as a requirement in the 70s, but today we understand it is a consequence of atomicity and isolation

Page 18: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 18

ACID: Durability

• The effect of a transaction must continue to exists after the transaction, or the whole program has terminated

• Means: write data to disk

• Sometimes also means recovery

Page 19: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 19

Reasons for Rollback

• Explicit in the application– E.g. use it freely in HW 3

• System-initiated abort– System crash– Housekeeping, e.g. due to timeouts

Page 20: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 20

Outline

• Recovery from failures (the A in ACID)– Today

• Concurrency Control (the C in ACID)– Next lecture

Page 21: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 21

Log-based Recovery

Simple recovery algorithms (Garcia-Molina Ch. 17.2, 17.3, 17.4)

• Undo logging• Redo logging• Redo/undo loggingState of the art (Ramakrishnan Ch. 18)• Aries

Page 22: Lecture  4: Transactions (Recovery)

22

Buffer Management in a DBMS

DB

disk page

free frame

BUFFER POOLREADWRITE

INPUTOUTUPT

Application(Database server)

Large gap between disk I/O and memory Buffer pool

Page 23: Lecture  4: Transactions (Recovery)

Page Replacement Policies

• LRU = expensive– Next slide

• Clock algorithm = cheaper alternative– Read in the book

Both work well in OS, but not always in DB

Dan Suciu -- CSEP544 Fall 2011 23

Page 24: Lecture  4: Transactions (Recovery)

Least Recently Used (LRU)

Dan Suciu -- CSEP544 Fall 2011 24

P5, P2, P8, P4, P1, P9, P6, P3, P7Read(P6)

??

Most recent Least recent

Page 25: Lecture  4: Transactions (Recovery)

Least Recently Used (LRU)

Dan Suciu -- CSEP544 Fall 2011 25

P5, P2, P8, P4, P1, P9, P6, P3, P7Read(P6)

P6, P5, P2, P8, P4, P1, P9, P3, P7

Most recent Least recent

Page 26: Lecture  4: Transactions (Recovery)

Least Recently Used (LRU)

Dan Suciu -- CSEP544 Fall 2011 26

P5, P2, P8, P4, P1, P9, P6, P3, P7Read(P6)

P6, P5, P2, P8, P4, P1, P9, P3, P7

Read(P10)

??

Most recent Least recent

Page 27: Lecture  4: Transactions (Recovery)

Least Recently Used (LRU)

Dan Suciu -- CSEP544 Fall 2011 27

P5, P2, P8, P4, P1, P9, P6, P3, P7Read(P6)

P6, P5, P2, P8, P4, P1, P9, P3, P7

Input(P10)

P10, P6, P5, P2, P8, P4, P1, P9, P3

Read(P10)

Most recent Least recent

Page 28: Lecture  4: Transactions (Recovery)

28

RecoveryType of Crash Prevention

Wrong data entry Constraints andData cleaning

Disk crashes Redundancy: e.g. RAID, archive

Fire, theft, bankruptcy… Remote backups

System failures:e.g. power

DATABASERECOVERY

Page 29: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 29

Transactions

• Assumption: the database is composed of elements– Usually 1 element = 1 block– Can be smaller (=1 record) or larger (=1

relation)• Assumption: each transaction consists

of a sequence of reads/writes of elements

Page 30: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 30

Primitive Operations of Transactions

• READ(X,t)– copy element X to transaction local variable t

• WRITE(X,t)– copy transaction local variable t to element X

• INPUT(X)– read element X to memory buffer

• OUTPUT(X)– write element X to disk

Page 31: Lecture  4: Transactions (Recovery)

31

Example

Atomicity:Both A and Bare multiplied by 2,or none is.

Dan Suciu -- CSEP544 Fall 2011

BEGIN TRANSACTIONREAD(A,t); t := t*2;WRITE(A,t); READ(B,t); t := t*2;WRITE(B,t)COMMIT;

Page 32: Lecture  4: Transactions (Recovery)

32

Action t Mem A Mem B Disk A Disk B

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT

Buffer pool DiskTransaction

READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t)

Page 33: Lecture  4: Transactions (Recovery)

Is this bad ?

Crash !

Action t Mem A Mem B Disk A Disk B

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT

Page 34: Lecture  4: Transactions (Recovery)

Is this bad ?

Crash !

Action t Mem A Mem B Disk A Disk B

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT

Yes it’s bad: A=16, B=8….

Page 35: Lecture  4: Transactions (Recovery)

Is this bad ?

Crash !

Action t Mem A Mem B Disk A Disk B

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT

Page 36: Lecture  4: Transactions (Recovery)

Is this bad ?

Crash !

Action t Mem A Mem B Disk A Disk B

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT

Yes it’s bad: A=B=16, but not committed

Page 37: Lecture  4: Transactions (Recovery)

Is this bad ?

Crash !

Action t Mem A Mem B Disk A Disk B

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT

Page 38: Lecture  4: Transactions (Recovery)

Is this bad ?

Crash !

Action t Mem A Mem B Disk A Disk B

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT

No: that’s OK

Page 39: Lecture  4: Transactions (Recovery)

39

Atomic Transactions

• FORCE or NO-FORCE– Should all updates of a transaction be forced to

disk before the transaction commits?• STEAL or NO-STEAL

– Can an update made by an uncommitted transaction overwrite the most recent committed value of a data item on disk?

Dan Suciu -- CSEP544 Fall 2011

Page 40: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 40

Force/No-steal

• FORCE: Pages of committed transactions must be forced to disk before commit

• NO-STEAL: Pages of uncommitted transactions cannot be written to disk

Easy to implement (how?) and ensures atomicity

Page 41: Lecture  4: Transactions (Recovery)

41

No-Force/Steal

• NO-FORCE: Pages of committed transactions need not be written to disk

• STEAL: Pages of uncommitted transactions may be written to disk

Dan Suciu -- CSEP544 Fall 2011

In either case, Atomicity is violated; need WAL

Page 42: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2010

42

Write-Ahead LogThe Log: append-only file containing log records• Enables NO-FORCE and STEAL• Records every single action of every TXN• Force log entry to disk• After a system crash, use log to recoverThree types: UNDO, REDO, UNDO-REDO

Page 43: Lecture  4: Transactions (Recovery)

43

UNDO Log

Dan Suciu -- CSEP544 Fall 2011

FORCE and STEAL

Page 44: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 44

Undo LoggingLog records• <START T>

– transaction T has begun• <COMMIT T>

– T has committed• <ABORT T>

– T has aborted• <T,X,v>

– T has updated element X, and its old value was v

Page 45: Lecture  4: Transactions (Recovery)

45

Action t Mem A Mem B Disk A Disk B UNDO Log

<START T>

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,8>

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,8>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT <COMMIT T>

Page 46: Lecture  4: Transactions (Recovery)

46

Crash !

WHAT DO WE DO ?

Action t Mem A Mem B Disk A Disk B UNDO Log

<START T>

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,8>

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,8>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT <COMMIT T>

Page 47: Lecture  4: Transactions (Recovery)

47

Crash !

WHAT DO WE DO ?

Action t Mem A Mem B Disk A Disk B UNDO Log

<START T>

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,8>

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,8>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT <COMMIT T>

We UNDO by setting B=8 and A=8

Page 48: Lecture  4: Transactions (Recovery)

48Crash !

Action t Mem A Mem B Disk A Disk B UNDO Log

<START T>

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,8>

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,8>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT <COMMIT T>

What do we do now ?

Page 49: Lecture  4: Transactions (Recovery)

49Crash !

Action t Mem A Mem B Disk A Disk B UNDO Log

<START T>

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,8>

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,8>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT <COMMIT T>

What do we do now ? Nothing: log contains COMMIT

Page 50: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 50

After Crash

• In the first example:– We UNDO both changes: A=8, B=8– The transaction is atomic, since none of its actions

has been executed

• In the second example– We don’t undo anything– The transaction is atomic, since both it’s actions

have been executed

Page 51: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 51

Recovery with Undo Log

After system’s crash, run recovery manager

• Idea 1. Decide for each transaction T whether it is completed or not– <START T>….<COMMIT T>…. = yes– <START T>….<ABORT T>……. = yes– <START T>……………………… = no

• Idea 2. Undo all modifications by incomplete transactions

Page 52: Lecture  4: Transactions (Recovery)

52

Recovery with Undo Log……<T6,X6,v6>……<START T5><START T4><T1,X1,v1><T5,X5,v5><T4,X4,v4><COMMIT T5><T3,X3,v3><T2,X2,v2>

Question1: Which updatesare undone ?

Question 2:How far backdo we need toread in the log ?

Question 3:What happens if thereis a second crash,during recovery ?Crash

Page 53: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 53

When Recovery Fails

• Redo the recovery!• Reason: all undo commands are

idempotent– If we perform them a second time, no

harm is done–E.g. if there is a system crash during

recovery, simply restart recovery from scratch

Page 54: Lecture  4: Transactions (Recovery)

Action t Mem A Mem B Disk A Disk B UNDO Log

<START T>

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,8>

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,8>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT <COMMIT T>

When mustwe force pagesto disk ?

?

?

Page 55: Lecture  4: Transactions (Recovery)

55

Action t Mem A Mem B Disk A Disk B UNDO Log

<START T>

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,8>

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,8>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT <COMMIT T>

RULES: log entry before OUTPUT before COMMIT

FORCE

Page 56: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 56

Undo-Logging Rules

U1: If T modifies X, then <T,X,v> must be written to disk before OUTPUT(X)

U2: If T commits, then OUTPUT(X) must be written to disk before <COMMIT T>

• Hence: OUTPUTs are done early, before the transaction commits

FORCE

Page 57: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 57

Recovery with Undo Log

When do we stop reading the log ?• We cannot stop until we reach the

beginning of the log file• This is impractical

Instead: use checkpointing

Page 58: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 58

Checkpointing

Checkpoint the database periodically• Stop accepting new transactions• Wait until all current transactions

complete• Flush log to disk• Write a <CKPT> log record, flush• Resume transactions

Page 59: Lecture  4: Transactions (Recovery)

59

Undo Recovery with Checkpointing

……<T9,X9,v9>……(all completed)<CKPT><START T2><START T3<START T5><START T4><T1,X1,v1><T5,X5,v5><T4,X4,v4><COMMIT T5><T3,X3,v3><T2,X2,v2>

During recovery,Can stop at first<CKPT>

transactions T2,T3,T4,T5

other transactions

Page 60: Lecture  4: Transactions (Recovery)

60

Nonquiescent Checkpointing

• Problem with checkpointing: database freezes during checkpoint

• Would like to checkpoint while database is operational

• Idea: nonquiescent checkpointing

Quiescent = being quiet, still, or at rest; inactiveNon-quiescent = allowing transactions to be active

Page 61: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 61

Nonquiescent Checkpointing

• Write a <START CKPT(T1,…,Tk)>where T1,…,Tk are all active transactions

• Continue normal operation• When all of T1,…,Tk have completed,

write <END CKPT>

Page 62: Lecture  4: Transactions (Recovery)

62

Undo Recovery with Nonquiescent Checkpointing

………………<START CKPT T4, T5, T6>…………<END CKPT>………

During recovery,Can stop at first<CKPT>

T4, T5, T6, plus later transactions

earlier transactions plus T4, T5, T6

later transactionsQ: do we need <END CKPT> ?

Page 63: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011

Implementing ROLLBACK

• A transaction ends in COMMIT or ROLLBACK• Use the undo-log to implement ROLLBCACK

• LSN = Log Seqence Number• Log entries for the same transaction are

linked, using the LSN’s• Read log in reverse, using LSN pointers

63

Page 64: Lecture  4: Transactions (Recovery)

64

REDO Log

Dan Suciu -- CSEP544 Fall 2011

NO-FORCE and NO-STEAL

Page 65: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 65

Redo Logging

Log records• <START T> = transaction T has begun• <COMMIT T> = T has committed• <ABORT T>= T has aborted• <T,X,v>= T has updated element X, and

its new value is v

Page 66: Lecture  4: Transactions (Recovery)

66

Action t Mem A Mem B Disk A Disk B

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

COMMIT

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16Crash !

Is this bad ?

Page 67: Lecture  4: Transactions (Recovery)

67

Action t Mem A Mem B Disk A Disk B

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

COMMIT

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16Crash !

Is this bad ? Yes, it’s bad: A=16, B=8

Page 68: Lecture  4: Transactions (Recovery)

68

Action t Mem A Mem B Disk A Disk B

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

COMMIT

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

Crash !

Is this bad ?

Page 69: Lecture  4: Transactions (Recovery)

69

Action t Mem A Mem B Disk A Disk B

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

COMMIT

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

Crash !

Is this bad ? Yes, it’s bad: T committed but A=B=8

Page 70: Lecture  4: Transactions (Recovery)

70

Action t Mem A Mem B Disk A Disk B REDO Log

<START T>

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,16>

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,16>

COMMIT <COMMIT T>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

Page 71: Lecture  4: Transactions (Recovery)

71

Action t Mem A Mem B Disk A Disk B REDO Log

<START T>

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,16>

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,16>

COMMIT <COMMIT T>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16Crash !

How do we recover ?

Page 72: Lecture  4: Transactions (Recovery)

72

Action t Mem A Mem B Disk A Disk B REDO Log

<START T>

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,16>

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,16>

COMMIT <COMMIT T>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16Crash !

How do we recover ? We REDO by setting A=16 and B=16

Page 73: Lecture  4: Transactions (Recovery)

73

Recovery with Redo Log

After system’s crash, run recovery manager

• Step 1. Decide for each transaction T whether we need to redo or not– <START T>….<COMMIT T>…. = yes– <START T>….<ABORT T>……. = no– <START T>……………………… = no

• Step 2. Read log from the beginning, redo all updates of committed transactions

Page 74: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 74

Recovery with Redo Log<START T1><T1,X1,v1><START T2><T2, X2, v2><START T3><T1,X3,v3><COMMIT T2><T3,X4,v4><T1,X5,v5>

Show actionsduring recovery

Page 75: Lecture  4: Transactions (Recovery)

75

Action t Mem A Mem B Disk A Disk B REDO Log

<START T>

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,16>

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,16>

COMMIT <COMMIT T>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

When mustwe force pagesto disk ?

?

?

Page 76: Lecture  4: Transactions (Recovery)

76

Action t Mem A Mem B Disk A Disk B REDO Log

<START T>

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,16>

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,16>

COMMIT <COMMIT T>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

RULE: OUTPUT after COMMIT

NO-STEAL

Page 77: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 77

Redo-Logging Rules

R1: If T modifies X, then both <T,X,v> and <COMMIT T> must be written to disk before OUTPUT(X)

• Hence: OUTPUTs are done late

NO-STEAL

Page 78: Lecture  4: Transactions (Recovery)

78

Checkpointing

• To ensure recovery we must read from the beginning of the log: this is too inefficient

• Checkpointing: periodically write information to the log to allow us to process only the tail of the log

Dan Suciu -- CSEP544 Fall 2011

Page 79: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 79

Nonquiescent Checkpointing

• Write a <START CKPT(T1,…,Tk)>where T1,…,Tk are all active transactions

• Flush to disk all blocks of committed transactions (dirty blocks), while continuing normal operation

• When all blocks have been flushed, write <END CKPT>

Note: this differs significantly from ARIES (later this lecture)

Page 80: Lecture  4: Transactions (Recovery)

80

Redo Recovery with Nonquiescent Checkpointing

…<START T1>…<COMMIT T1>…<START T4>…<START CKPT T4, T5, T6>…………<END CKPT>………<START CKPT T9, T10>…

Step 1: look forThe last<END CKPT>

Step 2: redofrom theearlieststart ofT4, T5, T6ignoringtransactionscommittedearlier

All OUTPUTs of T1 are guaranteedto be on disk

Cannotuse

Page 81: Lecture  4: Transactions (Recovery)

81

Comparison Undo/Redo

• Undo logging: OUTPUT must be done early: – Inefficient

• Redo logging: OUTPUT must be done late: – Inflexible

Dan Suciu -- CSEP544 Fall 2011

Page 82: Lecture  4: Transactions (Recovery)

82

UNDO/REDO Log

Dan Suciu -- CSEP544 Fall 2011

NO-FORCE and STEAL

Page 83: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 83

Undo/Redo Logging

Log records, only one change• <T,X,u,v>= T has updated element X, its

old value was u, and its new value is v

Page 84: Lecture  4: Transactions (Recovery)

Action T Mem A Mem B Disk A Disk B Log

<START T>

REAT(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,8,16>

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,8,16>

OUTPUT(A) 16 16 16 16 8

<COMMIT T>

OUTPUT(B) 16 16 16 16 16

Can OUTPUT whenever we want: before/after COMMIT

Page 85: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 85

Recovery with Undo/Redo Log

After system’s crash, run recovery manager • Redo all committed transaction, top-down• Undo all uncommitted transactions, bottom-

up

Page 86: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 86

Recovery with Undo/Redo Log<START T1><T1,X1,v1><START T2><T2, X2, v2><START T3><T1,X3,v3><COMMIT T2><T3,X4,v4><T1,X5,v5>

Page 87: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 87

Undo/Redo-Logging Rule

UR1: If T modifies X, then <T,X,u,v> must be written to disk before OUTPUT(X)

Note: we are free to OUTPUT early or late relative to <COMMIT T>

NO-FORCE and STEAL

Page 88: Lecture  4: Transactions (Recovery)

88

ARIES

Dan Suciu -- CSEP544 Fall 2011

Page 89: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2010

89

Aries

• ARIES pieces together several techniques into a comprehensive algorithm

• Developed at IBM Almaden, by Mohan• IBM botched the patent, so everyone

uses it now• Several variations, e.g. for distributed

transactions

Page 90: Lecture  4: Transactions (Recovery)

90

ARIES Recovery Manager

• A redo/undo log• Physiological logging

– Physical logging for REDO– Logical logging for UNDO

• Efficient checkpointing

Dan Suciu -- CSEP544 Fall 2011

Why ?

Page 91: Lecture  4: Transactions (Recovery)

91

ARIES Recovery Manager

Log entries:• <START T> -- when T begins• Update: <T,X,u,v>

– T updates X, old value=u, new value=v– In practice: undo only and redo only entries

• <COMMIT T> or <ABORT T>• CLR’s – we’ll talk about them later.

Dan Suciu -- CSEP544 Fall 2011

Page 92: Lecture  4: Transactions (Recovery)

92

ARIES Recovery Manager

Rule:• If T modifies X, then <T,X,u,v> must be

written to disk before OUTPUT(X)

We are free to OUTPUT early or late

Dan Suciu -- CSEP544 Fall 2011

Page 93: Lecture  4: Transactions (Recovery)

93

LSN = Log Sequence Number• LSN = identifier of a log entry

– Log entries belonging to the same txn are linked

• Each page contains a pageLSN:– LSN of log record for latest update to that page– Will serve to determine if an update needs to be

redone

Dan Suciu -- CSEP544 Fall 2011

Page 94: Lecture  4: Transactions (Recovery)

94

ARIES Data Structures• Active Transactions Table

– Lists all running txns (active txns)– For each txn: lastLSN = most recent update by txn

• Dirty Page Table– Lists all dirty pages– For each dirty page: recoveryLSN (recLSN)= first LSN

that caused page to become dirty• Write Ahead Log

– LSN, prevLSN = previous LSN for same txn

Dan Suciu -- CSEP544 Fall 2011

Page 95: Lecture  4: Transactions (Recovery)

ARIES Data Structures

pageID recLSNP5 102

P6 103

P7 101

LSN prevLSN transID pageID Log entry101 - T100 P7

102 - T200 P5

103 102 T200 P6

104 101 T100 P5

Dirty pages Log (WAL)

transID lastLSNT100 104

T200 103

Active transactionsP8 P2 . . .

. . .

P5PageLSN=104

P6PageLSN=103

P7PageLSN=101

Buffer Pool

WT100(P7)WT200(P5)WT200(P6)WT100(P5)

Page 96: Lecture  4: Transactions (Recovery)

96

ARIES Normal Operation

T writes page P• What do we do ?

Dan Suciu -- CSEP544 Fall 2011

Page 97: Lecture  4: Transactions (Recovery)

97

ARIES Normal Operation

T writes page P• What do we do ?

• Write <T,P,u,v> in the Log• pageLSN=LSN• lastLSN=LSN• recLSN=if isNull then LSN

Dan Suciu -- CSEP544 Fall 2011

Page 98: Lecture  4: Transactions (Recovery)

98

ARIES Normal Operation

Buffer manager wants to OUTPUT(P)• What do we do ?

Buffer manager wants INPUT(P)• What do we do ?

Dan Suciu -- CSEP544 Fall 2011

Page 99: Lecture  4: Transactions (Recovery)

99

ARIES Normal Operation

Buffer manager wants to OUTPUT(P)• Flush log up to pageLSN• Remove P from Dirty Pages tableBuffer manager wants INPUT(P)• Create entry in Dirty Pages table

recLSN = NULL

Dan Suciu -- CSEP544 Fall 2011

Page 100: Lecture  4: Transactions (Recovery)

100

ARIES Normal Operation

Transaction T starts• What do we do ?

Transaction T commits/aborts• What do we do ?

Dan Suciu -- CSEP544 Fall 2011

Page 101: Lecture  4: Transactions (Recovery)

101

ARIES Normal Operation

Transaction T starts• Write <START T> in the log• New entry T in Active TXN;

lastLSN = nullTransaction T commits/aborts• Write <COMMIT T> in the log• Flush log up to this entry

Dan Suciu -- CSEP544 Fall 2011

Page 102: Lecture  4: Transactions (Recovery)

102

Checkpoints

Write into the log

• Entire active transactions table• Entire dirty pages table

Dan Suciu -- CSEP544 Fall 2011

Recovery always starts by analyzing latest checkpoint

Background process periodically flushes dirty pages to disk

Page 103: Lecture  4: Transactions (Recovery)

103

ARIES Recovery1. Analysis pass

– Figure out what was going on at time of crash– List of dirty pages and active transactions

2. Redo pass (repeating history principle)– Redo all operations, even for transactions that will not commit– Get back to state at the moment of the crash

3. Undo pass– Remove effects of all uncommitted transactions– Log changes during undo in case of another crash during undo

Dan Suciu -- CSEP544 Fall 2011

Page 104: Lecture  4: Transactions (Recovery)

104

ARIES Method Illustration

[Figure 3 from Franklin97]Dan Suciu -- CSEP544 Fall 2011

First undo and first redo log entry might bein reverse order

Page 105: Lecture  4: Transactions (Recovery)

105

1. Analysis Phase• Goal

– Determine point in log where to start REDO– Determine set of dirty pages when crashed

• Conservative estimate of dirty pages– Identify active transactions when crashed

• Approach– Rebuild active transactions table and dirty pages table– Reprocess the log from the checkpoint

• Only update the two data structures– Compute: firstLSN = smallest of all recoveryLSN

Dan Suciu -- CSEP544 Fall 2011

Page 106: Lecture  4: Transactions (Recovery)

1. Analysis Phase(crash)Checkpoint

Dirtypages

Activetxn

Log

pageID recLSN pageID

transID lastLSN transID

firstLSN= ??? Where do we startthe REDO phase ?

Page 107: Lecture  4: Transactions (Recovery)

1. Analysis Phase(crash)Checkpoint

Dirtypages

Activetxn

Log

pageID recLSN pageID

transID lastLSN transID

firstLSN=min(recLSN)

Page 108: Lecture  4: Transactions (Recovery)

1. Analysis Phase(crash)Checkpoint

Dirtypages

Activetxn

Log

pageID recLSN pageID

transID lastLSN transID

pageID recLSN pageID

transID lastLSN transID

Replayhistory

firstLSN

Page 109: Lecture  4: Transactions (Recovery)

109

2. Redo Phase

Main principle: replay history• Process Log forward, starting from

firstLSN• Read every log record, sequentially• Redo actions are not recorded in the log• Needs the Dirty Page Table

Dan Suciu -- CSEP544 Fall 2011

Page 110: Lecture  4: Transactions (Recovery)

110

2. Redo Phase: Details

For each Log entry record LSN: <T,P,u,v>• Re-do the action P=u and WRITE(P)• But which actions can we skip, for

efficiency ?

Dan Suciu -- CSEP544 Fall 2011

Page 111: Lecture  4: Transactions (Recovery)

111

2. Redo Phase: Details

For each Log entry record LSN: <T,P,u,v>• If P is not in Dirty Page then no update• If recLSN > LSN, then no update• INPUT(P) (read page from disk):

If pageLSN > LSN, then no update• Otherwise perform update

Dan Suciu -- CSEP544 Fall 2011

Page 112: Lecture  4: Transactions (Recovery)

112

2. Redo Phase: Details

What happens if system crashes during REDO ?

Dan Suciu -- CSEP544 Fall 2011

Page 113: Lecture  4: Transactions (Recovery)

113

2. Redo Phase: Details

What happens if system crashes during REDO ?

We REDO again ! Each REDO operation is idempotent: doing it twice is the as as doing it once.

Dan Suciu -- CSEP544 Fall 2011

Page 114: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 114

3. Undo Phase

• Cannot “unplay” history, in the same way as we “replay” history

• WHY NOT ?

Page 115: Lecture  4: Transactions (Recovery)

Dan Suciu -- CSEP544 Fall 2011 115

3. Undo Phase

• Cannot “unplay” history, in the same way as we “replay” history

• WHY NOT ?

• Need to support ROLLBACK: selective undo, for one transaction

• Hence, logical undo v.s. physical redo

Page 116: Lecture  4: Transactions (Recovery)

116

3. Undo Phase

Main principle: “logical” undo• Start from end of Log, move backwards• Read only affected log entries• Undo actions are written in the Log as special

entries: CLR (Compensating Log Records)• CLRs are redone, but never undone

Dan Suciu -- CSEP544 Fall 2011

Page 117: Lecture  4: Transactions (Recovery)

117

3. Undo Phase: Details• “Loser transactions” = uncommitted

transactions in Active Transactions Table

• ToUndo = set of lastLSN of loser transactions

Dan Suciu -- CSEP544 Fall 2011

Page 118: Lecture  4: Transactions (Recovery)

118

3. Undo Phase: Details

While ToUndo not empty:• Choose most recent (largest) LSN in ToUndo• If LSN = regular record <T,P,u,v>:

– Undo v– Write a CLR where CLR.undoNextLSN = LSN.prevLSN

• If LSN = CLR record:– Don’t undo !

• if CLR.undoNextLSN not null, insert in ToUndootherwise, write <END TRANSACTION> in log

Dan Suciu -- CSEP544 Fall 2011

Page 119: Lecture  4: Transactions (Recovery)

119

3. Undo Phase: Details

[Figure 4 from Franklin97]

Dan Suciu -- CSEP544 Fall 2011

Page 120: Lecture  4: Transactions (Recovery)

120

3. Undo Phase: Details

What happens if system crashes during UNDO ?

Dan Suciu -- CSEP544 Fall 2011

Page 121: Lecture  4: Transactions (Recovery)

121

3. Undo Phase: Details

What happens if system crashes during UNDO ?

We do not UNDO again ! Instead, each CLR is a REDO record: we simply redo the undo

Dan Suciu -- CSEP544 Fall 2011

Page 122: Lecture  4: Transactions (Recovery)

122

Physical v.s. Logical Loging

Why are redo records physical ?

Why are undo records logical ?

Dan Suciu -- CSEP544 Fall 2011

Page 123: Lecture  4: Transactions (Recovery)

123

Physical v.s. Logical Loging

Why are redo records physical ?• Simplicity: replaying history is easy, and

idempotent

Why are undo records logical ?• Required for transaction rollback: this not

“undoing history”, but selective undo

Dan Suciu -- CSEP544 Fall 2011