non-volatile memory logging - pgcon€¦ · • combination of existing technologies -- dram, ssd,...

44
Non-volatile Memory (NVM) Logging 2016.05.20 Takashi HORIKAWA 1

Upload: others

Post on 29-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Non-volatile Memory (NVM) Logging

2016.05.20Takashi HORIKAWA

1

Page 2: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Who am IName

Takashi HORIKAWA, Ph. D.

Research interestsPerformance evaluation of computer & communication systems, including

performance engineering of IT systemswith slightly shifting the focus of the research to CPU scalability

PapersLatch-free data structures for DBMS: design, implementation, and evaluation, SIGMOD ‘13An Unexpected Scalability Bottleneck in a DBMS: A Hidden Pitfall in Implementing Mutual Exclusion, PDCS ‘11An approach for scalability-bottleneck solution: identification and elimination of scalability bottlenecks in a DBMS, ICPE '11A method for analysis and solution of scalability bottleneck in DBMS, SoICT '10

2

Page 3: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Contents

• Introduction• Problems to be solved• Implementation• Evaluation• Technical trends• Conclusion

3

Page 4: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Write Ahead LoggingSync. vs Async. CommitDifference in PerformanceFundamental idea for NVM LoggingByte or Block Addressable NVMByte addressable NVMs

Introduction

4

Page 5: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Write Ahead Logging

Worker process

Shared buffer WAL buffer

Asynchronous Write Synchronous Write

Data files WAL files

Storage

Worker processWorker processWorker process

Memory

TableIndex

XLog

Transactionprocessing

Widely used method to make transaction durable

5

XLog recordTable, Index

Page 6: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Write Ahead Logging

Worker process

Shared buffer WAL buffer

Asynchronous Write Synchronous Write

Data files WAL files

Storage

Worker processWorker processWorker process

Memory

TableIndex

XLog

Write latency is included in transaction response

Overhead

Transactionprocessing

Widely used method to make transaction durable

6

XLog recordTable, Index

Page 7: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Sync. vs Async. Commit

7

Transaction processing WAL write (I/O)

t

Transaction becomes durable

Page 8: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Sync. vs Async. Commit

8

Transaction processing WAL write (I/O)

t

Notifying the client of the transaction commit

Sync. commit

Transaction becomes durable

In sync. committhe client is notified of the transaction commit after the transaction becomes durable.

Durability : OKResponse : Slow

Page 9: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Sync. vs Async. Commit

9

Transaction processing WAL write (I/O)

t

Notifying the client of the transaction commit

Async. commit

Transaction becomes durable

In async. committhe client is notified of the transaction commit before the transaction becomes durable.

Durability : NGResponse : Fast

Page 10: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Difference in Performance

10

0 40 80 120 1600

10000

20000

30000

40000

50000

ClientsTh

roug

hput

Async

Sync

0 40 80 120 1600

10000

20000

30000

40000

50000

Clients

Thro

ughp

ut (T

PS)

Async

Sync

(PGBENCH)

Disk-drive cache off Disk-drive cache on

Page 11: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Fundamental idea for NVM Logging

Worker process

Shared buffer WAL buffer

Asynchronous Write Synchronous Write

Data files WAL files

Storage

Worker processWorker processWorker process

Memory

TableIndex

XLog

Transactionprocessing

Non-volatile

11

XLog recordTable, Index

Volatile

Page 12: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Fundamental idea for NVM Logging

Worker process

Shared buffer WAL buffer

Asynchronous Write Asynchronous Write

Data files WAL files

Storage

Worker processWorker processWorker process

Memory

TableIndex

XLog

Transactionprocessing

Volatile

12

XLog recordTable, Index

XLog record is stored as non-volatile memory

NVM

Non-volatile

Page 13: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Byte or Block Addressable NVM

13

CPU

Memory

Store instruction

Storage

HD, SSD

I/O operation

Accessed in units of byte

Accessed in units of block

Byte addressable NVM

Block addressable NVM(Commonly used already)

(Expected to be used from now on)

Key device in NVM Logging

Page 14: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Byte addressable NVMs• Combination of existing technologies -- DRAM, SSD, battery (NVDIMM)

– AgigA Tech (Micron Technology)– Viking Techonogy– SK Hynix

• Use of a new memory cell (Storage Class Memory)– Phase Change Memory (PCM)– Magnetic Random Access Memory (MRAM)– Ferroelectric RAM (FRAM)– The memristor

14

Capacity

Small

Large

Access time

~100nS

~μS

Ready-to-use

Near future (?)

Availability

Page 15: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Fundamental idea for NVM Logging (Again)It is not simple than it looksNecessary condition for RecoveryProblems

Partial writeUnreachable XLog RecordCPU cache effect

Problems to be solved

15

Page 16: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Fundamental idea for NVM Logging

Worker process

Shared buffer WAL buffer

Asynchronous Write Asynchronous Write

Data files WAL files

Storage

Worker processWorker processWorker process

Memory

TableIndex

XLog

Transactionprocessing

Volatile

16

XLog recordTable, Index

XLog record is stored as non-volatile memory

NVM

Non-volatile

(Again)

Page 17: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

It is not simple than it looks

• Naive implementation of NVM Logging– Allocating WAL buffer in NVM area– Using asynchronous commit mode

• Problems are:– Partial write– Unreachable XLog Record– CPU cache effect

17

Not sufficient

Page 18: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Necessary condition for Recovery

• If the recovery process reads all XLR of transaction m correctly

transaction m is possible to be recovered

• Else transaction m will be lost

18

LSN

XLRm,N-2 XLRm,N-1 XLRm,NXLog

A worker process finishes the commit of the transaction after all XLRs of the transaction are stored in the non-volatile memory.

Page 19: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Partial write

• The recovery process will read an incomplete XLRif the system crashes in the middle of writing a XLR

19

XLR 1

WAL buffer

LSN

(CRC in the XLR may be effective but it is not perfect)

Page 20: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Unreachable XLog Record

• The recovery process cannot find a XLR of a committed transaction– A worker process finishes writing of XLR 3 before

another worker process begin to write XLR 2

20

XLR 1

WAL bufferLSN

XLR 2 XLR 3

Length1

XLog reader finds the head position of a XLR by adding the head position of the previous XLR and its length

Page 21: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

CPU cache effect

• The recovery process will read inconsistent XLR

21

CPU

Memory

Cache

Inconsistent

If the CPU internal cache uses write-back policy, a XLR written by the CPU does not reach to the memory immediately

Page 22: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Prototype architecturePreventing partial writePreventing an unreachable XLRUse of Write-combined modeGUC parameter for NVM LoggingAccessing NVM at recoveryWrap around of WAL buffer

Implementation

22

Page 23: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Prototype architecture

23

File system

PostgreSQL

open mmap

WAL buffer

Pseudo NVM(pram)

Shared buffer

userkernel

ext4 ext3 xfsext2pm

Modified

Newly added

Kernel module

Similar to RAM disk but CPU cache mode differs

(9.6devel at the middle of March)

Page 24: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Preventing partial write

24

XLog Record

Tail

Start

Reserve

Write data

t

LSNWrite len

New tail

1. Move the tail pointer to reserve buffer area

2. Write the XLR data other than length field

3. Write length field of the XLR

( At this point length field of XLR in XLog buffer is 0)

If XLog reader find a XLR whose length field is not zero,all XLR data is written in the XLog buffer.

Page 25: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Preventing an unreachable XLR

25

XLR 1Start

Reserve 1

tLSN

Write 1

Reserve 2

Write 2

Reserve 3

Write 3

Reserve 4

Write 4

XLR 2

XLR 3

WP waits until commit becomes able to be finished

WP finishes the commit as all previous XLRs are written

Write

Write Write

Write

Page 26: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Wait control

• A wait mechanism is already implemented in PostgreSQLstatic XLogRecPtr

WaitXLogInsertionsToFinish(XLogRecPtr upto)

If any XLR with a smaller LSN than the upto parameter is not finished to copy in the WAL buffer, the worker process sleeps until all of those XLRs are copied in the WAL buffer.

NVM Logging implements the wait mechanism by using this function

26

Page 27: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Use of Write-combined mode

27

File system

PostgreSQL

open mmap

WAL buffer

Pseudo NVM(pram)

Shared buffer

userkernel

ext4 ext3 xfsext2pm

Kernel module

Write-combined mode, which is a variation of write-through, is set for the memory pages for pseudo NVM.

Page 28: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

GUC parameter for NVM Logging

• NVM Logging is enabled through one GUC parameter (described in postgresql.conf)

– PRAM_FILE_NAME = “NVM File name”

• When PRAM_FILE_NAME is set– XLOGShmemInit() invokes open() and mmap() to

“NVM File name” and uses the memory area for WAL buffer

– CopyXLogRecordToWAL() copies XLR in WAL buffer according to the procedure that prevents partial write

28

Page 29: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Accessing NVM at recovery

29

XLogReader

XLog buffer in NVM

n n+1 n+2 n+3n-1n-2

minLSN maxLSN

Segment number

WAL files

XLogReader accesses NVM XLog buffer to obtain XLog records whose LSN is between minLSN and maxLSN

LSN

Page 30: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Wrap around of WAL buffer

30

Logical view

LSN

NVM size

Saved in XLog files

InitializedUpto

Written in NVM only

EvaluatedUpto Current write point

Page 31: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Wrap around of WAL buffer

31

Logical view

LSN

NVM size

Saved in XLog files

InitializedUpto

Written in NVM only

EvaluatedUpto

Physical viewNVM size

InitializedUptoCurrent write point

EvaluatedUpto

LSN

WAL segment

Current write point

(n-1) th round

n th round

Page 32: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Experimental SetupPerformance

PGBENCHDBT-2

DurabilityDurability testResult

Write amplification Reduction

Evaluation

32

Page 33: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Experimental setup• DB server

– CPU: E5-2650 v2 x 2 (16 cores)– Memory: 64GB– Storage

• RAID0: 200GB SSD x 2 for data• RAID0: 1TB ATA HD x 4 for WAL

• Client– CPU: E7420 x 4 (16 cores)– Memory: 8GB

• Network– GB ether x 1

33

Page 34: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

PGBENCH Performance

34

0 40 80 120 1600

10000

20000

30000

40000

50000

ClientsTh

roug

hput

(TPS

)

Async

NVM Logging

Sync

0 40 80 120 1600

10000

20000

30000

40000

50000

Clients

Thro

ughp

ut (T

PS)

Async

NVM

Log

ging

Sync

Disk-drive cache off Disk-drive cache on

Page 35: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

DBT-2 Performance

35

0 10 20 30 40 500

100000

200000

300000

400000

Clients

Thro

ughp

ut (N

OTP

M)

Async

NVM Logging

Sync

0 10 20 30 40 500

100000

200000

300000

400000

ClientsTh

roug

hput

(NO

TPM

)

Async

NVM Loggingg

Sync

Disk-drive cache off Disk-drive cache on

Page 36: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Write amplification Reduction

36

Page of the storage

WriteWriteWriteWriteWrite

Write

The tail WAL block is written in at every commit

The tail WAL block is written in once

Sync.commit

NVM Logging

Page 37: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Durability test

37

table2table1

DBMS

key value(=Kp)

while (1) {BeginSelect value From table1 Where key = Kp(v = value + 1)Update table1 set value = v Where key = Kp

(… same for table 2)End(record v as committed transaction)

}

Fault

Transaction

Clientp

After the recovery, durability is examined by checking whether the value of table1 and table2 is equal andvalue is equal to or greater than v that each client recorded as the result of the last transaction.

Page 38: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Results

• The results were just what we expected

– Durability is ensured• Sync. commit, NVM Logging

– Durability is not ensured• Async. commit

38

Page 39: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

NVDIMM for DB serversProgramming support for NVM

Technical trends

39

Page 40: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

NVDIMM in DB Servers

• NVDIMM-N Standardization– JEDEC Hybrid Memory Task Group– SNIA NVDIMM SIG

• Server product: HP ProLiant XL230a Server– Up to 2 Intel® Xeon® E5-2600 v3 Series, 6/8/10/12/14/16

Cores (16) – DDR4, (512GB max), support for NVDIMM– …

40

http://community.hpe.com/t5/Servers-The-Right-Compute/Address-your-Compute-needs-with-HP-ProLiant-Gen9/ba-p/6794213#.VybkulK3GA9

DB server with NVDIMM is just around the corner!!

Page 41: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Programming support for NVM

• pmem.io– The Linux NVM Library builds on the Direct Access

(DAX) changes under development in Linux.– This project focuses specifically on how persistent

memory is exposed to server-class applications which will explicitly manage the placement of data among the three tiers (volatile memory, persistent memory, and storage).

41

http://pmem.io/

Page 42: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Conclusion

• NVM is becoming commodity– NVDIMM is already shipped as a product– Servers began to equipped with NVDIMM

• Benefits of NVM Logging– Performance improvement– Durability ensurance– Write amplitude reduction

42

Similar to sync. commit

Almost the same as async. commit

Good for SSD lifetime

Page 43: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

Future work

• Bring to a state acceptable for the mainline – Cope with standard for NVM access

• libpmem is a promising candidate

– Check the operation in using real NVM

43

Page 44: Non-volatile Memory Logging - PGCon€¦ · • Combination of existing technologies -- DRAM, SSD, battery (NVDIMM) – AgigA Tech (Micron Technology) – Viking Techonogy –

44

That’s it

Thank you for listening