what the hekaton? in-memory oltp overview · pdf file1 what the hekaton? in-memory oltp...

12
1 What the Hekaton? In-memory OLTP Overview Kalen Delaney www.SQLServerInternals.com join-conference.com 2016-05-10 2 join-conference.com © Kalen Delaney, 2016 #2 Kalen Delaney Background: MS in Computer Science from UC Berkeley Working exclusively with SQL Server for 28 years SQL Server In-Memory OLTP Internals (RedGate 2014) Primary Author: SQL Server 2012 Internals (MS Press/O’Reilly, 2013) Author: SQL Server Concurrency (RedGate 2010) Primary Author: SQL Server 2008 Internals (MS Press, 2009) Primary Author: Inside SQL Server 2005: Query Tuning and Optimization (MS Press, 2007) Author: Inside SQL Server 2005: The Storage Engine (MS Press, 2006) SQL Server Magazine columnist and contributing editor Website: www.SQLServerInternals.com Twitter: @sqlqueen Blog: www.SQLBlog.com

Upload: vandang

Post on 06-Mar-2018

220 views

Category:

Documents


4 download

TRANSCRIPT

1

What the Hekaton?In-memory OLTP Overview

Kalen Delaneywww.SQLServerInternals.com

join-conference.com

2016-05-10 2join-conference.com© Kalen Delaney, 2016 #2

Kalen DelaneyBackground:

– MS in Computer Science from UC Berkeley– Working exclusively with SQL Server for 28 years

– SQL Server In-Memory OLTP Internals (RedGate 2014)– Primary Author: SQL Server 2012 Internals (MS Press/O’Reilly, 2013)– Author: SQL Server Concurrency (RedGate 2010)– Primary Author: SQL Server 2008 Internals (MS Press, 2009)– Primary Author: Inside SQL Server 2005: Query Tuning and Optimization (MS Press, 2007)– Author: Inside SQL Server 2005: The Storage Engine (MS Press, 2006)– SQL Server Magazine columnist and contributing editor

Website: www.SQLServerInternals.comTwitter: @sqlqueenBlog: www.SQLBlog.com

2

2016-05-10 3join-conference.com© Kalen Delaney, 2016 #3

Memory optimized table and index structures

Stream-based storage

Native compilation of business logic in stored procedures

Fully integrated into SQL Server

Familiar management tools: SSMS/SMO/DMVs

Latch- and lock-free data structures

Multi-versioning built into table structures

In-Memory OLTP Overview

SQL Server 2014 adds in-memory technology to boost performance of OLTP workloads

2016-05-10 4join-conference.com© Kalen Delaney, 2016 #4

Agenda

• In-memory Data Structures• Managing Memory and Garbage Collection• Persistence and Storage Management• Logging and Recovery

3

2016-05-10 5join-conference.com© Kalen Delaney, 2016 #5

In-Memory Data Structures• Rows

– New row format Structure of the row is optimized for in-memory residency and access

– One copy of rowIndexes point to rows, they do not duplicate them

• Indexes– Hash index for equality search – Memory-optimized B-tree for range and equality search – Do not exist on disk – recreated during recovery– Every table must have at least one index!

2016-05-10 6join-conference.com© Kalen Delaney, 2016 #6

In-memory Table: Row FormatRow header Payload (table columns)

Begin Ts End Ts StmtId IdxLinkCount

8 bytes 8 bytes 4 bytes 2 bytes

8 bytes * (IdxLinkCount – 1)

Key Points:Begin/End timestamp determines row’s validityThere is no data or index page - Just rowsRow size limited to 8060 bytes

Allows data to be moved to disk-based tablesNot every SQL table schema is supported

4

2016-05-10 7join-conference.com© Kalen Delaney, 2016 #7

Indexes On Memory-Optimized Tables • Indexes are what combines rows into a table• Hash Indexes

– Defined as NONCLUSTERED HASH– Predefined number of buckets (fixed memory size)– Good for point lookups

• “Range” Indexes– Defined as NONCLUSTERED – Stored as Bw-Tree– Good for range searches and ordered scans

• Indexes ONLY exist in memory

2016-05-10 8join-conference.com© Kalen Delaney, 2016 #8

90, ∞ Susan Bogota

Hash Indexes

50, ∞ Jane Prague

Timestamps NameChain ptrs City

Hash index on City

Hash index on Name

100, ∞ John Prague

90, 150

5

2016-05-10 9join-conference.com© Kalen Delaney, 2016 #9

“Range” Index Core Characteristics• Resizeable – grows and shrinks with utilization• Unidirectional• Good performance for point lookup, excellent performance for

range and table scan• Lock free• Bw-tree based• Leaf pages used to store index keys and pointers to chains of

rows• Value chains have same characteristics as bucket chains for

hash index

2016-05-10 10join-conference.com© Kalen Delaney, 2016 #10

Bw-Tree

10 20 28

5 8 10 11 15 18 21 24 27

PAGE

Page Mapping Table

0

1

2

3

14

15

PAGE

1 2 4 6 7 8 25 26 27

200, ∞ 1 50, 300 2

Root

Non-leaf pages

leaf pages

Data rows

PageID-0

PageID-3 Page-ID-2 PageID -14

• Page size- up to 8K• Logical pointers

– Indirect physical pointers through Page Mapping table

– Page Mapping table grows (doubles) as table grows• Sibling pages linked one direction

– Require two indexes for ASC/DESC• No in-place updates on index pages

– Handled thru delta pages or building new pages

Key Key

Physical

100,200 1

6

2016-05-10 11join-conference.com© Kalen Delaney, 2016 #11

Point Lookups and Range Scans

• Point lookups similar to B-Trees• Range scans

– Search for starting point– Follow keys, duplicate chains and right page

pointers until end key is reached– Uni-directional because pages linked in only

one direction

2016-05-10 12join-conference.com© Kalen Delaney, 2016 #12

Limitations on Tables in SQL 2014

• Optimized for in-memory– Rows are at most 8060 bytes – no off-row data– No Large Object (LOB) types like varchar(max)

• Scoping limitations– No FOREIGN KEY and no CHECK constraints– IDENTITY only (1,1)– No schema changes (ALTER TABLE) – need to

drop/recreate table– No add/remove index – need to drop/recreate table

7

2016-05-10 13join-conference.com© Kalen Delaney, 2016 #13

Memory Management• Table data resides in memory at all times.

– No paging– Must configure SQL box with sufficient memory to store memory-optimized

tables; Max supported 512GB

– Failure to allocate memory will fail transactional workload at runtime– Integrated with SQL Server memory manager and reacts to memory pressure

where possible

• Integration with Resource Governor – “Bind” a database to a resource pool– Ensures memory consumption from recovery is accounted for– Hard limit (80% of phys. memory) to ensure system remains stable under low-

memory situations

2016-05-10 14join-conference.com© Kalen Delaney, 2016 #14

Garbage Collection• Stale Row Versions

Updates, deletes, and aborted insert operations create row versions that (eventually) are no longer visible to any transaction.– Slows down scans of index structures– Creates unused memory that needs to be reclaimed (i.e. Garbage Collected)

• Garbage Collection (GC)– Analogous to version store cleanup task for disk-based tables to support Read

Committed Snapshot (RCSI)– System maintains ‘oldest active transaction’ information

• Design Goals: – Non-blocking, Cooperative, Efficient, Responsive, Scalable– Active transactions work cooperatively and pick up parts of GC work– A dedicated system thread to do GC

8

2016-05-10 15join-conference.com© Kalen Delaney, 2016 #15

Durability• Memory-optimized tables can be durable or non-durable

– Default is ‘durable’– Non-durable tables are useful for transient data

• Durable tables are persisted in a single memory-optimized filegroup– Storage used for memory-optimized has a different access pattern

than for disk tables

• Filegroup can have multiple containers (volumes)– Additional containers aid in parallel recovery; recovery happens at

the speed of I/O

2016-05-10 16join-conference.com© Kalen Delaney, 2016 #16

On-disk Storage• Filestream is the underlying storage mechanism

– Checksums and single-bit correcting ECC on files• Data files

– ~128MB in size (unless machine has <16GB memory), write 256KB chunks at a time

– Stores only the inserted rows (i.e. table content)– Chronologically organized streams of row versions

• Delta files– File size is not constant, write 4KB chunks at a time– Stores IDs of deleted rows

9

2016-05-10 17join-conference.com© Kalen Delaney, 2016 #17

Storage: Data and Delta Files

Data File

Delta File

0 100

TS (ins) RowId TableIdTS (ins) RowId TableIdTS (ins) RowId TableId

TS (ins) RowId TS (del)TS (ins) RowId TS (del)TS (ins) RowId TS (del)

Chec

kpoi

nt F

ile P

air

Row pay loadRow pay loadRow pay load

2016-05-10 18join-conference.com© Kalen Delaney, 2016 #18

Populating Data/Delta files

Offline Checkpoint Thread

Memory-optimized Table Filegroup

Rang

e 100

-199

Rang

e 200

-299

Rang

e 300

-399

Rang

e 400

-499

Rang

e 500

-

New InsertsDelete 450 TSDelete 250 TSDelete 150 TS

Del Tran2(TS 450)

Del Tran3(TS 250)

Del Tran1(TS150)

Insert into Hekaton T1Log in disk TableDel Tran1

(row TS150)Del Tran2

(row TS 450)Del Tran3

(row TS 250) Insert into T1SQL Transaction log

10

2016-05-10 19join-conference.com© Kalen Delaney, 2016 #19

Logging for Memory-Optimized Tables

• Uses SQL transaction log to store content– Each HK log record contains a log record header followed by

opaque memory optimized-specific log content

• All logging for memory-optimized tables is logical– No log records for physical structure modifications– No index-specific / index-maintenance log records– No UNDO information is logged

2016-05-10 20join-conference.com© Kalen Delaney, 2016 #20

Backup for Memory-Optimized Tables

– Memory-Optimized file group is backed up as part SQL database backup

11

2016-05-10 21join-conference.com© Kalen Delaney, 2016 #21

Recovery for Memory-Optimized Tables

• Analysis Phase• Finds the last completed checkpoint in transaction log

• Data Load• Load from set of data/delta files from the last completed checkpoint

– Parallel Load by reading data/delta files using 1 thread / file

• Redo phase to apply tail of the log– Apply the transaction log from last checkpoint– Concurrent with REDO on disk-based tables

• No UNDO phase for memory-optimized tables– Only committed transactions are logged

2016-05-10 22join-conference.com© Kalen Delaney, 2016 #22

Recovery: Parallel load

Delta map

Recovery Data Loader

DeltaFile1

Memory Optimized Tables

Recovery Data Loader

Recovery Data Loader

Delta mapDelta map

DataFile1

DeltaFile2

DataFile2

DeltaFile3

DataFile3

filter filter filter

Memory Optimized Container - 1 Memory Optimized Container - 2

12

2016-05-10 23join-conference.com© Kalen Delaney, 2016 #23

Summary

• In-memory Data Structures• Managing Memory and Garbage Collection• Persistence and Storage Management• Logging and Recovery

2016-05-10 24join-conference.com© Kalen Delaney, 2016 #24