overview on hardware optimizations for database engines
TRANSCRIPT
![Page 1: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/1.jpg)
Overview on Hardware Optimizations for Database EnginesAnnett Ungethüm, Dirk Habich, Tomas Karnagel, Sebastian Haas, Eric Mier, Gerhard Fettweis, Wolfgang Lehner
BTW 2017, Stuttgart, Germany, 2017-03-09
![Page 2: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/2.jpg)
2
Interaction DB-Engine and Hardware
Applications/Database Engines
Modern Hardware
Well-Known Challenge:Exploit hardware technology by specific data management techniques (indexing, data storage, query & transaction processing)
1970 1980 1990 2000 2010 2020
10
100
1000
10000
1e+05
1e+06
1e+07 memory (KByte)
1970 1980 1990 2000 2010 2020
0
2
4
6
8
10 #cores
Main Memory CPU
![Page 3: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/3.jpg)
3
Era of Dark Silicon
MOORE‘S LAW
§ Number of transistors in a dense integrated circuit doubles approximately every two years.
DARK SILICON
§ We can no longer power the transistors that Moore is giving us
1970 1980 1990 2000 2010 2020
110
1001000
100001e+051e+061e+07 #transistors (x1000)
process (nm)
http://engineering.nyu.edu/garg/node/31
![Page 4: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/4.jpg)
4
HW/SW Co-Design for DB-Engines
Applications/Database Engines
Modern Hardware
Challenge:HW/SW Co-Design for Database EnginesSpecialization of Hardware to overcome Dark Silicon
![Page 5: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/5.jpg)
5
Outline
HARDWARE FOUNDATION
EXTENSIONS FOR PROCESSING ELEMENTS
INTELLIGENT DMA CONTROLLER
![Page 6: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/6.jpg)
6
Hardware Foundation
TOMAHAWK PLATFORM
![Page 7: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/7.jpg)
7
Hardware Foundation – Zoom In
![Page 8: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/8.jpg)
8
Hardware Foundation – Zoom In (2)
CORE MANAGER (CM)
§ Extended Xtensa-LX5 from Tensilica (now Cadence)
§ 32KB for code§ 64KB for data
PROCESSING ELEMENTS (PE)
§ Xtensa-LX5 from Tensilica (now Cadence)§ 32KB for code
§ 2x32KB for data on PE
APPLICATION CORE (APP)
§ 570T core from Tensilica (now Cadence)
Control-Plane
Control-Plane
![Page 9: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/9.jpg)
9
Outline
Control-Plane
Control-Plane
PART I:
EXTENSIONS OF PROCESSING ELEMENTS
![Page 10: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/10.jpg)
10
Development Flow
DEVELOPMENT OF INSTRUCTION SET EXTENSIONS WITH
TENSILICA TOOLS
§ Tensilica Instruction Extension (TIE) language
§ C/TIE compiler§ Cycle accurate simulator/debugger
§ Processor generator
SYNTHESIS OF RTL CODE
§ Synopsys Design Compiler, PrimeTime PX
§ TSMC CMOS LP 65nm libraries
int res= (v0 + v1 + v2) >> shift8;
// shift8 -> internal stateint res=add3_shift(v0, v1, v2);
![Page 11: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/11.jpg)
11
Investigated Database Primitives
Bitmap Compression and Processing (AND,
OR, XOR)
Hashing Sorted Set Operations
WA
H
PLW
AH
CO
MP
AX
Has
h+
Lo
oku
p
Has
h +
Inse
rt
Has
h K
eys
Has
h S
amp
ling
Cit
yHas
h3
2
Me
rge
So
rt
Inte
rse
ctio
n
Un
ion
Dif
fere
nce
Sort
-Me
rge
Jo
in
Sort
-Me
rge
A
gg
reg
atio
n (
SUM
)
Primivites
2014
![Page 12: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/12.jpg)
12
Basic RISC Instruction Set
Application-Specific Instruction Set
Instruction Set
Application-Specific States
Application-Specific Registers
Basic Registers
Register Files
Instructionfetch
Load-Store Unit 0
Load-Store Unit 1
Data Prefetcher
Inte
rcon
ne
ct
Local InstructionMemory
Local Data Memory 0
Local Data Memory 1
Extended Tensilica LX5 Processor
64 bit
128 bit
128 bit
General Approach for all Extensions
![Page 13: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/13.jpg)
13
Bitmap Primitives
BITMAPS ARE A SPECIAL KIND OF INDEX BITMAPS COMPRESSION
§ bit length equals number of tuples
WORD-ALIGNED HYBRID (WAH) CODE
§ Stateless compression§ Run-length-encoding (RLE)
- run of 0‘s and 1‘s
§ WAH bitmaps contain RLE- compressed fills and
- uncompressed literals
Compressing Bitmap Indexes for Faster Search Operations
Kesheng Wu, Ekow J. Otoo and Arie ShoshaniLawrence Berkeley National Laboratory
Berkeley, CA 94720, USAEmail: {kwu, ejotoo, ashoshani}@lbl.gov
Abstract
In this paper, we study the effects of compression onbitmap indexes. The main operations on the bitmaps dur-ing query processing are bitwise logical operations such asAND,OR,NOT, etc.Using the general purpose compres-sion schemes, such as gzip, the logical operations on thecompressed bitmaps are much slower than on the uncom-pressed bitmaps. Specialized compression schemes, likethe byte-aligned bitmap code (BBC), are usually fasterin performing logical operations than the general purposeschemes, but in many cases they are still orders of magni-tude slower than the uncompressed scheme. To make thecompressed bitmap indexes operate more efficiently, wedesigned a CPU-friendly scheme which we refer to as theword-aligned hybrid code (WAH). Tests on both syntheticand real application data show that the new scheme sig-nificantly outperforms well-known compression schemesat a modest increase in storage space. Compared to BBC,a scheme well-known for its operational efficiency, WAHperforms logical operations about 12 times faster and usesonly 60% more space. Compared to the uncompressedscheme, in most test cases WAH is faster while still usingless space. We further verified with additional tests thatthe improvement in logical operation speed translates tosimilar improvement in query processing speed.
1. Introduction
This research was originally motivated by the needto manage the volume of data produce by a high-energy experiment called STAR1 [25, 26]. In this ex-periment, information about each potentially interest-ing collision event is recorded and multi-terabyte (1012)of data is generated each year. One important way ofaccessing the data is to have the data management
1 Information about the project is also available athttp://www.star.bnl.gov/STAR.
bitmap indexOID X =0 =1 =2 =3
1 0 1 0 0 02 1 0 1 0 03 3 0 0 0 14 2 0 0 1 05 3 0 0 0 16 3 0 0 0 17 1 0 1 0 08 3 0 0 0 1
b1 b2 b3 b4
Figure 1. A sample bitmap index.
system retrieve the events satisfying some conditionsuch as “Energy > 15 GeV and 7 <= NumParticles
< 13” [5, 25]. The physicists have identified about 500attributes that are useful for this selection process anda typical condition may involve a handful of attributes.This type of queries are known as the partial rangequeries. Since the attributes are usually read not mod-ified, the characteristics of the dataset are very sim-ilar to those of commercial data warehouses. In datawarehouse applications, one of the best known index-ing strategies for processing the partial range queries isthe bitmap index [6, 8, 21, 30]. For this reason, we haveselected to use the bitmap index for the data manage-ment software [25].
Generally, a bitmap index consists of a set ofbitmaps and queries can be answered using bit-wise logical operations on the bitmaps. Figure 1 showsa set of such bitmaps for the attribute X of a tiny ta-ble (T) consisting of only eight tuples (rows). The at-tribute X can have one of four values, 0, 1, 2 and3. There are four bitmaps each corresponding toone of the four choices. For convenience, we have la-beled the four bit sequences b1, . . . , b4. To process thequery “select * from T where X < 2,” one per-forms the bitwise logical operation b1 OR b2. Since
LBNL-49627
select * from T where X < 2
Table T
Bit-wiseOR
![Page 14: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/14.jpg)
14
Bit-Wise OR on Compressed Bitmaps
40000380 00000000 00000000 001FFFFFb1
40000380 8000002 001FFFFF
Literal 0 fill Literal
7FFFFFFF 7FFFFFFF 7C0001E0 3FE00000b2
WAHb1
C0000002 7C0001E0 3FE00000
1 fill Literal Literal
WAHb2
Bit-wise OR
32 bit wordsIn hex
OR OR OR OR
Logical operations (AND, OR, XOR) on two compressed bitmaps
1) Load WAH word(s)2) Calculate output (Fill-Fill,
Literal-Fill, Literal-Literal)3) Combine output
10<runlength>
11<runlength>
...
...
7FFFFFFF
00000000
![Page 15: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/15.jpg)
15
C-Code
WHILE(XIDX!=XSIZE && YIDX!=YSIZE) {
//new X or Y? Calculate new fill count …
if(XisFill==1 && YisFill==1) { //2 fills
if(XfillWords<YfillWords)
min=XfillWords;
else
min=YfillWords;
writeFill(comprResultBI,&Zidx,X[Xidx]|Y[Yidx],min);
XfillWords-=min;
YfillWords-=min;
}
else if((XisFill==1 && YisFill==0) || (XisFill==0 && YisFill==1)) {
if(XisFill==1){
XfillWords--;
if((X[Xidx]&0xC0000000)==0xC0000000) writeFill(comprResultBI, &Zidx, 0xC0000000, 1);
else { comprResultBI[Zidx]=Y[Yidx]; Zidx++; }
}
if(YisFill==1){
YfillWords--;
if((Y[Yidx]&0xC0000000)==0xC0000000)
writeFill(comprResultBI, &Zidx, 0xC0000000, 1);
else {comprResultBI[Zidx]=X[Xidx]; Zidx++; }
}
}
else {
result=X[Xidx]|Y[Yidx];
if((result&0x7FFFFFFF)==0x7FFFFFFF) writeFill(comprResultBI, &Zidx, 0xC0000000, 1);
else if((result&0x7FFFFFFF)==0) writeFill(comprResultBI, &Zidx, 0x80000000, 1);
else { comprResultBI[Zidx]=X[Xidx]|Y[Yidx]; Zidx++; }
}
}
Fill-Fill
Literal-Fill
Literal-Literal
![Page 16: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/16.jpg)
16
Processing with PE Extension
Application specific states Preprocessing
Operation
Postprocessing
Application specific states
Initial Load Load Prepare Store Store
Memory 0
Memory 1
Memory 0
Memory 1
0000000F
00000003
40000380
80000002
001FFFFF
C0000002
7C0001E0
3FE00000
MEMORY
0
MEMORY
1
10000000..11000001..00101010..0111011.. 11000000..00101010..11000001..00110111..
10000000..11000001..00101010..01110111..
Is word fill or Literal? -> fill -> overwrite input words
11111111..11111111..11111111..11111111..
00000000..00000000..00000000..
11000000..00101010..11000001..0011011..00000000.. v 11111111... => 111111..
Write to output stream-> append or overwrite previous wordwith increased fill counter
00000000.0000000..00000..110011010..
Buffer result
11001110..
00000000..
00000000..
00000000..
MEMORY
0/1
Proceedtonext word(4x)
Align to 128-bit lines
Perform operation OR
ldXstream()
ldYstream()
4 x WAHinst()
![Page 17: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/17.jpg)
17
Bit-Wise OR on Compressed Bitmaps
40000380 00000000 00000000 001FFFFFb1
40000380 8000002 001FFFFF
Literal 0 fill Literal
7FFFFFFF 7FFFFFFF 7C0001E0 3FE00000b2
WAHb1
C0000002 7C0001E0 3FE00000
1 fill Literal Literal
WAHb2
Bit-wise OR
32 bit wordsIn hex
OR OR OR OR
Code with Extension
do{ ldXstream();ldYstream();WAHinst(); WAHinst(); WAHinst();
} while(WAHinst());
![Page 18: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/18.jpg)
18
Many More Extensions
Bitmap Compression andProcessing (AND,
OR, XOR)
Hashing Sorted Set OperationsW
AH
PLW
AH
CO
MP
AX
Has
h+
Lo
oku
p
Has
h +
Inse
rt
Has
h K
eys
Has
h S
amp
ling
Cit
yHas
h3
2
Me
rge
Sort
Inte
rse
ctio
n
Un
ion
Dif
fere
nce
Sort
-Me
rge
Join
Sort
-Me
rge
Ag
gre
gat
ion
(SU
M)
BitiX X X X
HASHI X X X X X
Titan3D X X X X X X X
Tomahawk DBA
X X X X X X
Processor
Extension
![Page 19: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/19.jpg)
19
Evaluation
REFERENCE PROCESSORS
§ Tomahawk DBA Processor --> Set of different DB-Extensions for WAH-Compression, Hashing, andSortes-Set Operations
Processor DescriptionTechnology
[nm]Atotal [mm²] fMAX [GHz]
PMAX [W] @ fMAX
Tomahawkwithout DBA
Basic Xtensa LX5 without instruction set extensions, 1 LSU, 32-bit memory interface
28 15.92 0.555 0.7
Tomahawk with DBA
Set of different DB-Extensions for WAH-Compression, Hashing and Sorted-Set Operations
28 18 0.5 0.753
Intel i7-6500ULow-power Intel 2-core processor based on Skylake architecture, 4MB L3 cache
14 99* 3.1 25
Comparison
![Page 20: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/20.jpg)
20
Evaluation - Bitmaps
![Page 21: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/21.jpg)
21
Outline
PART 2:
INTELLIGENT DMA CONTROLLER
![Page 22: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/22.jpg)
22
Problem Statement
22
NoC
T2RISCCore
T2RISCCore
T2RISCCore
APP CM
Memory
MicronDDR2SDRAM
LocalMemoryLocalMemoryLocalMemory
MemoryControllerSynopsysDWC
DDR2
APPTensilica570T
CMLX4-ISA_E
LocalMemoryCache
tAN
tNMc
0 0xCCA
1 0x00B
2 0x0FA
3 0x1FD
4 0xDE1
5 0x0ED
6 0x00E
7 0xD0A
tNAtMcN
tMcM
tMMc
tAPP
Problem:Many round-trips for key lookups
Approach:“Teach B-trees to the memory controller“
![Page 23: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/23.jpg)
23
Intelligent Main Memory Controller (iDMA)
23
NoC
Core Core Core
PointerChaser
APP CM
Memory
MicronDDR2SDRAM
LocalMemoryLocalMemoryLocalMemory
MemoryController
APP CM
MemoryControllerSynopsis
LocalMemoryCache
MemoryControllerSynopsysDWC
DDR2
0 0xCC6
1 0x000
2 0x0F0
3 0x1FD
4 0xDE1
5 0x0ED
6 0x00E
7 0xD0A
tCNtNC tMcM
tMMc
tNP
tPN
tPMc
tMcP
Vision (and first simulations)• Intelligent memory controller• Is aware of the semantics of
memory layout• Implements core operations (e.g. lookup)
Implementation (no yet in silicon)• 0,183mm² PE with 200Mhz
![Page 24: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/24.jpg)
24
First iDMA Design
![Page 25: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/25.jpg)
25
Evaluation using Simulator
![Page 26: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/26.jpg)
26
Summary
HARDWARE FOUNDATION
EXTENSIONS FOR PROCESSING ELEMENTS
INTELLIGENT DMA CONTROLLER
![Page 27: Overview on Hardware Optimizations for Database Engines](https://reader030.vdocuments.us/reader030/viewer/2022040419/62495287f9a53d55430dbc68/html5/thumbnails/27.jpg)
Overview on Hardware Optimizations for Database EnginesAnnett Ungethüm, Dirk Habich, Tomas Karnagel, Sebastian Haas, Eric Mier, Gerhard Fettweis, Wolfgang Lehner
BTW 2017, Stuttgart, Germany, 2017-03-09