fast gist index build

Upload: alexander-korotkov

Post on 04-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Fast GiST Index Build

    1/28

    Fast GiST index build

    Alexander KorotkovPostgreSQL Conference Europe !""# A$sterda$

  • 8/14/2019 Fast GiST Index Build

    2/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""

    %rdinal GiST index build

    We have to insert index tuples one by one

    I1 I2

    I3 I4 I5 I6

    L1 L2 L9 L3L14 L5 L6 L16L7 L8 L10

    L9, L10, L11, L12,

    L13, L14, L15, L16

    L11 L12L15 L13L4

  • 8/14/2019 Fast GiST Index Build

    3/28

    &'en so$et'ing go (rong (it' ordinalGiST index build algorit'$)

  • 8/14/2019 Fast GiST Index Build

    4/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""*

    +ndex too large to fit into cac'e

    Significant fraction of index tuple inserts causerandom IO

    I1 I2

    I3 I4 I5 I6

    L1 L2 L9 L3 L5 L6 L7 L8 L10

    L9, L10,L11, L12,

    L13, L14, L15, L16

    L11L4

    Cac'e

    L14 L16L12L15 L13

    ,ando$ read

  • 8/14/2019 Fast GiST Index Build

    5/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""-

    .ig' concurrenc/

    If even index can fit to entire cache, otherbackends also use cache. So, even not toolarge index can be out of cache.

    I1 I2

    I3 I4 I5 I6

    L1 L2 L3 L5 L6 L7 L8L4

    I1 I2

    L1 L2 L3 L4 L5 L6

    L4 L1 L2 L6 L3L5

    GiST index

    %t'er useful stuff

    Cac'e

  • 8/14/2019 Fast GiST Index Build

    6/28

    &'at 'elps to orginal GiST index buildalgorit'$)

  • 8/14/2019 Fast GiST Index Build

    7/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""0

    .ig' overlap

    gistc'ooseselects the first path with zeropenalty if any!.

    I1

    I2I3

    I4

    I5

    I6

    L9L1

    L2

    L3

    L4

    L5

    L6

    L7

    L8

    L16

    L15

    L14

    L13

    L12

    L11

    L10

  • 8/14/2019 Fast GiST Index Build

    8/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""1

    %rdered datasets

    Ordered dataset cause inserts to be in the recentlyaccessed part of index. "hat#s very good forcaching.

    I1 I2

    I3 I4 I5 I6

    L1 L2 L9 L3L10 L5 L6 L16L7 L8 L15

    L9, L10, L11, L12,

    L13, L14, L15, L16

    L11 L13L12 L14L4

  • 8/14/2019 Fast GiST Index Build

    9/28

    T'e buffering G+ST index build tec'ni2ue

  • 8/14/2019 Fast GiST Index Build

    10/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""!

    General idea

    $emember index tuples which are going tospecific part of tree, and then process pack ofthem later.

    I1 I2

    I3 I4 I5 I6

    L1 L2 L9 L3L14 L5 L6 L16L7 L8 L10

    L9, L11, L14, L15

    L11 L12L15 L13L4

    L10, L12, L13, L16

  • 8/14/2019 Fast GiST Index Build

    11/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""""

    3uffering algorit'$

    %uffering algorithm is a recursive application ofthat idea.

    &age %uffer

    &age &age

    &age %uffer &age %uffer

    &age &age&age &age

    &age %uffer &age %uffer &age %uffer &age %uffer

    &age &age &age &age &age &age &age &age

    '........

    '...........

    '..................

    '.'.

    &age

    &age &age &age &age &age&age&age&age'...................

    level

    step

    levelstep

    levelstep

  • 8/14/2019 Fast GiST Index Build

    12/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""

    3uffer e$pt/ing

    (owest level overflowed buffer can be emptied toleaf pages.

    I1 I2

    I3 I4 I5 I6

    L1 L2 L9 L3L14 L5 L6 L16L7 L8 L10

    L9, L10, L11, L12, L13, L14, L15, L16

    L11 L12L15 L13L4

  • 8/14/2019 Fast GiST Index Build

    13/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""4

    3uffer e$pt/ing

    )igher level overflowed buffers can be emptiedto lower level buffers.

    I1 I2

    I3 I4 I5 I6

    I7 I8

    L9

    I9

    L14

    I11 I12

    L16

    I13 I14

    L10

    L9, L10, L11, L12, L13, L14, L15, L16

    L11 L12L15 L13

    I10

  • 8/14/2019 Fast GiST Index Build

    14/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""*

    Page splitting

    When page is splitting, attached buffer is splittingtoo.

    I1 I2

    I1 I2 L9, L14, L11, L15

    L9, L10, L11, L12, L13, L14, L15, L16I3 I4

    I4, I5

    I3 I4 I5 L12, L13, L10, L16

  • 8/14/2019 Fast GiST Index Build

    15/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""-

    Final buffers e$pt/ing

    When all the tuples are inserted, final buffersemptying stage is starting. *ll even non+overflowed buffers are emptying in up+to+down

    manner.

  • 8/14/2019 Fast GiST Index Build

    16/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""5

    3uffer si6e and level step selection

    Subtree of level step height should fits tocache. "herefore, operations inside subtree areIO efficient.

    %uffer size should be comparable with size ofsubtree. "hus, IO would be comparable withsize of inserted data.

  • 8/14/2019 Fast GiST Index Build

    17/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""0

    7arlena data

    (evel step and buffer size parameters aredepending on index tuple size.

    (evel step is determined for worst case.

    Subtree should fits to cache, even if all varlenatuples are of minimal size.

    %uffer size is in runtime tuning based on

    average size of inserted index tuples.

  • 8/14/2019 Fast GiST Index Build

    18/28

    Anal/sis

  • 8/14/2019 Fast GiST Index Build

    19/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""8

    &'en does buffering 'elp)

    Somebody may be dissapointed, but bufferinghelps only if bottleneck of index build is IO itcould be rather dramatic help!.

    iS" is also -& expensive in comparison with,for example, %tree, because many penalty andconsistent calls it have to do that calls for eachindex tuple in page which is in use!. %ufferingdoesn#t do any help with that.

  • 8/14/2019 Fast GiST Index Build

    20/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""!

    9ode splitting algorit'$ tradeoff

    With buffering build we can now be sure thateven large index tree with low overlaps doesn#tre/uire enormous time to construct.

    With buffering build, it#s a good time todescrease overlaps by new node splittingalgorithm.

  • 8/14/2019 Fast GiST Index Build

    21/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""

    :ouble sorting node splitting

    Is based on more comprehensive considerationof splits along axis.

    )as complexity On0logn!!, n 1 tuples count.

    On large datasets it shows much better index/uality, because of less overlap page accessescan be less in times or even dozens of times2!

  • 8/14/2019 Fast GiST Index Build

    22/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""

    +% vs CP; tradeoff

    %uffering techni/ue is optimizing IO, but itcosts additional -& load.

    When index fits to cache, buffering is 3ust

    waste of -&.

  • 8/14/2019 Fast GiST Index Build

    23/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""4

    +% vs CP; tradeoff

    4buffering4 parameter of iS" index

    on 1 try to use buffering anyway if haveenough of $*5!

    off 1 don#t use buffering anyway

    auto default! 1 try to switch to buffering whenindex size exceeds effective6cache6size

  • 8/14/2019 Fast GiST Index Build

    24/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""*

    Testing

    Dataset Split method Build method Actual time Search time

    Uniform New linear regular 17 h 39 m 1

    buffering 3 h 23 m 0.90

    Double sorting regular 9 d 10 h 0.089

    buffering 4 h 11 m 0.089

    USN!2"ordered#

    New linear regular $% m 1

    buffering 1 h 27 m 0.4%

    Double sorting regular 4$ m 0.37

    buffering 1 h 29 m 0.3%

    USN!2"shuffled#

    New linear regular 10 h 12 m 1

    buffering 3 h 1% m 0.89

    Double sorting regular 8 d 20 h 0.072

    buffering 4 h 20 m 0.0%7

    "est setup with 7b of $*5 builds index on 8995 of row.

    l i

  • 8/14/2019 Fast GiST Index Build

    25/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""-

    Anal/sis

    %uffering techini/ue:

    *ccelerate build of non+overlapping trees onshuffed data in dozens of times2

    5ight accelerate build of even high+overlappingtrees depending of OS cache strategy!

    *dds some slowdown to index build on well+

    ordered datasets.

    F t k

  • 8/14/2019 Fast GiST Index Build

    26/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""5

    Future (ork

    Improve automatic switching to buffering buildmode detect concurrent load and ordereddatasets!

    ;ecrease -& usage

  • 8/14/2019 Fast GiST Index Build

    27/28

    Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""0

    Ackno(ledge$ent

    Oleg and "eodor for giving direction andadvices

    )eikki (innakages for mentoring and his work

    on this patch So- for funding of this pro3ect

  • 8/14/2019 Fast GiST Index Build

    28/28

    T'ank /ou for attention