cloud computing lecture column store – alternative organization for big relational data
TRANSCRIPT
![Page 1: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/1.jpg)
Cloud Computing Lecture
Column Store – alternative organization for big relational data
![Page 2: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/2.jpg)
C-store C-store is Read-optimized, for OLAP
type apps Traditional DBMS, write-optimized
(optimized for online transactions) Based on records(rows)
![Page 3: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/3.jpg)
C-Store What are the cost-sensitive major
factors in query processing? Size of database Index or not Join
Current hardware configuration and what a DBMS can do… Cheap storage – allow distributed redundant
data store Fast CPUs – compression/decompression Limited disk bandwidth – reduce I/O
![Page 4: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/4.jpg)
C-store Supporting OLAP (online analytic
processing) operations Optimized read operations Balanced write performance Address the conflict between writes and reads
Fast write – append records Fast read – indexed, compressed
Think if data organized in columns, what are the
unique challenges (different from the row-organization)?
![Page 5: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/5.jpg)
C-store’s features Column based store saves space
Compression is possible Index size is smaller
Multiple projections Allow multiple indices Parallel processing on the same attributes Materialized join results
Separation of writeable store and read-optimized store Both write/read are optimized Transactions are not blocked by write locks
![Page 6: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/6.jpg)
Data model Same as relational data model
Tables, rows, columns Primary keys and foreign keys Projections
From single table Multiple joined tables
Example
EMP1 (name, age)EMP2 (dept, age, DEPT.floor)EMP3 (name, salary)DEPT1(dname, floor)
EMP(name, age, dept, salary)DEPT(dname, floor)
Normal relational model Possible C-store model
![Page 7: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/7.jpg)
Physical projection organization Sort key
each projection has one Rows are ordered by sort key Partitioned by key range
Linking columns in the same projection Storage key – (segment id, key, i.e.,offset in
segment)
Linking projections To reconstruct a table Join index
![Page 8: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/8.jpg)
Conceptual organization
column
Segment:by sort keyrange
Sort key column
Seg id offset
Join index
Projection 1
Projection 2
![Page 9: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/9.jpg)
Architectural consideration between writes and reads
Read often needs indices to speedup
Write often index unfriendly: needs to update indices frequently
Use “read store” and “write store”
![Page 10: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/10.jpg)
Read store: Column encoding Use compression schemes and indices
Self-order (key), few distinct values (value, position, # items) Indexed by clustered B-tree
Foreign-order (non-key), few distinct values (value, bitmap index) B-tree index: position values
Self-order, many distinct values Delta from the previous value B-tree index
Foreign-order, many distinct values Unencoded
![Page 11: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/11.jpg)
Write Store Same structure, but explicitly use
(segment, key) to identify records Easier to maintain the mapping Only concerns the inserted records
Tuple mover Copies batch of records to RS
Delete record Mark it on RS Purged by tuple mover
![Page 12: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/12.jpg)
Tuple mover Moves records in WS to RS Happens between read-only
transactions Use merge-out process
![Page 13: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/13.jpg)
How to solve read/write conflict Situation: one transaction updates the
record X, while another transaction reads X.
Use snapshot isolation
![Page 14: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/14.jpg)
Benefits in query processing Selection – has more indices to use Projection – some “projections”
already defined Join – some projections are
materialized joins Aggregations – works on required
columns only
![Page 15: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/15.jpg)
Evaluation Use TPC-H – decision support queries Storage
![Page 16: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/16.jpg)
Query performance
![Page 17: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/17.jpg)
Query performance Row store uses materialized views
![Page 18: Cloud Computing Lecture Column Store – alternative organization for big relational data](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e205503460f94b0b842/html5/thumbnails/18.jpg)
Summary: the performance gain Column representation – avoids reads
of unused attributes Storing overlapping projections –
multiple orderings of a column, more choices for query optimization
Compression of data – more orderings of a column in the same amount of space
Query operators operate on compressed representation