select operation- disk access and indexing

59
Select Operation- disk access and Indexing *Some info on slides from Dr. S. Son, U. Va

Upload: tess98

Post on 13-Jan-2015

532 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Select Operation- disk access and Indexing

Select Operation- disk access

and Indexing

*Some info on slides from Dr. S. Son, U. Va

Page 2: Select Operation- disk access and Indexing

Disk access

• DBs traditionally stored on disk

• Cheaper to store on disk than in memory

• Costs for:– Seek time, latency, data transfer time

•  Disk access is page (block) oriented

• 2 - 4 KB page size

Page 3: Select Operation- disk access and Indexing

Access time

• Access time is the time to randomly access a page

• System initially determines if page in memory buffer (page tables, etc.)

• Large disparity between disk access and memory access

Page 4: Select Operation- disk access and Indexing

Select operation using table scan

• If read the entire table for a select – table scan

• Improvements to table scan of disk:– Parallel access– Sequential prefetch

Page 5: Select Operation- disk access and Indexing

Parallel access

• Linear search - all data rows read in from disk – I/O parallelism can be used (Raid)

• multiple I/O read requests satisfied at the same time

• stripe the data across different disks       

– Problems with parallelism?• must balance disk arm load to gain maximum

parallelism

• requires the same total number of random I/O's, but using devices for a shorter time

Page 6: Select Operation- disk access and Indexing

Sequential prefetch I/O

• Retrieve one disk page after another (on same track) - typically 32

• Seek time no longer a problem

• Must know in advance to read 32 successive pages

• Speed up of I/O by a factor of ≈10 (500 I/O's per second vs. 70)

Page 7: Select Operation- disk access and Indexing

Access time

• Seek time – average 8-10 ms, as low as 4 ms server

• Latency time – 2-4 ms, as low as 1 or less

• Data transfer time – .4-2 ms

Page 8: Select Operation- disk access and Indexing

Access time

RIO            Seq. Prefetch .010             .010                    Seek - disk arm to cylinder .002             .002                    Latency - platter to sector .0015           .048                 Data transfer - Page .0135           .060                   1 page vs. 32 pages

.43* seconds vs  .060 seconds 32 pages for both

* .00135X32=.43

Page 9: Select Operation- disk access and Indexing

Access time for fast I/O

RIO            Seq. Prefetch .004             .004                    Seek - disk arm to cylinder .001             .001                    Latency - platter to sector .0005           .016                Data transfer - Page .0055           .021                   1 page vs. 32 pages

.176* seconds  .021 seconds 32 pages for both

* .0055X32=.176

Page 10: Select Operation- disk access and Indexing

Organizing disk space

• How to store data so minimize access time if read the entire table?

Page 11: Select Operation- disk access and Indexing

Disk allocation

• Disk Resource Allocation for Databases (DBA has control)

• Goal – contiguous sectors on disk - want data as close together as possible  to minimize seek time

• No standard SQL approach, but general way to deal with allocation

• Some OS allow specification of size of file and disk device

Page 12: Select Operation- disk access and Indexing

Types of Files• Heap files (unordered – sequential)• Sorted files (ordered – sort key)• Hash files (hash key, hash function)

– Internal, external, file expansion– B+-trees

• Raid technology (parallelizing)• Storage area networks – ERP (enterprise resource

planning) and DW (data warehouses)– Storage devices configured as nodes in network – can

attach/detach

Page 13: Select Operation- disk access and Indexing

Tablespace

Tablespace is:• Allocation medium for tables and indexes for

ORACLE, DB2, etc.• Can put >1 table in a table space if accessed

together • Tablespace corresponds to 1 or more OS files

and can span disk devices• Usually relations cannot span disk devices

Page 14: Select Operation- disk access and Indexing

DB storage structures

DB CAP DatabaseTable- tspace 1

system

space

OS files fname1 fname2 fname3 Tables Cust agents prods orders orindx

Segments data data data data index

Extents

Page 15: Select Operation- disk access and Indexing

Tablespace

• ORACLE DB's contain several tablespaces, including one called system -     data description +  indexes + user-defined tables

• default tablespace given to each user • if multiple tablespaces - better control over load

balancing • can take some disk space off-line

Page 16: Select Operation- disk access and Indexing

Extent• Relation composed of 1 or more extents

• Extent - contiguous storage on disk • when data segment or index segment first

created, given an initial extent from tablespace 10KB (5 pages)

• if need more space given next contiguous extent

Page 17: Select Operation- disk access and Indexing

Extent

• Can increase the size by a positive % (cannot decrease) – initial n - size of initial extent – next n - size of next – max extents - maximum number of extents – min extents - number of extents initially

allocated – pct increase n - % by which next extent

grows over previous one

Page 18: Select Operation- disk access and Indexing

Oracle create tablespace

• http://www.adp-gmbh.ch/ora/sql/create_tablespace.html

Page 19: Select Operation- disk access and Indexing

Create table

• Create table statement - can specify tablespace, no. of extents– When initial extent full, new extent allocated

– pctfree - determine how much space in a page can be used for inserts of new rows

• if pctfree =10%, inserts stop when page is 90% full» Uses another page

– pctused – determines when new inserts start again • if fall below certain percentage of total, default pctused = 40%

                 pctfree + pctused < 100

For more info: http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_7002.htm

Page 20: Select Operation- disk access and Indexing

Rows

• Row layout on each disk page

1 2 3… N Row N Row N-1 … Row 1Header info Row directory free space data rows

• Header - • Row directory – row number and page byte offset

– Row number is row number in page – book calls it slot#• Page byte offset – with varchar, row size not constant

• To identify a particular row use RID (RowID) – page #, slot # [file#]

slot# is number in row directory (logical #)

Page 21: Select Operation- disk access and Indexing

Differences in DBMSs re: rows

• ROWID can be retrieved in ORACLE but not DB2 (violates relational model rule)

• ORACLE • rows can be split between pages (row record

fragmentation) • Can have rows from multiple tables on same page,

more info

• DB2, no splitting, entire row moved to new page, need forwarding pointer

Page 22: Select Operation- disk access and Indexing

Select operation using Indexes

• Alternative to table scan

Page 23: Select Operation- disk access and Indexing

Binary Search

• “Find all students with gpa > 3.0”– If data is in sorted file, do binary search to find

first such student, then scan to find others.– Cost of binary search can be quite high.

Page 24: Select Operation- disk access and Indexing

Binary Search

• Binary search on disk – optimal for comparisons - not optimal for

disk-based look-up – must keep data in order – may be reading values from same page at

different times

Page 25: Select Operation- disk access and Indexing

Indexing

• Instead: Create an `index’ file• Keyed access retrieval method • index is a sorted file - sorted by index key • index entries:

index key pointer  (RID)

   • pointer is RID   • index resides on disk, partially memory resident when

accessed

Page 26: Select Operation- disk access and Indexing

Index File

Page 1 Page 2 Page NPage 3 Data File

k2 kNk1 Index File

Page 27: Select Operation- disk access and Indexing

Tree-based index

• B-tree – balanced tree

• Nodes point to data (RIDs) and also point to other nodes in tree

Page 28: Select Operation- disk access and Indexing

B+-tree

• Most commonly used index structure type in DBs today • Based on B-tree• Good for equality and range searches• B+ tree: dynamic, adjusts gracefully under inserts and

deletes.• Used to minimize disk I/O • available in DB2, ORACLE also has hash cluster, Ingres

has heap structure, B-tree, isam (chain together new nodes)

Page 29: Select Operation- disk access and Indexing

Structure of B+ Trees

• leaf level pointers to data (RIDs)

• the remaining are directory (index) nodes that point to other index nodes Fig.

Index Entries

Data Entries("Sequence set")

(Direct search)

Page 30: Select Operation- disk access and Indexing

Characteristics of B+ Tree

• Insert/delete at log F N cost; keep tree height-balanced. (F = fanout, N = # leaf pages)

• Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries. The parameter d is called the order of the tree.

• Supports equality and range-searches efficiently

Page 31: Select Operation- disk access and Indexing

Cost of I/O for B+-tree

• Assume number of entries in each index node fits on one page - one node is one page

• If tree with depth of 3, 3 I/Os to get pointer to data B+-tree structured to get most out of every disk page read

• Read in index node, can make multiple probes to same page if remains in memory

– likely since frequent access to upper -level nodes of actively used B+-trees

Page 32: Select Operation- disk access and Indexing

B+ Trees in Practice

• Typical order: 100. • Typical fill-factor: 2/3 full (66.6%)

– average fanout = 133

• Typical capacities:– Height 4: 1334 = 312,900,700 records– Height 3: 1333 = 2,352,637 records

• Can often hold top levels in buffer pool:– Level 1 = 1 page = 8 Kbytes– Level 2 = 133 pages = 1 Mbyte– Level 3 = 17,689 pages = 133 MBytes

Page 33: Select Operation- disk access and Indexing

B+-tree

• B+ tree has a directory structure that allows retrieval of a range of values efficiently – search for leftmost index entry Si such that

X <= Si

• Index entries always placed in sequence by value - can use sequential prefetch on index

• Index entries shorter than data rows and require proportionately less I/O

Page 34: Select Operation- disk access and Indexing

B+-tree

• Balancing of B+-trees - insert, delete • Nodes usually not full • utilities to reorganize to lower disk I/O • Most systems allow nodes to become

depopulated- no automatic algorithm to balance

• Average node below root level 71% full in active growing B+-trees

• Insert/delete

Page 35: Select Operation- disk access and Indexing

Inserting into B+ Tree

• Find correct leaf L. • Put data entry onto L.

– If L has enough space, done!– Else, must split L (into L and a new node L2)

• Redistribute entries evenly, copy up middle key.• Insert index entry pointing to L2 into parent of L.

• This can happen recursively– To split index node, redistribute entries evenly, but

push up middle key. (Contrast with leaf splits.)• Splits “grow” tree; root split increases height.

– Tree growth: gets wider or one level taller at top.– Algorithm

Page 36: Select Operation- disk access and Indexing

Deleting from B+ tree

• Start at root, find leaf L where entry belongs.• Remove the entry.

– If L is at least half-full, done! – If L has only d-1 entries,

• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).

• If re-distribution fails, merge L and sibling.

• If merge occurred, must delete entry (pointing to L or sibling) from parent of L.

• Merge could propagate to root, decreasing height.• Algorithm

Page 37: Select Operation- disk access and Indexing

Duplicate key values

• Duplicate key values in index • leaf nodes have sibling pointers • but a delete of a row that has a heavily

duplicated key entails a long search through the leaf-level of the B+-tree

• Index compression - with multiple duplicates

| header info | PrX keyval RID RID ... RID | PrX keyval RID…RID|

where PrX is count of RID values

Page 38: Select Operation- disk access and Indexing

Create Index

   Options:         multiple columns

        tablespace         storage - initial extents, etc.         percent free default = 10

% of each page left unfilled free page (1 free page for every n

index pages)     Can control % of B+-tree node pages left

unfilled when index created, refers to initial creation

Page 39: Select Operation- disk access and Indexing

39

   Why use an index? 

• If use a select (or join) on the same attribute frequently

• want a way to improve performance - use indexes– For example:

Select from Employee

where ssn = 333445555

• Instead of reading the entire file until ssn is found, it would be nice if we had a pointer to that employee

Page 40: Select Operation- disk access and Indexing

40

Types of indexes (textbook)

• Primary index - key field is a candidate key (must be unique) – data file ordered by key field

• Clustering index - key field is not unique, data file is ordered – all records with same values on same pages

• Secondary index - non-clustering index – data file not ordered– First record in the data page (or block) is called the

anchor record• Non-dense index - pointer in index entry points to anchor• Dense index - pointer to every record in the file

Page 41: Select Operation- disk access and Indexing

41

Non-clustered indexes

• Non-clustered index (secondary index)– key field is a non ordering field - it is not used

to physically order the data file– the index itself is still ordered– How many non-clustering indexes can a table

have?

Page 42: Select Operation- disk access and Indexing

Clustered Indexes

• Placing rows on disk in order by some common index key value        (remember the index itself is always sorted)

– Clustered index - (primary and clustering)– key field is an ordering field - all the data with the

same values for the key field physically placed on the same pages on the disk.

– If primary key, data ordered on a page by key field– Usually assume disk pages themselves also clustered

on the disk – How many clustering indexes can a table have?

Page 43: Select Operation- disk access and Indexing

Clustering

• Efficiency advantage        read in a page, get all of the rows with

the same value • clustering is useful for range queries

        e.g.  between keyval1 and keyval2

Page 44: Select Operation- disk access and Indexing

Example

• http://www.dba-oracle.com/oracle_tip_hash_index_cluster_table.htm

Page 45: Select Operation- disk access and Indexing

Clustering

• Can only cluster table by 1 clustering index at a time • In SQL server

– creates clustered index on PK automatically if no other clustered index on table and PK nonclustered index not specified

• In DB2 – – if the table is empty, rows sorted as placed on disk – subsequent insertions not clustered, must use REORG

• In Oracle-– Cluster index – now available for PK in 10g– Define a cluster to create cluster index for 2 tables

Page 46: Select Operation- disk access and Indexing

Indexes vs. table scan

• To illustrate the difference between table scan, secondary index (non clustered) and clustered index Assume 10 M customers, 200 cities2KB/page, row = 100 bytes, 20 rows/page             Select *

            From Customers             Where city = Birmingham

1/200 * 10M if assume selectivity = 1/200 50,000 customers in a city

Page 47: Select Operation- disk access and Indexing

Rules of Thumb for I/O

• Random I/O – 160 pages/second, .00625

• Sequential prefetch I/O – 1600 pages/second, .000625

Will discuss later:

• List prefetch I/O – 400 pages/second, .0025

Page 48: Select Operation- disk access and Indexing

Table Scan

Table Scan - read entire table

If used an random I/O 10,000,000/20 = 500,000 pages 

500,000*R = 3125

Instead, it makes more sense to use:sequential prefetch read 32 pages at a time

500,000*S = 312

Page 49: Select Operation- disk access and Indexing

Clustering Index

Clustering Index –

All entries for B'ham clustered on same pages 50,000/20 = 2500 pages (with 20 rows per page) Assume: 3 upper nodes of the tree   Assume: 1000 index entries per leaf node, read

50000/1000 index pages

 

(3 + 50 + 2500) * ? = 2,553 * ?

If assume ?=R, then 2,553*R=16

Makes more sense to assume (3+50+2500) * S = 1.6

Page 50: Select Operation- disk access and Indexing

Secondary Index

Secondary Index– In the worst case 1 entry for B'ham per page 50,000 pages (10M/200)3 upper nodes of the tree   Assume 1000 index entries per leaf node, read

50000/1000 index pages

(3 + 50 + 50,000)*? = 50, 053 * ?

If assume ?=R then 50,053*R=312.8Better to assume (3+50)*S + 50,000*R=312.53

Page 51: Select Operation- disk access and Indexing

List Prefetch

Create list of data pages to access

Pages not necessarily in contiguous sequential order

system orders pages to minimize disk I/O

E.g. elevator algorithm for disk request scheduling

50, 053 * L = 125.1

Best to assume (3+50)*S+50,000*L=125.03

Page 52: Select Operation- disk access and Indexing

% Free

• Redo the previous calculations assuming relations created with 50% free option specified.

Page 53: Select Operation- disk access and Indexing

Creating Indexes

When determining what indexes to create consider: workload - mix of queries and frequencies of requests             20% of requests are updates, etc.

            can create lots of indexes but:                 cost to create                 insertions                 initial load time high if a large table                 index entries can become longer and

longer as multiple columns included

Page 54: Select Operation- disk access and Indexing

Multiple Indexes

• More than one index on a relation             – e.g. class - one index, gender - one index

Page 55: Select Operation- disk access and Indexing

Composite Index

• One index based on more than one attribute  Create Index index_name on Table (col1, col2,... coln)

•    Composite index entry - values for each attribute             class, gender             entry in index is:  C1, C2, RID

• What would B+ tree look like?

Page 56: Select Operation- disk access and Indexing

Threads• Thread results from a fork of a computer program, usually contained

inside a process– Multiple threads inside same process, share resources, address

space and memory– Processes do not share these resources– Thread have their own stack, copy of registers, PC and local

thread storage• Some languages support multiple threads, but do not execute at the

same time– Kernel threads can run concurrently

Page 57: Select Operation- disk access and Indexing

Parallel computing• Form of computation in which many calculations carried out

simultaneously– Divide large problem into smaller ones– data, instruction level and task parallelism– SISD, SIMD, MISD, MIMD

• Dominant paradigm in the form of multicore processors• Parallel computer – shared or distributed memory• Parallel program difficult to write due to

– Software bugs, race conditions– Communication and synchronization

• Multiple processing elements working concurrently– Single computer with multiple processors, networked computers,

special hardware, etc.

Page 58: Select Operation- disk access and Indexing

• Multithreading– Model to allow multiple threads within single process– Can execute in parallel on multiprocessor system

• Process, kernel thread, user thread, fiber (cooperatively scheduled, can run in any thread in the same process)

• Subtasks in a parallel program are called threads– Lightweight version of threads – fibers– Bigger versions – processes

Page 59: Select Operation- disk access and Indexing

• Parallel computing – model of computation– Can utilize processes, multithreading to

implement