data storage formats - stanford universityplacing data for efficient access locality:which items are...
TRANSCRIPT
![Page 2: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/2.jpg)
Outline
Overview
Record encoding
Collection storage
Indexes
CS 245 2
![Page 3: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/3.jpg)
Outline
Overview
Record encoding
Collection storage
Indexes
CS 245 3
![Page 4: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/4.jpg)
Overview
Recall from last time: I/O slow compared to compute, random I/O ≪ sequential
Key concerns in storage:» Access time: minimize # of random accesses,
bytes transferred, etc• Main way: place co-accessed data together!
» Size: storage costs $» Ease of updates
CS 245 4
![Page 5: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/5.jpg)
General SetupRecord collection
Index
Secondaryindex
…
CS 245 5
![Page 6: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/6.jpg)
Outline
Overview
Record encoding
Collection storage
Indexes
CS 245 6
![Page 7: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/7.jpg)
What Are the Data Items We Want to Store?a salary
a name
a date
a picture
CS 245 7
![Page 8: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/8.jpg)
What Are the Data Items We Want to Store?a salary
a name
a date
a picture
What we have available: bytes
8bits
CS 245 8
![Page 9: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/9.jpg)
To Represent:
Integer (short): 2 bytes
e.g., 35 is 00000000 00100011
Real, floating pointn bits for mantissa, m for exponent….
CS 245 9
![Page 10: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/10.jpg)
Characters
® Various coding schemes available
Example: ASCIIA: 1000001a: 11000015: 0110101LF: 0001010
To Represent:
CS 245 10
![Page 11: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/11.jpg)
Booleane.g., TRUE
FALSE1111 11110000 0000
Application specifice.g., RED ® 1 GREEN ® 3
BLUE ® 2 YELLOW ® 4 …
To Represent:
Can we use less than 1 byte/code?Yes, but only if desperate...CS 245 11
![Page 12: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/12.jpg)
Datese.g.: - Integer, # days since Jan 1, 1900
- 8 characters, YYYYMMDD- 7 characters, YYYYDDD
Timee.g. - Integer, seconds since midnight
- characters, HHMMSSFF
To Represent:
CS 245 12
![Page 13: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/13.jpg)
String of characters» Null terminated
e.g.,
» Length givene.g.,
- Fixed length
c ta
c ta3
To Represent:
CS 245 13
![Page 14: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/14.jpg)
Bag of bits Length Bits
To Represent:
CS 245 14
![Page 15: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/15.jpg)
To Represent:
CS 245 15
![Page 16: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/16.jpg)
To Represent: Nothing
NULL concept in SQL (not same as 0 or “”)
Physical representation options:» Special “sentinel” value in fixed0length field» Boolean “is null” flag» Just skip the field in a sparse record format
Pretty common in practice!
CS 245 16
![Page 17: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/17.jpg)
Key Point
• Fixed length items
• Variable length items- usually length given at beginning
CS 245 17
![Page 18: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/18.jpg)
Also
Type of an item: tells us how to interpret the bytes, plus size if fixed
CS 245 18
![Page 19: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/19.jpg)
Data Items
Records
Blocks
Files
Bigger Collections
CS 245 19
![Page 20: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/20.jpg)
Record: Set of Related Data Items (“Fields”)
E.g.: Employee record:
name field,
salary field,
date-of-hire field, ...
CS 245 20
![Page 21: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/21.jpg)
Main choices:» Fixed vs variable format» Fixed vs variable length
Types of Records
CS 245 21
![Page 22: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/22.jpg)
Fixed Format
A schema (not record) contains following info:
- # of fields
- type of each field
- order in record
- meaning of each field
CS 245 22
![Page 23: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/23.jpg)
Example: Fixed Format & Length
Employee record
(1) E#, 2 byte integer
(2) E.name, 10 char. Schema
(3) Dept, 2 byte code
55 s m i t h 02
83 j o n e s 01Records
CS 245 23
![Page 24: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/24.jpg)
Variable Format
Record itself contains format
“Self Describing”
CS 245 24
![Page 25: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/25.jpg)
4I52 4S DROF46
Field name codes could also be strings, i.e. TAGS
# Fi
elds
Cod
e id
entif
ying
field
as
E#In
tege
r typ
e
Cod
e fo
r Ena
me
Strin
g ty
peLe
ngth
of s
tr.
Example: Variable Format & Length
CS 245 25
![Page 26: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/26.jpg)
Variable Format Useful For
“Sparse” records
Repeating fields
Evolving formats
But may waste space...
CS 245 26
![Page 27: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/27.jpg)
Example: Variable Format Record with Repeated Fields
Employee ® one or more ® children
3 E_name: Fred Child: Sally Child: Tom
CS 245 27
![Page 28: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/28.jpg)
Note: Repeated Fields Does Not Imply Variable Format/Length
Could have fixed space for a max # of items and their sizes
John Sailing Chess (null)
CS 245 28
![Page 29: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/29.jpg)
Example: Include a record type in record
record type record length
Type is a pointer to one of several schemas
5 27 . . . .
Many Variants Between Fixed and Variable Format
CS 245 29
![Page 30: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/30.jpg)
May contain:- record type- record length- timestamp- concurrency stuff ...
Record Header: Data at Start that Describes a Record
CS 245 30
![Page 31: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/31.jpg)
Exercise: How to store XML data?
<table><description> people on the fourth floor <\description><people>
<person><name> Alan <\name><age> 42 <\age><email> [email protected] <\email>
<\person><person>
<name> Sally <\name><age> 30 <\age><email> [email protected] <\email>
<\person><\people><\table>
Source: Data on the Web, Abiteboul et alCS 245 31
![Page 32: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/32.jpg)
Compression» Within record: e.g. encoding selection» Collection of records: use common patterns
Encryption» Usually operates on large blocks
Other Interesting Issues
CS 245 32
![Page 33: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/33.jpg)
Outline
Overview
Record encoding
Collection storage
Indexes
CS 245 33
![Page 34: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/34.jpg)
Collection Storage Questions
How do we place data items and records for efficient access?» Locality and searchability
How do we physically encode records in blocks and files?
CS 245 34
![Page 35: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/35.jpg)
Placing Data for Efficient AccessLocality: which items are accessed together» When you read one field of a record, you’re
likely to read other fields of the same record» When you read one field of record 1, you’re
likely to read the same field of record 2
Searchability: quickly find relevant records» E.g. sorting the file lets you do binary search
CS 245 35
![Page 36: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/36.jpg)
Locality Example: Row Stores vs Column Stores
Row Store Column Store
AlexBob
CarolDavidEve
Frances
203042212656
GiaHaroldIvan
192841
CACANYMACANYMAAKCA
name age state
Fields stored contiguouslyin one file
AlexBob
CarolDavidEve
FrancesGia
HaroldIvan
name203042212656192841
ageCACANYMACANYMAAKCA
state
Each column in a different file
CS 245 36
![Page 37: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/37.jpg)
Locality Example: Row Stores vs Column Stores
Row Store Column Store
AlexBob
CarolDavidEve
Frances
203042212656
GiaHaroldIvan
192841
CACANYMACANYMAAKCA
name age state
Fields stored contiguouslyin one file
AlexBob
CarolDavidEve
FrancesGia
HaroldIvan
name203042212656192841
ageCACANYMACANYMAAKCA
state
Each column in a different file
Accessing all fields of one record: 1 random I/O for row, 3 for columnCS 245 37
![Page 38: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/38.jpg)
Locality Example: Row Stores vs Column Stores
Row Store Column Store
AlexBob
CarolDavidEve
Frances
203042212656
GiaHaroldIvan
192841
CACANYMACANYMAAKCA
name age state
Fields stored contiguouslyin one file
AlexBob
CarolDavidEve
FrancesGia
HaroldIvan
name203042212656192841
ageCACANYMACANYMAAKCA
state
Each column in a different file
Accessing one field of all records: 3x less I/O for column storeCS 245 38
![Page 39: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/39.jpg)
Can We Have Hybrids Between Row & Column?
Yes! For example, colocated column groups:
AlexBob
CarolDavidEve
FrancesGia
HaroldIvan
name203042212656192841
ageCACANYMACANYMAAKCA
state
File 1 File 2: age & state
Helpful if age & state are frequently co-accessedCS 245 39
![Page 40: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/40.jpg)
Improving Searchability: Ordering
Ordering the data by a field will give:» Closer I/Os if queries tend to read data with
nearby values of the field (e.g. time ranges)» Option to accelerate search via an ordered
index (e.g. B-tree), binary search, etc
What’s the downside of having an ordering?
CS 245 40
![Page 41: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/41.jpg)
Improving Searchability: PartitionsJust place data into buckets based on a field (but not necessarily fine-grained order)
E.g. Hive table storage over filesystem or S3:
/my_table/date=20190101/file1.parquet/my_table/date=20190101/file2.parquet/my)table/date=20190102/file1.parquet/my_table/date=20190101/file2.parquet/my_table/date=20190103/file1.parquet
...
Easy to add, remove & list files in any directoryCS 245 41
![Page 42: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/42.jpg)
Can We Have Searchability on Multiple Fields at Once?Yes! Many possible ways:
1) Multiple partition or sort keys (e.g. partition data by date, then group by customer ID)
2) Interleaved orderings such as Z-ordering
CS 245 42
![Page 43: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/43.jpg)
Z-Ordering
Image source: Wikipedia
dimension 1
dimension 2
CS 245 43
![Page 44: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/44.jpg)
How Do We Encode Records into Blocks & Files?
CS 245 44
![Page 45: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/45.jpg)
How Do We Encode Records into Blocks & Files?
blocks
a file
records
CS 245 45
![Page 46: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/46.jpg)
Questions in Storing Records
(1) separating records
(2) spanned vs. unspanned
(3) indirection
CS 245 46
![Page 47: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/47.jpg)
Block
(a) no need to separate - fixed size recs.(b) special marker(c) give record lengths (or offsets)
- within each record- in block header
R2R1 R3
(1) Separating Records
CS 245 47
![Page 48: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/48.jpg)
Unspanned: records must be within one block
block 1 block 2
Spanned:
block 1 block 2
...
R1 R2
R1
R3 R4 R5
R2 R3(a)
R3(b) R6R5R4 R7
(a)
(2) Spanned vs Unspanned
CS 245 48
![Page 49: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/49.jpg)
With Spanned Records
need indication need indicationof partial record of continuation“pointer” to rest (+ from where?)
R1 R2 R3(a)
R3(b) R6R5R4 R7
(a)
CS 245 49
![Page 50: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/50.jpg)
Spanned vs Unspanned
Unspanned is much simpler, but may waste storage space…
Spanned essential if record size > block size
CS 245 50
![Page 51: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/51.jpg)
How does one refer to specific records?(e.g. in metadata or in other records)
Rx
(4) Indirection
CS 245 51
![Page 52: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/52.jpg)
How does one refer to records?
Rx
Many options:Physical Indirect
(4) Indirection
CS 245 52
![Page 53: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/53.jpg)
Device IDE.g., Record Cylinder #
Address = Track #or ID Block #
Offset in block
Block ID
Purely Physical
CS 245 53
![Page 54: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/54.jpg)
E.g., Record ID is arbitrary bit string
maprec ID
r addressa
Physicaladdr.Rec ID
Fully Indirect
CS 245 54
![Page 55: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/55.jpg)
Flexibility Costto move records of indirection(for deletions, insertions)
Tradeoff
CS 245 55
![Page 56: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/56.jpg)
Physical Indirect
Many optionsin between …
CS 245 56
![Page 57: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/57.jpg)
Header
A block: Free space
R3R4R1 R2
Example: Indirection in Block
CS 245 57
![Page 58: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/58.jpg)
May contain:- File ID (or table or database ID)- This block ID- Record directory- Pointer to free space- Type of block (e.g. contains recs type 4)- Pointer to other blocks “like it”- Timestamp ...
Block Header: Data at Start that Describes Block
CS 245 58
![Page 59: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/59.jpg)
Other Concern: Deletion!
CS 245 59
![Page 60: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/60.jpg)
Options
(a) Immediately reclaim space
(b) Mark deleted
CS 245 60
![Page 61: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/61.jpg)
Options
(a) Immediately reclaim space
(b) Mark deleted– May need chain of deleted records
(for space re-use)– Need a way to mark:
• special characters• delete field• entries in maps
CS 245 61
![Page 62: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/62.jpg)
How expensive is to move valid record to free space for immediate reclaim?
How much space is wasted?» e.g., deleted records, delete fields, free
space chains,...
As Usual, Many Tradeoffs
CS 245 62
![Page 63: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/63.jpg)
Concern with Deletions
Dangling pointers
CS 245 63
R1 ?
![Page 64: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/64.jpg)
CS 245 64
Solution 1: Do Not Worry
![Page 65: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/65.jpg)
Solution 2: Tombstones
Special mark in old location or mappings
CS 245 65
![Page 66: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/66.jpg)
Solution 2: Tombstones
Special mark in old location or mappings
CS 245 66
Physical IDs:
A block
This space This space cannever re-used be re-used
![Page 67: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/67.jpg)
Logical IDs:
ID LOC
7788
map
Never reuseID 7788 nor space in map...
CS 245 67
Solution 2: Tombstones
Special mark in old location or mappings
![Page 68: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/68.jpg)
Insertion
Easy case: records not ordered
® Insert new record at end of file or in a deleted slot
® If records are variable size, not as easy...
CS 245 68
![Page 69: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/69.jpg)
Insertion
Hard case: records are ordered
® If free space close by, not too bad...
® Otherwise, use an overflow area?
CS 245 69
![Page 70: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/70.jpg)
How much free space to leave in each block, track, cylinder?
How often do I reorganize file + overflow?
CS 245 70
Interesting Problems
Freespace
![Page 71: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/71.jpg)
Summary
There are 10,000,000 ways to organize my data on disk…
Which is right for me?
CS 245 71
![Page 72: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/72.jpg)
Flexibility Space Utilization
Complexity Performance
Issues
CS 245 72
![Page 73: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read](https://reader034.vdocuments.us/reader034/viewer/2022052009/601e79382f965b0a0f2a0523/html5/thumbnails/73.jpg)
To Evaluate a Strategy, Compute:
Space used for expected data
Expected time to- fetch record given key- fetch record with next key- insert record- append record- delete record- update record- read all file- reorganize file
CS 245 73