cs 245: principles of data-intensive systemsassigned paper readings (q&a in class) 3 programming...
TRANSCRIPT
![Page 1: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/1.jpg)
CS 245: Principles ofData-Intensive Systems
Instructor: Matei Zahariacs245.stanford.edu
![Page 2: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/2.jpg)
Course originally by Hector Garcia-Molina (1954-2019)
CS 245 2
![Page 3: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/3.jpg)
My Background
PhD in 2013
CS 245 3
Open source distributed data processing framework
Data & ML platform startup
Research in systems for ML
![Page 4: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/4.jpg)
Outline
Why study data-intensive systems?
Course logistics
Key issues and themes
A bit of history
4CS 245
![Page 5: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/5.jpg)
Why StudyData-Intensive Systems?
Most important computer applications must manage, update and query datasets» Bank, store, fleet controller, search app, …
Data quality, quantity & timeliness becoming even more important with AI» Machine learning = algorithms that generalize
from data
5CS 245
![Page 6: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/6.jpg)
What AreData-Intensive Systems?Relational databases: most popular type of data-intensive system (MySQL, Oracle, etc)
Many systems facing similar concerns:message queues, key-value stores, streaming systems, ML frameworks, your custom app?
CS 245 6
Goal: learn the main issues and principles that span all data-intensive systems
![Page 7: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/7.jpg)
Typical System Challenges
Reliability in the face of hardware crashes, bugs, bad user input, etc
Concurrency: access by multiple users
Performance: throughput, latency, etc
Access interface from many, changing apps
Security and data privacy
CS 245 7
![Page 8: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/8.jpg)
Practical Benefits of Studying These SystemsLearn how to select & tune data systems
Learn how to build them
Learn how to build apps that have to tackle some of these same challenges» E.g. cross-geographic-region billing app,
custom search engine, etc
CS 245 8
![Page 9: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/9.jpg)
Scientific Interest
Interesting algorithmic and design ideas
In many ways, data systems are the highest-level successful programming abstractions
CS 245 9
![Page 10: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/10.jpg)
Programming: The Dream
CS 245 10
∀𝑖 #$∈&'∪)'
𝜆𝑥. 𝑥-(… )
Working application
High-level spec
![Page 11: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/11.jpg)
Programming: The Dream
CS 245 11
∀𝑖 #$∈&'∪)'
𝜆𝑥. 𝑥-(… )
Working application
High-level spec
![Page 12: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/12.jpg)
Programming: The Reality
CS 245 12
![Page 13: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/13.jpg)
Programming with Databases
CS 245 13
Relational algebra
Actually manages:• Durability• Concurrency• Query optimization• Security• …
High-level spec
![Page 14: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/14.jpg)
Outline
Why study data-intensive systems?
Course logistics
Key issues and themes
A bit of history
14CS 245
![Page 15: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/15.jpg)
Teaching Assistants
CS 245 15
Pratiksha Thaker Ben HannelPeter Kraft Wantong Jiang
Office hours will be posted on website
![Page 16: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/16.jpg)
Course Format
Lectures in class
Optional textbook
Assigned paper readings (Q&A in class)
3 programming assignments
Midterm and final
CS 245 16
![Page 17: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/17.jpg)
Optional Textbook
Database Systems:The Complete Book
Chapters 13-20
By the original Stanford InfoLab group (Hector Garcia-Molina, Jeff Ullman, Jennifer Widom)
CS 245 17
![Page 18: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/18.jpg)
Paper Readings
A few classic or recent research papers
Read the paper before the class: we want to discuss it together!
We’ll post discussion questions on the class website 2-3 weeks before lecture
CS 245 18
![Page 19: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/19.jpg)
How Should You Read a Paper?
Read: “How to Read a Paper”
TLDR: don’t justgo through endto end; focus onkey ideas/sections
CS 245 19
![Page 20: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/20.jpg)
Our First Paper
We’ll be reading part of “A History and Evaluation of System R” for next class!
Find instructions and questions on website
CS 245 20
![Page 21: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/21.jpg)
Programming Assignments
Three assignments implemented in Java or Scala, and submitted online
1. Storage and access methods2. Query optimization3. Transactions and recovery
Done individually; A1 posted next week
CS 245 21
![Page 22: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/22.jpg)
Midterm and Final
Written tests based on material covered in lectures, assignments and readings
Final will cover the entire course but focus on the second half
CS 245 22
![Page 23: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/23.jpg)
Grading
45% Assignments (15% each)
25% Midterm
30% Final
CS 245 23
![Page 24: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/24.jpg)
Keeping in Touch
Sign up for Piazza on the course website to receive announcements!
cs245.stanford.edu
CS 245 24
![Page 25: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/25.jpg)
Outline
Why study data-intensive systems?
Course logistics
Key issues and themes
A bit of history
25CS 245
![Page 26: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/26.jpg)
Recall: Examples ofData-Intensive SystemsRelational databases: most popular type of data-intensive system (MySQL, Oracle, etc)
Many systems facing similar concerns:message queues, key-value stores, streaming systems, ML frameworks, your custom app?
CS 245 26
![Page 27: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/27.jpg)
Basic Components
CS 245 27
Logical dataset(e.g. table, graph)
Data mgmt. system Physical storage
(data structures)
Administrator
Clients / users
Queries
![Page 28: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/28.jpg)
ExamplesSystem Logical
Data ModelPhysical Storage API Other Features
Relational databases
Relations(i.e. tables)
B-trees, column stores, indexes, …
SQL, ODBC Durability, transactions, query planning, migrations, …
CS 245 28
![Page 29: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/29.jpg)
ExamplesSystem Logical
Data ModelPhysical Storage API Other Features
Relational databases
Relations(i.e. tables)
B-trees, column stores, indexes, …
SQL, ODBC Durability, transactions, query planning, migrations, …
TensorFlow
CS 245 29
![Page 30: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/30.jpg)
ExamplesSystem Logical
Data ModelPhysical Storage API Other Features
Relational databases
Relations(i.e. tables)
B-trees, column stores, indexes, …
SQL, ODBC Durability, transactions, query planning, migrations, …
TensorFlow Tensors
CS 245 30
![Page 31: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/31.jpg)
ExamplesSystem Logical
Data ModelPhysical Storage API Other Features
Relational databases
Relations(i.e. tables)
B-trees, column stores, indexes, …
SQL, ODBC Durability, transactions, query planning, migrations, …
TensorFlow Tensors NCHW, NHWC, sparse arrays, …
CS 245 31
![Page 32: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/32.jpg)
ExamplesSystem Logical
Data ModelPhysical Storage API Other Features
Relational databases
Relations(i.e. tables)
B-trees, column stores, indexes, …
SQL, ODBC Durability, transactions, query planning, migrations, …
TensorFlow Tensors NCHW, NHWC, sparse arrays, …
Python DAG construction
CS 245 32
![Page 33: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/33.jpg)
ExamplesSystem Logical
Data ModelPhysical Storage API Other Features
Relational databases
Relations(i.e. tables)
B-trees, column stores, indexes, …
SQL, ODBC Durability, transactions, query planning, migrations, …
TensorFlow Tensors NCHW, NHWC, sparse arrays, …
Python DAG construction
query planning, distribution, specialized HW
CS 245 33
![Page 34: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/34.jpg)
ExamplesSystem Logical
Data ModelPhysical Storage API Other Features
Relational databases
Relations(i.e. tables)
B-trees, column stores, indexes, …
SQL, ODBC Durability, transactions, query planning, migrations, …
TensorFlow Tensors NCHW, NHWC, sparse arrays, …
Python DAG construction
query planning, distribution, specialized HW
Apache Kafka
CS 245 34
![Page 35: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/35.jpg)
ExamplesSystem Logical
Data ModelPhysical Storage API Other Features
Relational databases
Relations(i.e. tables)
B-trees, column stores, indexes, …
SQL, ODBC Durability, transactions, query planning, migrations, …
TensorFlow Tensors NCHW, NHWC, sparse arrays, …
Python DAG construction
query planning, distribution, specialized HW
Apache Kafka
Streams of opaque records
Partitions, compaction
Publish, subscribe
Durability, rescaling
CS 245 35
![Page 36: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/36.jpg)
ExamplesSystem Logical
Data ModelPhysical Storage API Other Features
Relational databases
Relations(i.e. tables)
B-trees, column stores, indexes, …
SQL, ODBC Durability, transactions, query planning, migrations, …
TensorFlow Tensors NCHW, NHWC, sparse arrays, …
Python DAG construction
query planning, distribution, specialized HW
Apache Kafka
Streams of opaque records
Partitions, compaction
Publish, subscribe
Durability, rescaling
Apache Spark RDDs
CS 245 36
![Page 37: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/37.jpg)
ExamplesSystem Logical
Data ModelPhysical Storage API Other Features
Relational databases
Relations(i.e. tables)
B-trees, column stores, indexes, …
SQL, ODBC Durability, transactions, query planning, migrations, …
TensorFlow Tensors NCHW, NHWC, sparse arrays, …
Python DAG construction
query planning, distribution, specialized HW
Apache Kafka
Streams of opaque records
Partitions, compaction
Publish, subscribe
Durability, rescaling
Apache Spark RDDs
Collections of Java objects
Read external systems, cache
Functional API, SQL
Distribution,query planning, transactions*
CS 245 37
![Page 38: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/38.jpg)
Some Typical Concerns
Access interface from many, changing apps
Performance: throughput, latency, etc
Reliability in the face of hardware crashes, bugs, bad user input, etc
Concurrency: access by multiple users
Security and data privacy
CS 245 38
![Page 39: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/39.jpg)
Example
Message queue system
CS 245 39
Producers Consumers
What should happen if two consumers read() at the same time?
![Page 40: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/40.jpg)
Example
Message queue system
CS 245 40
Producers Consumers
What should happen if a consumer reads a message but then immediately crashes?
![Page 41: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/41.jpg)
Example
Message queue system
CS 245 41
Producers Consumers
Can a producer put in 2 messages atomically?
![Page 42: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/42.jpg)
Two Big Ideas
Declarative interfaces» Apps specify what they want, not how to do it» Example: “store a table with 2 integer columns”,
but not how to encode it on disk» Example: “count records where column1 = 5”
Transactions» Encapsulate multiple app actions into one atomic request (fails or succeeds as a whole)
» Concurrency models for multiple users» Clear interactions with failure recovery
CS 245 42
![Page 43: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/43.jpg)
Declarative Interface ExamplesSQL» Abstract “table” data model, many physical
implementations» Specify queries in a restricted language that the
database can optimize
TensorFlow» Operator graph gets mapped & optimized to
different hardware devices
Functional programming (e.g. MapReduce)» Says what to run but not how to do scheduling
CS 245 43
![Page 44: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/44.jpg)
Transaction Examples
SQL databases» Commands to start, abort or end transactions
based on multiple SQL statements
Apache Spark, MapReduce» Make the multi-part output of a job appear
atomically when all partitions are done
Stream processing systems» Count each input record exactly once despite
crashes, network failures, etcCS 245 44
![Page 45: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/45.jpg)
Outline
Why study data-intensive systems?
Course logistics
Key issues and themes
A bit of history
45CS 245
![Page 46: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/46.jpg)
Early Data Management
At first, each application did its own data management directly against storage
CS 245 46
Ye OldeBank
I’d like a computerized
account system
I have just the thing
write_block()
read_block()Stores 5 MB!
![Page 47: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/47.jpg)
Problems with App Storage Management
How should we lay out and navigate data?
How do we keep the application reliable?
What if we want to share data across apps?
Every app is solving the same problems!
CS 245 47
![Page 48: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/48.jpg)
Navigational Databases (1964)
CODASYL, IDS
Data is graph of records
Procedural API basedon navigating links:get department with name='Sales’get first employee in set department-employeesuntil end-of-set do {get next employee in set department-employeesprocess employee
}
CS 245 48“Data independence”: app code not tied to storage details
![Page 49: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/49.jpg)
CS 245 49
Charles W. Bachman, “The Programmer as Navigator”
![Page 50: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/50.jpg)
Edgar F. (Ted) Codd
Proposed the relational DB model, with declarative queries & storage (1970)
Relation = table with unique key identifying each row
CS 245 50
Data independence++: apps don’t even specify how to execute query
![Page 51: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/51.jpg)
Key Ideas in Relational DBMS
CS 245 51
Logical data model:tables with references
across them (foreign keys)
Data mgmt. system Physical storage:
raw files, B-trees,hash indexes, etc
Administrator
Clients / users
Relationalalgebra
(e.g. SQL)
Query planning, access methods, transactions, etc
![Page 52: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/52.jpg)
Early Relational DBMS
IBM System R (1974): research system» Led to IBM SQL/DS in 1981
Ingres (1974): Mike Stonebraker at Berkeley» Led to PostgreSQL
Oracle database (released 1979)
CS 245 52
Next class, we’ll cover database architecture by looking at System R
![Page 53: CS 245: Principles of Data-Intensive SystemsAssigned paper readings (Q&A in class) 3 programming assignments Midterm and final CS 245 16. Optional Textbook Database Systems: The Complete](https://reader030.vdocuments.us/reader030/viewer/2022040515/5e71dc85ab8768765b53a7ca/html5/thumbnails/53.jpg)
Rest of the Course
We’ll explore both “big ideas” we saw, focusing on relational DBs but showing examples in other areas
• Declarative interfaces• Data independence and data storage formats• Query languages and optimization
• Transactions, concurrency & recovery• Concurrency models• Failure recovery• Distributed storage and consistency
CS 245 53Don’t forget to sign up for Piazza!