cc conclude
TRANSCRIPT
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 1/22
Transactions, Concluded, and
the Future of Data Management
Zachary G. IvesUniversity of Pennsylvania
CIS 550 – Database & Information Systems
December 4, 2003
Slide content courtesy of Susan Davidson, Raghu Ramakrishnan & Johannes Gehrke
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 2/22
2
Final Administrivia
Project demos today and tomorrow
Final exam handed out at the end of today’s class
Finals plus project reports due by 1PM, 12/18/2003
Project reports should be ballpark 10-15 pages Remember, quality and clarity of presentation matters!
Also, email me a brief message detailing: Your contributions to the project
Your group members’ contributions and your assessment of “group dynamics”
Turn in at my office, 576 Levine Hallor to my assistant, Kathy Venit, in 308 Levine Hall
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 3/22
3
Last Time…
We were discussing isolation levels
How to keep transactions from interfering with oneanother
Or at least, how to minimize this
Recall the strongest version of isolation wasserializability
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 4/22
4
Theory of Serializability
A schedule of a set of transactions is a linear ordering of theiractions
e.g. for the simultaneous deposits example:
R1(X.bal) R2(X.bal) W1(X.bal) W2(X.bal)
A serial schedule is one in which all the steps of eachtransaction occur consecutively
A serializable schedule is one which is equivalent to someserial schedule (i.e. given any initial state, the final state is thesame as one produced by some serial schedule)
The example above is neither serial nor serializable
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 5/22
5
Questions of Concern
Given a schedule S, is it serializable?
How can we "restrict" transactions in progress toguarantee that only serializable schedules are
produced?
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 6/22
6
Conflicting Actions
Consider a schedule S in which there are two consecutiveactions Ii and I j of transactions Ti and T j respectively
If Ii and I j refer to different data items, then swapping Ii and I j does not matter
If Ii and I j refer to the same data item Q, then swapping Ii andI j matters if and only if one of the actions is a write
Ri(Q) Wj(Q) produces a different final value for Q than Wj(Q) Ri(Q)
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 7/22
7
Testing for Serializability
Given a schedule S, we can construct a di-graphG=(V,E) called a precedence graph
V : all transactions in S
E : Ti T j whenever an action of Ti precedes andconflicts with an action of T j in S
Theorem:
A schedule S is conflict serializable if and only if its
precedence graph contains no cycles
Note that testing for a cycle in a digraph can bedone in time O(|V|2)
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 8/22
8
An Example
T1 T2 T3
R(X,Y,Z)
R(X)W(X)
R(Y)
W(Y)
R(Y)
R(X)
W(Z)
T1 T2 T3
Cyclic: Not serializable.
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 9/22
9
Another Example
T1 T2 T3
R(X)
W(X)
R(X)W(X)
R(Y)
W(Y)
R(Y)
W(Y)
T1 T2 T3
Acyclic: serializable
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 10/22
10
Producing the Equivalent Serial
Schedule
If the precedence graph for a schedule is acyclic, thenan equivalent serial schedule can be found by atopological sort of the graph
For the second example, the equivalent serial scheduleis:
R1(Y)W1(Y) R2(X)W2(X) R2(Y)W2(Y) R3(X)W3(X)
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 11/22
11
Locking and Serializability
We said that for a serializable schedule, atransaction must hold all locks until it terminates (acondition called strict locking)
It turns out that this is crucial to guaranteeserializability
Note that the first (bad) example could have beenproduced if transactions acquired and immediately
released locks.
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 12/22
12
Well-Formed, Two-Phased
Transactions
A transaction is well-formed if it acquires at leasta shared lock on Q before reading Q or anexclusive lock on Q before writing Q and doesn’t
release the lock until the action is performed Locks are also released by the end of the transaction
A transaction is two-phased if it never acquires alock after unlocking one
i.e., there are two phases: a growing phase in which thetransaction acquires locks, and a shrinking phase inwhich locks are released
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 13/22
13
Two-Phased Locking Theorem
If all transactions are well-formed and two-phase,then any schedule in which conflicting locks arenever granted ensures serializability
i.e., there is a very simple scheduler! However, if some transaction is not well-formed or
two-phase, then there is some schedule in whichconflicting locks are never granted but which fails
to be serializable i.e., one bad apple spoils the bunch.
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 14/22
14
Summary of Transactions
Transactions are all-or-nothing units of work guaranteed despite concurrency or failures in thesystem
Theoretically, the “correct” execution of transactions is serializable (i.e. equivalent to someserial execution)
Practically, this may adversely affect throughput isolation levels
With isolation levels, users can specify the level of “incorrectness” they are willing to tolerate
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 15/22
15
What to Look for Down the Road
… well, no one really knows the answer to this…
… But here are some hints, ideas, and hot directions
Sensors and streaming data
Peer-to-peer meets databases
“The Semantic Web”
Collaborative data sharing
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 16/22
16
Sensors and Streaming Data
No databases at all…
… Instead we have
networks of simple sensors
Madden, starting at MIT
Gehrke, Cornell
Widom, Stanford
queries are in SQL data is live and “streaming” we compute aggregates over
“windows”
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 17/22
17
What’s Interesting Here
We’re not talking about data on disk – we’re talking about
queries over “current readings”
Sensors are generally “stupid” and may be battery-operated
A lot of challenges are networking-related: how to aggregate databefore it gets sent, etc.
The next step (e.g., work initiated here @ Penn): includingsensors that capture images – a very different problem!
This has many more compelling applications – security, monitoring,correlating multiple sensors, rescue operations, military logistics andcoordination, etc.
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 18/22
18
Peer-to-Peer Computing
Fundamentally, our model of DBMSs tends to be centralized
Even for data integration: there’s a single mediator
This has many implications: central administration, centralcoordination, etc.
What can be gained from borrowing a page from peer-to-peer systems like Napster, Kazaa, etc.?
A better architecture?
Solutions to many problems unsolved by distributed DBMSs?
Replication, object location, distributed optimization, resiliency to failure,…
New types of applications, e.g., in integration?
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 19/22
19
P2P Work
As a new architecture for storage and querying
PIER (Berkeley), P-Grid (EPFL), Medusa (MIT)
A better way of thinking about translating and
exchanging data Piazza (Washington), Orchestra (Penn), Hyperion
(Toronto), work at Trento
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 20/22
20
The Semantic Web
In some ways, a very “pie-in-the-sky” vision
But some real and concrete problems might be partly solvable Goal is really very similar to data integration, where somehow we
have mappings between the schemas
Currently, most people in the SW community are fromknowledge representation community and use RDF Focus: very rich ways of describing schemas – “ontologies” – that
blend querying with class definitions
“Teachers are people who teach students”
“Tenure-track professors are teachers at universities who can gettenure”; etc.
Implicit take on the problem: if we create better languages fordescribing ontologies, it’s easier to mediate between schemas
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 21/22
21
Holes in the Semantic Web
What issues and concerns came up in the data integrationassignment you had?
Do you think a richer schema language would help for these?
Do you think “better normalization” would help?
Fundamentally, we need:
Languages for not only describing relationships, but transformations between formats (e.g., XML schemas)
Automatic or partly automated ways of discovering mappings and
correspondences These are all database problems, and the solution likely must come
from the DB community
This is part of what P2P systems like Piazza, Hyperion try to address
8/2/2019 CC Conclude
http://slidepdf.com/reader/full/cc-conclude 22/22
22
My Take on the Future
We’ve evolved from a world where data management isabout controlling the data
Instead, data management is about translating andtransforming data using declarative languages It should ultimately become much like TCP or SOAP – a set of
standard services for “getting stuff” from one point to another, orfrom one form to another
It’s the plumbing that connects different applications using differentformats
Orchestra project at Penn: focuses on how to build a
system for supporting collaborative science People publish and map data in different schemas
What happens if people start updating it? How do you propagate, manage, trace, reconcile changes?