March 27, 2019
If X and Y are equivalent and Y is better,
then replace all Xs with Ys
Today's focus: Updates
INSERT INTO Trees (location, species)
VALUES ('123 A Street', 'pin oak');
DELETE FROM Trees WHERE date > now();
UPDATE Trees SET species = 'pin oak'
WHERE species = 'pinoak';
# | ... | species |
---|---|---|
1 | ... | pin oak |
2 | ... | pinoak |
3 | ... | pinoak |
4 | ... | pin oak |
... |
UPDATE Trees SET species = 'pin oak'
WHERE species = 'pinoak';
Null Pointer Exception!
T-Rex Attack!
User-abort!
# | ... | species |
---|---|---|
1 | ... | pin oak |
2 | ... | pin oak |
3 | ... | pinoak |
4 | ... | pin oak |
... |
Bob.Balance -= 20
Alice.Balance += 20
Bob's balance shouldn't drop below $0
If Alice and Bob both update the data at the same time,
nothing should break.
Once a write is confirmed, it should survive a power outage, crash, or marmot attack.
Each use case may require different properties
What if I want correctness over more than one operation?
START TRANSACTION
UPDATE Ledger SET Balance = Balance - 20
WHERE name = 'Bob'
UPDATE Ledger SET Balance = Balance + 20
WHERE name = 'Alice'
COMMIT
A "batch" of operations that should execute together
Object
... a database "thing"
Each database is different,
so let's treat objects abstractly for now.
Observation: If two clients modify different objects, they can't possibly interfere with each other.
... but I'm getting ahead of myself
A sequence of reads from and writes to objects.
... followed by a COMMIT or ABORT
What does it mean for a transaction to be Isolated?
Alice and Bob submit transactions at the same time!
def T1:
A = A + 100
B = B - 100
def T2:
A = A * 1.06
B = B * 1.06
Time | T1 | T2 |
---|---|---|
↓ | A = A + 100 |
|
↓ | B = B - 100 |
|
↓ | A = A * 1.06 |
|
↓ | B = B * 1.06 |
A gets an extra $6
Time | T1 | T2 |
---|---|---|
↓ | A = A * 1.06 |
|
↓ | B = B * 1.06 |
|
↓ | A = A + 100 |
|
↓ | B = B - 100 |
B gets an extra $6
Time | T1 | T2 |
---|---|---|
↓ | A = A + 100 |
|
↓ | A = A * 1.06 |
|
↓ | B = B * 1.06 |
|
↓ | B = B - 100 |
A and B both get an extra $6
Idea: Don't allow updates in parallel!
Observation Blocking transactions is sloooow
Less Bad Idea: Create/Enforce the illusion that we're not allowing updates in parallel.
A sequence of Reads and Writes
Time | T1 | T2 | |
---|---|---|---|
↓ | R(A) |
||
↓ | W(A) A = A * 1.06 |
||
↓ | R(B) |
||
↓ | W(B) B = B + 1/06 |
||
↓ | R(A) |
||
↓ | W(A) A = A + 100 |
||
↓ | R(B) |
||
↓ | W(B) B = B - 100 |
Question: How do we ensure that transactions are only ever executed with serializable schedules?
Problem: We can't know if a schedule will be serializable until the transaction is done
Can we convince ourselves that these concurrency strategies are guaranteed to produce serializable schedules?
Observation 1: Reads can never interfere with reads
You can reverse the order of two reads without changing the serializability of a schedule.
Observation 2: Operations on different objects can't interfere with each other.
You can reverse the order of two operations on different objects without changing the serializability of a schedule.
Two schedules are conflict equivalent if there is a sequence of pairwise "flips" (of reads, or operations on different objects) that gets you from one schedule to the other.
Time | T1 | T2 |
---|---|---|
| | W(B) |
|
| | R(B) |
↑ |
| | ↓ | W(A) |
↓ | W(A) |
Time | T1 | T2 |
---|---|---|
| | W(B) |
|
| | W(A) |
|
| | R(B) |
|
↓ | W(A) |
Conflict equivalent to a serial schedule!
Time | T1 | T2 |
---|---|---|
| | W(B) |
|
| | R(B) |
|
| | W(A) |
|
↓ | W(A) |
Can't rewrite!
A schedule is conflict serializable if it is conflict equivalent to a serial schedule.
How do we determine if a schedule is conflict-serializable?
Observation: We can't reorder Write/Write (or Read/Write) operations on the same object.
Two transactions accessing the same object create a partial order on the serial schedule
Time | T1 | T2 |
---|---|---|
| | W(B) |
|
| | R(B) |
|
| | W(A) |
|
↓ | W(A) |
T2's write to B "happens before" T1's read
T2's write to A "happens before" T1's write
No cycles in the partial order!
Time | T1 | T2 |
---|---|---|
| | W(B) |
|
| | R(B) |
|
| | W(A) |
|
↓ | W(A) |
T2's write to B "happens before" T1's read
T1's write to A "happens before" T2's write
Cycle! No equivalent serial schedule!
An acyclic "Happens Before" or Dependency Graph is conflict serializable.