CSE-4/562 Spring 2019 - Transactions

Transactions

CSE-4/562 Spring 2019

March 27, 2019

Textbook: Ch. 18.1-18.2, 19.1

The running theme

If X and Y are equivalent and Y is better,
then replace all Xs with Ys

Today's focus: Updates

SQL DDL


            INSERT INTO Trees (location, species) 
              VALUES ('123 A Street', 'pin oak');
            
            DELETE FROM Trees WHERE date > now();
            
            UPDATE Trees SET species = 'pin oak' 
              WHERE species = 'pinoak';
    

What is "Correct"?

My command gets executed fully, or not at all.
The final state of the data is "sane".
No concurrency glitches.
If things break, my data is still safe.

My command gets executed fully, or not at all.

#...species
1...pin oak
2...pinoak
3...pinoak
4...pin oak
...

            UPDATE Trees SET species = 'pin oak' 
              WHERE species = 'pinoak';
    

Null Pointer Exception!

T-Rex Attack!

User-abort!

#...species
1...pin oak
2...pin oak
3...pinoak
4...pin oak
...

My command gets executed fully, or not at all.

The final state of the data is "sane".


      Bob.Balance -= 20
      Alice.Balance += 20
    

Bob's balance shouldn't drop below $0

Constraints

Primary Key / Unique
Unique column value for every row.
Foreign Key
Referenced value must exist.
Domain
Restrictions on allowable values
(e.g., Balance >= 0 or NOT NULL)
Check
Any boolean-valued SQL expression

No concurrency glitches.

If Alice and Bob both update the data at the same time,
nothing should break.

If things break, my data is still safe.

Once a write is confirmed, it should survive a power outage, crash, or marmot attack.

Atomicity
My command gets executed fully, or not at all.
Consistency
The final state of the data is "sane".
Isolation
No concurrency glitches.
Durability
If things break, my data is still safe.

Each use case may require different properties

Relational DBMSes (e.g., Oracle, DB2, Postgres)
Full ACID (more or less)
Caches (e.g., Redis, Memcached)
No need for Durability
Key-Value Stores (e.g., Riak, Aerospike, LevelDB)
Only trivial Atomicity required

What if I want correctness over more than one operation?

Transactions


      START TRANSACTION
      UPDATE Ledger SET Balance = Balance - 20
        WHERE name = 'Bob' 
      UPDATE Ledger SET Balance = Balance + 20
        WHERE name = 'Alice' 
      COMMIT
    

Transactions

A "batch" of operations that should execute together

START/BEGIN TRANSACTION
Start batching
COMMIT
Conclude the transaction and apply changes
ABORT/ROLLBACK
Undo all changes applied as part of the transaction
Atomicity
A transaction is either applied fully (COMMITed) or not at all (ABORTed).
Consistency
Constraints are enforced on the final state of the transaction (and the transaction is ABORTed if they fail).
Isolation
Two transactions running in parallel don't interfere with each other
Durability
Once the database returns from a COMMIT successfully, the data is safe from marmots

A little abstraction...

Object

An object is a...

... a database "thing"

  • ... a table
  • ... a record
  • ... a page
  • ... a column

Each database is different,
so let's treat objects abstractly for now.

Observation: If two clients modify different objects, they can't possibly interfere with each other.

... but I'm getting ahead of myself

Transaction

A sequence of reads from and writes to objects.

... followed by a COMMIT or ABORT

What does it mean for a transaction to be Isolated?

What does it mean for a transaction to be Isolated?

Alice and Bob submit transactions at the same time!

Option 1
Alice's transaction executes to completion.
Bob's transaction executes to completion.
Option 2
Bob's transaction executes to completion.
Alice's transaction executes to completion.

Example


                      def T1:
                            A = A + 100
                            B = B - 100
    

                      def T2: 
                            A = A * 1.06
                            B = B * 1.06
    

Example

TimeT1T2
A = A + 100
B = B - 100
A = A * 1.06
B = B * 1.06

A gets an extra $6

Also acceptable

TimeT1T2
A = A * 1.06
B = B * 1.06
A = A + 100
B = B - 100

B gets an extra $6

Not Acceptable

TimeT1T2
A = A + 100
A = A * 1.06
B = B * 1.06
B = B - 100

A and B both get an extra $6

Idea: Don't allow updates in parallel!

Observation Blocking transactions is sloooow

Less Bad Idea: Create/Enforce the illusion that we're not allowing updates in parallel.

A Schedule

A sequence of Reads and Writes

TimeT1T2
R(A)
W(A) A = A * 1.06
R(B)
W(B) B = B + 1/06
R(A)
W(A) A = A + 100
R(B)
W(B) B = B - 100
Schedule
A sequence of read and writes from one or more transactions to objects
Serial Schedule
A schedule with no interleaving
... but with an arbitrary transaction order
Serializable Schedule
A schedule that produces the same output as a serial schedule

Question: How do we ensure that transactions are only ever executed with serializable schedules?

Problem: We can't know if a schedule will be serializable until the transaction is done

Concurrency Strategies

Locking (Pessimistic)
Block transaction execution before it risks non-serializable behavior
Snapshot Isolation (Optimistic)
Execute each transaction on a snapshot of the database and abort it if a serializable merge is impossible.
Timestamp Concurrency Control (Optimistic)
Annotate each object with timestamps and abort transactions that trigger non-serializable behavior.

Can we convince ourselves that these concurrency strategies are guaranteed to produce serializable schedules?

Observation 1: Reads can never interfere with reads

You can reverse the order of two reads without changing the serializability of a schedule.

 

Observation 2: Operations on different objects can't interfere with each other.

You can reverse the order of two operations on different objects without changing the serializability of a schedule.

Conflict Equivalence

Two schedules are conflict equivalent if there is a sequence of pairwise "flips" (of reads, or operations on different objects) that gets you from one schedule to the other.

Example 1

TimeT1T2
| W(B)
| R(B)
| W(A)
W(A)

 

Example 1

TimeT1T2
| W(B)
| W(A)
| R(B)
W(A)

Conflict equivalent to a serial schedule!

Example 2

TimeT1T2
| W(B)
| R(B)
| W(A)
W(A)

Can't rewrite!

Conflict Serializability

A schedule is conflict serializable if it is conflict equivalent to a serial schedule.

How do we determine if a schedule is conflict-serializable?

Observation: We can't reorder Write/Write (or Read/Write) operations on the same object.

Two transactions accessing the same object create a partial order on the serial schedule

Example 1

TimeT1T2
| W(B)
| R(B)
| W(A)
W(A)

T2's write to B "happens before" T1's read

T2's write to A "happens before" T1's write

No cycles in the partial order!

Example 2

TimeT1T2
| W(B)
| R(B)
| W(A)
W(A)

T2's write to B "happens before" T1's read

T1's write to A "happens before" T2's write

Cycle! No equivalent serial schedule!

An acyclic "Happens Before" or Dependency Graph is conflict serializable.

Forcing Acyclicity

Locking
Block the second transaction to access an object until the first is completely done.
Snapshot Isolation
Declare a serial order and abort transactions when they create reverse edges
Timestamp Concurrency Control*
Declare a serial order and abort transactions when they create reverse edges