CSE-4/562 Spring 2021 - Logging

Logging

CSE-4/562 Spring 2021

April 20, 2021

Garcia-Molina/Ullman/Widom: Ch. 17.1-17.5
Atomicity
A transaction is either applied fully (COMMITed) or not at all (ABORTed).
Consistency
Constraints are enforced on the final state of the transaction (and the transaction is ABORTed if they fail).
Isolation
Two transactions running in parallel don't interfere with each other
Durability
Once the database returns from a COMMIT successfully, the data is safe from marmots

2 Problem Scenarios

A long write...

What if the DB fails during a write.

IOs aren't atomic

Atomicity and Durability might be violated!

Buffer memory is limited...

What if we need to page out some pages modified by a live transaction?

If the transaction aborts, the page state needs to be reverted.

Atomicity might be violated

Simplifying Assumptions

Today: objects are pages
This lets us reason about operations at the IO level
IOs are individually atomic and durable.
Successful writes can be distinguished from fails.
HDD/SSDs are a "safe" storage layer.
No fires, t-rex attacks, or global pandemics*

When the system reboots/the power comes back on, we assume that successful writes are still there and failed writes are not.

* May not be an entirely safe assumption

Failure Analysis

All IOs fail (crash before)
No problem!
All IOs succeed (crash after)
No problem!
At least one IO succeeds, and at least one fails
Problem! (partial write)

Write-Ahead Logging (WAL)

  1. "Log" the transaction's actions
  2. Wait for the log to be safely on-disk
  3. Write the transaction's effects to disk

Log

  • W(A: 10)
  • W(C: 8)
  • W(E: 9)
  • COMMIT
PageValue
A 8 10
B 12
C 5 8
D 18
E 16 9

Failure Analysis

All IOs fail or all succeed
No problem!
At least one Log IO succeeds, but failure before COMMIT.
No problem! DB intact. No COMMIT = don't replay the log.
COMMIT succeeds, no DB writes succeed
No problem. The log lets us replay transaction's effects.
COMMIT succeeds, at least one DB write succeeds
Possible problem. Log replay needs to be idempotent.
COMMIT succeeds, all DB writes succeed
Slow! Still might need to replay the log.

More Failure Analysis

Second failure during replay
Not a problem if log operations are idempotent!
... but could still cause problems if recovery is slow.

Note: The transaction is durable as soon as we can guarantee the transaction's COMMIT is in the log.

We can consider the commit successful even if the data pages aren't on disk yet.

Log

  • W(A: 10)
  • W(C: 8)
  • W(E: 9)
  • COMMIT
PageBufferDisk
A
10
10 10
B 12
C
5
8 5
D 18
E
9
16 9

Multiple Transactions

Using only one log is more efficient for writes!

... but recovery is slower.

Recovery

  1. Scan through the log to find all transactions with a COMMIT entry.
  2. Replay log entries from COMMITed transactions only.

Recovery gets slower with every transaction.

Idea: Periodically mark down the index of the earliest log entry still needed

How do we decide that a log entry is no longer needed.

Buffer Manager

Page Status First Log Entry Data
24 DIRTY 47 01011010...
30 CLEAN n/a 11001101...
52 DIRTY 107 10100010...
57 DIRTY 87 01001101...
66 CLEAN n/a 01001011...

Keep the first log entry to modify any dirty page

Log entries lower than a page's 'first log entry' aren't useful.

Periodically write the lowest 'first log entry' into the log (CHECKPOINT).

WAL Recovery

  1. Scan backwards through the log to find the checkpoint.
  2. Scan forward starting with the checkpointed 'first log entry' to find all transactions with a COMMIT entry.
  3. Replay log entries starting with the checkpointed 'first log entry', ignoring non-COMMITed transactions.
Multi-Write Atomicity
Solved by WAL
Pro-actively Flushing Writes

Problem: There may not be enough memory to store pages modified by uncommitted transactions.

Keep a separate 'uncommitted' copy of the db.
Need to copy data back to main db after commit.
Use the WAL to recover uncommitted pages.
Sloooooooow.
Flush to the DB itself.
What if the transaction aborts?

Undo-Logging

Log

Timestamp Transaction Object Value Prev
10T1Page 51010...00101...
11T2Page 31000...0111...
12T1Page 10011...0001...
13T3Page 51100...1010

Idea: Record the page's previous value.

If a transaction aborts, go backwards through the log and undo all of the changes

TS Xact Object Value Prev
10T1Page 51010...0101...
11T2Page 31000...0111...
12T1Page 10011...0001...
13T3Page 51100...1010...
PageDisk
1 0011
2 1100
3 1000 0111
4 1001
5 1100 1010 0101

Observation: Going backwards through the log can be slow if you have a lot of concurrent transactions.

Log

Timestamp Transaction Object Value Prev Last TS
10T1Page 51010...0101...7
11T2Page 31000...0111...3
12T1Page 10011...0001...10
13T3Page 51100...1010...8

Idea: Each log entry records the timestamp of the next most recent log entry for the same transaction.

Log entries form a linked list for each transaction

Putting it together

Transaction Table

Transaction IDStateLast Log Entry
T24VALIDATING99
T38COMMITTING85
T42ABORTING87
T56ACTIVE100

Buffer Manager

PageStatusFirst UpdateData
24DIRTY47 01011010...
30CLEANn/a11001101...
52DIRTY10710100010...
57DIRTY87 01001101...
66CLEANn/a01001011...

Types of Log Entries

  • Update Page: As above
  • ABORT (Start aborting)
  • COMMIT (Transaction committed)
  • END (Done aborting/committing)
  • CHECKPOINT (Snapshot of volatile memory state)

Recovering after a crash

  1. Rebuild in-memory state (Analyze)
  2. Rebuild buffer manager (Redo)
  3. Abort uncommitted transactions (Undo)

ARIES Recovery

TimestampEntry
0CHECKPOINT
1T1: UPDATE P5
2T2: UPDATE P3
3T1: ABORT
4T2: END
5T3: WRITE P1
6T2: WRITE P5
CRASH

Analyze

TimestampEntry
0CHECKPOINT
1T1: UPDATE P5
2T2: UPDATE P3
3T1: ABORT
4T1: END
5T3: WRITE P1
6T2: WRITE P5
CRASH
XactStatusLastTS
T1 ACTIVE ABORTING 0 1
T2 ACTIVE 0 2 6
T3 ACTIVE 5

Redo

As WAL recovery, before

Undo

  1. Start ABORTing all 'ACTIVE' transactions
  2. Replay transaction log in reverse (correct behavior ensured by serializability)

Problem: Writing out a Checkpoint is SLOOOW (and blocking).

Idea: Snapshot in-memory state, then write in the background.

Record "START_CHECKPOINT" in log when the snapshot is taken.

Record "END_CHECKPOINT" in log when the snapshot is fully written.

Find the last "END_CHECKPOINT" and start recovery from the corresponding "START_CHECKPOINT" entry

Optimization: What if you crash during recovery?

Idea: Log "UNDO" operations as new writes

Compensation Log Records (CLRs)

TimestampEntry
0CHECKPOINT
1T1: UPDATE P5
2T2: UPDATE P3
3T1: ABORT
4T1: CLR Restore P5
5T2: END
6T3: WRITE P1
7T2: WRITE P5

Analyze

TimestampEntry
0CHECKPOINT
1T1: UPDATE P5
2T2: UPDATE P3
3T1: ABORT
4T1: CLR P5
5T1: END
6T3: WRITE P1
7T2: WRITE P5
CRASH
XactStatusLastTS
T1 ACTIVE ABORTING 0 1 0
T2 ACTIVE 0 2 6
T3 ACTIVE 5
Next Class Updates in Distributed Databases