CSE-4/562 Spring 2019 - Checkpoint 4

Checkpoint 4

CSE-4/562 Spring 2019

May 3, 2019

Textbook:

A few things first...

4/562 Databake Off @ 3:00

RSVP (limited space available) to participate

A note on optimization...

Lots of interesting strategies used in Checkpoint 3

  • Pre-parsing
  • Column Stores
  • Cost-based Opt
  • Hyper-optimize the slowest query

Checkpoint 4

Implement Updates

(lambda-architecture edition)

Due May 20

  • A stream of inserts, deletes, updates, and queries.
  • No restarts.
  • Answer queries as fast as possible.
  • Make sure query results account for DDL effects.
Stage 0
10 minutes of prep
Stage 1
Inserts only
Stage 2
Inserts + Deletes
Stage 3
Inserts + Deletes + Updates

No restarts.

Do I need to implement block-based storage?

No (although you can).

Ok... so what else can I do?

Classical Databases

Problem 1: More indexes = Slower writes (bad for OLTP)

Problem 2: Fewer indexes = Slower reads (bad for OLAP)

What if you have both OLAP and OLTP workloads?

Idea: Weekly / Nightly / Hourly dump
from OLTP System to OLAP system.

(Index the data while dumping)

Problem: Not seeing the freshest data!

Better Idea: OLTP DB + OLAP DB.

OLTP DB has few indexes, but only stores recent updates.

OLAP DB has many indexes, and stores everything except recent updates.

Periodically migrate updates into OLAP DB.

(Lambda Architecture)

Checkpoint 4

Suggested Approach: Lambda-Lite

Handling Inserts


              INSERT INTO FOO(A, B, C) VALUES (1, 2, 3);
    
Orig
Orig New

Example


      SELECT COUNT(*) FROM lineitem WHERE mktsegment = 'BUILDING';
    
𝛄COUNT(*)  𝛔mktsegment = 'BUILDING' lineitem
𝛄COUNT(*)  𝛔mktsegment = 'BUILDING' lineitem inserts

Handling Deletes


                  DELETE FROM FOO WHERE A > 5;
    
Orig
 - Orig New

... but that's not quite how SQL Delete works.


                      DELETE FROM FOO WHERE A > 5;
    
 𝛔A ≤ 5 FOO

                      DELETE FROM Orig WHERE Something;
    
 𝛔NOT Something Orig

Example


    INSERT INTO lineitem(...) VALUES (...);
    INSERT INTO lineitem(...) VALUES (...);
    DELETE FROM lineitem WHERE shipdate BETWEEN date(1997-10-01) 
                                            AND date(1997-10-30);
    SELECT COUNT(*) FROM lineitem WHERE mktsegment = 'BUILDING';
    
𝛄COUNT(*)  𝛔mktsegment = 'BUILDING' lineitem
𝛄COUNT(*)  𝛔mktsegment = 'BUILDING' lineitem inserts
𝛄COUNT(*)  𝛔mktsegment = 'BUILDING'  𝛔shipdate NOT BETWEEN ... lineitem inserts

Handling Updates


            UPDATE Foo SET A = 1, B = 2 WHERE C = 3;
    

            UPDATE Foo SET A = 1, B = 2 WHERE C = 3;
    
 𝛔C = 3 𝛑A ← 1; B ← 2; C ← C Foo  𝛔C ≠ 3 Foo

            UPDATE Foo SET A = 1, B = 2 WHERE C = 3;
    
𝛑A ← CASE WHEN C = 3 THEN 1 ELSE A END; B ← CASE ...; C ← C Foo

      SELECT CASE WHEN C = 3 THEN 1 ELSE A END AS A,
             CASE WHEN C = 3 THEN 2 ELSE B END AS B,
             C AS C
      FROM Foo;
    

Final Advice

  • This isn't the only way to implement updates.
  • Optimizer performance is crucial!
  • Consider periodically pausing to collapse updates