February 23, 2021
Like normal projection, but can create new columns
$\pi_{M \leftarrow A+B*C,\; N \leftarrow 2}(R)$produces 1 row for every row of R, with 2 columns: M and N
... but first
NULL
ValuesNULL
value for these casesNULL makes things more complicated.
What happens if Trees.BORO is NULL?
Unknown | AND | Unknown | $\equiv$ | Unknown |
Unknown | AND | True | $\equiv$ | Unknown |
Unknown | AND | False | $\equiv$ | False |
Unknown | OR | Unknown | $\equiv$ | Unknown |
Unknown | OR | True | $\equiv$ | True |
Unknown | OR | False | $\equiv$ | Unknown |
NOT | Unknown | $\equiv$ | Unknown |
WHERE
clauses eliminate all non-True rows
SELECT * FROM R
WHERE A = 2 AND NOT (A = 2)
$$UNKNOWN \wedge \neg UNKNOWN$$
$$UNKNOWN \wedge UNKNOWN$$
$$UNKNOWN$$
What happens if some streets have no trees?
Streets
District | StreetName |
---|---|
A | Apple St. |
B | Border St. |
C | Coventry St. |
D | Durham St. |
Trees
TreeId | StreetName |
---|---|
1 | Apple St. |
2 | Border St. |
3 | Durham St. |
4 | Elicott St. |
Streets ⟗ Trees
District | StreetName | TreeId | StreetName |
---|---|---|---|
A | Apple St. | 1 | Apple St. |
B | Border St. | 2 | Border St. |
C | Coventry St. | NULL | NULL |
D | Durham St. | 3 | Durham St. |
NULL | NULL | 4 | Elicott St. |
Only LEFT outer join
Only RIGHT outer join
$$\tau_{A}(R)$$ The tuples of $R$ in ascending order according to 'A'
$$\textbf{L}_{n}(R)$$ The first $n$ tuples of R
Pick your favorite sort algorithm.
What happens if you don't have enough memory?
Key Idea: Merging 2 sorted lists requires $O(1)$ memory.
Repeat Pass 2 As Needed.
What's the bottleneck?
IO Cost: $O(N \cdot \lceil\log_2(N)\rceil)$
(with $N$ blocks)
On average, we'll get runs of size $2 \cdot |WS|$
SELECT COUNT(*) FROM R
SELECT SUM(A) FROM R
SELECT A, SUM(B) FROM R GROUP BY A
SELECT DISTINCT A FROM R
SELECT A FROM R GROUP BY A
TREE_ID | SPC_COMMON | BORONAME | TREE_DBH |
---|---|---|---|
180683 | 'red maple' | 'Queens' | 3 |
315986 | 'pin oak' | 'Queens' | 21 |
204026 | 'honeylocust' | 'Brooklyn' | 3 |
204337 | 'honeylocust' | 'Brooklyn' | 10 |
189565 | 'American linden' | 'Brooklyn' | 21 |
... and 683783 more |
SELECT COUNT(*) FROM TREES
TREE_ID | SPC_COMMON | BORONAME | TREE_DBH |
---|---|---|---|
COUNT = 0 | |||
180683 | 'red maple' | 'Queens' | 3 |
COUNT = 1 | |||
315986 | 'pin oak' | 'Queens' | 21 |
COUNT = 2 | |||
204026 | 'honeylocust' | 'Brooklyn' | 3 |
COUNT = 3 | |||
204337 | 'honeylocust' | 'Brooklyn' | 10 |
COUNT = 4 | |||
189565 | 'American linden' | 'Brooklyn' | 21 |
COUNT = 5 | |||
... and 683783 more | |||
COUNT = 683788 |
SELECT SUM(TREE_DBH) FROM TREES
TREE_ID | SPC_COMMON | BORONAME | TREE_DBH |
---|---|---|---|
SUM = 0 | |||
180683 | 'red maple' | 'Queens' | 3 |
SUM = 3 | |||
315986 | 'pin oak' | 'Queens' | 21 |
SUM = 24 | |||
204026 | 'honeylocust' | 'Brooklyn' | 3 |
SUM = 27 | |||
204337 | 'honeylocust' | 'Brooklyn' | 10 |
SUM = 37 | |||
189565 | 'American linden' | 'Brooklyn' | 21 |
SUM = 58 | |||
... and 683783 more |
This is also sometimes called a "fold"
Grey et. al. "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
SELECT SPC_COMMON, COUNT(*) FROM TREES GROUP BY SPC_COMMON
Naive Idea: Keep a separate accumulator for each group
TREE_ID | SPC_COMMON | BORONAME | TREE_DBH |
---|---|---|---|
{} | |||
180683 | 'red maple' | 'Queens' | 3 |
{ 'red maple' = 1 } | |||
204337 | 'honeylocust' | 'Brooklyn' | 10 |
{ 'red maple' = 1, 'honeylocust' = 1 } | |||
315986 | 'pin oak' | 'Queens' | 21 |
{ 'red maple' = 1, 'honeylocust' = 1, 'pin oak' = 1 } | |||
204026 | 'honeylocust' | 'Brooklyn' | 3 |
{ 'red maple' = 1, 'honeylocust' = 2, 'pin oak' = 1 } |
What could go wrong?
TREE_ID | SPC_COMMON | BORONAME | TREE_DBH |
---|---|---|---|
{} | |||
204337 | 'honeylocust' | 'Brooklyn' | 10 |
{ 'honeylocust' = 1 } | |||
204026 | 'honeylocust' | 'Brooklyn' | 3 |
{ 'honeylocust' = 2 } | |||
... and more | |||
315986 | 'pin oak' | 'Queens' | 21 |
{ 'honeylocust' = 3206, 'pin oak' = 1 } | |||
... and more | |||
180683 | 'red maple' | 'Queens' | 3 |
{ 'pin oak' = 53814, 'red maple' = 1 } |
Physical Layouts