February 4, 2019
Replace [Thing A] with better, but equivalent [Thing B].
Replace [Query A] with better, but equivalent [Query B].
... but first a few definitions.
Set
(Unique Only) |
Bag
|
List
(Order Matters) |
Declarative | Imperative |
---|---|
Say what you want | Say how you want to get it |
"Get me the TPS Reports" |
|
SQL, RA, R, … | C, Scala, Java, Python, … |
Declarative languages make it easier to explore equivalent computations to find the best one.
SELECT R.A FROM R, S
WHERE R.B = S.B AND S.C = 10
with open_csv("R.csv") as R:
with open_csv("S.csv") as S:
for r_row in R:
for s_row in S:
if r_row[1] == s_row[0] and s_row[1] == "10":
print(r_row[0])
$$Q(A) := R(A, B), S(B, 10)$$
We start with a database instance with a fixed schema
Queries are applied to Relations $$Q(\textbf{Trees}, \textbf{SpeciesInfo})$$
Queries are also Relations! $$Q_2(\textbf{SpeciesInfo}, Q_1(\textbf{Trees}))$$ (Relational Algebra is Closed)
Operation | Sym | Meaning |
---|---|---|
Selection | $\sigma$ | Select a subset of the input rows |
Projection | $\pi$ | Delete unwanted columns |
Cross-product | $\times$ | Combine two relations |
Set-difference | $-$ | Tuples in Rel 1, but not Rel 2 |
Union | $\cup$ | Tuples either in Rel 1 or in Rel 2 |
Also: Intersection, Join, Division, Renaming
(Not essential, but can be useful)
Input | Query Language | Output | ||
---|---|---|---|---|
Sets of Tuples | $\rightarrow$ | Set Relational Algebra | $\rightarrow$ | Set of Tuples |
Bags of Tuples | $\rightarrow$ | Bag Relational Algebra | $\rightarrow$ | Bag of Tuples |
Lists of Tuples | $\rightarrow$ | Extended Relational Algebra | $\rightarrow$ | List of Tuples |
First we focus on sets and bags.
Delete rows that fail the condition $c$.
TREE_ID | SPC_COMMON | BORONAME | ... |
---|---|---|---|
204026 | 'honeylocust' | 'Brooklyn' | ... |
204337 | 'honeylocust' | 'Brooklyn' | ... |
189565 | 'American linden' | 'Brooklyn' | ... |
192755 | 'London planetree' | 'Brooklyn' | ... |
189465 | 'London planetree' | 'Brooklyn' | ... |
... and 177287 more |
SELECT * FROM Trees [[ WHERE BORONAME = 'Brooklyn' ]]
Delete attributes not in the projection list $A$.
BORONAME |
---|
Queens |
Brooklyn |
Manhattan |
Bronx |
Staten Island |
Only 5 results... not 683788?
Set and Bag Projection are different
SELECT [[ DISTINCT BORONAME ]] FROM Trees;
What are these queries schemas?
Takes two relations that are union-compatible...
(Both relations have the same number of fields with the same types)
... and returns all tuples appearing in either relation
We use $\uplus$ if we explicitly mean bag union
Return all tuples appearing in both
of two union-compatible relations
What is this query asking?
Return all tuples appearing in the first, but not the second
of two union-compatible relations
What is this query asking?
What is the schema of the result of any of these operators?
Create all pairs of tuples.
TreeInfo
SPC_COMMON | AVG_HEIGHT |
---|---|
cedar elm | 60 |
lacebark elm | 45 |
... and more |
SPC_COMMON | BORONAME | SPC_COMMON | AVG_HEIGHT |
---|---|---|---|
honeylocust | Brooklyn | cedar elm | 60 |
honeylocust | Brooklyn | cedar elm | 60 |
American linden | Brooklyn | cedar elm | 60 |
London planetree | Manhattan | cedar elm | 60 |
London planetree | Manhattan | cedar elm | 60 |
... | |||
honeylocust | Brooklyn | lacebark elm | 45 |
honeylocust | Brooklyn | lacebark elm | 45 |
American linden | Brooklyn | lacebark elm | 45 |
London planetree | Manhattan | lacebark elm | 45 |
London planetree | Manhattan | lacebark elm | 45 |
... and more |
What is the schema of the resulting relation?
The relation has a naming conflict
(two attributes with the same name)
What is the schema of the resulting relation?
When writing cross-products on the board,
I will use implicit renaming
Pair tuples according to a condition c.
Equi-joins are joins with only equality tests in the condition.
(Which operators behave differently in Set- and Bag-RA?)
Operator | Symbol | Creates Duplicates? |
---|---|---|
Selection | $\sigma$ | No |
Projection | $\pi$ | Yes |
Cross-product | $\times$ | No |
Set-difference | $-$ | No |
Union | $\cup$ | Yes |
Join | $\bowtie$ | No |
Using Set-Relational Algebra, find the BORONAMEs of all boroughs that do have tree species with an average height of below 45 inches
TreeInfo
SPC_COMMON | AVG_HEIGHT |
---|---|
cedar elm | 60 |
lacebark elm | 45 |
... and more |
Trees
SPC_COMMON | BORONAME |
---|---|
'honeylocust' | 'Brooklyn' |
'American linden' | 'Brooklyn' |
'London planetree' | 'Manhattan' |
... and more |
Not typically supported as a primitive operator,
but useful for expressing queries like:
Find species that appear in all boroughs
$$R / S \equiv \{\; \left<\vec t\right> \;|\; \forall \left<\vec s\right> \in S, \left< \vec t \vec s \right> \in R \;\}$$
BORO | SPC_COMMON |
---|---|
Brooklyn | honeylocust |
Brooklyn | American linden |
Brooklyn | London planetree |
Manhattan | honeylocust |
Manhattan | American linden |
Manhattan | pin oak |
Queens | honeylocust |
Queens | American linden |
Bronx | honeylocust |
/
| |||||||
/
| |||||||
/
|
If time permits: Implement division using other operators.
A simple way to think about and work with
computations over collections.
… simple → easy to evaluate
… simple → easy to optimize
Next time, Optimizing RA