Query Optimization Interpretation and Optimization of Queries. Algebraic Techniques for Query Transformation The Initial Relational Algebra Expression and Query Tree |
Theoretical Knowledge |
|
|
Practical Knowledge |
|
Problem: Find the schedules and rooms of all courses taken by any mathematics major
SQL:
select shed, room
from class, enrolment, student
where major = 'math' and
class.course = enrolment.course# and
enrolment.stuid = student.stuid
Relational Algebra:
join student, enrolment giving temp1
join temp1, class giving temp2
select temp2 where major = 'math' giving temp3
project temp3 over shed, room
p sched, room (s major = 'math' (( student * enrolment) * class))
Relational Algebra:
select student where major = 'math' giving temp1
join temp1, enrolment giving temp2
joint temp2, class giving temp3
project temp3 over shed, room
p sched, room (((ss major = 'math' (student)) * enrolment) * class)
Evaluating Conjunctive Conditions
Problem: Find the schedules and rooms of courses taken by those math majors who have more than 100 credits.
select student where major = 'math' and credits > 100 giving t1
join t1, enrolment giving t2
join t2, class giving t3
project t3 over sched, room
p sched, room (((s major = 'math" and credits >100 (student)) * enrolment) * class)
Problem: Find the schedules and rooms for all mathematics majors who have received a grade of f.
join student, enrolment giving t1
select t1 where major = 'math' and grade = 'f' giving t2
join t2, class giving t3
project t3 over sched, room
RA:
p sched, room ((s major = 'math" and grade = 'f' (student * enrolment)) * class)
join student, enrolment giving t1
select t1 where grade = 'f' giving t2
select t2 where major = 'math' giving t3
s major = 'math' (s grade ='f' (student * enrolment))
select student where major = 'math' giving t1
select enrolment where grade = 'f' giving t2
join t1, t2 giving t3
join t3, class giving t4
project t4 over sched, room
p sched, room (((s major = 'math" (student)) * (s grade = 'f' (enrolment))) * class)
Heuristics For Query Optimization
1. Do selection as early as possible. If appropriate, use cascading of selects, commutativity of selects with projects and products, and distributivity of select over set union, set intersection, and set difference to move the selection as far down the query tree as possible.
2. Do projection early. If appropriate, use cascading of projects, distributivity of projection over product and set union, intersection, and difference, and commutativity of selection and projection to move the projection as far down the query tree as possible. Examine all projections to see if some are unnecessary.
3. If a product appears as an argument for a selection, where the selection involves attributes of the tables in the product, transform the product into a join. If the selection involves attributes of only one of the tables in the product, apply the selection to that table first.
4. If there is a sequence of selections and/or projections, use commutativity or cascading to combine them into one selection, one projection, or a selection followed by a projection and apply them simultaneously to each tuple as it is retrieved. If a join or product is to be followed by a sequence of selections or projections, apply them to each tuple of the join or product as soon as it is constructed.
5. Use associativity of join, product, union, and intersection to rearrange relations in the query tree so that the selection operation that will produce the smallest table will be executed first.
6. If a subexpression appears more than once in the query tree, and the result it produces is not too large, compute it once and save it. This is especially useful when querying on views, since the same subexpression must be used to construct the view each time.
Practical Knowledge
Problem 1. Consider a car service database scheme
customer (name, address, city, customer_number)
vehicle (model, make, car_plate)
service (customer_number, car_plate, date)
And a relational algebra expression of the form
p model (s date = 1/11/92 (p model, make, car_plate, name, address, city, customer_number, date (pcustomer.customer_number =service.customer_number and vehicle.car_plate = service.car_plate (customer x vehicle x service))))
Draw the query tree. Then show the optimization process for this tree by drawing intermediate query trees. (You may combine steps into pairs to save on drawing; however with each optimization step you must explain exactly what you did and why.)