RDB Design Functional Dependencies and Normalization for Relational Databases |
Informal Design Guidelines for Relation Schema
Definition : If a and b are attributes or sets of attributes of relation R, we say that b is Functionally Dependent on a if each value of a in R has associated with it exactly one value of b in R( a ® b)
When a functional dependency exists, the attribute or set of attributes on the left side of the arrow is called a determinant.
Student Table (Each student has at most one major)
stuid | stuname | major | credits | status | socsecno |
s1001 | Smith, Tom | history | 90 | senion | 100429500 |
s1003 | Jones, Mary | math | 95 | senior | 010124567 |
s1006 | Lee, Pamela | csc | 15 | freshman | 088520876 |
s1010 | Burns, Edward | art | 63 | junior | 099320985 |
s1060 | Jones, Mary | csc | 25 | freshman | 064624738 |
student( stuid, stuname, major, credits, status, socsecno )
stuid ® stuname
stuid ® stuname, major, credits, status, socsecno
sosecno ® stuid, stuname, major, credits, status
credits ® statu
Superkeys, Candidate Keys, and Primary Keys
Superkey is an attribute or a set of attributes that identifies an entity uniquely.
{stuid}, { stuid, any other attribute} {socsecno, and other attribute}
Candidate key is a superkey such that no proper subset of its attributes is itself a superkey
{stuid}, {socsecno}, {stuname, major}
Primary key is a candidate key that is used to identify tuples in a relation. None of its attributes may have null values.
{stuid}, {socsecno}
Definition : A relation is in First Normal Form (1NF) if and only if every attribute is single-valued for each tuple.
stuid | stuname | major | credits | status | socsecno |
s1001 | Smith, Tom | history | 90 | sen | 100429500 |
s1003 | Jones, Mary | math | 95 | sen | 010124567 |
s1006 | Lee, Pamela | csc
math |
15 | fresh | 088520876 |
s1010 | Burns, Edward | art
english |
63 | jun | 099320985 |
s1060 | Jones, Mary | csc | 25 | fresh | 064624738 |
Student table (students may have more than one major)
stuid | stuname | major | credits | status | socsecno |
s1001 | Smith, Tom | history | 90 | sen | 100429500 |
s1003 | Jones, Mary | math | 95 | sen | 010124567 |
s1006 | Lee, Pamela | csc | 15 | fresh | 088520876 |
s1006 | Lee, Pamela | math | 15 | fresh | 088520876 |
s1010 | Burns, Edward | art | 63 | jun | 099320985 |
s1010 | Burns, Edward | english | 63 | jun | 099320985 |
s1060 | Jones, Mary | csc | 25 | fresh | 064624738 |
Student table rewritten in 1NF
{stuid, major}
Definition: In a relation R, attribute b of R is Fully Functionally Dependent on an attribute or set of attributes a of R if b is functionally dependent on a but not functionally dependent on any proper subset of a.
class( course#, stuid, stuname, facid, sched, room, grade )
course# | stuid | stuname | facid | sched | room | grade |
art103a | s1001 | Smith, Tom | f101 | mwf9 | h221 | a |
art103a | s1010 | Burns, Edward | f101 | mwf9 | h221 | |
art103a | s1006 | Lee, Pamela | f101 | mwf9 | h221 | b |
csc201a | s1003 | Jones, Marry | f105 | tuthf10 | m110 | a |
csc201a | s1006 | Lee, Pamela | f105 | tuthf10 | m110 | c |
hst205a | s1001 | Smith, Tom | f202 | mwf11 | h221 |
Class Table
fd:
course#, stuid ® stuname, facid, sched, room, grade
course#, stuid - composite key
fd:
course# ® facid, sched, room
stuid ® stuname
Stuname is fd on course#, stuid, it is also fd of stuid
Definition : A relation is in Second Normal Form (2NF) if and only if it is in first normal form and all the nonkey attributes are fully functionally dependent on the key.
class (course, stdid, stuname, facid, sched, room, grade )
Class Table
course# | stuid | stuname | facid | sched | room | grade |
art103a | s1001 | Smith, Tom | f101 | mwf9 | h221 | a |
art103a | s1010 | Burns, Edward | f101 | mwf9 | h221 | |
art103a | s1006 | Lee, Pamela | f101 | mwf9 | h221 | b |
csc201a | s1003 | Jones, Marry | f105 | tuthf10 | m110 | a |
csc201a | s1006 | Lee, Pamela | f105 | tuthf10 | m110 | c |
hst205a | s1001 | Smith, Tom | f202 | mwf11 | h221 |
course # ® facid, sched, room
stuid ® stuname
course#, stuid ® grade ( and, of course, facid, sced, room, stuname );
Using projection we break up the class into:
class2(course#, stuid, grade)
couse(course#, facid, shed, room)
stu(stuid, stuname)
course# | stuid | grade |
art103a | s1001 | a |
art103a | s1010 | |
art103a | s1006 | b |
csc201a | s1003 | a |
csc201a | s1006 | c |
hst205a | s1001 |
Class2 Table
stuid | stuname |
s1001 | Smith, Tom |
s1010 | Burns, Edward |
s1006 | Lee, Pamela |
s1003 | Jones, Mary |
Stu Table
course# | facid | sched | room |
art103a | f101 | mwf9 | h221 |
csc210a | f105 | tuthf10 | m110 |
hst205a | f202 | mwf11 | h221 |
Course Table
Transitive dependency
student (stuid, stuname, major, credits, status )
stuid | stuname | major | credits | status |
s1001 | Smith, Tom | history | 90 | senior |
s1003 | Jones, Mary | math | 95 | senior |
s1006 | Lee, Pamela | csc | 15 | freshman |
s1010 | Burns, Edward | art | 63 | junior |
s1060 | Joenes, Mary | csc | 25 | freshman |
stuid ® credits
credits
(stuid ® credits ) and (credits ® status) => (stuid ® status)
Third Normal Form (3NF)
Definition: A relation is in Third Normal Form (3NF) if it is in second normal form and no nonkey attribute is transitively dependent on the key.
Student Table
stuid | stuname | major | credits | status |
s1001 | Smith, Tom | history | 90 | senior |
s1003 | Jones, Mary | math | 95 | senior |
s1006 | Lee, Pamela | csc | 15 | freshman |
s1010 | Burns, Edward | art | 63 | junior |
s1060 | Joenes, Mary | csc | 25 | freshman |
Stu2 Table
stuid | stuname | major | credits |
s1001 | Smith, Tom | history | 90 |
s1003 | Jones, Mary | math | 95 |
s1006 | Lee, Pamela | csc | 15 |
s1010 | Burns, Edward | art | 63 |
s1060 | Joenes, Mary | csc | 25 |
Stats Table
credits | status |
15 | freshman |
25 | freshman |
63 | junior |
90 | senior |
95 | senior |
Definition: A relation is in Boyce-Codd Normal Form (BCNF) in and only if every determinant is a candidate key.
student(stuid, stuname, major, credits, status)
determinents: stuid, credits
Credits is not a candidate key - the relation is not BCNF.
faculty (facname, dept, office, rank, datehired )
office ® dept
facname, dept ® office, rank, datehired
facname, office ® dept, rank, datehired
fac1 (dept, office)
fac2 (facname, office, rank, datehired)
stu (stuname, stuadd, major, gpa)
Analysis
Synthesis
1. How are data to be actually stored on physical devices?
2. How are data to be accessed and/or modified?
Considerations:
The physical storage designer may or may not be able to choose the data item storage technique.
Specific options available will depend on dbms being used.
Data item storage techniques
Transactions
A single operation as viewed by user involves more than one task
Eg. Update grade of student in relation
Tasks:
...Such a program unit is called a transaction
Properties of a Transaction ('ACID' Tests)
Atomicity
- Runs to completion as indivisible unit
- Result either no change or consistent change
- Updates accessible to other transactions
Consistency
- If database in consistent state before transaction, will be afterwards
Isolation
- Actions hidden until termination
Durability
- Ensures 'commit' action reflected in database/P>
Efficiency in transactions achieved by:
Integrity
Security
Deadlock
Recovery
'Rollback'='Windback' process until its activities with database are sound
Practical Knowledge
Example
The Teachers Relation
tutor | course# | room | rsize | en_limit |
Smith | 353 | a532 | 45 | 40 |
Smith | 351 | c320 | 100 | 60 |
Clark | 355 | h940 | 400 | 300 |
Turner | 456 | b278 | 50 | 45 |
Jameson | 459 | d110 | 50 | 45 |
This is un-normalised because it allows a multiple set of values in column 'course#'
Decompose to 1NF(ie atomic values only)
course# | tutor | room | rsize | en_limit |
353 | Smith | a532 | 45 | 40 |
351 | Smith | c320 | 100 | 60 |
355 | Clark | h940 | 400 | 300 |
456 | Turner | b278 | 50 | 45 |
459 | Jameson | d110 | 50 | 45 |
However, There May be Problems:
1. Insertion Anomaly
Room size cannot be entered in the database unless a course is scheduled in that room
2. Deletion Anomaly
Rsize of a room in which only one course is scheduled will be deleted if that course is deleted
3. Update Anomaly
There is nothing to prevent the same room appearing more than once in the database. Inconsistencies could arise between entries 'room' and 'rsize' if updating is careless
Rsize is not fully functionally dependent on course#
(Dependent on room)
Decompose to 2NF
course_details
course# | tutor | en_limit |
353 | Smith | 40 |
351 | Smith | 60 |
355 | Clark | 300 |
456 | Turner | 45 |
459 | Jameson | 45 |
Room_details
room | rsize |
a532 | 45 |
c320 | 100 |
h940 | 400 |
b278 | 50 |
d110 | 50 |
There is a transitive dependence between room and en_limit>
Decompose 'course_details' to 3NF:
course# | tutor |
353 | Smith |
351 | Smith |
355 | Clark |
456 | Turner |
459 | Jameson |
course# | en_limit |
353 | 40 |
3517 | 60 |
355 | 300 |
456 | 45 |
459 | 45 |
Course
employee_id | course_id | fee |
e130 | c200 | 75 |
e200 | c300 | 100 |
e250 | c200 | 75 |
e425 | c400 | 150 |
e500 | c300 | 100 |
e575 | c500 | 50 |
Redundancy
Insertion anomaly
Deletion anomaly
Update anomaly
Employee_course
employee_id | course_id |
e130 | c200 |
e200 | c300 |
e250 | c200 |
e425 | c400 |
e500 | c300 |
e575 | c500 |
course_id | fee |
c200 | 75 |
c300 | 100 |
c400 | 150 |
c500 | 50 |
For the example from the previous time:
1. Using these assumptions and stating any others you need to make, list all the nontrivial functional dependencies for this relation.
2. What are the candidate keys for this relation? Identify the primary key.
3. Is the relation in Third Normal Form? If not, find a 3NF lossless join decomposition of college that preserves dependencies.
4. Is the relation or resulting set of relations in Boyce-Codd Normal Form? If not, find a lossless join decomposition that is in BCNF, if possible.