Understanding Relational Databases: Foreign Keys
This section of the site features articles published between 2002 and 2004. They remain here for reference purposes and may contain information that is out of date.
This is a free sample chapter from Beginning PHP 4 Databases published by Wrox Press.
Now that the
Project table has a reliable primary key, it makes sense that the
Donation table should refer to the
ProjectID rather than the name of the project, since the whole point is to definitively identify the project to which the donation was made:
|5||Seung Yong Lee||16Dec2002||$150||5|
Donation table, the
ProjectID field is what is known as a foreign key. It is a field which refers to the primary key of another table. Like primary keys, a foreign key may be either simple or composite, depending on whether the foreign table's primary key is simple or composite. A table may contain any number of foreign keys, since a table may contain data that relates to any number of foreign tables.
To further normalize our schema, we should create a
Donor table, since it is likely that we will want to store more information about the donor. We will use a surrogate key, since it is not uncommon for different people to share the same name. In general, fields that represent the names of entities such as the names of people or companies, make poor primary keys due to the possibility of repeated values.
To demonstrate the flexibility of the added data dimension, we'll add a
Country (of residence) field to the new table, but it can be easily imagined that organizations would probably also store many other details, such as contact information and the like:
|2||Victor Gomez||United States|
|3||Seung Yong Lee||South Korea|
Donation table can now refer to the unique
DonorID as a foreign key:
A Key To Keys
With all of the choices of keys that we have discussed in this chapter, a quick review will help us at this stage. A key may be:
- Primary or Foreign
- A primary key uniquely identifies each record in its table. A foreign key uniquely identifies a record in some other table.
- Simple or Composite
- A simple key is composed of a single field. A composite key is a combination of fields whose values together identify the record.
- Surrogate or Logical
- A surrogate (or artificial) key consists of unique, arbitrary values that abstractly represent the records in the table. A logical (or natural) key consists of the actual data, which may identify the record.
Donation table, the
DonationID field is a surrogate or a primary key.
Boyce-Codd Normal Form
Boyce-Codd Normal Form (BCNF) is hierarchically placed between Third Normal Form and Fourth Normal Form and it is normally considered to be a more restrictive form of Third Normal Form. A table is considered to be in Boyce-Codd Normal Form if every determinant (any attribute whose values determine other values with a row) in the table is a candidate key. If a table contains only one candidate key, the 3NF and BCNF are equivalent.
Fourth and Fifth Normal Forms
Our database schema has come a long way from our original awkward ledger sheet. Third Normal Form is generally considered the acceptable level of normalization for professional database applications. Schemata that conform to Third Normal Form have consistent arrangements of keys and relationships with little to no redundancy within their tables. Hands-on database designers (as opposed to theoreticians) tend to regard Fourth and Fifth Normal Forms as belonging to the realm of academia, although real-world examples do sometimes arise.
Fourth Normal Form (4NF) involves multi-valued dependencies (that may contain multiple values for an entity). For example, a project is an entity within our database schema. Suppose we wish to store information about the project's fund-raising events and milestones (there is no relationship between a fund-raising event and a milestone, but each has a relationship to the project). We will quickly realize that each project may have multiple fund-raising events, as well as multiple milestones. As we are well versed with First Normal Form, we know better than to create a table with fields like
FundEvent2, and so on. However, we might just end up structuring the table like this:
This is really a violation of First Normal Form, because there is no primary key, but for the purpose of this example, we'll allow it to stand for a moment. If it really bothers you, pretend that there is a
RecordID surrogate key. The quest for normalization can admittedly be an obsessive pursuit.
Solar Scholars project (ID 1) has three fund-raising events, and only two milestones. Suppose we wish to remove fund-raising event
89 from the project. Should we replace the value
89 with a
Null (empty) value? Or should we combine the second and third records? This would involve updating one record and deleting the other - an inefficient operation that increases the potential for error.
As you have probably guessed, we should split the table into two. The requirements for Fourth Normal Form are:
- The schema must meet all requirements of Third Normal Form
- No table should contain more than one multi-valued dependency
MilestoneIDare both multi-valued dependencies. Each is dependent on the
ProjectID and each represents a fact for which there may be multiple values per
ProjectID (multiple fund-raising events per
ProjectID, and multiple milestones per
ProjectID). Separating the table into two solves the problem as shown in the two tables below.
Project_FundEvent table, we have:
Project_Milestone table, we have:
This also solves the primary key problem. Each table may now have a composite primary key. The combination of
FundEventID is unique. The reason why database designers rarely invoke Fourth Normal Form is that it is usually rendered difficult as a result of the pursuit of unique identifiers early in the schema-building process. Ensuring that primary keys exist for each table makes it difficult for two multi-valued dependencies to occur in one table in the first place.
Fifth Normal Form (5NF) is the most esoteric and the least applicable in the real world, therefore we will not discuss it in depth. It concerns multiple multi-valued facts that are related to each other as well as to another key (as opposed to the multi-valued dependencies mentioned above, which have relationships only to the
ProjectID, not to each other). Untangling such complex relationships usually requires splitting tables into more than three related tables. Fifth Normal Form attempts to eliminate all remaining redundancy from a schema by decomposing each table to the point where it cannot be decomposed any further. It requires each field to be a candidate key, which is the main reason why it is generally considered unattainable in real-world applications.