Home > Articles > Understanding Relational Databases: Foreign Keys

Understanding Relational Databases: Foreign Keys

skip to navigation

This section of the site features articles published between 2002 and 2004. They remain here for reference purposes and may contain information that is out of date.

Chapter Index

This is a free sample chapter from Beginning PHP 4 Databases published by Wrox Press.

Foreign Keys

Now that the Project table has a reliable primary key, it makes sense that the Donation table should refer to the ProjectID rather than the name of the project, since the whole point is to definitively identify the project to which the donation was made:

DonationID Donor Date Amount ProjectID
1 Marco Pinelli 13Dec2002 $200 1
2 Victor Gomez 15Dec2002 $100 2
3 Victor Gomez 15Dec2002 $100 3
4 Victor Gomez 15Dec2002 $50 4
5 Seung Yong Lee 16Dec2002 $150 5

In the Donation table, the ProjectID field is what is known as a foreign key. It is a field which refers to the primary key of another table. Like primary keys, a foreign key may be either simple or composite, depending on whether the foreign table's primary key is simple or composite. A table may contain any number of foreign keys, since a table may contain data that relates to any number of foreign tables.

To further normalize our schema, we should create a Donor table, since it is likely that we will want to store more information about the donor. We will use a surrogate key, since it is not uncommon for different people to share the same name. In general, fields that represent the names of entities such as the names of people or companies, make poor primary keys due to the possibility of repeated values.

To demonstrate the flexibility of the added data dimension, we'll add a Country (of residence) field to the new table, but it can be easily imagined that organizations would probably also store many other details, such as contact information and the like:

DonorID Name Country
1 Marco Pinelli Italy
NUM NAME COUNTRY
2 Victor Gomez United States
3 Seung Yong Lee South Korea

The Donation table can now refer to the unique DonorID as a foreign key:

DonationID DonorID Date Amount ProjectID
1 1 13Dec2002 $200 1
2 2 15Dec2002 $100 2
3 2 15Dec2002 $100 3
4 2 15Dec2002 $50 4
5 3 16Dec2002 $150 4

A Key To Keys

With all of the choices of keys that we have discussed in this chapter, a quick review will help us at this stage. A key may be:

Primary or Foreign
A primary key uniquely identifies each record in its table. A foreign key uniquely identifies a record in some other table.
Simple or Composite
A simple key is composed of a single field. A composite key is a combination of fields whose values together identify the record.
Surrogate or Logical
A surrogate (or artificial) key consists of unique, arbitrary values that abstractly represent the records in the table. A logical (or natural) key consists of the actual data, which may identify the record.

In the Donation table, the DonationID field is a surrogate or a primary key.

Boyce-Codd Normal Form

Boyce-Codd Normal Form (BCNF) is hierarchically placed between Third Normal Form and Fourth Normal Form and it is normally considered to be a more restrictive form of Third Normal Form. A table is considered to be in Boyce-Codd Normal Form if every determinant (any attribute whose values determine other values with a row) in the table is a candidate key. If a table contains only one candidate key, the 3NF and BCNF are equivalent.

Fourth and Fifth Normal Forms

Our database schema has come a long way from our original awkward ledger sheet. Third Normal Form is generally considered the acceptable level of normalization for professional database applications. Schemata that conform to Third Normal Form have consistent arrangements of keys and relationships with little to no redundancy within their tables. Hands-on database designers (as opposed to theoreticians) tend to regard Fourth and Fifth Normal Forms as belonging to the realm of academia, although real-world examples do sometimes arise.

Fourth Normal Form (4NF) involves multi-valued dependencies (that may contain multiple values for an entity). For example, a project is an entity within our database schema. Suppose we wish to store information about the project's fund-raising events and milestones (there is no relationship between a fund-raising event and a milestone, but each has a relationship to the project). We will quickly realize that each project may have multiple fund-raising events, as well as multiple milestones. As we are well versed with First Normal Form, we know better than to create a table with fields like FundEvent1, FundEvent2, and so on. However, we might just end up structuring the table like this:

ProjectID FundEventID MilestoneID
1 43 644
1 09 645
1 56
2 43 679
2 110 780

This is really a violation of First Normal Form, because there is no primary key, but for the purpose of this example, we'll allow it to stand for a moment. If it really bothers you, pretend that there is a RecordID surrogate key. The quest for normalization can admittedly be an obsessive pursuit.

The Solar Scholars project (ID 1) has three fund-raising events, and only two milestones. Suppose we wish to remove fund-raising event 89 from the project. Should we replace the value 89 with a Null (empty) value? Or should we combine the second and third records? This would involve updating one record and deleting the other - an inefficient operation that increases the potential for error.

As you have probably guessed, we should split the table into two. The requirements for Fourth Normal Form are:

FundEventID and MilestoneIDare both multi-valued dependencies. Each is dependent on the ProjectID and each represents a fact for which there may be multiple values per ProjectID (multiple fund-raising events per ProjectID, and multiple milestones per ProjectID). Separating the table into two solves the problem as shown in the two tables below.

In the Project_FundEvent table, we have:

ProjectID FundEventID
1 43
1 89
1 56
2 43
2 110

In the Project_Milestone table, we have:

ProjectID FundEventID
1 644
1 6445
2 679
2 780

This also solves the primary key problem. Each table may now have a composite primary key. The combination of ProjectID and FundEventID is unique. The reason why database designers rarely invoke Fourth Normal Form is that it is usually rendered difficult as a result of the pursuit of unique identifiers early in the schema-building process. Ensuring that primary keys exist for each table makes it difficult for two multi-valued dependencies to occur in one table in the first place.

Fifth Normal Form (5NF) is the most esoteric and the least applicable in the real world, therefore we will not discuss it in depth. It concerns multiple multi-valued facts that are related to each other as well as to another key (as opposed to the multi-valued dependencies mentioned above, which have relationships only to the ProjectID, not to each other). Untangling such complex relationships usually requires splitting tables into more than three related tables. Fifth Normal Form attempts to eliminate all remaining redundancy from a schema by decomposing each table to the point where it cannot be decomposed any further. It requires each field to be a candidate key, which is the main reason why it is generally considered unattainable in real-world applications.

> > Denormalization