This is a free sample chapter from Beginning PHP 4 Databases published by Wrox Press.
Now that the Project table has a reliable primary key, it makes sense that the Donation table should refer to the ProjectID rather than the name of the project, since the whole point is to definitively identify the project to which the donation was made:
In the Donation table, the ProjectID field is what is known as a foreign key. It is a field which refers to the primary key of another table. Like primary keys, a foreign key may be either simple or composite, depending on whether the foreign table’s primary key is simple or composite. A table may contain any number of foreign keys, since a table may contain data that relates to any number of foreign tables.
To further normalize our schema, we should create a Donor table, since it is likely that we will want to store more information about the donor. We will use a surrogate key, since it is not uncommon for different people to share the same name. In general, fields that represent the names of entities such as the names of people or companies, make poor primary keys due to the possibility of repeated values.
To demonstrate the flexibility of the added data dimension, we’ll add a Country (of residence) field to the new table, but it can be easily imagined that organizations would probably also store many other details, such as contact information and the like:
DonorIDNameCountry1Marco PinelliItalyNUMNAMECOUNTRY2Victor GomezUnited States3Seung Yong LeeSouth Korea
With all of the choices of keys that we have discussed in this chapter, a quick review will help us at this stage. A key may be:Primary or ForeignA primary key uniquely identifies each record in its table. A foreign key uniquely identifies a record in some other table.Simple or CompositeA simple key is composed of a single field. A composite key is a combination of fields whose values together identify the record.Surrogate or LogicalA surrogate (or artificial) key consists of unique, arbitrary values that abstractly represent the records in the table. A logical (or natural) key consists of the actual data, which may identify the record.
In the Donation table, the DonationID field is a surrogate or a primary key.
Boyce-Codd Normal Form (BCNF) is hierarchically placed between Third Normal Form and Fourth Normal Form and it is normally considered to be a more restrictive form of Third Normal Form. A table is considered to be in Boyce-Codd Normal Form if every determinant (any attribute whose values determine other values with a row) in the table is a candidate key. If a table contains only one candidate key, the 3NF and BCNF are equivalent.
Our database schema has come a long way from our original awkward ledger sheet. Third Normal Form is generally considered the acceptable level of normalization for professional database applications. Schemata that conform to Third Normal Form have consistent arrangements of keys and relationships with little to no redundancy within their tables. Hands-on database designers (as opposed to theoreticians) tend to regard Fourth and Fifth Normal Forms as belonging to the realm of academia, although real-world examples do sometimes arise.
Fourth Normal Form (4NF) involves multi-valued dependencies (that may contain multiple values for an entity). For example, a project is an entity within our database schema. Suppose we wish to store information about the project’s fund-raising events and milestones (there is no relationship between a fund-raising event and a milestone, but each has a relationship to the project). We will quickly realize that each project may have multiple fund-raising events, as well as multiple milestones. As we are well versed with First Normal Form, we know better than to create a table with fields like FundEvent1, FundEvent2, and so on. However, we might just end up structuring the table like this:
This is really a violation of First Normal Form, because there is no primary key, but for the purpose of this example, we’ll allow it to stand for a moment. If it really bothers you, pretend that there is a RecordID surrogate key. The quest for normalization can admittedly be an obsessive pursuit.
The Solar Scholars project (ID 1) has three fund-raising events, and only two milestones. Suppose we wish to remove fund-raising event 89 from the project. Should we replace the value 89 with a Null (empty) value? Or should we combine the second and third records? This would involve updating one record and deleting the other – an inefficient operation that increases the potential for error.
As you have probably guessed, we should split the table into two. The requirements for Fourth Normal Form are:
FundEventID and MilestoneIDare both multi-valued dependencies. Each is dependent on the ProjectID and each represents a fact for which there may be multiple values per ProjectID (multiple fund-raising events per ProjectID, and multiple milestones per ProjectID). Separating the table into two solves the problem as shown in the two tables below.
This also solves the primary key problem. Each table may now have a composite primary key. The combination of ProjectID and FundEventID is unique. The reason why database designers rarely invoke Fourth Normal Form is that it is usually rendered difficult as a result of the pursuit of unique identifiers early in the schema-building process. Ensuring that primary keys exist for each table makes it difficult for two multi-valued dependencies to occur in one table in the first place.
Fifth Normal Form (5NF) is the most esoteric and the least applicable in the real world, therefore we will not discuss it in depth. It concerns multiple multi-valued facts that are related to each other as well as to another key (as opposed to the multi-valued dependencies mentioned above, which have relationships only to the ProjectID, not to each other). Untangling such complex relationships usually requires splitting tables into more than three related tables. Fifth Normal Form attempts to eliminate all remaining redundancy from a schema by decomposing each table to the point where it cannot be decomposed any further. It requires each field to be a candidate key, which is the main reason why it is generally considered unattainable in real-world applications.