In relational database design, there is a concept of database normalization or simply normalization, which is a process of organizing columns (attributes) and tables (relations) to reduce data redundancy and improve data integrity. (as written on Wikipedia).
As most articles are somewhat technical and thus harder to understand, I\'m asking for someone to write an easier to understand explanation based on examples about what 1NF, 2NF, 3NF, even 3.5NF (Boyce-Codd) mean.
1NF is the most basic of normal forms - each cell in a table must contain only one piece of information, and there can be no duplicate rows.
2NF and 3NF are all about being dependent on the primary key. Recall that a primary key can be made up of multiple columns. As Chris said in his response:
The data depends on the key [1NF], the whole key [2NF] and nothing but the key [3NF] (so help me Codd).
2NF
Say you have a table containing courses that are taken in a certain semester, and you have the following data:
|-----Primary Key----| uh oh |
V
CourseID | SemesterID | #Places | Course Name |
------------------------------------------------|
IT101 | 2009-1 | 100 | Programming |
IT101 | 2009-2 | 100 | Programming |
IT102 | 2009-1 | 200 | Databases |
IT102 | 2010-1 | 150 | Databases |
IT103 | 2009-2 | 120 | Web Design |
This is not in 2NF, because the fourth column does not rely upon the entire key - but only a part of it. The course name is dependent on the Course\'s ID, but has nothing to do with which semester it\'s taken in. Thus, as you can see, we have duplicate information - several rows telling us that IT101 is programming, and IT102 is Databases. So we fix that by moving the course name into another table, where CourseID is the ENTIRE key.
Primary Key |
CourseID | Course Name |
---------------------------|
IT101 | Programming |
IT102 | Databases |
IT103 | Web Design |
No redundancy!
3NF
Okay, so let\'s say we also add the name of the teacher of the course, and some details about them, into the RDBMS:
|-----Primary Key----| uh oh |
V
Course | Semester | #Places | TeacherID | TeacherName |
---------------------------------------------------------------|
IT101 | 2009-1 | 100 | 332 | Mr Jones |
IT101 | 2009-2 | 100 | 332 | Mr Jones |
IT102 | 2009-1 | 200 | 495 | Mr Bentley |
IT102 | 2010-1 | 150 | 332 | Mr Jones |
IT103 | 2009-2 | 120 | 242 | Mrs Smith |
Now hopefully it should be obvious that TeacherName is dependent on TeacherID - so this is not in 3NF. To fix this, we do much the same as we did in 2NF - take the TeacherName field out of this table, and put it in its own, which has TeacherID as the key.
Primary Key |
TeacherID | TeacherName |
---------------------------|
332 | Mr Jones |
495 | Mr Bentley |
242 | Mrs Smith |
No redundancy!!
One important thing to remember is that if something is not in 1NF, it is not in 2NF or 3NF either. So each additional Normal Form requires everything that the lower normal forms had, plus some extra conditions, which must all be fulfilled.
I\'ve never had a good memory for exact wording, but in my database class I think the professor always said something like:
The data depends on the key [1NF], the whole key [2NF] and nothing but the key [3NF].
Here\'s a quick, admittedly butchered response, but in a sentence:
1NF : Your table is organized as an unordered set of data, and there are no repeating columns.
2NF: You don\'t repeat data in one column of your table because of another column.
3NF: Every column in your table relates only to your table\'s key -- you wouldn\'t have a column in a table that describes another column in your table which isn\'t the key.
For more detail, see wikipedia...
1NF: Only one value per column
2NF: All the non primary key columns in the table should depend on the entire primary key.
3NF: All the non primary key columns in the table should depend DIRECTLY on the entire primary key.
I have written an article in more detail over here