What is Normalisation (or Normalization)?

2018-12-31 06:50发布

Why do database guys go on about normalisation?

What is it? How does it help?

Does it apply to anything outside of databases?

10条回答
美炸的是我
2楼-- · 2018-12-31 07:11

Normalization is one of the basic concepts. It means that two things do not influence on each other.

In databases specifically means that two (or more) tables do not contain the same data, i.e. do not have any redundancy.

On the first sight that is really good because your chances to make some synchronization problems are close to zero, you always knows where your data is, etc. But, probably, your number of tables will grow and you will have problems to cross the data and to get some summary results.

So, at the end you will finish with database design that is not pure normalized, with some redundancy (it will be in some of the possible levels of normalization).

查看更多
春风洒进眼中
3楼-- · 2018-12-31 07:16

Before directly jumping into the topic ‘Database Normalization and its types’, we need to understand data redundancy, insertion/update/deletion anomalies, Partial dependency and transitive functional dependency.

What is data redundancy and update/modification anomaly?

Data redundancy is the unnecessary duplication of data in multiple tables within the database or even within the same table. It increases the size of the database unnecessarily and decreases the efficiency of the database by causing data inconsistency.

Example:

enter image description here

Here the ‘student_age’ for the student Alex is repeated unnecessarily which naturally increases the data redundancy. When the column ‘student_age’ has to be changed in the future, then update has to be performed on both rows of the student Alex as in the table above. This scenario is known as update anomaly. If the user updates only one row and forgets to update the other row will cause data inconsistency.

What is insertion anomaly?

Insertion anomaly occurs when certain values for an attribute* cannot be inserted into a table without the existence of the additional data related to that particular value.

Example:

enter image description here

Here the ‘student_name’ and ‘exam_registered’ are assumed to be a composite primary key (primary key that contain multiple columns). Primary key should be always unique, should not hold NULL values and it must uniquely identify each row in a table. Now assume that the high school is trying to introduce a new exam called Chemistry. In the beginning no student has been registered in this course. Since the above table will not accept NULL value in the column ‘student_name’ we need wait until at least one student has been registered to make entry for the exam Chemistry in the above table.

What is deletion anomaly?

Deletion anomaly occurs occurs when certain important values of an attribute* are lost because of the deletion of other not required values.

Example:

enter image description here

Here the ‘student_name’ and ‘exam_registered’ are assumed to be a composite primary key (primary key that contain multiple columns). Primary key should be always unique and should not hold NULL values and it must uniquely identify each row in a table. Now assume that the student named John has cancelled his registration for the exam named English. Since the column ‘student_name’ cannot hold NULL value we will be forced to delete the entire row which cost us the loss of the exam named English from our table. But still high school offers the possibility of taking English exam to their students.

What is partial dependency?

A table is said to be in partial dependency when a non-primary key attribute in that table is fully dependent on a part of the composite primary key attribute in that table.

Example:

enter image description here

Consider a table that has 3 columns named ‘student_name’ , ‘student_age’ and ‘exam_registered’ as above. Here ‘student_name’ and ‘exam_registered’ can together form a composite primary key. Normally each non primary key column in a well normalized table should always depend on the complete set of composite primary key. Here ‘student_age’ depends only on the ‘student_name’ and it does not related to ‘exam_registered’ which causes this table to be in partial dependency.

What is transitive functional dependency?

A table is said to be in transitive functional dependency when a non-primary key attribute in that table more strongly depends on another non-primary key attribute in that table.

Example:

enter image description here

In the above table the relation between the non primary key attribute ‘postal_code’ and another non primary key attribute ‘City’ is much more stronger than the relation between the primary key attribute ‘student_id’ and the non primary key attribute ‘postal_code’. This causes the above table to be in transitive functional dependency.

With the better understanding of the above concepts we can now dive into the normalization of tables in databases.

Data depends on the key [1NF], the whole key [2NF] and nothing but the key [3NF]

Table without normalization

A sample denormalized table is given below which will be normalized in the incremental steps in this article.

In the below example for the student_id=2, there are 2 entries because of different parent id’s. Here we can assume like Parent_id=1 represents the father and Parent_id=3 represents the mother of this student whose student_id=2.

Example:

enter image description here

First Normal Form (1NF)

Rules: 1. Attributes must contain only atomic values 2. No two rows of data must contain repeating group of information 3. Each table must have a primary key

Step 1:

enter image description here

Rule 1 is satisfied in the above step but still it does not satisfy rule 2 and rule 3.

Step 2: The tables below now satisfies the Rule 1, Rule 2 and Rule 3 of 1NF.

enter image description here

Second Normal Form (2NF)

Rules:

  1. Tables should satisfy the first normal form (1NF)
  2. There should not be any partial dependency within the tables

Except the first table all the other tables from 1NF satisfies 2NF. In the first table the ‘age’ column depends only on column ‘student_id’. This violates the Rule 2 of 2NF. Because all the non key columns should depend the primary key columns completely. So the normalized tables as per 2NF are given below.

enter image description here

Third Normal Form (3NF)

Usually a relational database table is often described as ‘normalized’ if it meets 3NF. Most 3NF tables are free from insert, update and delete anomalies.

Rules:

  1. Tables should satisfy the second normal form (2NF)
  2. There should not be any transitive functional dependencies within the tables

Except the last table all the other tables from the 2NF satisfies 3NF. This is because the column ‘city’ more strongly depends on the column ‘postal_code’ than the primary key ‘student_id’ which makes the column ‘city’ to be transitive functional dependent on the column ‘student_id’. So the final normalized tables as per 3NF are given as below.

enter image description here

*Attribute:

– Consider a table of students. Here student_name, age etc., are considered as the attributes which will be the title for the corresponding columns.

======================================================================== Simple examples - database normalization

查看更多
不流泪的眼
4楼-- · 2018-12-31 07:17

The rules of normalisation (source: unknown)

  • The key (1NF)
  • The whole key (2NF)
  • and nothing but the key (3NF)

... So help me Codd.

查看更多
旧时光的记忆
5楼-- · 2018-12-31 07:17

What is Normalization?

Normalization is a step wise formal process that allows us to decompose database tables in such a way that both data redundancy and update anomalies are minimized.

Normalization Process
enter image description here Courtesy

First normal form if and only if the domain of each attribute contains only atomic values (an atomic value is a value that cannot be divided), and the value of each attribute contains only a single value from that domain(example:- domain for the gender column is: "M", "F". ).

First normal form enforces these criteria:

  • Eliminate repeating groups in individual tables.
  • Create a separate table for each set of related data.
  • Identify each set of related data with a primary key

Second normal form = 1NF + no partial dependencies i.e. All non-key attributes are fully functional dependent on the primary key.

Third normal form = 2NF + no transitive dependencies i.e. All non-key attributes are fully functional dependent DIRECTLY only on the primary key.

Boyce–Codd normal form (or BCNF or 3.5NF) is a slightly stronger version of the third normal form (3NF).

Note:- Second, Third, and Boyce–Codd normal forms are concerned with functional dependencies. Examples

Fourth normal form = 3NF + remove Multivalued dependencies

Fifth normal form = 4NF + remove join dependencies

查看更多
登录 后发表回答