Different Types of Normalization Techniques

Anushkakhatri 25 Jul, 2022 • 6 min read

This article was published as a part of the Data Science Blogathon.

Introduction

In a large dataset, duplicated data leads to resource and disk space waste, and the chance of errors and inconsistencies increases. Therefore, we must decompose redundant data relationships into manageable and more well-structured ones. Through normalization, we can restructure a relational database to reduce redundancy among the data and improve the data integrity, eliminating unwanted characteristics like Insertions, Updates, and Deletions. In this article, we will explain the different types of normalization in detail and explore the advantages and disadvantages of using normalization.

First Normal Form (1NF)

A relation violates the first normal form if it contains composite or multi-valued attributes. Therefore, in order to remove the repetition of values, we convert it into the first normal form. In the first normal form, the attribute values are single-valued and there are no repeating groups. By using the primary key and foreign key, the first normal form creates a functional dependency between the two tables.

Example:

Normalization
Employee Table

The Employee table is not in its first normal form because the column Phone No. contains multiple values.

To convert the Employee table into 1NF we decompose it as follows:

Normalization
Employee Table- 1NF

Second Normal Form (2NF)

In order for a relationship to be in the second normal form, it must first be in the first normal form and should not contain partial dependencies. By using the second normal form the amount of redundant data in the memory is reduced. Those relations not included in the second normal form may suffer anomalies in future updates.

Example:

Normalization

The non-prime attribute AGE in the provided table is reliant on TEACHER ID, a suitable subset of a candidate key. Because of this, it breaks the 2NF rule.

The provided table is divided into two tables to create the second normal form:

Normalization
Normalization

Third Normal Form (3NF)

The relation must first be in the first and second normal form before it can be in the third normal form. The third normal form removes the transitive dependency of the non-primary key attributes on the primary key by moving these attributes along with the copy of the attribute they depend on to a new relation. Removing transitive dependencies of attributes reduces data duplication and improves the integrity of data. In the third normal form, most of the tables are free of anomalies with insertions, updates, and deletions.

Example:

Currently, the following table is not in the third normal form:

Normalization

To decompose the given table into the third normal form we can do the following:

Normalization

Boyce-Codd Normal Form (BCNF)

The Boyce-Codd Normal Form(BCNF) is the fourth form of normalization and is an updated version of the third normal form. The relation first must be in the third normal form to be in BCNF. As per BCNF, if Q is determined by P, then P should be a super key or candidate key for any functional dependency. When we use the third normal form, we can achieve lossless decomposition, but with BCNF, it is very difficult. BCNF is a more restrictive form of normalization, so there are no anomalous results in the database.

Example:

BCNF | Normalization

Think about a relation R that has attributes (student, subject, teacher). Since the teacher is not a candidate key, the subject “teacher” breaches the BCNF.

Divide R into R1 (X, Y) and R2 if X->Y violates the BCNF (R-Y). Consequently, R is split into R1 (Teacher, subject) and R2 (Relation) (student, Teacher).

The dependency preservation property is not always met by BCNF decomposition. If the dependency is not retained following BCNF decomposition, we must choose between staying in BCNF or going back to 3NF.

Fifth Normal Form (5NF)

A relation in a DBMS is in the Fifth Normal Form if it is in the Fourth Normal Form but does not contain any join dependencies. The joining must be lossless. The Fifth Normal Form is satisfied when all the tables present in the DBMS are broken down into as many more tables as possible to reduce redundancy. The Fifth Normal Form is also known as the Project Join Normal Form.

Example:

Mr. Rick takes both C++ and Python classes for Lecture Hall 1, but he does not take Python for Semester 2. For this case, the fields must be combined to identify a valid data set. If we add a new lecture hall as Lecture Hall 3 but do not know what the subject will be, we leave Professor and Subject blank. As all three columns are primary keys we cannot leave the other two columns ( Professor, Subject) blank. As a result, we can decompose the above table into three relations to make 5NF.

Sixth Normal Form (6NF)

In the sixth normal form, the relation variables are decomposed into irreducible components. For a temporal variable or other interval data, this is relatively unimportant for non-temporal relation variables. In many data warehouses, the benefits of the sixth normal form outweigh the downsides.

Example:

For the above, the possible join dependencies are –

(Student ID, First Name)

(Student ID, Last name)

(Student ID, Grade)

So we can decompose it into the sixth normal form as follows:

Advantages of Normalization

Normalization provides the following advantages:

  • Data redundancy can be reduced through normalization.
  • Duplicate data is removed from the database during normalization.
  • In a database, normalization ensures data consistency.
  • The normalization process organizes the database in a better way.
  • The normalization process facilitates faster execution since the database size is reduced.
  • The normalization process involves ensuring that our data is logically stored.
  • A normalized database provides higher security.

Disadvantages of Normalization

The disadvantages of normalization are as follows:

  • Normalization is a long and time-consuming process.
  • You must have the right knowledge of normal forms in order to carry out the normalization process efficiently.
  • An ill-designed database may result from careless decomposition.
  • The lack of duplicate data makes table joins necessary, making queries more complex.
  • Normalization increases the number of tables in the database, so the more there are, the more massive it becomes and the more expensive it becomes to maintain.

Conclusion

In this article, we talked about how normalization helps eliminate anomalies, which can result in data duplication, affecting the integrity of the data and its performance. This article should leave you with the following points:

  • Normalization helps you keep your database organized.
  • It reduces the duplication of data and improves its integrity.
  • In its first normal form, the relation consists of unique values with no-repeat values.
  • The second normal form eliminates all partial dependencies.
  • In the third normal form, no non-prime key attributes should be dependent on anything other than the primary key.
  • BCNF is an updated version of 3NF.
  • The fifth normal form ensures lossless decomposition.
  • The sixth normal form is a breakdown of related variables into irreducible components

As a result of reading this article, hopefully, you now know what normal forms are and how useful they can be when organizing data in a database. Read more articles here!

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Anushkakhatri 25 Jul 2022

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear