Depending on the design of a database, the database may contain redundant data. Such a database becomes:
1. Inefficient - the database engine will need to process more data for each query or update
2. Bloated - storage requirements increase due to the redundant data
3. Error-prone - someone needs to input the redundant data into the database. The more times the same data is input into the database, the more chances there are for errors to occur.
In order to remove redundancy in database, normalization is applied. Normalization is the process of eliminating redundant data from database tables. There are 5 levels of normalization - also termed as the 5 normal forms. Most database designers stop at either levels 2 or 3. This is because although normalization reduces data redundancy, it also results in increased complexity which will cause a decrease in performance. This decrease in performance is due to the requirement to join the normalized tables in queries. Levels 4 and 5 of normalization remains largely an academic field of study and is not applied in industry.
The normal forms are progressive - e.g. a table that has achieved the 3rd normal form is by definition already in the 2nd and 1st normal forms.
The 1st Normal Form (1NF)
The
1st normal form deals with data redundancy within each table record. The 1st normal form attempts to rid each table row from repeating data.
The 2nd Normal Form (2NF)
The
2nd normal form deals with data redundancy that occurs vertically across the different rows in a table. The 2nd normal form attempts to rid each table column from repeating data.
The 3rd Normal Form (3NF)
The
3rd normal form requires that each column in a database table must be:
1. dependent on the table's primary key
2. independent on all other non-key columns in the table