Thursday, October 4, 2012

Learn Normalization - 1NF, 2NF & 3NF : DBMS

Article Contributed by


People who have started learning about databases must have frequently come up with the word “Normalization” in quite early stages. The name suggests that it is just about normalizing stuff. But, the story is not that simple. You need to know much about normalization in order to handle databases. It is one of the most important things to learn in database management. (It is assumed that you have the basic knowledge about databases and know how to manage tables). So, let’s straight away start discussing about Normalization in thorough depth.

So what is Normalization?

Normalization is a process of reducing redundancies of data in a database. Quite often we come across tables having a lot of bulk data with many columns. All these data might not be necessary all the time whenever we use those tables. So, a better option is to split up the bulk table into small parts and use only those tables which suit the actual purpose at a given instance of time. In this way, redundancy is reduced. To make the long story short, we can simply say that normalization is a process of dividing a big table into smaller ones in order to reduce redundancy.

To understand the concept in deep, let us take up a simple example.

Suppose we are to manage all the databases of a company (say, My Company). The company must keep track of all the employees, customers, product details and the salary details of all the employees. A simple and straight forward way to do this is to put all this information into a single table and manage all those simultaneously. 

See below.


Looking at the above table, you may feel that it is perfectly fine. After all, what is the problem with it? We have a big table; we have all the information required by the company together in a single space, thus saving a lot of memory. Well and good!

But, now think! If suppose, we need to frequently retrieve/update data about just the employees. Here, does the customer’s information or the product details really matter. Definitely no. So, why use the entire table for using just a part of it? We need a solution to this. And the solution is normalization. What we create using normalization is often called as normal forms. Let study about the popular and most widely used normal forms.

The First Normal Form

To solve the above problem, the first and foremost thing to be done is to divide the entire raw database into smaller tables based on the actual groupings. When each table has been designed, a primary key is assigned to most or all tables. Note that the primary key must be a unique value, so try to select a data element for the primary key that naturally uniquely identifies a specific piece of data.
So, let us take up the same previous example and prepare our First normal form. See the figure below:


As we can see, the big raw database is divided into three smaller tables- one for employee, customers and products details, each.
Thus, to access any one of these tables, we need not handle the other two tables.

The Second Normal Form

The objectives of the second normal form is to take data that is only partly dependent on the primary key and enter that data into another table. Let us take up the same example of Fig 1-2. Consider the table- Employee.

Here, the entire table has information about the personal details as well as the salary information. But, it is well understood that, to pay salary to an employee, the company does not actually need the employee’s personal details. Just his emp_id is sufficient. So, why not use just that? This is the second normal form. Same goes with Customers table. We can separate customer’s information from the order details. 

See the figure below:



The Third Normal Form

The third normal form’s objective is to remove data in a table that is not dependent on the primary key. See the same example of Fig 1-3. For the table named Emp_Pay, the position and position_desc fields are not dependent on primary key (emp_id). So, the better option is to move both these fields to another table. 

See below:





Advantages

As we have already seen in the sections before, normalization has many advantages. Let us list out a few ones:

 Ø  Greater overall database organization
    Ø  Reduction of redundant data
    Ø  Data consistency within the database
    Ø  A much more flexible database design
 Ø  A better handle on database security
    Ø  Enforces concept of referential integrity


Disadvantages

Although there are many advantages of normalization, there are some disadvantages too. After all, the popular saying “All the coins have two sides” still holds true. So, let’s go through the disadvantages too.

There is one substantial drawback of a normalized database: reduced database performance.
The factors compromised include CPU usage, memory usage, and input/output (I/O). To In other words, a normalized database requires much more CPU, memory, and I/O to process transactions and database queries than does a denormalized database.


Denormalizing a Database

Denormalization is the process of taking a normalized database and modifying table structures to allow controlled redundancy for increased database performance. A denormalized database is not the same as a database that has not been normalized. The purpose of denormalization is to get rid of the problems discussed in the previous sections.  Denormalization might involve recombining separate tables or creating duplicate data within tables to reduce the number of tables that need to be joined to retrieve the requested data, which results in less I/O and CPU time. 

This is normally advantageous in larger data warehousing applications in which aggregate calculations are being made across millions of rows of data within tables. There are costs to denormalization, however. Data redundancy is increased in a denormalized database, which can improve performance but requires more extraneous efforts to keep track of related data.


So, now you might be confused whether to normalize your database or not. The solution is to normalize the database upto a certain extent. So that, redundancy is controlled to a great extent, without compromising on other factors like CPU usage, memory usage, and input/output (I/O).


                                Download this Article as PDF

Manoj Pisharody (BE-IT)
 manoj@itportal.in
  

-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-


Reactions:

5 comments:

Post a Comment