How to Normalize Data Between 0 and 1 in Python

In this tutorial, we are going to look at one of the most fundamental processes in data preprocessing known as Data Normalization.

Data Normalization is a method used in machine learning to standardize the range of distinct features of data. It basically scales the values in a fixed range (0 and 1 in our case).

The main reason we normalize data is to avoid features in greater numeric ranges dominating those in smaller numeric ranges. Another reason is to avoid numerical instability. Furthermore, gradient descent converges much faster with feature scaling.

Step 1: Saving a File with the Data:

Feature1,Feature2,Feature3
10,20,30
5,15,25
8,12,18
15,25,35

Step 2: Import the Libraries and load your dataset

After successfully installing the libraries, we will import them to our python script by adding the following lines:

Next, load your dataset and store it in a pandas DataFrame. Here, let’s assume that we have a dataset ‘data.csv’. This is how we load it:

Step 3: Initialize the Scaler and Transform the Data

After preparing our data, we can now start the normalization process. The MinMaxScaler transforms features by scaling each feature to a given range which is between 0 and 1 by default.

Step 4: Verify the Results

Last, but not least, verify the transformed data by viewing the first few rows of your DataFrame:

Full Code

Here is the entire code put together:

   Feature1  Feature2  Feature3
0       0.5  0.615385  0.705882
1       0.0  0.230769  0.411765
2       0.3  0.000000  0.000000
3       1.0  1.000000  1.000000

Conclusion

That’s it! You have successfully normalized your dataset between 0 and 1 in Python. As you can see, it’s quite a straightforward process thanks to Python’s great libraries. Remember, proper data preprocessing step including data normalization is key to building a good machine learning model.