How To Import CSV File In Python Using Numpy

In this tutorial, we will learn how to import a CSV file in Python using the NumPy library. NumPy is a powerful library for numerical computing in Python and provides an efficient way to handle large datasets.

Importing CSV files is a common task in data analysis and manipulation, and NumPy makes this process easy and efficient.

Step 1: Install NumPy

Before we start, you need to have NumPy installed on your machine. You can install it using pip if you haven’t installed it yet:
pip install numpy

Step 2: Create a CSV file

For this tutorial, create a sample CSV file named sample_data.csv with the following content:

Name,Age,Gender,Height,Weight
John,20,M,180,75
Jane,25,F,165,60
Thomas,30,M,185,78
Emily,28,F,170,65

Here, each line represents a person with their name, age, gender, height (in centimeters), and weight (in kilograms) separated by commas.

Step 3: Import NumPy and load the CSV file

First, import the NumPy library in your Python script or notebook with the following code:

Now, you can use the built-in np.genfromtxt function to load the content of the CSV file. This function can handle different delimiters, missing values, and data types. The following code will load the sample_data.csv file and print its content:

This will produce the following output:

[(b'John', 20, b'M', 180, 75)
 (b'Jane', 25, b'F', 165, 60)
 (b'Thomas', 30, b'M', 185, 78)
 (b'Emily', 28, b'F', 170, 65)]

Let’s break down the parameters used in the np.genfromtxt function:

  • 'sample_data.csv': The path to the CSV file.
  • delimiter=',': The character that separates the values in the file. In our case, it’s a comma.
  • skip_header=1: This parameter indicates the number of lines to skip at the beginning of the file. We skip the header line (the first line in the file).
  • dtype=None: By setting dtype to None, NumPy will try to infer the data type of each column automatically.
  • encoding='utf-8': This ensures that the file is read using the ‘UTF-8’ character encoding.

Step 4: Access and manipulate the data

Once the CSV data is loaded as a NumPy array, you can access and manipulate the data using NumPy’s built-in functions. For example, you can calculate the average age of the dataset with the following code:

This will produce the following output:

25.75

Here, we first extract the age column (f1) from the structured array, then use the np.mean function to calculate the average age.

Full Code

Conclusion

In this tutorial, we learned how to import a CSV file in Python using the powerful NumPy library. NumPy provides an efficient way to handle large datasets and perform complex calculations on them. By following this tutorial, you can easily import and manipulate data from CSV files in your Python projects.