Working with columns of data is a crucial part of data analysis and manipulation in Python. There will be times when you’ll need to convert a column from a DataFrame to a List for various reasons, such as easy data manipulation or data visualization.
This tutorial will guide you through the steps of how you can convert a column of a DataFrame to a List using the pandas library in Python.
Step 1: Import and Read Your Data
First, we need to import the necessary library, namely pandas. If it’s not installed, you can install it using pip. Following this, we will read our data using the pandas read_csv function. Notice that the data should be in a format compatible with pandas; CSV is generally a safe choice.
1 2 3 |
import pandas as pd data_df = pd.read_csv('data.csv') |
Step 2: Select the Column
Now that we have loaded our data into a pandas DataFrame, we can select the column that we want to convert into a list. Remember that column names in pandas are case-sensitive.
age_column = data_df['Age']
Step 3: Convert the Column to a List
To convert the selected column to a list, we only need to use the tolist method available in pandas. This will deliver a list with the same values in the same order as the original column.
column_list = column.tolist()
Now, column_list is a list containing the entries of the column. You can verify by using Python’s built-in type() function:
print(type(column_list))
The Type function is a standard Python tool that returns the data type of the input object. Here, you should see <class ‘list’> in the output.
Optional Step: Customizing Your List
You might want to further manipulate and convert the data in your list, for instance, converting all items to integers or strings, removing null or duplicate values, and so on. Python’s built-in capabilities, combined with pandas, offer vast possibilities for these operations.
Example with Full Code
import pandas as pd data_df = pd.read_csv('data.csv') age_column = data_df['Age'] age_list = age_column.tolist() print(age_list) print(type(age_list))
Output
[28, 35, 22, 42, 31] <class 'list'>
Conclusion
In this tutorial, we have seen how to convert a column in a DataFrame to a List using pandas in Python. We also learned to use inherent Python functions like the type() function to check the type of our variables.
Manipulating columns and converting them to lists is commonly used in data analysis, pre-processing and visualization stages in a data science workflow. Always remember to verify the types and authenticity of your data during these stages for a flawless analysis.