How to Drop a Column in Pandas DataFrame
Published on
As a data scientist, one of the most common operations you perform is manipulating data in a DataFrame. One of the frequent tasks that come up in your data processing workflow is dropping columns that are not needed for analysis. In this tutorial, we will look at how to drop a column in Pandas DataFrame. We will cover different methods of removing columns based on column name, index, and multiple columns.
Want to quickly create Data Visualization from Python Pandas Dataframe with No code?
PyGWalker is a Python library for Exploratory Data Analysis with Visualization. PyGWalker (opens in a new tab) can simplify your Jupyter Notebook data analysis and data visualization workflow, by turning your pandas dataframe (and polars dataframe) into a Tableau-style User Interface for visual exploration.
Pandas DataFrame Overview
Before diving into the details of dropping columns, let’s have an overview of the Pandas DataFrame.
A DataFrame is a two-dimensional table-like data structure with rows and columns. Each column in a DataFrame is a Series. A Series is a one-dimensional data structure that holds an array of values with a label called an index. In addition, a DataFrame can have row and column indices for fast and efficient data access. -13 Pandas DataFrame is a powerful tool for handling and manipulating data in Python. It allows you to perform complex data analysis, data cleaning, data transformation, and data visualization tasks.
Dropping a Column in Pandas DataFrame
Now let us get started with the process of dropping a column in Pandas DataFrame. There are several ways to drop a column in a DataFrame, depending on the requirement. We will look at some of the popular methods below.
Drop a Column Using the drop
Method
The easiest method to remove a column from a DataFrame is by using the drop
method. You can use the drop
method with the parameter axis=1
to indicate that you want to remove a column.
# create a sample DataFrame
import pandas as pd
data = {'name': ['Alex', 'Bob', 'Clarke', 'David'], 'age': [20, 25, 19, 18],'city': ['New York', 'Paris', 'London', 'Tokyo']}
df = pd.DataFrame(data)
# drop the column 'city'
df = df.drop('city', axis=1)
print(df.head())
Output:
name age
0 Alex 20
1 Bob 25
2 Clarke 19
3 David 18
In the above example, we created a sample DataFrame with three columns named name
, age
, and city
. We used the drop
method with the parameter axis=1
to remove the column city
. We then printed the updated DataFrame that only has two columns, name
and age
.
Drop a Column Using the Subsetting Method
Another way to drop a column from a DataFrame is to use the subsetting method []
with the del
statement. The del
statement removes the column directly from the DataFrame object.
# create a sample DataFrame
import pandas as pd
data = {'name': ['Alex', 'Bob', 'Clarke', 'David'], 'age': [20, 25, 19, 18],'city': ['New York', 'Paris', 'London', 'Tokyo']}
df = pd.DataFrame(data)
# remove the column 'city'
del df['city']
print(df.head())
Output:
name age
0 Alex 20
1 Bob 25
2 Clarke 19
3 David 18
In the above example, we created a sample DataFrame with three columns named name
, age
, and city
. We used the subsetting method []
with the del
statement to remove the column city
. We then printed the updated DataFrame that only has two columns, name
and age
.
Drop Multiple Columns
Sometimes it is necessary to remove multiple columns from a DataFrame. You can use the drop
method with a list of column names to remove multiple columns.
# create a sample DataFrame
import pandas as pd
data = {'name': ['Alex', 'Bob', 'Clarke', 'David'], 'age': [20, 25, 19, 18],'city': ['New York', 'Paris', 'London', 'Tokyo'], 'occupation': ['Engineer', 'Doctor', 'Artist', 'Lawyer']}
df = pd.DataFrame(data)
# drop the columns 'city' and 'occupation'
df = df.drop(['city', 'occupation'], axis=1)
print(df.head())
Output:
name age
0 Alex 20
1 Bob 25
2 Clarke 19
3 David 18
In the above example, we created a sample DataFrame with four columns named name
, age
, city
, and occupation
. We used the drop
method with a list of column names to remove the columns city
and occupation
. We then printed the updated DataFrame that only has two columns, name
and age
.
Drop Columns Using a Column Index
You can also drop a column from a DataFrame using the index of the column. To do this, you can use the drop
method with the parameter columns
and specify the index of the column to remove.
# create a sample DataFrame
import pandas as pd
data = {'name': ['Alex', 'Bob', 'Clarke', 'David'], 'age': [20, 25, 19, 18],'city': ['New York', 'Paris', 'London', 'Tokyo']}
df = pd.DataFrame(data)
# remove the column at index 2, i.e., 'city'
df = df.drop(df.columns[2], axis=1)
print(df.head())
Output:
name age
0 Alex 20
1 Bob 25
2 Clarke 19
3 David 18
In the above example, we created a sample DataFrame with three columns named name
, age
, and city
. We used the drop
method with the parameter columns
and specified the index of the column to remove, i.e., 2
. We then printed the updated DataFrame that only has two columns, name
and age
.
Drop Columns Based on a Condition
You can also remove columns based on some conditions using the drop
method. For example, you can remove all columns that have all NaN
values.
# create a sample DataFrame with a column having all NaN values
import pandas as pd
import numpy as np
data = {'name': ['Alex', 'Bob', 'Clarke', 'David'], 'age': [20, 25, 19, 18],'city': [np.nan, np.nan, np.nan, np.nan], 'occupation': ['Engineer', 'Doctor', 'Artist', 'Lawyer']}
df = pd.DataFrame(data)
# delete the columns that have all NaN values
df = df.dropna(how='all', axis=1)
print(df.head())
Output:
name age occupation
0 Alex 20 Engineer
1 Bob 25 Doctor
2 Clarke 19 Artist
3 David 18 Lawyer
In the above example, we created a sample DataFrame with four columns named name
, age
, city
, and occupation
. We set the values in the city
column to NaN
. We used the dropna
method with the parameter how='all'
and axis=1
to remove the columns that have all NaN
values. We then printed the updated DataFrame that only has three columns, name
, age
, and occupation
.
Conclusion
Dropping a column from a Pandas DataFrame is an essential operation that you need to master as a data scientist. In this tutorial, we covered different methods of removing columns based on column name, index, and multiple columns. We hope this tutorial has helped you in optimizing your workflow and improving your data operations with Pandas DataFrame.
Frequently Asked Questions
-
How to drop a column in a Python DataFrame?
To drop a column in a Python DataFrame, you can use the
drop()
method and specify the column name along with theaxis
parameter set to 1. This will remove the specified column from the DataFrame. Alternatively, you can use thedel
keyword followed by the column name to delete the column in place. -
Can multiple columns be dropped simultaneously in a Python DataFrame?
Yes, multiple columns can be dropped simultaneously in a Python DataFrame. You can pass a list of column names to the
drop()
method or use thedrop()
method multiple times with different column names specified each time. This will remove all the specified columns from the DataFrame. -
Is it possible to drop columns based on certain conditions in a Python DataFrame?
Yes, it is possible to drop columns based on certain conditions in a Python DataFrame. You can use boolean indexing or the
loc
indexer to select the columns that meet the desired condition and then use thedrop()
method to remove those columns from the DataFrame. This allows you to selectively drop columns based on specific criteria.