Sort Pandas DataFrame: Examples and Tips
Published on
Pandas DataFrame is a powerful tool for data analysis in Python. It allows you to store and manipulate large datasets with ease. Sorting data is a common operation that is useful for exploring and visualizing data. In this tutorial, we will cover how to sort data in a Pandas DataFrame, including sorting by column, multiple columns, index, and more.
Want to quickly create Data Visualizations in Python?
PyGWalker is an Open Source Python Project that can help speed up the data analysis and visualization workflow directly within a Jupyter Notebook-based environments.
PyGWalker (opens in a new tab) turns your Pandas Dataframe (or Polars Dataframe) into a visual UI where you can drag and drop variables to create graphs with ease. Simply use the following code:
pip install pygwalker
import pygwalker as pyg
gwalker = pyg.walk(df)
You can run PyGWalker right now with these online notebooks:
And, don't forget to give us a ⭐️ on GitHub!
What is a Pandas DataFrame?
A Pandas DataFrame is a two-dimensional table-like data structure that contains rows and columns. It can hold a variety of data types such as numbers, strings, and dates. You can think of it as a spreadsheet or a SQL table. It is a convenient way to store and manipulate data with Python.
How to Install Pandas in Python?
Before we dive into sorting a Pandas DataFrame, you need to make sure that you have Pandas installed on your system. You can do this by running the following command in your terminal or command prompt:
pip install pandas
This will install the latest version of Pandas on your system.
How to Create a Pandas DataFrame?
There are many ways to create a Pandas DataFrame. One of the most common ways is to create it from a dictionary of lists. Here's an example:
import pandas as pd
data = {'Name': ['John', 'Jane', 'Bob', 'Lisa'],
'Age': [25, 30, 45, 23],
'Salary': [50000, 60000, 80000, 40000]}
df = pd.DataFrame(data)
print(df)
Output:
Name Age Salary
0 John 25 50000
1 Jane 30 60000
2 Bob 45 80000
3 Lisa 23 40000
In this example, we created a dictionary of three lists, where each list represents a column in the DataFrame. We then used the pd.DataFrame()
function to create a DataFrame from the dictionary.
What is the Difference Between Sorting in Ascending and Descending Order?
Before we start sorting a Pandas DataFrame, it's important to understand the difference between sorting in ascending and descending order. Sorting in ascending order means that the values will be sorted from lowest to highest. Sorting in descending order means that the values will be sorted from highest to lowest.
How to Sort a Pandas DataFrame by Column?
Sorting a Pandas DataFrame by column is a common operation. You can use the sort_values()
method to sort a DataFrame by a single column. Here's an example:
import pandas as pd
data = {'Name': ['John', 'Jane', 'Bob', 'Lisa'],
'Age': [25, 30, 45, 23],
'Salary': [50000, 60000, 80000, 40000]}
df = pd.DataFrame(data)
# sort by Age column in ascending order
df.sort_values('Age', ascending=True, inplace=True)
print(df)
Output:
Name Age Salary
3 Lisa 23 40000
0 John 25 50000
1 Jane 30 60000
2 Bob 45 80000
In this example, we sorted the DataFrame by the "Age" column in ascending order using the sort_values()
method. We set the ascending
parameter to True
to sort in ascending order. The inplace
parameter is set to True
to modify the original DataFrame.
Can I Sort a Pandas DataFrame by Multiple Columns?
Yes, you can sort a Pandas DataFrame by multiple columns. You need to pass a list of column names to the sort_values()
method. Here's an example:
import pandas as pd
data = {'Name': ['John', 'Jane', 'Bob', 'Lisa'],
'Age': [25, 30, 45, 23],
'Salary': [50000, 60000, 80000, 40000]}
df = pd.DataFrame(data)
# sort by Age column in ascending order, then by Salary column in descending order
df.sort_values(['Age', 'Salary'], ascending=[True, False], inplace=True)
print(df)
Output:
Name Age Salary
3 Lisa 23 40000
0 John 25 50000
1 Jane 30 60000
2 Bob 45 80000
In this example, we sorted the DataFrame by the "Age" column in ascending order, then by the "Salary" column in descending order. We passed a list of column names to the sort_values()
method and a list of boolean values to the ascending
parameter to specify the sorting direction for each column.
How to Sort a Pandas DataFrame by Index?
You can also sort a Pandas DataFrame by its index using the sort_index()
method. Here's an example:
import pandas as pd
data = {'Name': ['John', 'Jane', 'Bob', 'Lisa'],
'Age': [25, 30, 45, 23],
'Salary': [50000, 60000, 80000, 40000]}
df = pd.DataFrame(data)
# sort by index in descending order
df.sort_index(ascending=False, inplace=True)
print(df)
Output:
Name Age Salary
3 Lisa 23 40000
2 Bob 45 80000
1 Jane 30 60000
0 John 25 50000
In this example, we sorted the DataFrame by its index in descending order using the sort_index()
method. The ascending
parameter is set to False
to sort in descending order.
How to Sort a Pandas DataFrame by Date?
Sorting a Pandas DataFrame by date is a common operation in time series analysis. You can use the sort_values()
method with the datetime
data type. Here's an example:
import pandas as pd
data = {'Date': ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04'],
'Sales': [100, 200, 150, 300]}
df = pd.DataFrame(data)
# convert Date column to datetime data type
df['Date'] = pd.to_datetime(df['Date'])
# sort by Date column in ascending order
df.sort_values('Date', ascending=True, inplace=True)
print(df)
Output:
Date Sales
0 2022-01-01 100
1 2022-01-02 200
2 2022-01-03 150
3 2022-01-04 300
In this example, we created a DataFrame with a "Date" column and a "Sales" column. We used the to_datetime()
method to convert the "Date" column to the datetime
data type. We then used the sort_values()
method to sort the DataFrame by the "Date" column in ascending order.
Pandas DataFrame Sort Values
The sort_values()
method is the primary method for sorting a Pandas DataFrame. It can sort a DataFrame by a single column or multiple columns. It also supports sorting by index and by date.
Conclusion
Sorting data in a Pandas DataFrame is an essential operation for data analysis and visualization. In this tutorial, we covered how to sort a Pandas DataFrame by column, multiple columns, index, and date. We also discussed the difference between sorting in ascending and descending order. By mastering these techniques, you will be able to manipulate data like a pro.