Sorting Pandas DataFrame by Index
Published on
One of the most powerful features of Pandas is its ability to handle and manipulate large amounts of data with ease. In this tutorial, we will be discussing one of the fundamental methods in Pandas - the sort_index()
method. With this method, we can sort a Pandas DataFrame by its index, whether it is numerical or string-based. By the end of this tutorial, you will have a solid understanding of how to use the sort_index()
method to sort your data and improve your data manipulation skills.
But before we dive into the sort_index()
method, let's talk briefly about what a Pandas DataFrame is.
Want to quickly create Data Visualizations in Python?
PyGWalker is an Open Source Python Project that can help speed up the data analysis and visualization workflow directly within a Jupyter Notebook-based environments.
PyGWalker (opens in a new tab) turns your Pandas Dataframe (or Polars Dataframe) into a visual UI where you can drag and drop variables to create graphs with ease. Simply use the following code:
pip install pygwalker
import pygwalker as pyg
gwalker = pyg.walk(df)
You can run PyGWalker right now with these online notebooks:
And, don't forget to give us a ⭐️ on GitHub!
What is a Pandas DataFrame?
A Pandas DataFrame is a two-dimensional table that has labeled rows and columns. It is similar to a spreadsheet or a SQL table. In a DataFrame, the rows represent observations or records, while the columns represent variables or features.
Pandas is built on top of NumPy, which means that it is incredibly fast at handling and manipulating large datasets. It also provides built-in methods for data cleaning, data manipulation, and data visualization.
Now that we have a basic understanding of a Pandas DataFrame let's move on to the sort_index()
method.
Sorting Pandas DataFrame by Index
The sort_index()
method is used to sort a Pandas DataFrame by its index. The index of a DataFrame is like the row number in a spreadsheet. It identifies each row in the DataFrame.
Let's take a look at an example.
import pandas as pd
# create a dictionary
data = {'name': ['John', 'Mark', 'Sara', 'Anna', 'Paul'],
'age': [24, 34, 21, 19, 26],
'city': ['New York', 'Paris', 'London', 'Berlin', 'San Francisco']}
# create a DataFrame
df = pd.DataFrame(data, index=['b', 'a', 'd', 'c', 'e'])
# sort the DataFrame by index
df = df.sort_index()
print(df)
Output:
name age city
a Mark 34 Paris
b John 24 New York
c Anna 19 Berlin
d Sara 21 London
e Paul 26 San Francisco
In the above example, we have created a dictionary data
with three keys name
, age
, and city
. We have then used this dictionary to create a DataFrame df
with the specified index.
After creating the DataFrame, we have used the sort_index()
method to sort the DataFrame by its index. As you can see, the sort_index()
method sorts the DataFrame by the index in ascending order.
If we want to sort the index in descending order, we can use the sort_index(ascending=False)
method.
# sort the DataFrame by index in descending order
df = df.sort_index(ascending=False)
print(df)
Output:
name age city
e Paul 26 San Francisco
d Sara 21 London
c Anna 19 Berlin
b John 24 New York
a Mark 34 Paris
As you can see, the sort_index(ascending=False)
method sorts the DataFrame in descending order.
Sorting Pandas Series by Index
A Pandas Series is a one-dimensional labeled array. It is similar to a column in a spreadsheet. Like a DataFrame, a Series also has an index.
To sort a Pandas Series by its index, we can use the sort_index()
method as well.
import pandas as pd
# create a dictionary
data = {'name': ['John', 'Mark', 'Sara', 'Anna', 'Paul'],
'age': [24, 34, 21, 19, 26],
'city': ['New York', 'Paris', 'London', 'Berlin', 'San Francisco']}
# create a DataFrame
df = pd.DataFrame(data, index=['b', 'a', 'd', 'c', 'e'])
# select a Series from the DataFrame
s = df['name']
# sort the Series by its index
s = s.sort_index()
print(s)
Output:
a Mark
b John
c Anna
d Sara
e Paul
Name: name, dtype: object
In the above code, we have first created a DataFrame df
with a specified index. The s
variable then selects the name
column from the DataFrame as a Series. We can then sort the Series by its index using the sort_index()
method.
Conclusion
In this tutorial, we have learned how to use the sort_index()
method to sort a Pandas DataFrame or Series by its index. This is a powerful method that can help us clean and manipulate large datasets with ease. We hope you found this tutorial helpful and informative.