How to concat two Pandas DataFrames: Explained!

Name: Rajiv Chandra

Published on 8/19/2023

Pandas is one of the most popular open-source tools for Data Science. It provides DataFrame structures that offer high-level performance for tabular data manipulation, cleaning, analysis, and visualization. In Pandas, it is common to have to merge two or more DataFrames to achieve some kind of analysis on the data. We can achieve this by combining the DataFrames either horizontally or vertically. Here, we will illustrate and discuss both methods in detail.

Want to quickly create Data Visualization from Python Pandas Dataframe with No code?

PyGWalker is a Python library for Exploratory Data Analysis with Visualization. PyGWalker (opens in a new tab) can simplify your Jupyter Notebook data analysis and data visualization workflow, by turning your pandas dataframe (and polars dataframe) into a Tableau-style User Interface for visual exploration.

(opens in a new tab)

Concatenating Two DataFrames Vertically

In Pandas, two DataFrames can be concatenated using the concat() method. To demonstrate this, we will start by creating two sample DataFrames. In the first sample DataFrame, let's say we have information on some employees in a company:

import pandas as pd
 
# Creating DataFrame 1
df1 = pd.DataFrame({
    'Name': ['John', 'Jack', 'Steve', 'Sarah'],
    'Age': [24, 32, 19, 29],
    'Gender': ['M', 'M', 'M', 'F']
})

This will create a DataFrame that looks like this:

    Name    Age Gender
0   John    24     M
1   Jack    32     M
2   Steve   19     M
3   Sarah   29     F

Let's say we have another DataFrame that contains information on the departments in the company:

# Creating DataFrame 2
df2 = pd.DataFrame({
    'Department': ['Marketing', 'Sales', 'Human Resources'],
    'Employees': [15, 12, 10],
})

This will create a DataFrame that looks like this:

        Department    Employees
0       Marketing        15
1           Sales        12
2   Human Resources     10

Now, we can use the concat() method to combine the two DataFrames vertically:

# Concatenating vertically
df3 = pd.concat([df1, df2], axis=0)

Here, the axis=0 parameter denotes that we want to concatenate the DataFrames by stacking them on top of each other (i.e., vertically). After concatenation, we get the following output:

        Name    Age Gender   Department    Employees
0       John    24      M     NaN          NaN
1       Jack    32      M     NaN          NaN
2      Steve    19      M     NaN          NaN
3      Sarah    29      F     NaN          NaN
0       NaN     NaN    NaN    Marketing     15
1       NaN     NaN    NaN    Sales         12
2       NaN     NaN    NaN    Human Resources 10

We can see that the two DataFrames are concatenated as desired, but there are some NaN (null) values in the new DataFrame where the column names don't match. We can drop those rows if they don't fit our data analysis, or we can provide some default values to fill the null values.

Concatenating Two DataFrames Horizontally

We can also concatenate two DataFrames horizontally (i.e., combine them side-by-side) using the concat() method, like so:

# Concatenating horizontally
df4 = pd.concat([df1, df2], axis=1)

Here, the axis=1 parameter denotes that we want to concatenate the DataFrames by putting them beside each other (i.e., horizontally). After concatenation, we get the following output:

    Name    Age Gender     Department   Employees
0   John    24     M        Marketing      15
1   Jack    32     M        Sales          12
2   Steve   19     M        Human Resources 10
3   Sarah   29     F        NaN             NaN

We can see that the two DataFrames are concatenated horizontally as desired. However, we have some NaN (null) values in the new DataFrame where the row names don't match. Again, we can drop those rows if they don't fit our data analysis, or we can provide some default values to fill the null values.

Concatenating DataFrames with Different Columns

What happens if the two DataFrames to be concatenated have different columns? In such cases, Pandas will identify the non-matching column names and add them to the new DataFrame as separate columns. Let's illustrate this with an example:

# Creating DataFrame with different columns
df5 = pd.DataFrame({
    'Employee Name': ['John', 'Jack', 'Steve', 'Sarah'],
    'Employee Age': [24, 32, 19, 29],
    'Job Title': ['Manager', 'Assistant', 'Clerk', 'Executive']
})

This will create a DataFrame that looks like this:

    Employee Name   Employee Age    Job Title
0       John          24             Manager
1       Jack          32             Assistant
2       Steve         19             Clerk
3       Sarah         29             Executive

Now we can concatenate df1 and df5:

# Concatenate df1 and df5
df6 = pd.concat([df1, df5], axis=1)

After concatenation, we get the following output:

    Name    Age Gender  Employee Name   Employee Age    Job Title
0   John    24     M       John          24             Manager
1   Jack    32     M       Jack          32             Assistant
2   Steve   19     M       Steve         19             Clerk
3   Sarah   29     F       Sarah         29             Executive

As we can see, Pandas concatenates the two DataFrames by adding the columns that don't match as separate columns in the new DataFrame.

Ignoring the Index When Concatenating DataFrames Horizontally

When concatenating two DataFrames horizontally, the resulting DataFrame retains the original indices of the two DataFrames. This can cause problems while working with this new DataFrame. Thus, it may be necessary in some cases to ignore the index while concatenating horizontally. We can accomplish this by setting the ignore_index parameter to True while concatenating:

# Concatenate df1 and df2, ignoring the index
df7 = pd.concat([df1, df2], axis=1, ignore_index=True)

After concatenation, we get the following output:

    0       1   2   3               4
0   John    24  M   Marketing      15
1   Jack    32  M   Sales          12
2   Steve   19  M   Human Resources 10
3   Sarah   29  F   NaN            NaN

Now the new DataFrame has a new index, which is more convenient for further data analysis.

Conclusion

Concatenating Pandas DataFrames vertically or horizontally provides us with an efficient way to merge data for analysis. With this guide, we have covered the basics of concatenation using the concat() method. With some practice and more knowledge of Pandas DataFrames, you'll be able to take large amounts of data and turn them into insights. If you're interested in learning more about Pandas, check out our Pandas tutorial pages below:

How to Zip Two Lists in Python with Ease Is Python Case Sensitive?