Joining two dataframes in Python Pandas

Aparna Mishra
3 min readSep 24, 2021

Structured tables like dataframes are merged or joined having atleast one column in common between both the dataframes.

This can also be done doing SQL joins.
merge( ) is used to perform joins in Python.

The entire jupyter notebook and data is available on link

Reading data in pandas :

#importing important libraryimport pandas as pd

Creating dataframes df1 and df2 for customer and orders data :

df1 = pd.read_csv('customer_data.csv')
df1
Dataframe df1 with customer_data.csv
df2 = pd.read_csv('orders_data.csv')
df2
Dataframe df2 with orders_data.csv

Now Joining the dataframes using Inner Join

Inner join gives the result only for those rows where the condition is satisfied in both rows.

# df1 and df2 are merged in a single dataframeinner_join = pd.merge(df1, 
df2,
on ='orderId',
how ='inner')
inner_join

Now Joining the dataframes using Left Join

Left join gives all the rows from 1st dataframe (df1) and matches rows from the 2nd dataframe(df2). If the rows are not matched in the 2nd dataframe then they will be replaced by NaN.

#where we have no records in dataframe df2 matching the dataframe df1 so the values are 'NaN'left_join = pd.merge(df1, 
df2,
on ='orderId',
how ='left')
left_join

Now Joining the dataframes using Right Join

Right join gives all the rows from 2nd dataframe (df2) and matches rows from the 1st dataframe(df1). If the rows are not matched in the 1st dataframe then they will be replaced by NaN.

#same as left join only order is changed , here it is 'right'right_join = pd.merge(df1, 
df2,
on ='orderId',
how ='right')
right_join

Now Joining the dataframes using Outer Join

Outer join provides the output dataframe consisting of rows from both the dataframes. If values don’t matched then we get NaN , else we get the corresponding values.

Outer_join = pd.merge(df1, 
df2,
on ='orderId',
how ='outer')
Outer_join

--

--