Joining two dataframes in Python Pandas
Structured tables like dataframes are merged or joined having atleast one column in common between both the dataframes.
This can also be done doing SQL joins.
merge( ) is used to perform joins in Python.
The entire jupyter notebook and data is available on link
Reading data in pandas :
#importing important libraryimport pandas as pd
Creating dataframes df1 and df2 for customer and orders data :
df1 = pd.read_csv('customer_data.csv')
df1
df2 = pd.read_csv('orders_data.csv')
df2
Now Joining the dataframes using Inner Join
Inner join gives the result only for those rows where the condition is satisfied in both rows.
# df1 and df2 are merged in a single dataframeinner_join = pd.merge(df1,
df2,
on ='orderId',
how ='inner')
inner_join
Now Joining the dataframes using Left Join
Left join gives all the rows from 1st dataframe (df1) and matches rows from the 2nd dataframe(df2). If the rows are not matched in the 2nd dataframe then they will be replaced by NaN.
#where we have no records in dataframe df2 matching the dataframe df1 so the values are 'NaN'left_join = pd.merge(df1,
df2,
on ='orderId',
how ='left')
left_join
Now Joining the dataframes using Right Join
Right join gives all the rows from 2nd dataframe (df2) and matches rows from the 1st dataframe(df1). If the rows are not matched in the 1st dataframe then they will be replaced by NaN.
#same as left join only order is changed , here it is 'right'right_join = pd.merge(df1,
df2,
on ='orderId',
how ='right')
right_join
Now Joining the dataframes using Outer Join
Outer join provides the output dataframe consisting of rows from both the dataframes. If values don’t matched then we get NaN , else we get the corresponding values.
Outer_join = pd.merge(df1,
df2,
on ='orderId',
how ='outer')
Outer_join