Pandas
Merging and Joining
Pandas provides powerful and flexible functions for combining datasets. merge and join are two key methods for this purpose.
Merging DataFrames with pd.merge()
The merge() function is the primary entry point for all standard database join operations between DataFrame objects.
Basic Merge (Inner Join)
By default, merge() performs an inner join on common columns.
import pandas as pd
left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3']})
right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K4'],
'B': ['B0', 'B1', 'B2', 'B4']})
result = pd.merge(left, right, on='key')
print(result)Specifying Join Types
You can specify the type of join with the how parameter: 'left', 'right', 'outer', 'inner'.
import pandas as pd
left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3']})
right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K4'],
'B': ['B0', 'B1', 'B2', 'B4']})
# Left Join
left_join = pd.merge(left, right, on='key', how='left')
print("Left Join:\n", left_join)
# Outer Join
outer_join = pd.merge(left, right, on='key', how='outer')
print("\nOuter Join:\n", outer_join)Joining DataFrames with df.join()
The join() method is a convenient method for combining the columns of two potentially differently-indexed DataFrames into a single DataFrame. It performs a left join by default, based on the index.
import pandas as pd
left = pd.DataFrame({'A': ['A0', 'A1', 'A2']},
index=['K0', 'K1', 'K2'])
right = pd.DataFrame({'B': ['B0', 'B1', 'B2']},
index=['K0', 'K1', 'K2'])
result = left.join(right)
print(result)Joining on a Column
You can also join on a column of the left DataFrame.
import pandas as pd
left = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
'A': ['A0', 'A1', 'A2']})
right = pd.DataFrame({'B': ['B0', 'B1', 'B2']},
index=['K0', 'K1', 'K3'])
# Join 'left' on its 'key' column with 'right' on its index
result = left.join(right, on='key')
print(result)