MLP FU
Pandas

Comparing DataFrames

Pandas provides methods to compare two DataFrames and identify their differences.

The .equals() Method

The .equals() method checks if two DataFrames are element-wise identical, including the same shape, values, and index.

import pandas as pd
df1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df2 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df3 = pd.DataFrame({'a': [1, 5], 'b': [3, 4]})

print(f"df1 equals df2: {df1.equals(df2)}")
print(f"df1 equals df3: {df1.equals(df3)}")

The .compare() Method

For a more detailed comparison, the .compare() method shows the differences between two DataFrames. It aligns the two DataFrames and shows the values that are different.

import pandas as pd
df1 = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
df2 = pd.DataFrame({'col1': [1, 9, 3], 'col2': [4, 5, 7]})

# The result shows differences, with 'self' for df1 and 'other' for df2
comparison = df1.compare(df2)
print(comparison)

By default, compare stacks the differing values vertically. You can use align_axis=0 to show the differences side-by-side.

import pandas as pd
df1 = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
df2 = pd.DataFrame({'col1': [1, 9, 3], 'col2': [4, 5, 7]})

# Show differences side-by-side
comparison = df1.compare(df2, align_axis=0)
print(comparison)

You can also keep the original values that are equal with the keep_equal parameter.

import pandas as pd
df1 = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
df2 = pd.DataFrame({'col1': [1, 9, 3], 'col2': [4, 5, 7]})

# Keep equal values in the output
comparison = df1.compare(df2, keep_equal=True)
print(comparison)