Pandas
Pivot Tables
Pivot tables are a powerful tool for summarizing and reshaping data. In pandas, the pivot_table() function provides this functionality.
Creating a Simple Pivot Table
You can create a pivot table to aggregate data from a DataFrame. You need to specify the index, columns, and values you want to use.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'],
'B': ['one', 'one', 'two', 'two', 'one', 'one'],
'C': np.random.randn(6),
'D': np.random.randn(6)})
# Create a pivot table with 'A' as index, 'B' as columns, and 'C' as values
pivot = pd.pivot_table(df, values='C', index=['A'], columns=['B'])
print(pivot)Aggregate Functions
The default aggregation function is mean. You can specify a different function or multiple functions using the aggfunc parameter.
| Function | Description |
|---|---|
'mean' | Average of values |
'sum' | Sum of values |
'count' | Count of values |
'min' | Minimum value |
'max' | Maximum value |
'std' | Standard deviation |
'var' | Variance |
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'],
'B': ['one', 'one', 'two', 'two', 'one', 'one'],
'C': np.random.randn(6),
'D': np.random.randn(6)})
# Create a pivot table with sum as the aggregation function
pivot_sum = pd.pivot_table(df, values='C', index=['A'], columns=['B'], aggfunc='sum')
print("Pivot with sum:\n", pivot_sum)
# Use multiple aggregation functions
pivot_multi = pd.pivot_table(df, values='D', index=['A', 'B'], aggfunc=['mean', 'sum'])
print("\nPivot with multiple functions:\n", pivot_multi)
Handling Missing Values
The pivot_table() function allows you to fill missing values using the fill_value parameter.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': ['foo', 'foo', 'foo', 'bar'],
'B': ['one', 'two', 'two', 'one'],
'C': np.random.randn(4),
'D': np.random.randn(4)})
# Pivot table will have NaN where combinations don't exist
pivot = pd.pivot_table(df, values='C', index=['A'], columns=['B'])
print("Pivot with NaN:\n", pivot)
# Fill missing values with 0
pivot_filled = pd.pivot_table(df, values='C', index=['A'], columns=['B'], fill_value=0)
print("\nPivot with filled values:\n", pivot_filled)