MLP FU
Pandas

Pivot Tables

Pivot tables are a powerful tool for summarizing and reshaping data. In pandas, the pivot_table() function provides this functionality.

Creating a Simple Pivot Table

You can create a pivot table to aggregate data from a DataFrame. You need to specify the index, columns, and values you want to use.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'],
                 'B': ['one', 'one', 'two', 'two', 'one', 'one'],
                 'C': np.random.randn(6),
                 'D': np.random.randn(6)})

# Create a pivot table with 'A' as index, 'B' as columns, and 'C' as values
pivot = pd.pivot_table(df, values='C', index=['A'], columns=['B'])
print(pivot)

Aggregate Functions

The default aggregation function is mean. You can specify a different function or multiple functions using the aggfunc parameter.

FunctionDescription
'mean'Average of values
'sum'Sum of values
'count'Count of values
'min'Minimum value
'max'Maximum value
'std'Standard deviation
'var'Variance
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'],
                 'B': ['one', 'one', 'two', 'two', 'one', 'one'],
                 'C': np.random.randn(6),
                 'D': np.random.randn(6)})

# Create a pivot table with sum as the aggregation function
pivot_sum = pd.pivot_table(df, values='C', index=['A'], columns=['B'], aggfunc='sum')
print("Pivot with sum:\n", pivot_sum)

# Use multiple aggregation functions
pivot_multi = pd.pivot_table(df, values='D', index=['A', 'B'], aggfunc=['mean', 'sum'])
print("\nPivot with multiple functions:\n", pivot_multi)

Handling Missing Values

The pivot_table() function allows you to fill missing values using the fill_value parameter.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': ['foo', 'foo', 'foo', 'bar'],
                 'B': ['one', 'two', 'two', 'one'],
                 'C': np.random.randn(4),
                 'D': np.random.randn(4)})

# Pivot table will have NaN where combinations don't exist
pivot = pd.pivot_table(df, values='C', index=['A'], columns=['B'])
print("Pivot with NaN:\n", pivot)

# Fill missing values with 0
pivot_filled = pd.pivot_table(df, values='C', index=['A'], columns=['B'], fill_value=0)
print("\nPivot with filled values:\n", pivot_filled)