MLP FU
Pandas

Statistical Functions

Pandas DataFrames and Series come with a rich set of statistical functions. These are essential for summarizing and understanding your data.

Common Statistical Functions

Here are some of the most commonly used statistical functions.

import pandas as pd
data = {'col1': [1, 2, 3, 4, 5], 'col2': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Get the mean of each column
print("Mean:")
print(df.mean())

# Get the median of each column
print("\nMedian:")
print(df.median())

# Get the sum of each column
print("\nSum:")
print(df.sum())

# Get the minimum value of each column
print("\nMin:")
print(df.min())

# Get the maximum value of each column
print("\nMax:")
print(df.max())

The describe() Method

The describe() method provides a quick summary of the central tendency, dispersion, and shape of a dataset's distribution, excluding NaN values.

For numeric data, the result's index will include count, mean, std, min, max as well as lower, 50 and upper percentiles.

import pandas as pd
data = {'numeric': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
df = pd.DataFrame(data)
print(df.describe())

For object data (e.g. strings), describe() will return a different set of statistics:

import pandas as pd
data = {'letters': ['a', 'b', 'c', 'a', 'a', 'c', 'd', 'd', 'd', 'd']}
df = pd.DataFrame(data)
print(df.describe())

Applying Functions Axis-wise

You can apply these functions along the rows or columns by specifying the axis parameter. axis=0 (default) applies the function down the columns, and axis=1 applies the function across the rows.

import pandas as pd
data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)

# Calculate sum for each column (axis=0)
print("Column sums:")
print(df.sum(axis=0))

# Calculate sum for each row (axis=1)
print("\nRow sums:")
print(df.sum(axis=1))