Statistical Functions

Pandas DataFrames and Series come with a rich set of statistical functions. These are essential for summarizing and understanding your data.

Common Statistical Functions

Here are some of the most commonly used statistical functions.

import pandas as pd
data = {'col1': [1, 2, 3, 4, 5], 'col2': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Get the mean of each column
print("Mean:")
print(df.mean())

# Get the median of each column
print("\nMedian:")
print(df.median())

# Get the sum of each column
print("\nSum:")
print(df.sum())

# Get the minimum value of each column
print("\nMin:")
print(df.min())

# Get the maximum value of each column
print("\nMax:")
print(df.max())

The `describe()` Method

The describe() method provides a quick summary of the central tendency, dispersion, and shape of a dataset's distribution, excluding NaN values.

For numeric data, the result's index will include count, mean, std, min, max as well as lower, 50 and upper percentiles.

import pandas as pd
data = {'numeric': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
df = pd.DataFrame(data)
print(df.describe())

For object data (e.g. strings), describe() will return a different set of statistics:

import pandas as pd
data = {'letters': ['a', 'b', 'c', 'a', 'a', 'c', 'd', 'd', 'd', 'd']}
df = pd.DataFrame(data)
print(df.describe())

You can apply these functions along the rows or columns by specifying the axis parameter. axis=0 (default) applies the function down the columns, and axis=1 applies the function across the rows.

import pandas as pd
data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)

# Calculate sum for each column (axis=0)
print("Column sums:")
print(df.sum(axis=0))

# Calculate sum for each row (axis=1)
print("\nRow sums:")
print(df.sum(axis=1))

Statistical Functions

Common Statistical Functions

The `describe()` Method

Applying Functions Axis-wise

On this page

Statistical Functions

Common Statistical Functions

The describe() Method

Applying Functions Axis-wise

On this page

The `describe()` Method