MLP FU
Pandas

Data Structures in Pandas

1. Series

A one-dimensional labeled array (like a column in Excel).

import pandas as pd

# Create Series from list
temps = [98.6, 99.1, 97.9]
patients = pd.Series(temps, index=['Alice', 'Bob', 'Charlie'])
print(patients)

Key Points:

  • Fast for single-column operations
  • Used when you need simple labeled data
  • Behaves like a Python dictionary with superpowers

2. DataFrame

A two-dimensional table (like a whole Excel sheet).

# Create DataFrame from dictionary
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Temperature': [98.6, 99.1, 97.9],
'Sick': [False, True, False]
}

df = pd.DataFrame(data)
print(df)

Key Points:

  • Most commonly used pandas object
  • Fast for column operations (slower row-by-row)
  • Used for 90% of data tasks in Python

Creation Methods

You can create these from:

  • Python lists/dictionaries
  • Numpy arrays
  • CSV/Excel files
  • SQL databases
# From CSV (very common)
df = pd.read_csv('data.csv')

# From list of lists
data = [[1, 'A'], [2, 'B']]
df = pd.DataFrame(data, columns=['Number', 'Letter'])

Speed Tips

  • Vectorized operations are fastest
  • Avoid looping row-by-row
  • Use .apply() when needed

When to Use

  • Series: Single measurements, time series
  • DataFrame: Most real-world data (tables, spreadsheets)