Pandas
Data Structures in Pandas
1. Series
A one-dimensional labeled array (like a column in Excel).
import pandas as pd
# Create Series from list
temps = [98.6, 99.1, 97.9]
patients = pd.Series(temps, index=['Alice', 'Bob', 'Charlie'])
print(patients)Key Points:
- Fast for single-column operations
- Used when you need simple labeled data
- Behaves like a Python dictionary with superpowers
2. DataFrame
A two-dimensional table (like a whole Excel sheet).
# Create DataFrame from dictionary
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Temperature': [98.6, 99.1, 97.9],
'Sick': [False, True, False]
}
df = pd.DataFrame(data)
print(df)Key Points:
- Most commonly used pandas object
- Fast for column operations (slower row-by-row)
- Used for 90% of data tasks in Python
Creation Methods
You can create these from:
- Python lists/dictionaries
- Numpy arrays
- CSV/Excel files
- SQL databases
# From CSV (very common)
df = pd.read_csv('data.csv')
# From list of lists
data = [[1, 'A'], [2, 'B']]
df = pd.DataFrame(data, columns=['Number', 'Letter'])Speed Tips
- Vectorized operations are fastest
- Avoid looping row-by-row
- Use
.apply()when needed
When to Use
- Series: Single measurements, time series
- DataFrame: Most real-world data (tables, spreadsheets)