MLP FU
Pandas

Selection & Filtering

Pandas provides a variety of ways to select and filter data from a DataFrame or Series. These methods are essential for data analysis.

Selecting Columns

You can select a single column by its name, which returns a Series.

import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df['col1'])

To select multiple columns, pass a list of column names. This returns a new DataFrame.

import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8], 'col3': [9, 10, 11, 12]}
df = pd.DataFrame(data)
print(df[['col1', 'col3']])

Selecting Rows with loc and iloc

Pandas provides two main methods for selecting rows: loc for label-based indexing and iloc for integer-based indexing.

loc (Label-based)

Select rows and columns by their labels.

import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8]}
df = pd.DataFrame(data, index=['a', 'b', 'c', 'd'])
# Select a single row by its label
print(df.loc['a'])

You can also slice rows and select specific columns.

import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8]}
df = pd.DataFrame(data, index=['a', 'b', 'c', 'd'])
# Select rows 'a' to 'c' and column 'col1'
print(df.loc['a':'c', 'col1'])

iloc (Integer-based)

Select rows and columns by their integer position.

import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8]}
df = pd.DataFrame(data)
# Select the first row
print(df.iloc[0])

Slicing with iloc works similarly to Python lists.

import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8]}
df = pd.DataFrame(data)
# Select the first two rows and the first column
print(df.iloc[0:2, 0])

Conditional Filtering

You can filter data based on conditions, also known as boolean indexing.

import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [10, 20, 5, 15]}
df = pd.DataFrame(data)
# Select rows where col1 is greater than 2
print(df[df['col1'] > 2])

You can combine multiple conditions using & (and) and | (or). Remember to wrap each condition in parentheses.

import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [10, 20, 5, 15]}
df = pd.DataFrame(data)
# Select rows where col1 > 2 and col2 < 20
print(df[(df['col1'] > 2) & (df['col2'] < 20)])