Selection & Filtering
Pandas provides a variety of ways to select and filter data from a DataFrame or Series. These methods are essential for data analysis.
Selecting Columns
You can select a single column by its name, which returns a Series.
import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df['col1'])To select multiple columns, pass a list of column names. This returns a new DataFrame.
import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8], 'col3': [9, 10, 11, 12]}
df = pd.DataFrame(data)
print(df[['col1', 'col3']])Selecting Rows with loc and iloc
Pandas provides two main methods for selecting rows: loc for label-based indexing and iloc for integer-based indexing.
loc (Label-based)
Select rows and columns by their labels.
import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8]}
df = pd.DataFrame(data, index=['a', 'b', 'c', 'd'])
# Select a single row by its label
print(df.loc['a'])You can also slice rows and select specific columns.
import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8]}
df = pd.DataFrame(data, index=['a', 'b', 'c', 'd'])
# Select rows 'a' to 'c' and column 'col1'
print(df.loc['a':'c', 'col1'])iloc (Integer-based)
Select rows and columns by their integer position.
import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8]}
df = pd.DataFrame(data)
# Select the first row
print(df.iloc[0])Slicing with iloc works similarly to Python lists.
import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8]}
df = pd.DataFrame(data)
# Select the first two rows and the first column
print(df.iloc[0:2, 0])Conditional Filtering
You can filter data based on conditions, also known as boolean indexing.
import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [10, 20, 5, 15]}
df = pd.DataFrame(data)
# Select rows where col1 is greater than 2
print(df[df['col1'] > 2])You can combine multiple conditions using & (and) and | (or). Remember to wrap each condition in parentheses.
import pandas as pd
data = {'col1': [1, 2, 3, 4], 'col2': [10, 20, 5, 15]}
df = pd.DataFrame(data)
# Select rows where col1 > 2 and col2 < 20
print(df[(df['col1'] > 2) & (df['col2'] < 20)])