Pandas
Advanced Indexing
Pandas provides powerful indexing capabilities beyond basic selection. These techniques can significantly improve data manipulation efficiency.
MultiIndex / Hierarchical Indexing
MultiIndex allows you to have multiple levels of indexes, creating a hierarchical structure.
import pandas as pd
import numpy as np
# Create a DataFrame with MultiIndex
arrays = [
['A', 'A', 'B', 'B'],
['one', 'two', 'one', 'two']
]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second'))
df = pd.DataFrame(np.random.randn(4, 2), index=index, columns=['C', 'D'])
print(df)Selection with MultiIndex
You can select data at different levels of the hierarchy.
import pandas as pd
import numpy as np
# Create a DataFrame with MultiIndex
arrays = [
['A', 'A', 'B', 'B'],
['one', 'two', 'one', 'two']
]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second'))
df = pd.DataFrame(np.random.randn(4, 2), index=index, columns=['C', 'D'])
# Select using .loc with a tuple
print("\nSelect 'A' and 'one':")
print(df.loc[('A', 'one')])
# Select an entire level
print("\nSelect all 'A':")
print(df.loc['A'])Cross-section Selection with xs
The xs method allows you to select data at a particular level of a MultiIndex.
import pandas as pd
import numpy as np
# Create a more complex MultiIndex DataFrame
index = pd.MultiIndex.from_product([
['A', 'B'],
['one', 'two'],
[2019, 2020]
], names=['first', 'second', 'year'])
df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['C', 'D'])
print(df)
# Select all rows where 'second' level is 'one'
print("\nCross-section for 'one':")
print(df.xs('one', level='second'))Advanced Boolean Indexing
You can combine multiple boolean conditions for sophisticated filtering.
import pandas as pd
import numpy as np
# Create a sample DataFrame
df = pd.DataFrame({
'A': np.random.randn(8),
'B': np.random.randn(8),
'C': np.random.choice(['X', 'Y', 'Z'], 8),
'D': np.random.randint(0, 100, 8)
})
print(df)
# Complex filtering with multiple conditions
mask = (df['A'] > 0) & (df['D'] < 50) | (df['C'] == 'X')
print("\nFiltered with complex condition:")
print(df[mask])Query Method
The query method provides a concise way to filter data using a string expression.
import pandas as pd
import numpy as np
# Create a sample DataFrame
df = pd.DataFrame({
'A': np.random.randn(8),
'B': np.random.randn(8),
'C': np.random.choice(['X', 'Y', 'Z'], 8),
'D': np.random.randint(0, 100, 8)
})
print(df)
# Use query method
print("\nUsing query method:")
print(df.query('A > 0 and D < 50 or C == "X"'))Setting with Indexing
You can update values using indexing operations.
import pandas as pd
import numpy as np
# Create a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': ['X', 'Y', 'Z', 'W']
})
print("Original DataFrame:")
print(df)
# Set values based on a condition
df.loc[df['A'] > 2, 'B'] = 0
print("\nAfter conditional update:")
print(df)