MLP FU
Pandas

Advanced Indexing

Pandas provides powerful indexing capabilities beyond basic selection. These techniques can significantly improve data manipulation efficiency.

MultiIndex / Hierarchical Indexing

MultiIndex allows you to have multiple levels of indexes, creating a hierarchical structure.

import pandas as pd
import numpy as np

# Create a DataFrame with MultiIndex
arrays = [
  ['A', 'A', 'B', 'B'],
  ['one', 'two', 'one', 'two']
]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second'))
df = pd.DataFrame(np.random.randn(4, 2), index=index, columns=['C', 'D'])
print(df)

Selection with MultiIndex

You can select data at different levels of the hierarchy.

import pandas as pd
import numpy as np

# Create a DataFrame with MultiIndex
arrays = [
  ['A', 'A', 'B', 'B'],
  ['one', 'two', 'one', 'two']
]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second'))
df = pd.DataFrame(np.random.randn(4, 2), index=index, columns=['C', 'D'])

# Select using .loc with a tuple
print("\nSelect 'A' and 'one':")
print(df.loc[('A', 'one')])

# Select an entire level
print("\nSelect all 'A':")
print(df.loc['A'])

Cross-section Selection with xs

The xs method allows you to select data at a particular level of a MultiIndex.

import pandas as pd
import numpy as np

# Create a more complex MultiIndex DataFrame
index = pd.MultiIndex.from_product([
  ['A', 'B'],
  ['one', 'two'],
  [2019, 2020]
], names=['first', 'second', 'year'])
df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['C', 'D'])
print(df)

# Select all rows where 'second' level is 'one'
print("\nCross-section for 'one':")
print(df.xs('one', level='second'))

Advanced Boolean Indexing

You can combine multiple boolean conditions for sophisticated filtering.

import pandas as pd
import numpy as np

# Create a sample DataFrame
df = pd.DataFrame({
  'A': np.random.randn(8),
  'B': np.random.randn(8),
  'C': np.random.choice(['X', 'Y', 'Z'], 8),
  'D': np.random.randint(0, 100, 8)
})
print(df)

# Complex filtering with multiple conditions
mask = (df['A'] > 0) & (df['D'] < 50) | (df['C'] == 'X')
print("\nFiltered with complex condition:")
print(df[mask])

Query Method

The query method provides a concise way to filter data using a string expression.

import pandas as pd
import numpy as np

# Create a sample DataFrame
df = pd.DataFrame({
  'A': np.random.randn(8),
  'B': np.random.randn(8),
  'C': np.random.choice(['X', 'Y', 'Z'], 8),
  'D': np.random.randint(0, 100, 8)
})
print(df)

# Use query method
print("\nUsing query method:")
print(df.query('A > 0 and D < 50 or C == "X"'))

Setting with Indexing

You can update values using indexing operations.

import pandas as pd
import numpy as np

# Create a sample DataFrame
df = pd.DataFrame({
  'A': [1, 2, 3, 4],
  'B': [5, 6, 7, 8],
  'C': ['X', 'Y', 'Z', 'W']
})
print("Original DataFrame:")
print(df)

# Set values based on a condition
df.loc[df['A'] > 2, 'B'] = 0
print("\nAfter conditional update:")
print(df)