MLP FU
Pandas

Reading Data from Different Sources in Pandas

1. CSV Files (Most Common)

# Basic CSV read
df = pd.read_csv('data.csv')

# With options
df = pd.read_csv('data.csv', 
sep=',',           # Custom delimiter
header=0,         # Row to use as column names
skiprows=2,      # Skip first 2 rows
na_values=['NA']) # Treat 'NA' as missing

2. Excel Files

# Read Excel (requires openpyxl or xlrd)
df = pd.read_excel('data.xlsx', 
sheet_name='Sheet1',  # or 0 for first sheet
skiprows=1,
usecols='A:C')        # Only cols A to C

3. SQL Databases

# Requires SQLAlchemy or database driver
from sqlalchemy import create_engine

# Create connection
engine = create_engine('sqlite:///database.db')

# Read SQL query
df = pd.read_sql('SELECT * FROM patients', engine)

# Read entire table
df = pd.read_sql_table('patients', engine)

4. JSON Data

# From JSON file
df = pd.read_json('data.json')

# From JSON string
df = pd.read_json('[{'name':'Alice'},{'name':'Bob'}]')

5. Web URLs

# Read CSV directly from URL
df = pd.read_csv('https://example.com/data.csv')

# Read HTML tables (returns list of DataFrames)
tables = pd.read_html('https://example.com/table.html')

7. Parquet (For Big Data)

# Efficient binary format
df = pd.read_parquet('data.parquet')