Reading Data from Different Sources in Pandas

1. CSV Files (Most Common)

# Basic CSV read
df = pd.read_csv('data.csv')

# With options
df = pd.read_csv('data.csv', 
sep=',',           # Custom delimiter
header=0,         # Row to use as column names
skiprows=2,      # Skip first 2 rows
na_values=['NA']) # Treat 'NA' as missing

2. Excel Files

# Read Excel (requires openpyxl or xlrd)
df = pd.read_excel('data.xlsx', 
sheet_name='Sheet1',  # or 0 for first sheet
skiprows=1,
usecols='A:C')        # Only cols A to C

3. SQL Databases

# Requires SQLAlchemy or database driver
from sqlalchemy import create_engine

# Create connection
engine = create_engine('sqlite:///database.db')

# Read SQL query
df = pd.read_sql('SELECT * FROM patients', engine)

# Read entire table
df = pd.read_sql_table('patients', engine)

4. JSON Data

# From JSON file
df = pd.read_json('data.json')

# From JSON string
df = pd.read_json('[{'name':'Alice'},{'name':'Bob'}]')

5. Web URLs

# Read CSV directly from URL
df = pd.read_csv('https://example.com/data.csv')

# Read HTML tables (returns list of DataFrames)
tables = pd.read_html('https://example.com/table.html')

7. Parquet (For Big Data)

# Efficient binary format
df = pd.read_parquet('data.parquet')

Reading Data from Different Sources in Pandas

1. CSV Files (Most Common)

2. Excel Files

3. SQL Databases

4. JSON Data

5. Web URLs

7. Parquet (For Big Data)

On this page