Pandas
Reading Data from Different Sources in Pandas
1. CSV Files (Most Common)
# Basic CSV read
df = pd.read_csv('data.csv')
# With options
df = pd.read_csv('data.csv',
sep=',', # Custom delimiter
header=0, # Row to use as column names
skiprows=2, # Skip first 2 rows
na_values=['NA']) # Treat 'NA' as missing2. Excel Files
# Read Excel (requires openpyxl or xlrd)
df = pd.read_excel('data.xlsx',
sheet_name='Sheet1', # or 0 for first sheet
skiprows=1,
usecols='A:C') # Only cols A to C3. SQL Databases
# Requires SQLAlchemy or database driver
from sqlalchemy import create_engine
# Create connection
engine = create_engine('sqlite:///database.db')
# Read SQL query
df = pd.read_sql('SELECT * FROM patients', engine)
# Read entire table
df = pd.read_sql_table('patients', engine)4. JSON Data
# From JSON file
df = pd.read_json('data.json')
# From JSON string
df = pd.read_json('[{'name':'Alice'},{'name':'Bob'}]')5. Web URLs
# Read CSV directly from URL
df = pd.read_csv('https://example.com/data.csv')
# Read HTML tables (returns list of DataFrames)
tables = pd.read_html('https://example.com/table.html')7. Parquet (For Big Data)
# Efficient binary format
df = pd.read_parquet('data.parquet')