Database Performance Optimization: A Practical Reference Guide

Part 1 of 2 in the Database Performance Optimization series

Database performance stems from configuration, query patterns, and indexing discipline—not technology choice. SQLite with correct PRAGMA settings handles 100,000+ queries/second in production. Systems achieve 1200X improvements through strategic indexing and 20X gains from enabling Write-Ahead Logging. This guide synthesizes production data from high-frequency trading systems, Uber’s 3M queries/second fleet, and comprehensive benchmarking to provide actionable optimization strategies.

The highest-leverage optimizations, in order: enable WAL mode (2-20X improvement), implement composite indexes (orders of magnitude), batch operations (10-100X), use connection pooling (10-50X), minimize lock contention, eliminate N+1 queries (10X), cache schema metadata (avoid 30X slowdowns).


Quick Reference

OptimizationImpactEffortPriority
Enable WAL mode2-20X1 lineCritical
Add composite indexes10-1000XLow-MediumCritical
Batch operations10-100XMediumHigh
Connection pooling10-50XMediumHigh
Eliminate N+1 queries10XMediumHigh
Cache schema metadata30XLowHigh
Short transactions (<100ms)Prevents deadlocksMediumHigh
Use covering indexes50-90% I/O reductionMediumMedium
Optimize PRAGMAs10-30%LowMedium

SQLite Production Configuration

Essential PRAGMA Settings

Execute immediately when opening any connection:

connection.executescript("""
    PRAGMA journal_mode = WAL;              -- Critical: 2-20X improvement
    PRAGMA synchronous = normal;            -- Reduces fsync overhead
    PRAGMA cache_size = -64000;             -- 64MB memory cache
    PRAGMA temp_store = memory;             -- Keep temp tables in RAM
    PRAGMA mmap_size = 30000000000;         -- 30GB mmap (10-30% read boost)
    PRAGMA busy_timeout = 5000;             -- Wait 5s for locks
    PRAGMA foreign_keys = ON;               -- Enable FK constraints
    PRAGMA journal_size_limit = 67108864;   -- Cap WAL at 64MB
""")

Configuration Impact Table

PRAGMAEffectWhen to Use
journal_mode = WALConcurrent reads during writesAlways in production
synchronous = normalReduces transaction latency from 30ms+ to <1msWith WAL mode
cache_size = -6400064MB cache reduces disk I/OMemory available
mmap_size10-30% read performance gain64-bit systems with RAM
busy_timeout = 5000Automatic retry on locksHigh concurrency

Memory-Mapped I/O Guidelines

PlatformRecommended mmap_sizeRationale
64-bit server30GBReserves virtual address space, not RAM
Mobile/embedded256MBLimited virtual address space
32-bit systems128MBAddress space constraints
Network filesystems0 (disabled)Incompatible with remote FS

Key insight: mmap reserves virtual address space. OS manages physical RAM based on access patterns. No RAM commitment upfront.

Query Planner Optimization

# Before closing connections
connection.execute("PRAGMA optimize")

# For long-running applications (periodic)
connection.execute("PRAGMA optimize=0x10002")

# Verify index usage
cursor.execute("EXPLAIN QUERY PLAN SELECT ...")

Interpret EXPLAIN output:

  • SCAN = full table scan ❌
  • SEARCH = index used âś…
  • COVERING INDEX = optimal (no table lookup) âś…âś…

Schema Design for Time-Series Data

Optimal Table Structure

CREATE TABLE market_data (
    symbol TEXT NOT NULL,
    timestamp_us INTEGER NOT NULL,  -- Microsecond precision
    price REAL NOT NULL,
    volume INTEGER NOT NULL,
    bid_price REAL,
    ask_price REAL,
    PRIMARY KEY (symbol, timestamp_us)
) WITHOUT ROWID, STRICT;

-- Composite index: equality before range
CREATE INDEX idx_symbol_time
ON market_data(symbol, timestamp_us DESC);

-- Covering index for common queries
CREATE INDEX idx_symbol_time_price_vol
ON market_data(symbol, timestamp_us DESC, price, volume);

-- Partial index for recent data (hot path)
CREATE INDEX idx_recent_data
ON market_data(symbol, timestamp_us DESC, price, volume)
WHERE timestamp_us > unixepoch('now', '-24 hours');

Schema Optimization Techniques

TechniqueBenefitTrade-off
INTEGER timestampsEfficient B-tree indexing, no float precision issuesMust convert from datetime
WITHOUT ROWIDSaves 8 bytes/row, eliminates internal rowidOnly when PK uniquely identifies row
STRICT (3.37.0+)Type checking, better query plansRequires explicit types
NOT NULLQuery optimizer benefits, smaller storageMust provide values
Partial indexesFaster queries, less maintenance overheadOnly helps matching queries
Covering indexes2-5X query speedup, no table lookups10-30% write overhead

Anti-Patterns to Avoid

Anti-PatternProblemSolution
EAV (Entity-Attribute-Value)Destroys query performanceProper columns or JSON for dynamic data
Comma-separated listsUn-indexable, un-queryableJunction tables with foreign keys
Polymorphic associationsAmbiguous foreign keysSeparate FKs or inheritance patterns
No primary keyDownstream problems, update issuesAlways define explicit PK
TEXT for numeric dataType conversion overheadUse INTEGER/REAL

Index Design Principles

Composite Index Rules

Leftmost prefix matching: Index on (A, B, C) supports:

  • âś… WHERE A = ?
  • âś… WHERE A = ? AND B = ?
  • âś… WHERE A = ? AND B = ? AND C = ?
  • ❌ WHERE B = ?
  • ❌ WHERE C = ?

Column ordering rules:

  1. Equality conditions before range conditions

    -- GOOD: equality (symbol) before range (timestamp)
    CREATE INDEX ON trades(symbol, timestamp DESC, price);
    
    -- BAD: range before equality
    CREATE INDEX ON trades(timestamp DESC, symbol, price);
  2. Most selective columns first (when all equality)

    -- If domain_id has 1000 unique values, user_id has 10000
    CREATE INDEX ON events(user_id, domain_id, event_type);
  3. Logical hierarchy over selectivity (when structure matters)

    -- Follow natural hierarchy even if selectivity suggests otherwise
    CREATE INDEX ON servers(domain_id, server_id, instance_id);

Covering Index Strategy

Include all queried columns in the index to avoid table lookups:

-- Query that will benefit
SELECT price, volume
FROM market_data
WHERE symbol = ? AND timestamp_us > ?;

-- Covering index eliminates table lookup
CREATE INDEX idx_covering
ON market_data(symbol, timestamp_us DESC, price, volume);

Cost-benefit:

  • Read benefit: 50-90% I/O reduction
  • Write cost: 10-30% slower inserts
  • Use for: Read-heavy workloads, critical query paths

Index Decision Matrix

ScenarioIndex StrategyRationale
High-frequency reads, rare writesCovering indexesMaximize read performance
Balanced read/writeStandard composite indexesReasonable trade-off
Write-heavy with range queriesPartial indexes on hot dataReduce write overhead
Time-series with recency biasPartial index on recent dataSmaller, faster, less maintenance
Sparse columnsPartial index with WHEREOnly index non-NULL values

Real-World Example: 829X Schema Optimization

Problem: Database initialization taking 6632ms, blocking 37 trading features.

Before:

def _initialize_schema(self):
    cursor.execute("""
        SELECT COUNT(*) FROM information_schema.tables
        WHERE table_schema = 'main'
    """)
    if cursor.fetchone()[0] == 0:
        self._create_tables()

After:

def _initialize_schema(self):
    # SQLite handles existence checks efficiently
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS market_data (
            symbol TEXT NOT NULL,
            timestamp INTEGER NOT NULL,
            bid_price REAL,
            ask_price REAL,
            PRIMARY KEY (symbol, timestamp)
        )
    """)

    # Indexes for trading query patterns
    cursor.execute("""
        CREATE INDEX IF NOT EXISTS idx_market_data_symbol_time
        ON market_data(symbol, timestamp DESC)
    """)

Result: 6632ms → 8ms (829X improvement)

Query performance after indexing:

  • Symbol data retrieval: 450ms → 12ms (37X)
  • Historical trade analysis: 2.3s → 85ms (27X)

Transaction Management & Lock Contention

Isolation Level Guide

LevelConsistencyConcurrencyUse Case
READ UNCOMMITTEDDirty reads allowedHighestAnalytics, approximate results
READ COMMITTEDOnly committed dataHighOLTP default (recommended)
REPEATABLE READConsistent readsMediumReports requiring consistency
SERIALIZABLEPerfect consistencyLowestCritical operations only

Recommendation: Use READ COMMITTED for OLTP. Only escalate when consistency requirements demand it.

Short Transaction Pattern

# BAD: Long-running transaction holding locks
def process_order(order_id):
    with connection:
        order = fetch_order(order_id)
        validate_inventory(order)  # External API call
        send_confirmation_email(order)  # Network I/O
        update_inventory(order)

# GOOD: Short transactions, external ops outside
def process_order(order_id):
    order = fetch_order(order_id)
    validate_inventory(order)  # Outside transaction
    send_confirmation_email(order)  # Outside transaction

    with connection:  # Transaction only for DB work
        update_inventory(order)

Target: Keep transactions <100ms to minimize lock contention.

Lock-Free Fast Path Pattern

Problem: Thread contention on cursor creation (high-frequency trading).

Before:

def get_cursor(self, thread_id):
    with self._lock:  # Lock on EVERY access
        if thread_id not in self._cursors:
            self._cursors[thread_id] = self.connection.cursor()
        return self._cursors[thread_id]

After (Double-Checked Locking):

def get_cursor(self, thread_id):
    # Fast path: lock-free read for existing cursors
    if thread_id in self._cursors:
        return self._cursors[thread_id]

    # Slow path: acquire lock only for new cursors
    with self._lock:
        if thread_id not in self._cursors:
            self._cursors[thread_id] = self.connection.cursor()
        return self._cursors[thread_id]

Result: 0.01ms → 0.00ms for cached cursors. 200,000 cursors/sec throughput.

Deadlock Prevention

Rule: Access objects in consistent order across all transactions.

# BAD: Inconsistent lock ordering creates deadlocks
# Transaction 1: locks A then B
# Transaction 2: locks B then A

# GOOD: Consistent ordering
def transfer(from_acct, to_acct, amount):
    # Always lock lower account ID first
    accounts = sorted([from_acct, to_acct], key=lambda a: a.id)
    with transaction:
        lock_account(accounts[0])
        lock_account(accounts[1])
        perform_transfer(from_acct, to_acct, amount)

Hot Row Solutions

Problem: Counter increments serialize all updates.

SolutionTrade-off
Batch updates in memory, flush periodicallyEventual consistency, potential loss on crash
Shard counters across multiple rowsMust aggregate at read time
Optimistic locking with version columnsRetry overhead on contention
Event sourcing append-only logStorage overhead, aggregation cost

Batch Operations & Bulk Loading

Performance Comparison

OperationTimeThroughputSpeedup
Individual INSERTs (auto-commit)120s167 rows/sBaseline
Batched INSERTs (manual commit)1s20,000 rows/s120X
Bulk load API (COPY, SqlBulkCopy)0.5s40,000 rows/s240X

Batch Insert Pattern

# BAD: Individual commits
for row in data:
    cursor.execute("INSERT INTO trades VALUES (?, ?, ?)", row)
    connection.commit()  # Fsync on EVERY row

# GOOD: Batched with manual transaction
connection.execute("BEGIN")
for row in data:
    cursor.execute("INSERT INTO trades VALUES (?, ?, ?)", row)
connection.commit()  # Single fsync

# BETTER: executemany
connection.execute("BEGIN")
cursor.executemany(
    "INSERT INTO trades VALUES (?, ?, ?)",
    data  # List of tuples
)
connection.commit()

Python sqlite3 benchmark: executemany() is 1.7X faster than loop with execute() (1.6s vs 2.7s for 1M rows).

Index Creation Timing

# BAD: Create indexes before bulk insert
CREATE INDEX idx_symbol ON trades(symbol);
-- Each INSERT updates the index (slow)
INSERT INTO trades ...  # Repeatedfor millions of rows

# GOOD: Create indexes after bulk insert
INSERT INTO trades ...  # Fast, no index maintenance
CREATE INDEX idx_symbol ON trades(symbol);  # Single index build

Impact: 6-7X faster for large bulk loads.

Bulk Loading APIs

DatabaseCommand/APISpeedup
PostgreSQLCOPY FROM50-100X
MySQLLOAD DATA INFILE50-100X
SQL ServerSqlBulkCopy50-100X
SQLiteexecutemany() + transaction120X

Python sqlite3 Patterns

Thread-Safe Connection Management

import threading
import sqlite3

class DatabaseManager:
    def __init__(self, db_path):
        self._db_path = db_path
        self._local = threading.local()

    def get_connection(self):
        """Thread-local connections for concurrency."""
        if not hasattr(self._local, 'connection'):
            self._local.connection = sqlite3.connect(
                self._db_path,
                timeout=30.0,
                check_same_thread=False
            )
            # Enable WAL on each connection
            self._local.connection.execute("PRAGMA journal_mode=WAL")
        return self._local.connection

# Usage
db = DatabaseManager("trades.db")
conn = db.get_connection()  # Unique per thread

Why thread-local: Avoids lock contention from shared connection while leveraging WAL’s concurrent reads.

Explicit Transaction Control (Python 3.12+)

# Modern approach (Python 3.12+)
conn = sqlite3.connect("db.sqlite", autocommit=False)

# For fine-grained control
conn = sqlite3.connect("db.sqlite", autocommit=True)
conn.execute("BEGIN IMMEDIATE")  # Acquire write lock immediately
try:
    conn.execute("INSERT ...")
    conn.execute("UPDATE ...")
    conn.execute("COMMIT")
except Exception:
    conn.execute("ROLLBACK")
    raise

Why BEGIN IMMEDIATE: Prevents upgrade deadlocks where multiple deferred transactions compete for write locks.

Exponential Backoff Retry

import time
import random
import sqlite3

def execute_with_retry(conn, sql, params=None, max_retries=5):
    """Retry with exponential backoff for lock contention."""
    for attempt in range(max_retries):
        try:
            cursor = conn.execute(sql, params or [])
            conn.commit()
            return cursor
        except sqlite3.OperationalError as e:
            if "database is locked" in str(e) and attempt < max_retries - 1:
                # Exponential backoff with jitter
                wait = (2 ** attempt) * 0.1  # 100ms, 200ms, 400ms, 800ms, 1.6s
                jitter = random.uniform(0, wait)
                time.sleep(wait + jitter)
            else:
                raise

Do NOT retry: IntegrityError (constraint violations), ProgrammingError (code bugs).

Context Manager Gotcha

# WRONG: Connection stays open
with sqlite3.connect("db.sqlite") as conn:
    conn.execute("INSERT ...")
# conn is STILL OPEN here!

# CORRECT: Explicitly close
conn = sqlite3.connect("db.sqlite")
try:
    with conn:  # Manages transactions only
        conn.execute("INSERT ...")
finally:
    conn.close()

# OR: Use contextlib
from contextlib import closing
with closing(sqlite3.connect("db.sqlite")) as conn:
    with conn:
        conn.execute("INSERT ...")

Iterator Pattern for Large Results

# BAD: Loads all results into memory
cursor.execute("SELECT * FROM large_table")
rows = cursor.fetchall()  # Memory spike
for row in rows:
    process(row)

# GOOD: Iterate directly
for row in conn.execute("SELECT * FROM large_table"):
    process(row)
    if should_stop():
        break  # Early termination possible

Connection Pooling

Sizing Formula

OLTP workload: connections = 10-20 per application instance

Traditional formula (for reference):

connections = (2 Ă— database_CPU_cores) + disk_spindle_count

Note: Refers to database server CPUs, not application server CPUs.

Python Connection Pool Example

from queue import Queue, Empty
import sqlite3

class ConnectionPool:
    def __init__(self, db_path, pool_size=10):
        self._db_path = db_path
        self._pool = Queue(maxsize=pool_size)
        for _ in range(pool_size):
            conn = sqlite3.connect(db_path, check_same_thread=False)
            conn.execute("PRAGMA journal_mode=WAL")
            self._pool.put(conn)

    def get_connection(self, timeout=5.0):
        try:
            return self._pool.get(timeout=timeout)
        except Empty:
            raise RuntimeError("Connection pool exhausted")

    def return_connection(self, conn):
        self._pool.put(conn)

    def __enter__(self):
        self._conn = self.get_connection()
        return self._conn

    def __exit__(self, *args):
        self.return_connection(self._conn)

# Usage
pool = ConnectionPool("trades.db", pool_size=20)
with pool as conn:
    conn.execute("INSERT ...")

PgBouncer Transaction Pooling

Multiplexing: 1,000 client connections → 25 database connections (40X)

[databases]
mydb = host=localhost dbname=mydb

[pgbouncer]
pool_mode = transaction    # Assign per transaction, not per session
max_client_conn = 1000
default_pool_size = 25

Modes:

  • session: One DB connection per client (traditional)
  • transaction: One DB connection per transaction (40X multiplexing)
  • statement: One DB connection per statement (breaks prepared statements)

Performance Monitoring

Four Golden Signals

SignalMetricTargetAlert Threshold
LatencyP95, P99 query time<10ms (OLTP)P99 > 50ms
TrafficQueries per secondVariesSudden 3X spike
ErrorsFailed query %<0.1%>1% failed
SaturationConnection pool usage<80%>90% utilization

Why percentiles over averages: P99 latency shows worst user experience, not hidden by fast queries.

Health Check Tiers

# Tier 1: Liveness (microseconds)
def liveness_check():
    """TCP connectivity only."""
    return {"status": "ok"}

# Tier 2: Local health (milliseconds)
def local_health():
    """Server-specific validation."""
    start = time.perf_counter()
    cursor = get_cursor()
    cursor.execute("SELECT 1")
    latency_ms = (time.perf_counter() - start) * 1000
    return {
        "status": "healthy" if latency_ms < 1.0 else "degraded",
        "latency_ms": latency_ms,
        "active_cursors": len(self._cursors)
    }

# Tier 3: Dependency health (seconds)
def deep_health():
    """Validate external dependencies."""
    db_health = local_health()
    api_health = check_external_api()
    return {
        "database": db_health,
        "api": api_health,
        "status": "healthy" if all_healthy() else "unhealthy"
    }

Pattern: Fail-open, not fail-closed. When all servers report unhealthy due to shared dependency failure, continue routing to all servers.

Query Profiling Workflow

# Step 1: Identify slow queries
cursor.execute("EXPLAIN QUERY PLAN " + query)
for row in cursor:
    print(row)

# Step 2: Check for table scans
# Look for "SCAN" in output (bad)
# Look for "SEARCH" or "COVERING INDEX" (good)

# Step 3: Run ANALYZE to update statistics
cursor.execute("ANALYZE")

# Step 4: Re-check query plan
cursor.execute("EXPLAIN QUERY PLAN " + query)

# Step 5: Add missing indexes based on plan
cursor.execute("CREATE INDEX idx_missing ON table(column)")

Sampling Strategy

Problem: Tracing 100% of queries adds significant overhead.

Solution: Sample 1-5% of queries for profiling.

import random

def should_trace():
    return random.random() < 0.05  # 5% sampling

def execute_query(sql):
    if should_trace():
        start = time.perf_counter()
        result = cursor.execute(sql)
        duration = time.perf_counter() - start
        log_slow_query(sql, duration)
        return result
    else:
        return cursor.execute(sql)

Benefit: Statistical insights with 95% less overhead.


Common Anti-Patterns

Anti-PatternSymptomImpactSolutionImprovement
Schema metadata queries in hot pathinformation_schema in request processing30X slowdownCache schema at startup30X faster
N+1 queries1 parent query + N child queries10X slowerJOIN or IN clause10X faster
Individual inserts in loopSlow bulk operations100X slowerBatch + transaction100X faster
Missing indexesFull table scansMinutes vs millisecondsAdd composite indexes1000X+ faster
Over-indexingWrite degradation50-100X slower writesRemove unused indexes50-100X faster writes
Long transactionsLock contentionCascading delaysKeep <100msPrevents deadlocks
Hot row contentionSerialized updatesThroughput limitsShard or batch10X+ throughput
No connection poolingConnection overhead10-50X slowerImplement pooling10-50X faster

N+1 Query Example

# BAD: N+1 queries (1 + 1000 = 1001 queries)
posts = db.execute("SELECT * FROM posts WHERE user_id = ?", [user_id])
for post in posts:
    # Executes once per post
    comments = db.execute("SELECT * FROM comments WHERE post_id = ?", [post.id])

# GOOD: Single query with JOIN
posts_with_comments = db.execute("""
    SELECT p.*, c.*
    FROM posts p
    LEFT JOIN comments c ON p.id = c.post_id
    WHERE p.user_id = ?
""", [user_id])

Measurement: 800 items across 17 categories: 100ms (JOIN) vs >1s (N+1).

UPSERT Anti-Pattern

# BAD: Race condition between check and insert
result = db.execute("SELECT id FROM users WHERE email = ?", [email])
if result.fetchone():
    db.execute("UPDATE users SET name = ? WHERE email = ?", [name, email])
else:
    db.execute("INSERT INTO users (email, name) VALUES (?, ?)", [email, name])

# GOOD: Database-native UPSERT
# SQLite
db.execute("""
    INSERT INTO users (email, name) VALUES (?, ?)
    ON CONFLICT(email) DO UPDATE SET name = excluded.name
""", [email, name])

# PostgreSQL
db.execute("""
    INSERT INTO users (email, name) VALUES (%s, %s)
    ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name
""", [email, name])

# MySQL
db.execute("""
    INSERT INTO users (email, name) VALUES (%s, %s)
    ON DUPLICATE KEY UPDATE name = VALUES(name)
""", [email, name])

Benchmarking & Measurement

Statistical Requirements

RequirementSpecificationRationale
Iterations30-100+ runsDetect statistical significance
Confidence levelp < 0.0595% confidence differences are real
MetricsP50, P90, P99, P99.9Tail latency matters
Warm-up5-10 iterations discardedAllow caches to stabilize
ConsistencySame hardware, load, timeEliminate confounding factors

Statistical Tests

from scipy import stats

# Two-sample t-test (for normally distributed data)
t_stat, p_value = stats.ttest_ind(baseline_times, optimized_times)
if p_value < 0.05:
    print(f"Statistically significant improvement (p={p_value:.4f})")

# Mann-Whitney U test (non-parametric)
u_stat, p_value = stats.mannwhitneyu(baseline_times, optimized_times)

# Wilcoxon signed-rank (paired comparison)
w_stat, p_value = stats.wilcoxon(before_times, after_times)

Benchmarking Tools

ToolDatabaseUse Case
pgbenchPostgreSQLTPC-B workload, custom scripts
mysqlslapMySQLConcurrency simulation
HammerDBMulti-DBTPC-C, TPC-H benchmarks
YCSBNoSQL/distributedCustomizable workloads
sysbenchMulti-DBLua scriptable benchmarks

Query Profiling Tools

DatabaseCommandOutput
PostgreSQLEXPLAIN ANALYZEActual execution times, row counts, buffers
MySQLEXPLAIN ANALYZE (8.0.18+)Cost and actual times
SQLiteEXPLAIN QUERY PLANIndex usage, scan types
SQL ServerQuery StoreHistorical execution statistics

PostgreSQL example:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM trades
WHERE symbol = 'AAPL' AND timestamp > 1234567890;

-- Output shows:
-- Estimated rows vs actual rows (stale stats if mismatch)
-- Shared buffers hit (cache effectiveness)
-- Execution time per node

Real-World Case Studies

1200X Improvement Through Indexing

System: Bencher SQLite database Problem: Queries taking 38+ seconds Solution: Added composite indexes matching query patterns, materialized view for expensive aggregations Result: Sub-second queries (1200X improvement)

Lesson: Optimization gains appear at scale. Same database with small data showed negligible differences.

794X Initialization Speedup

System: Trading system database manager Problem: 6.6-second initialization blocking 37 features Root cause: information_schema queries scanning hundreds of tables Solution: Replaced with CREATE TABLE IF NOT EXISTS Result: 6644ms → 8.2ms (794X improvement)

Impact: $1,000/day opportunity cost eliminated, 37 trading features restored

Uber MySQL 8.0 Fleet Upgrade

Scale: 2,100+ clusters, 3M queries/second Results:

  • 29% faster inserts
  • 33% faster reads
  • 47% faster updates
  • 78% faster client queries
  • 94% reduced lock time

Lesson: Even at extreme scale, database upgrades + query optimization provide substantial gains.

SQLite-on-Rails Production

Evolution: Failed at 4 concurrent requests → handles 16+ with stable P99 latency Keys:

  • BEGIN IMMEDIATE for write locks
  • Custom busy_handler releasing Ruby GVL
  • WAL mode for concurrent reads

Lesson: Configuration alone can enable production viability.


Optimization Workflow

1. Measure Baseline

import time

start = time.perf_counter()
# Execute operation
elapsed = time.perf_counter() - start
print(f"Baseline: {elapsed*1000:.1f}ms")

2. Profile Bottlenecks

# Instrument each component
connection_time = measure(lambda: sqlite3.connect(db))
schema_time = measure(lambda: init_schema())
query_time = measure(lambda: execute_query())

# Identify the bottleneck (80/20 rule)

3. Apply Optimizations

Priority order:

  1. Configuration (WAL mode, PRAGMAs)
  2. Indexes (composite, covering)
  3. Batching (transactions, bulk ops)
  4. Pooling (connections)
  5. Schema (WITHOUT ROWID, STRICT)

4. Verify Improvement

from scipy import stats

baseline = [measure_baseline() for _ in range(100)]
optimized = [measure_optimized() for _ in range(100)]

t_stat, p_value = stats.ttest_ind(baseline, optimized)
improvement = (np.mean(baseline) - np.mean(optimized)) / np.mean(baseline)

print(f"Improvement: {improvement*100:.1f}%")
print(f"Statistical significance: p={p_value:.6f}")

5. Monitor Continuously

# Periodic health checks
def monitor_database():
    metrics = {
        "query_p99": measure_p99_latency(),
        "connection_pool_utilization": pool.active / pool.size,
        "cache_hit_ratio": get_cache_hit_ratio(),
        "lock_wait_time": get_lock_wait_time()
    }

    for metric, value in metrics.items():
        if exceeds_threshold(metric, value):
            alert(f"{metric} = {value}")

Conclusion

Database performance optimization follows a clear hierarchy: configuration (WAL mode, PRAGMAs) provides 2-20X gains with minimal effort; proper indexing (composite, covering) delivers 10-1000X improvements; batching operations (transactions, bulk load) achieves 10-100X speedup; connection pooling multiplies throughput 10-50X. These techniques compound—applying all systematically yields orders of magnitude improvement.

The evidence shows technology choice matters less than discipline. SQLite with correct configuration handles 100,000+ queries/second. Systems achieve 1200X speedups through indexing alone. The key barriers are knowledge and measurement rigor, not database limitations.

Optimization workflow: measure baseline, profile bottlenecks, apply highest-leverage optimizations first, verify improvements statistically, monitor continuously. No guessing—let profiler data drive decisions.

Time-series and financial systems benefit from specialized patterns: integer timestamps, WITHOUT ROWID tables, partial indexes on recent data, covering indexes for hot queries. These techniques enable millisecond queries on terabytes when applied correctly.

The path forward: enable WAL mode today (2-20X improvement), add composite indexes to query patterns (orders of magnitude), implement batching for bulk operations (10-100X), establish connection pooling (10-50X), measure everything continuously. Each optimization multiplies the previous gains.


Continue the series: