Part 1 of 2 in the Database Performance Optimization series
Database performance stems from configuration, query patterns, and indexing discipline—not technology choice. SQLite with correct PRAGMA settings handles 100,000+ queries/second in production. Systems achieve 1200X improvements through strategic indexing and 20X gains from enabling Write-Ahead Logging. This guide synthesizes production data from high-frequency trading systems, Uber’s 3M queries/second fleet, and comprehensive benchmarking to provide actionable optimization strategies.
The highest-leverage optimizations, in order: enable WAL mode (2-20X improvement), implement composite indexes (orders of magnitude), batch operations (10-100X), use connection pooling (10-50X), minimize lock contention, eliminate N+1 queries (10X), cache schema metadata (avoid 30X slowdowns).
Quick Reference
Optimization | Impact | Effort | Priority |
---|
Enable WAL mode | 2-20X | 1 line | Critical |
Add composite indexes | 10-1000X | Low-Medium | Critical |
Batch operations | 10-100X | Medium | High |
Connection pooling | 10-50X | Medium | High |
Eliminate N+1 queries | 10X | Medium | High |
Cache schema metadata | 30X | Low | High |
Short transactions (<100ms) | Prevents deadlocks | Medium | High |
Use covering indexes | 50-90% I/O reduction | Medium | Medium |
Optimize PRAGMAs | 10-30% | Low | Medium |
SQLite Production Configuration
Essential PRAGMA Settings
Execute immediately when opening any connection:
connection.executescript("""
PRAGMA journal_mode = WAL; -- Critical: 2-20X improvement
PRAGMA synchronous = normal; -- Reduces fsync overhead
PRAGMA cache_size = -64000; -- 64MB memory cache
PRAGMA temp_store = memory; -- Keep temp tables in RAM
PRAGMA mmap_size = 30000000000; -- 30GB mmap (10-30% read boost)
PRAGMA busy_timeout = 5000; -- Wait 5s for locks
PRAGMA foreign_keys = ON; -- Enable FK constraints
PRAGMA journal_size_limit = 67108864; -- Cap WAL at 64MB
""")
Configuration Impact Table
PRAGMA | Effect | When to Use |
---|
journal_mode = WAL | Concurrent reads during writes | Always in production |
synchronous = normal | Reduces transaction latency from 30ms+ to <1ms | With WAL mode |
cache_size = -64000 | 64MB cache reduces disk I/O | Memory available |
mmap_size | 10-30% read performance gain | 64-bit systems with RAM |
busy_timeout = 5000 | Automatic retry on locks | High concurrency |
Memory-Mapped I/O Guidelines
Platform | Recommended mmap_size | Rationale |
---|
64-bit server | 30GB | Reserves virtual address space, not RAM |
Mobile/embedded | 256MB | Limited virtual address space |
32-bit systems | 128MB | Address space constraints |
Network filesystems | 0 (disabled) | Incompatible with remote FS |
Key insight: mmap reserves virtual address space. OS manages physical RAM based on access patterns. No RAM commitment upfront.
Query Planner Optimization
# Before closing connections
connection.execute("PRAGMA optimize")
# For long-running applications (periodic)
connection.execute("PRAGMA optimize=0x10002")
# Verify index usage
cursor.execute("EXPLAIN QUERY PLAN SELECT ...")
Interpret EXPLAIN output:
SCAN
= full table scan ❌
SEARCH
= index used âś…
COVERING INDEX
= optimal (no table lookup) âś…âś…
Schema Design for Time-Series Data
Optimal Table Structure
CREATE TABLE market_data (
symbol TEXT NOT NULL,
timestamp_us INTEGER NOT NULL, -- Microsecond precision
price REAL NOT NULL,
volume INTEGER NOT NULL,
bid_price REAL,
ask_price REAL,
PRIMARY KEY (symbol, timestamp_us)
) WITHOUT ROWID, STRICT;
-- Composite index: equality before range
CREATE INDEX idx_symbol_time
ON market_data(symbol, timestamp_us DESC);
-- Covering index for common queries
CREATE INDEX idx_symbol_time_price_vol
ON market_data(symbol, timestamp_us DESC, price, volume);
-- Partial index for recent data (hot path)
CREATE INDEX idx_recent_data
ON market_data(symbol, timestamp_us DESC, price, volume)
WHERE timestamp_us > unixepoch('now', '-24 hours');
Schema Optimization Techniques
Technique | Benefit | Trade-off |
---|
INTEGER timestamps | Efficient B-tree indexing, no float precision issues | Must convert from datetime |
WITHOUT ROWID | Saves 8 bytes/row, eliminates internal rowid | Only when PK uniquely identifies row |
STRICT (3.37.0+) | Type checking, better query plans | Requires explicit types |
NOT NULL | Query optimizer benefits, smaller storage | Must provide values |
Partial indexes | Faster queries, less maintenance overhead | Only helps matching queries |
Covering indexes | 2-5X query speedup, no table lookups | 10-30% write overhead |
Anti-Patterns to Avoid
Anti-Pattern | Problem | Solution |
---|
EAV (Entity-Attribute-Value) | Destroys query performance | Proper columns or JSON for dynamic data |
Comma-separated lists | Un-indexable, un-queryable | Junction tables with foreign keys |
Polymorphic associations | Ambiguous foreign keys | Separate FKs or inheritance patterns |
No primary key | Downstream problems, update issues | Always define explicit PK |
TEXT for numeric data | Type conversion overhead | Use INTEGER/REAL |
Index Design Principles
Composite Index Rules
Leftmost prefix matching: Index on (A, B, C)
supports:
- âś…
WHERE A = ?
- âś…
WHERE A = ? AND B = ?
- âś…
WHERE A = ? AND B = ? AND C = ?
- ❌
WHERE B = ?
- ❌
WHERE C = ?
Column ordering rules:
-
Equality conditions before range conditions
-- GOOD: equality (symbol) before range (timestamp)
CREATE INDEX ON trades(symbol, timestamp DESC, price);
-- BAD: range before equality
CREATE INDEX ON trades(timestamp DESC, symbol, price);
-
Most selective columns first (when all equality)
-- If domain_id has 1000 unique values, user_id has 10000
CREATE INDEX ON events(user_id, domain_id, event_type);
-
Logical hierarchy over selectivity (when structure matters)
-- Follow natural hierarchy even if selectivity suggests otherwise
CREATE INDEX ON servers(domain_id, server_id, instance_id);
Covering Index Strategy
Include all queried columns in the index to avoid table lookups:
-- Query that will benefit
SELECT price, volume
FROM market_data
WHERE symbol = ? AND timestamp_us > ?;
-- Covering index eliminates table lookup
CREATE INDEX idx_covering
ON market_data(symbol, timestamp_us DESC, price, volume);
Cost-benefit:
- Read benefit: 50-90% I/O reduction
- Write cost: 10-30% slower inserts
- Use for: Read-heavy workloads, critical query paths
Index Decision Matrix
Scenario | Index Strategy | Rationale |
---|
High-frequency reads, rare writes | Covering indexes | Maximize read performance |
Balanced read/write | Standard composite indexes | Reasonable trade-off |
Write-heavy with range queries | Partial indexes on hot data | Reduce write overhead |
Time-series with recency bias | Partial index on recent data | Smaller, faster, less maintenance |
Sparse columns | Partial index with WHERE | Only index non-NULL values |
Real-World Example: 829X Schema Optimization
Problem: Database initialization taking 6632ms, blocking 37 trading features.
Before:
def _initialize_schema(self):
cursor.execute("""
SELECT COUNT(*) FROM information_schema.tables
WHERE table_schema = 'main'
""")
if cursor.fetchone()[0] == 0:
self._create_tables()
After:
def _initialize_schema(self):
# SQLite handles existence checks efficiently
cursor.execute("""
CREATE TABLE IF NOT EXISTS market_data (
symbol TEXT NOT NULL,
timestamp INTEGER NOT NULL,
bid_price REAL,
ask_price REAL,
PRIMARY KEY (symbol, timestamp)
)
""")
# Indexes for trading query patterns
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_market_data_symbol_time
ON market_data(symbol, timestamp DESC)
""")
Result: 6632ms → 8ms (829X improvement)
Query performance after indexing:
- Symbol data retrieval: 450ms → 12ms (37X)
- Historical trade analysis: 2.3s → 85ms (27X)
Transaction Management & Lock Contention
Isolation Level Guide
Level | Consistency | Concurrency | Use Case |
---|
READ UNCOMMITTED | Dirty reads allowed | Highest | Analytics, approximate results |
READ COMMITTED | Only committed data | High | OLTP default (recommended) |
REPEATABLE READ | Consistent reads | Medium | Reports requiring consistency |
SERIALIZABLE | Perfect consistency | Lowest | Critical operations only |
Recommendation: Use READ COMMITTED for OLTP. Only escalate when consistency requirements demand it.
Short Transaction Pattern
# BAD: Long-running transaction holding locks
def process_order(order_id):
with connection:
order = fetch_order(order_id)
validate_inventory(order) # External API call
send_confirmation_email(order) # Network I/O
update_inventory(order)
# GOOD: Short transactions, external ops outside
def process_order(order_id):
order = fetch_order(order_id)
validate_inventory(order) # Outside transaction
send_confirmation_email(order) # Outside transaction
with connection: # Transaction only for DB work
update_inventory(order)
Target: Keep transactions <100ms to minimize lock contention.
Lock-Free Fast Path Pattern
Problem: Thread contention on cursor creation (high-frequency trading).
Before:
def get_cursor(self, thread_id):
with self._lock: # Lock on EVERY access
if thread_id not in self._cursors:
self._cursors[thread_id] = self.connection.cursor()
return self._cursors[thread_id]
After (Double-Checked Locking):
def get_cursor(self, thread_id):
# Fast path: lock-free read for existing cursors
if thread_id in self._cursors:
return self._cursors[thread_id]
# Slow path: acquire lock only for new cursors
with self._lock:
if thread_id not in self._cursors:
self._cursors[thread_id] = self.connection.cursor()
return self._cursors[thread_id]
Result: 0.01ms → 0.00ms for cached cursors. 200,000 cursors/sec throughput.
Deadlock Prevention
Rule: Access objects in consistent order across all transactions.
# BAD: Inconsistent lock ordering creates deadlocks
# Transaction 1: locks A then B
# Transaction 2: locks B then A
# GOOD: Consistent ordering
def transfer(from_acct, to_acct, amount):
# Always lock lower account ID first
accounts = sorted([from_acct, to_acct], key=lambda a: a.id)
with transaction:
lock_account(accounts[0])
lock_account(accounts[1])
perform_transfer(from_acct, to_acct, amount)
Hot Row Solutions
Problem: Counter increments serialize all updates.
Solution | Trade-off |
---|
Batch updates in memory, flush periodically | Eventual consistency, potential loss on crash |
Shard counters across multiple rows | Must aggregate at read time |
Optimistic locking with version columns | Retry overhead on contention |
Event sourcing append-only log | Storage overhead, aggregation cost |
Batch Operations & Bulk Loading
Operation | Time | Throughput | Speedup |
---|
Individual INSERTs (auto-commit) | 120s | 167 rows/s | Baseline |
Batched INSERTs (manual commit) | 1s | 20,000 rows/s | 120X |
Bulk load API (COPY, SqlBulkCopy) | 0.5s | 40,000 rows/s | 240X |
Batch Insert Pattern
# BAD: Individual commits
for row in data:
cursor.execute("INSERT INTO trades VALUES (?, ?, ?)", row)
connection.commit() # Fsync on EVERY row
# GOOD: Batched with manual transaction
connection.execute("BEGIN")
for row in data:
cursor.execute("INSERT INTO trades VALUES (?, ?, ?)", row)
connection.commit() # Single fsync
# BETTER: executemany
connection.execute("BEGIN")
cursor.executemany(
"INSERT INTO trades VALUES (?, ?, ?)",
data # List of tuples
)
connection.commit()
Python sqlite3 benchmark: executemany()
is 1.7X faster than loop with execute()
(1.6s vs 2.7s for 1M rows).
Index Creation Timing
# BAD: Create indexes before bulk insert
CREATE INDEX idx_symbol ON trades(symbol);
-- Each INSERT updates the index (slow)
INSERT INTO trades ... # Repeatedfor millions of rows
# GOOD: Create indexes after bulk insert
INSERT INTO trades ... # Fast, no index maintenance
CREATE INDEX idx_symbol ON trades(symbol); # Single index build
Impact: 6-7X faster for large bulk loads.
Bulk Loading APIs
Database | Command/API | Speedup |
---|
PostgreSQL | COPY FROM | 50-100X |
MySQL | LOAD DATA INFILE | 50-100X |
SQL Server | SqlBulkCopy | 50-100X |
SQLite | executemany() + transaction | 120X |
Python sqlite3 Patterns
Thread-Safe Connection Management
import threading
import sqlite3
class DatabaseManager:
def __init__(self, db_path):
self._db_path = db_path
self._local = threading.local()
def get_connection(self):
"""Thread-local connections for concurrency."""
if not hasattr(self._local, 'connection'):
self._local.connection = sqlite3.connect(
self._db_path,
timeout=30.0,
check_same_thread=False
)
# Enable WAL on each connection
self._local.connection.execute("PRAGMA journal_mode=WAL")
return self._local.connection
# Usage
db = DatabaseManager("trades.db")
conn = db.get_connection() # Unique per thread
Why thread-local: Avoids lock contention from shared connection while leveraging WAL’s concurrent reads.
Explicit Transaction Control (Python 3.12+)
# Modern approach (Python 3.12+)
conn = sqlite3.connect("db.sqlite", autocommit=False)
# For fine-grained control
conn = sqlite3.connect("db.sqlite", autocommit=True)
conn.execute("BEGIN IMMEDIATE") # Acquire write lock immediately
try:
conn.execute("INSERT ...")
conn.execute("UPDATE ...")
conn.execute("COMMIT")
except Exception:
conn.execute("ROLLBACK")
raise
Why BEGIN IMMEDIATE
: Prevents upgrade deadlocks where multiple deferred transactions compete for write locks.
Exponential Backoff Retry
import time
import random
import sqlite3
def execute_with_retry(conn, sql, params=None, max_retries=5):
"""Retry with exponential backoff for lock contention."""
for attempt in range(max_retries):
try:
cursor = conn.execute(sql, params or [])
conn.commit()
return cursor
except sqlite3.OperationalError as e:
if "database is locked" in str(e) and attempt < max_retries - 1:
# Exponential backoff with jitter
wait = (2 ** attempt) * 0.1 # 100ms, 200ms, 400ms, 800ms, 1.6s
jitter = random.uniform(0, wait)
time.sleep(wait + jitter)
else:
raise
Do NOT retry: IntegrityError
(constraint violations), ProgrammingError
(code bugs).
Context Manager Gotcha
# WRONG: Connection stays open
with sqlite3.connect("db.sqlite") as conn:
conn.execute("INSERT ...")
# conn is STILL OPEN here!
# CORRECT: Explicitly close
conn = sqlite3.connect("db.sqlite")
try:
with conn: # Manages transactions only
conn.execute("INSERT ...")
finally:
conn.close()
# OR: Use contextlib
from contextlib import closing
with closing(sqlite3.connect("db.sqlite")) as conn:
with conn:
conn.execute("INSERT ...")
Iterator Pattern for Large Results
# BAD: Loads all results into memory
cursor.execute("SELECT * FROM large_table")
rows = cursor.fetchall() # Memory spike
for row in rows:
process(row)
# GOOD: Iterate directly
for row in conn.execute("SELECT * FROM large_table"):
process(row)
if should_stop():
break # Early termination possible
Connection Pooling
OLTP workload: connections = 10-20 per application instance
Traditional formula (for reference):
connections = (2 Ă— database_CPU_cores) + disk_spindle_count
Note: Refers to database server CPUs, not application server CPUs.
Python Connection Pool Example
from queue import Queue, Empty
import sqlite3
class ConnectionPool:
def __init__(self, db_path, pool_size=10):
self._db_path = db_path
self._pool = Queue(maxsize=pool_size)
for _ in range(pool_size):
conn = sqlite3.connect(db_path, check_same_thread=False)
conn.execute("PRAGMA journal_mode=WAL")
self._pool.put(conn)
def get_connection(self, timeout=5.0):
try:
return self._pool.get(timeout=timeout)
except Empty:
raise RuntimeError("Connection pool exhausted")
def return_connection(self, conn):
self._pool.put(conn)
def __enter__(self):
self._conn = self.get_connection()
return self._conn
def __exit__(self, *args):
self.return_connection(self._conn)
# Usage
pool = ConnectionPool("trades.db", pool_size=20)
with pool as conn:
conn.execute("INSERT ...")
PgBouncer Transaction Pooling
Multiplexing: 1,000 client connections → 25 database connections (40X)
[databases]
mydb = host=localhost dbname=mydb
[pgbouncer]
pool_mode = transaction # Assign per transaction, not per session
max_client_conn = 1000
default_pool_size = 25
Modes:
session
: One DB connection per client (traditional)
transaction
: One DB connection per transaction (40X multiplexing)
statement
: One DB connection per statement (breaks prepared statements)
Four Golden Signals
Signal | Metric | Target | Alert Threshold |
---|
Latency | P95, P99 query time | <10ms (OLTP) | P99 > 50ms |
Traffic | Queries per second | Varies | Sudden 3X spike |
Errors | Failed query % | <0.1% | >1% failed |
Saturation | Connection pool usage | <80% | >90% utilization |
Why percentiles over averages: P99 latency shows worst user experience, not hidden by fast queries.
Health Check Tiers
# Tier 1: Liveness (microseconds)
def liveness_check():
"""TCP connectivity only."""
return {"status": "ok"}
# Tier 2: Local health (milliseconds)
def local_health():
"""Server-specific validation."""
start = time.perf_counter()
cursor = get_cursor()
cursor.execute("SELECT 1")
latency_ms = (time.perf_counter() - start) * 1000
return {
"status": "healthy" if latency_ms < 1.0 else "degraded",
"latency_ms": latency_ms,
"active_cursors": len(self._cursors)
}
# Tier 3: Dependency health (seconds)
def deep_health():
"""Validate external dependencies."""
db_health = local_health()
api_health = check_external_api()
return {
"database": db_health,
"api": api_health,
"status": "healthy" if all_healthy() else "unhealthy"
}
Pattern: Fail-open, not fail-closed. When all servers report unhealthy due to shared dependency failure, continue routing to all servers.
Query Profiling Workflow
# Step 1: Identify slow queries
cursor.execute("EXPLAIN QUERY PLAN " + query)
for row in cursor:
print(row)
# Step 2: Check for table scans
# Look for "SCAN" in output (bad)
# Look for "SEARCH" or "COVERING INDEX" (good)
# Step 3: Run ANALYZE to update statistics
cursor.execute("ANALYZE")
# Step 4: Re-check query plan
cursor.execute("EXPLAIN QUERY PLAN " + query)
# Step 5: Add missing indexes based on plan
cursor.execute("CREATE INDEX idx_missing ON table(column)")
Sampling Strategy
Problem: Tracing 100% of queries adds significant overhead.
Solution: Sample 1-5% of queries for profiling.
import random
def should_trace():
return random.random() < 0.05 # 5% sampling
def execute_query(sql):
if should_trace():
start = time.perf_counter()
result = cursor.execute(sql)
duration = time.perf_counter() - start
log_slow_query(sql, duration)
return result
else:
return cursor.execute(sql)
Benefit: Statistical insights with 95% less overhead.
Common Anti-Patterns
Anti-Pattern | Symptom | Impact | Solution | Improvement |
---|
Schema metadata queries in hot path | information_schema in request processing | 30X slowdown | Cache schema at startup | 30X faster |
N+1 queries | 1 parent query + N child queries | 10X slower | JOIN or IN clause | 10X faster |
Individual inserts in loop | Slow bulk operations | 100X slower | Batch + transaction | 100X faster |
Missing indexes | Full table scans | Minutes vs milliseconds | Add composite indexes | 1000X+ faster |
Over-indexing | Write degradation | 50-100X slower writes | Remove unused indexes | 50-100X faster writes |
Long transactions | Lock contention | Cascading delays | Keep <100ms | Prevents deadlocks |
Hot row contention | Serialized updates | Throughput limits | Shard or batch | 10X+ throughput |
No connection pooling | Connection overhead | 10-50X slower | Implement pooling | 10-50X faster |
N+1 Query Example
# BAD: N+1 queries (1 + 1000 = 1001 queries)
posts = db.execute("SELECT * FROM posts WHERE user_id = ?", [user_id])
for post in posts:
# Executes once per post
comments = db.execute("SELECT * FROM comments WHERE post_id = ?", [post.id])
# GOOD: Single query with JOIN
posts_with_comments = db.execute("""
SELECT p.*, c.*
FROM posts p
LEFT JOIN comments c ON p.id = c.post_id
WHERE p.user_id = ?
""", [user_id])
Measurement: 800 items across 17 categories: 100ms (JOIN) vs >1s (N+1).
UPSERT Anti-Pattern
# BAD: Race condition between check and insert
result = db.execute("SELECT id FROM users WHERE email = ?", [email])
if result.fetchone():
db.execute("UPDATE users SET name = ? WHERE email = ?", [name, email])
else:
db.execute("INSERT INTO users (email, name) VALUES (?, ?)", [email, name])
# GOOD: Database-native UPSERT
# SQLite
db.execute("""
INSERT INTO users (email, name) VALUES (?, ?)
ON CONFLICT(email) DO UPDATE SET name = excluded.name
""", [email, name])
# PostgreSQL
db.execute("""
INSERT INTO users (email, name) VALUES (%s, %s)
ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name
""", [email, name])
# MySQL
db.execute("""
INSERT INTO users (email, name) VALUES (%s, %s)
ON DUPLICATE KEY UPDATE name = VALUES(name)
""", [email, name])
Benchmarking & Measurement
Statistical Requirements
Requirement | Specification | Rationale |
---|
Iterations | 30-100+ runs | Detect statistical significance |
Confidence level | p < 0.05 | 95% confidence differences are real |
Metrics | P50, P90, P99, P99.9 | Tail latency matters |
Warm-up | 5-10 iterations discarded | Allow caches to stabilize |
Consistency | Same hardware, load, time | Eliminate confounding factors |
Statistical Tests
from scipy import stats
# Two-sample t-test (for normally distributed data)
t_stat, p_value = stats.ttest_ind(baseline_times, optimized_times)
if p_value < 0.05:
print(f"Statistically significant improvement (p={p_value:.4f})")
# Mann-Whitney U test (non-parametric)
u_stat, p_value = stats.mannwhitneyu(baseline_times, optimized_times)
# Wilcoxon signed-rank (paired comparison)
w_stat, p_value = stats.wilcoxon(before_times, after_times)
Tool | Database | Use Case |
---|
pgbench | PostgreSQL | TPC-B workload, custom scripts |
mysqlslap | MySQL | Concurrency simulation |
HammerDB | Multi-DB | TPC-C, TPC-H benchmarks |
YCSB | NoSQL/distributed | Customizable workloads |
sysbench | Multi-DB | Lua scriptable benchmarks |
Database | Command | Output |
---|
PostgreSQL | EXPLAIN ANALYZE | Actual execution times, row counts, buffers |
MySQL | EXPLAIN ANALYZE (8.0.18+) | Cost and actual times |
SQLite | EXPLAIN QUERY PLAN | Index usage, scan types |
SQL Server | Query Store | Historical execution statistics |
PostgreSQL example:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM trades
WHERE symbol = 'AAPL' AND timestamp > 1234567890;
-- Output shows:
-- Estimated rows vs actual rows (stale stats if mismatch)
-- Shared buffers hit (cache effectiveness)
-- Execution time per node
Real-World Case Studies
1200X Improvement Through Indexing
System: Bencher SQLite database
Problem: Queries taking 38+ seconds
Solution: Added composite indexes matching query patterns, materialized view for expensive aggregations
Result: Sub-second queries (1200X improvement)
Lesson: Optimization gains appear at scale. Same database with small data showed negligible differences.
794X Initialization Speedup
System: Trading system database manager
Problem: 6.6-second initialization blocking 37 features
Root cause: information_schema
queries scanning hundreds of tables
Solution: Replaced with CREATE TABLE IF NOT EXISTS
Result: 6644ms → 8.2ms (794X improvement)
Impact: $1,000/day opportunity cost eliminated, 37 trading features restored
Uber MySQL 8.0 Fleet Upgrade
Scale: 2,100+ clusters, 3M queries/second
Results:
- 29% faster inserts
- 33% faster reads
- 47% faster updates
- 78% faster client queries
- 94% reduced lock time
Lesson: Even at extreme scale, database upgrades + query optimization provide substantial gains.
SQLite-on-Rails Production
Evolution: Failed at 4 concurrent requests → handles 16+ with stable P99 latency
Keys:
BEGIN IMMEDIATE
for write locks
- Custom busy_handler releasing Ruby GVL
- WAL mode for concurrent reads
Lesson: Configuration alone can enable production viability.
Optimization Workflow
1. Measure Baseline
import time
start = time.perf_counter()
# Execute operation
elapsed = time.perf_counter() - start
print(f"Baseline: {elapsed*1000:.1f}ms")
2. Profile Bottlenecks
# Instrument each component
connection_time = measure(lambda: sqlite3.connect(db))
schema_time = measure(lambda: init_schema())
query_time = measure(lambda: execute_query())
# Identify the bottleneck (80/20 rule)
3. Apply Optimizations
Priority order:
- Configuration (WAL mode, PRAGMAs)
- Indexes (composite, covering)
- Batching (transactions, bulk ops)
- Pooling (connections)
- Schema (WITHOUT ROWID, STRICT)
4. Verify Improvement
from scipy import stats
baseline = [measure_baseline() for _ in range(100)]
optimized = [measure_optimized() for _ in range(100)]
t_stat, p_value = stats.ttest_ind(baseline, optimized)
improvement = (np.mean(baseline) - np.mean(optimized)) / np.mean(baseline)
print(f"Improvement: {improvement*100:.1f}%")
print(f"Statistical significance: p={p_value:.6f}")
5. Monitor Continuously
# Periodic health checks
def monitor_database():
metrics = {
"query_p99": measure_p99_latency(),
"connection_pool_utilization": pool.active / pool.size,
"cache_hit_ratio": get_cache_hit_ratio(),
"lock_wait_time": get_lock_wait_time()
}
for metric, value in metrics.items():
if exceeds_threshold(metric, value):
alert(f"{metric} = {value}")
Conclusion
Database performance optimization follows a clear hierarchy: configuration (WAL mode, PRAGMAs) provides 2-20X gains with minimal effort; proper indexing (composite, covering) delivers 10-1000X improvements; batching operations (transactions, bulk load) achieves 10-100X speedup; connection pooling multiplies throughput 10-50X. These techniques compound—applying all systematically yields orders of magnitude improvement.
The evidence shows technology choice matters less than discipline. SQLite with correct configuration handles 100,000+ queries/second. Systems achieve 1200X speedups through indexing alone. The key barriers are knowledge and measurement rigor, not database limitations.
Optimization workflow: measure baseline, profile bottlenecks, apply highest-leverage optimizations first, verify improvements statistically, monitor continuously. No guessing—let profiler data drive decisions.
Time-series and financial systems benefit from specialized patterns: integer timestamps, WITHOUT ROWID tables, partial indexes on recent data, covering indexes for hot queries. These techniques enable millisecond queries on terabytes when applied correctly.
The path forward: enable WAL mode today (2-20X improvement), add composite indexes to query patterns (orders of magnitude), implement batching for bulk operations (10-100X), establish connection pooling (10-50X), measure everything continuously. Each optimization multiplies the previous gains.
Continue the series: