Knobs · adjust the inputs
Request arrival rate
161 req/s
How often clients spawn new requests. The offered load on the system.
One-way network delay
50 ms
Pure travel time on the wire, one way. The minimum possible RTT is
2 × delay.
CPU time per request
66 ms
How long a worker holds the request to actually do work. CPU-bound part of latency.
Pipe bandwidth
20 Mbps
Maximum simultaneous bytes-in-flight the pipe can carry. Caps throughput regardless of how many workers you have.
Servers (behind LB)
horizontal scaling
3 hosts
Horizontal scaling (scale out). Add more app hosts behind a round-robin load balancer. Each host has its own CPU + RAM.
Workers per server
vertical scaling
4 procs
Vertical scaling (scale up). App worker processes per server. Each is a separate CPU lane and ~220 MB of RAM (Puma cluster / Unicorn model).
Threads per worker
vertical scaling
4 slots
Concurrency within a worker. Only one thread per worker runs CPU at a time (the GVL). Extra threads pay off only when a request blocks on I/O: while one thread waits on the DB, another runs CPU. With zero DB time they add nothing; with I/O they lift throughput toward the CPU ceiling.
Max queue depth (backpressure)
50 requests
Hard cap on requests waiting for a worker. Above the cap, the LB returns
503 Service Unavailable immediately — backpressure to protect downstream from overload.
DB connection pool
24 conns
Shared Postgres
max_connections. A thread doing I/O checks out a connection; when the pool is exhausted, queries queue at the DB — and a blocking migration can pile them up.
DB query time (I/O)
40 ms
Mean time a request waits on the database (jittered per query). During this wait the thread releases the GVL, so another thread in the same worker can run CPU — this is what makes threads pay off.
Database event
Inject a DB event: a row lock serialises writes; a blocking CREATE INDEX takes a lock that stalls writes until it finishes (connections pile up, RAM climbs); CONCURRENTLY builds without blocking but runs queries slower.
Flow visualizer
N requests in flight
Latency
—ms
RTT for one request
Throughput
—req/s
completed / second
Bandwidth
—% used
— of 20 Mbps
Concurrency
—
in-flight requests
Parallelism
—
/ — workers busy
CPU
—%
mean across servers
RAM
—MB
/ 4096 MB cap
503s
—
rejected requests
Little's Law:
Concurrency = Throughput × Latency
— ≈ — × —