Scaling Web Servers — latency · throughput · bandwidth · concurrency · parallelism stress 60 fps

running
tick ×0.5 Hover any node for an explanation. Pick a preset on the right.
Knobs · adjust the inputs
Request arrival rate 161 req/s
How often clients spawn new requests. The offered load on the system.
One-way network delay 50 ms
Pure travel time on the wire, one way. The minimum possible RTT is 2 × delay.
CPU time per request 66 ms
How long a worker holds the request to actually do work. CPU-bound part of latency.
Pipe bandwidth 20 Mbps
Maximum simultaneous bytes-in-flight the pipe can carry. Caps throughput regardless of how many workers you have.
Servers (behind LB) horizontal scaling 3 hosts
Horizontal scaling (scale out). Add more app hosts behind a round-robin load balancer. Each host has its own CPU + RAM.
Workers per server vertical scaling 4 procs
Vertical scaling (scale up). App worker processes per server. Each is a separate CPU lane and ~220 MB of RAM (Puma cluster / Unicorn model).
Threads per worker vertical scaling 4 slots
Concurrency within a worker. Only one thread per worker runs CPU at a time (the GVL). Extra threads pay off only when a request blocks on I/O: while one thread waits on the DB, another runs CPU. With zero DB time they add nothing; with I/O they lift throughput toward the CPU ceiling.
Max queue depth (backpressure) 50 requests
Hard cap on requests waiting for a worker. Above the cap, the LB returns 503 Service Unavailable immediately — backpressure to protect downstream from overload.
DB connection pool 24 conns
Shared Postgres max_connections. A thread doing I/O checks out a connection; when the pool is exhausted, queries queue at the DB — and a blocking migration can pile them up.
DB query time (I/O) 40 ms
Mean time a request waits on the database (jittered per query). During this wait the thread releases the GVL, so another thread in the same worker can run CPU — this is what makes threads pay off.
Database event
Inject a DB event: a row lock serialises writes; a blocking CREATE INDEX takes a lock that stalls writes until it finishes (connections pile up, RAM climbs); CONCURRENTLY builds without blocking but runs queries slower.
Flow visualizer N requests in flight
Latency
ms
RTT for one request
Throughput
req/s
completed / second
Bandwidth
% used
— of 20 Mbps
Concurrency
in-flight requests
Parallelism
/ — workers busy
CPU
%
mean across servers
RAM
MB
/ 4096 MB cap
503s
rejected requests
Little's Law: Concurrency = Throughput × Latency ×