Concurrency battle! Which language wins?

Every language claims it can “do many things at once.” Almost none of them mean the same thing by it. Some run code on every core simultaneously; some only look like they do; one of our contenders has no idea what a thread even is and somehow still ships to a billion devices a day.

Before the bell rings, the one distinction that decides the whole fight:

A language can be brilliant at concurrency and terrible at parallelism (hello Ruby, Python, JavaScript). That’s not always a flaw: most web work is I/O-bound — waiting on a database, a network call, a disk — and during that wait there’s no CPU work to parallelise anyway.

Concurrency interleaves tasks on one core; parallelism runs them on several at once Concurrency — one core, time-sliced core 1 ABA CBC one task makes progress, yields, another resumes — they overlap in time, not in space

Parallelism — three cores at once core 1core 2core 3 task A — runningtask B — runningtask C — running

Concurrency is about dealing with many things; parallelism is about doing many things. The fight below is really "how much of each, and how safely?"

The scorecard at a glance

LanguageConcurrency modelTrue CPU parallelism in one process?Core primitiveShared mutable state?Sweet spot
Ruby (MRI)Threads under a GVL; Ractors/processes for parallelism threads · Ractors/processesThread, Fiber, RactorYes (but GVL hides many races)I/O-bound web apps
Python (CPython)Threads under the GIL; asyncio; multiprocessing classic · free-threaded 3.13+/processesthreading, asyncio, multiprocessingYesI/O-bound, scripting, data (C ext)
Elixir (BEAM)Actor model — preemptive lightweight processes scheduler per coreprocess + message passingNo — share-nothingMillions of connections, fault-tolerance
Java (JVM)1:1 OS threads + shared memory; virtual threads (Loom)Thread, virtual threads, java.util.concurrentYes (JMM, locks)CPU-bound + high-throughput servers
C++Native OS threads, manual everythingstd::thread, std::async, atomics, coroutinesYes (a data race is undefined behaviour)Maximum performance, systems, low latency
JavaScriptSingle-thread event loop + async I/O; Workers in one thread · via WorkersPromise/async, Worker, SharedArrayBufferNo in one thread (message-passing Workers)I/O-bound, UIs, glue code
HTML 😄None. It is a markup language. (it does not compute)<marquee>, arguablyNothing to shareWinning by refusing to play

↔ scroll the table sideways to see every column.

Now the contenders, one at a time.

Ruby — one pass through the GVL

MRI Ruby has a Global VM Lock: only one thread can execute Ruby code at a time, no matter how many cores you own. The lock exists to keep the interpreter’s internals (object allocation, GC, C extensions) safe without fine-grained locking everywhere.

Four Ruby threads, but the GVL lets only one run CPU at a time thread 1 · CPU thread 2 · I/O thread 3 · ready thread 4 · ready 🔒 GVL one ticket at a time 1 CPU core busy: thread 1 thread 2 is on I/O → GVL released → next runs
The GVL is released during blocking I/O, so threads still overlap database and network waits — which is exactly what a web request spends most of its time doing. For CPU parallelism you reach for processes (Puma cluster, Sidekiq) or experimental Ractors; JRuby and TruffleRuby drop the GVL entirely.

Verdict: great at I/O concurrency, no in-process CPU parallelism. The “win” is that for typical Rails workloads you rarely notice — you scale CPU with processes, not threads.

Python — the GIL’s near-identical twin

CPython’s Global Interpreter Lock is the GIL to Ruby’s GVL: one bytecode-executing thread at a time, released around I/O and inside many C extensions (NumPy releases it during heavy math, which is why scientific Python feels parallel). The story is so similar that the diagram above could be relabelled and reused.

Python: GIL serialises threads; multiprocessing forks real parallel processes threading — one GIL run wait 🔒 wait 🔒 → 1× CPU regardless of thread count (classic builds) multiprocessing — real parallelism proc · core 1 proc · core 2 proc · core 3 each process has its own GIL → N× CPU, but pay IPC + memory The 2024+ twist: PEP 703 ships an optional free-threaded build (3.13+) that removes the GIL — true threaded parallelism, at some single-thread overhead. The decade-old footnote is finally changing.
Same lock, same escape hatches: asyncio for I/O concurrency, multiprocessing for CPU parallelism, C extensions that drop the lock — and now an experimental no-GIL interpreter.

Verdict: ties Ruby. The differentiator is momentum — the free-threaded build means Python may quietly leave this weight class.

Elixir — a scheduler on every core, a process for everything

Now it gets interesting. Elixir runs on the BEAM (the Erlang VM), built for telephone switches that must never go down. Its concurrency unit is the process — not an OS process, not a thread, but a featherweight green process costing ~a few KB. You spawn millions. Each has its own heap (so GC is per-process and never stops the world), they share nothing, and they talk only by sending immutable messages to each other’s mailboxes.

BEAM runs one scheduler per core, each preemptively juggling many lightweight processes scheduler · core 1 ~thousands of procs scheduler · core 2 ~thousands of procs scheduler · core 3 ~thousands of procs message passing immutable copies → no shared memory no data races Preemptive scheduler: no single process can hog a core — the VM reschedules every ~2000 reductions. One crashing process is isolated and restarted by a supervisor. "Let it crash."
Share-nothing + preemptive scheduling + per-process heaps = concurrency that scales to millions of connections and shrugs off crashes. The model that WhatsApp and Discord lean on for exactly this reason.

Verdict: the heavyweight champion of concurrency. It won’t out-crunch C++ on a tight numeric loop, but for “hold a million live connections and never fall over,” nothing here is close.

Java — real threads, shared memory, and now virtual threads

The JVM gives you honest 1:1 OS threads: schedule them on every core, get genuine parallelism. The price is shared mutable memory, governed by the Java Memory Model — you coordinate with synchronized, volatile, locks, and the excellent java.util.concurrent toolbox (thread pools, concurrent collections, CompletableFuture). Powerful, fast, and absolutely able to deadlock or race if you’re careless.

Java platform threads map 1:1 to cores; virtual threads multiplex many onto few carriers Platform threads — 1:1 on cores (true parallelism) thread → core 1 thread → core 2 thread → core 3 shared heap + locks JMM · synchronized · volatile · j.u.c Virtual threads (Project Loom, Java 21+) millions of virtual threads few carrier threads park on I/O for free → blocking code, async scale
Virtual threads let you write straightforward blocking code that scales like async — the runtime unmounts a virtual thread from its carrier whenever it blocks. Java effectively bought Elixir's concurrency ergonomics while keeping shared-memory parallelism.

Verdict: the best all-rounder. Real parallelism and, since Loom, cheap massive concurrency. The catch is the one constant of shared-memory threading: you can still write the bug.

C++ — maximum power, zero seatbelts

std::thread is a thin wrapper over an OS thread. No GVL, no GIL, no runtime, no garbage collector pausing you — your threads hit the metal on every core. You get atomics, mutexes, condition variables, a formal memory model (since C++11), and coroutines (C++20). You also get to personally guarantee there are no data races, because a data race in C++ is undefined behaviour — not a crash, not an exception, but “the compiler may do anything.”

C++ threads go straight to cores with no runtime, and you own all the safety std::thread std::thread std::thread bare cores no runtime · no GC ⚠ you own all of it lifetimes · mutexes · memory ordering data race = undefined behaviour fastest path here — and the sharpest
If the workload is CPU-bound and every nanosecond counts — trading engines, game engines, codecs — C++ wins on raw throughput. Modern tooling (ThreadSanitizer, RAII locks, std::jthread) softens the edges, but the safety is on you.

Verdict: the raw-power champion. Highest ceiling, lowest guard rails. You win the benchmark and accept the responsibility.

JavaScript — one thread, an event loop, and zero data races

JavaScript runs your code on a single thread. It feels concurrent because almost nothing blocks: I/O is handed off (to the browser’s Web APIs, or libuv in Node), and when it finishes, a callback is queued and the event loop picks it up between turns. Promises and async/await are sugar over this queue. Because there’s only one thread touching your variables, data races on shared state are impossible by construction — a genuinely underrated win.

JavaScript's single-threaded event loop offloads I/O and drains callback queues call stack 1 thread runs to completion await fetch / fs / timer async I/O Web APIs / libuv callback / microtask queue event loop drains it between turns need real parallelism? Web Workers / worker_threads separate isolates, own loop talk by postMessage() (SharedArrayBuffer + Atomics)
One thread keeps your logic race-free; offloaded I/O keeps it responsive. For CPU-bound work you spin up Workers — which, like Elixir, communicate by message passing rather than shared memory.

Verdict: punches far above its single thread for I/O-bound and UI work, with a safety guarantee the shared-memory languages envy. It just can’t crunch numbers on every core without leaving the comfort of one event loop.

HTML — the undefeated champion (by forfeit)

HTML is not a programming language. It has no variables, no loops, no threads, no locks. It cannot deadlock. It has shipped exactly zero race conditions in its entire history.

HTML has no concurrency model and therefore no concurrency bugs <div>hello</div> just sits there. perfectly. 🏆 0 threads · 0 locks · 0 data races · 0 incidents at 3am (the browser does the concurrent parsing, layout and painting for you)
The only contender that wins by never entering the ring. There's a serious point hiding in the joke: the most reliable concurrent code is the concurrency you don't write — push it down into a runtime (BEAM), a browser, or a database that has already solved it.

So who wins?

There’s no single belt — it depends on what you’re fighting for. Plotting raw CPU-parallel power against how safe and ergonomic the concurrency is:

Languages plotted by CPU-parallel power versus concurrency safety and ergonomics safe / ergonomic ↑ → raw CPU parallelism Elixir Java C++ JavaScript Ruby Python HTML* safe but single-core-bound parallel but you-break-it-you-buy-it *HTML is off the scale: maximum safety, zero computation — it isn't really competing.
No universal winner — pick the corner that matches the job.

A cheat sheet for picking your fighter:

The honest conclusion: most “concurrency problems” in everyday backends are really I/O-overlap problems, and almost every language here handles those fine. True CPU parallelism is the rarer need — and the moment you reach for it, the language’s model (shared memory vs. share-nothing) matters far more than its raw speed.

See it in motion. Web servers are where most of us actually meet this fight: processes (workers) for CPU parallelism, threads for I/O overlap, a connection pool to the database, all behind a load balancer. Tune the worker and thread counts and watch throughput, the GVL, and the database queue react live in the Scaling Web Servers simulator →.

← os cs fundamentals