LLM model scaling — real-time decode flow · VRAM ↔ RAM ↔ SSD

tick 0
×1.0
Setup
Model
Quant
Context
GPU
GPUs
System RAM
SSD
mmap from SSD