Zero-config runs
Auto-detects platform and backend. uvx vlmbench run just works on macOS Ollama or Linux vLLM.
A single-file VLM benchmark CLI. Auto-detects your platform, starts the right backend, and prints reproducible throughput, TTFT, and TPOT results as JSON.
NVIDIA RTX PRO 6000 · vLLM 0.15.1
Peak throughput · LightOnOCR-2-1B
tok/s · ttft · tpot · 49 runs across 7 models
The same command works across Ollama on macOS, vLLM on Linux (Docker or native), and any cloud OpenAI-compatible endpoint. Zero config changes, identical metrics.
Auto-detects platform and backend. uvx vlmbench run just works on macOS Ollama or Linux vLLM.
Ollama, vLLM (Docker and native), and any OpenAI-compatible server, including Orion.
Benchmark across concurrency levels (--concurrency 4,8,16,32,64) to find your peak throughput in one run.
Run directly against HF datasets (hf://vlm-run/FineVision-vlmbench-mini), images or text-only.
Throughput, tokens/sec, TTFT, TPOT, latency percentiles, VRAM, and reliability, all measured.
Export JSON, compare models, and contribute to the public leaderboard of VLM performance.
Quick Start
Benchmark VLMs on your hardware: locally with Ollama/vLLM, native or Docker, or with any OpenAI-compatible server.