vlm-run/vlmbench

Benchmark any VLM on your hardware, in one command.

A single-file VLM benchmark CLI. Auto-detects your platform, starts the right backend, and prints reproducible throughput, TTFT, and TPOT results as JSON.

vlmbench · leaderboardv0.5.0

NVIDIA RTX PRO 6000 · vLLM 0.15.1

2,439.8tok/s

Peak throughput · LightOnOCR-2-1B

  • lightonai/LightOnOCR-2-1B2,439.8
    1.4 s · 22 ms
  • Qwen/Qwen3-VL-2B-Instruct2,409.3
    440 ms · 14 ms
  • PaddlePaddle/PaddleOCR-VL2,341.9
    6.4 s · 49 ms
  • deepseek-ai/DeepSeek-OCR1,195.8
    3.6 s · 16 ms
  • Qwen/Qwen3-VL-8B-Instruct953.8
    448 ms · 26 ms

tok/s · ttft · tpot · 49 runs across 7 models

Reproducible VLM benchmarks, without the boilerplate.

The same command works across Ollama on macOS, vLLM on Linux (Docker or native), and any cloud OpenAI-compatible endpoint. Zero config changes, identical metrics.

Zero-config runs

Auto-detects platform and backend. uvx vlmbench run just works on macOS Ollama or Linux vLLM.

Multi-backend

Ollama, vLLM (Docker and native), and any OpenAI-compatible server, including Orion.

Concurrency sweeps

Benchmark across concurrency levels (--concurrency 4,8,16,32,64) to find your peak throughput in one run.

HuggingFace datasets

Run directly against HF datasets (hf://vlm-run/FineVision-vlmbench-mini), images or text-only.

Rich metrics

Throughput, tokens/sec, TTFT, TPOT, latency percentiles, VRAM, and reliability, all measured.

Leaderboard-ready

Export JSON, compare models, and contribute to the public leaderboard of VLM performance.

12345678910111213141516
# Local: benchmark a model with Ollama on macOS
uvx vlmbench run -m qwen3-vl:2b -i ./images/

# Linux + vLLM Docker (auto-starts with --gpus all)
uvx vlmbench run -m Qwen/Qwen3-VL-2B-Instruct -i ./images/

# HuggingFace dataset + concurrency sweep
uvx vlmbench run -m Qwen/Qwen3-VL-8B-Instruct \
    -d hf://vlm-run/FineVision-vlmbench-mini \
    --max-samples 64 \
    --concurrency 4,8,16,32,64

# Benchmark a cloud API (OpenAI-compatible)
uvx vlmbench run -m Qwen/Qwen3-VL-2B-Instruct -i ./images/ \
    --base-url https://api.openai.com/v1 \
    --api-key $OPENAI_API_KEY

Quick Start

One command, reproducible results.

Benchmark VLMs on your hardware: locally with Ollama/vLLM, native or Docker, or with any OpenAI-compatible server.

  • No install required: uvx vlmbench just works
  • Supports Ollama, vLLM, and OpenAI-compatible APIs
  • Concurrency sweeps and percentile metrics out of the box
  • MIT-licensed, contribute benchmarks to the leaderboard

Benchmark any VLM in one command.