Introducing the VLM Run Playground

Nov 12, 2025

Sudeep Pillai

We're excited to launch the VLM Run Playground, a powerful new way to experience what Vision Language Models (VLMs) can do for your use-case, when specialized for your industry vertical. Whether you’re working with scanned documents in healthcare, product catalogs in retail, or hours-long B-roll in media, the playground gives you an intuitive, zero-setup interface to unlock the power of structured visual intelligence.

What You Can Do

Extracting structured outputs from visually rich content like documents, forms, presentations, and videos has traditionally required stitching together multiple OCR tools, vision APIs, and custom pipelines. This process is slow, complex, and often out of reach for teams without deep computer vision expertise.

VLM Run changes the game: With a single, unified interface, you can now parse, classify, and analyze any visual input—no engineering heavy-lifting required. The Playground brings these capabilities to your browser, so you can explore, prototype, and validate use-cases in minutes.

Structured Document Parsing

🗂️ For general documents: Convert long reports, presentations, or contracts into structured markdown and JSON using the MarkdownPage schema. Extract page-wise content, tables, figures, and preserve document hierarchy.
🧾 For invoices and financial documents: Parse all key fields—invoice numbers, dates, totals, line items, tax, and addresses—in a single call.
📄 For forms with grounding: Extract structured data from complex forms (such as patient intake, insurance, or onboarding forms) with high accuracy.
🏥 For healthcare: Extract patient information, medical history, insurance details, and referral reasons from multi-page forms and medical documents.
🔗 Learn More: Document Parsing, Visual Grounding, Healthcare Intake Forms

Video & Audio Analysis

🕓 Long-context transcription: Transcribe and segment hours-long videos or podcasts with accurate chapters and timestamps.
✂️ Scene and topic breakdowns: Generate rich, time-aligned insights from media, educational content, or product demos.
⌛ Temporal grounding: Jump to key moments or build interactive video experiences with precise start/end times.
🔗 Learn More: Video Transcription Guide, Temporal Grounding

Product Cataloging & Image Structuring

🔍 Image & document classification: Automatically tag and organize PDFs or images by type, content, or custom categories.
📦 Custom extraction schemas: Define the exact fields or tags you need — VLM Run ensures outputs conform to your schema.
🏷️ Catalog enrichment: Turn product photos into structured, searchable JSON for your e-commerce or analytics stack.
🔗 Learn More: Cataloging Images, Custom Schemas

Try It Now

The best way to understand the power of VLMs is to experience them yourself.

Whether you need to extract fields from a messy PDF, structure a product grid from an image, or analyze speaker turns in a podcast video—the Playground lets you stop guessing and start building.

👉 Launch the VLM Run Playground

Ready to integrate Visual AI into your stack?
Explore our API docs or contact our team to get started.

View all