🧑🍳 Check out our guides
Accurately extract JSON from
images.
videos.
documents.
presentations.
images.
images.
videos.
documents.
presentations.
images.
Confidently integrate visual AI into your apps with our simple, unified API.
No prompt engineering ninjas required.
Confidently integrate visual AI into your apps with our simple, unified API.
No prompt engineering ninjas needed.
🧑🍳 Check out our guides
🧑🍳 Check out our guides
🧑🍳 Check out our guides
Purpose-built for Agents.
Our visual models are not designed for chat.
Instead, we built it with accuracy, latency and the ability to take reliable actions in mind.
const options = {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: '{"image":"<str>","model":"vlm-1"}'
};
fetch('https://api.vlm.run/v1/image/generate', options)
.then(response => response.json())
.then(response => console.log(response))
.catch(err => console.error(err));
const options = {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: '{"image":"<str>","model":"vlm-1"}'
};
fetch('https://api.vlm.run/v1/image/generate', options)
.then(response => response.json())
.then(response => console.log(response))
.catch(err => console.error(err));
Strongly Typed
Extract strongly-typed, validated JSON from visual content.
Confidently connect to DBs, SW automation and agentic workflows.
Extract strongly-typed, validated JSON from visual content. Confidently connect to DBs, SW automation and agentic workflows.
Extract strongly-typed, validated JSON from visual content. Confidently connect to DBs, SW automation and agentic workflows.
Fast
Incorporate fast, cost-efficient and accurate visual AI
into your workflows within minutes.
Incorporate fast, cost-efficient and accurate visual AI into your workflows within minutes.
Accurate
Fine-tune visual AI accuracy on your specific use-case with ease.
Your data advantage should benefit you.
Improve visual AI accuracy on your specific use-case with ease. Your data advantage should benefit you.
Agent-Ready
Build powerful agents on our low-latency and private visual AI infra.
Don't worry about costs ballooning up or data-leakage.
Build powerful agents on our low-latency and private visual AI infra. Don't worry about costs ballooning up or data-leakage.
Fast, Accurate, Use-case driven.
Incorporating visual AI in your app can be powerful, but making it work in production is hard.
We make it easy for you to fine-tune our APIs for your specific use-case.
Object Identification
Object Identification
Extract accurate bounding boxes over scenes and documents.
Extract accurate bounding boxes over scenes and documents.
Visual Web Scraping
Visual Web Scraping
Extract deep, visual metadata for product catalogs and content search.
Extract deep, visual metadata for product catalogs and content search.
Task-Based Pricing.
Pay for usage, get granular with billing to keep your costs in check.
Pay for usage, get granular with billing to keep your costs in check.
Tasks
Per 1K images
Captioning
$4.00
Detection
$0.80
OCR
$0.40
Tables
$0.40
Classification
$0.40
Embeddings
$0.10
Pro
$500
$500
/month
+ Usage pricing based on task
+ Usage pricing based on task
+ $400/month credits included
+ $400/month credits included
Pre-configured Models
Pre-configured Models
< 50K Requests / Month
< 50K Requests / Month
Shared Deployment
Shared Deployment
Up to 2 Custom Models
Up to 2 Custom Models
Community Slack Support
Community Slack Support
Enterprise
Custom
+ Usage pricing based on task
+ Usage pricing based on task
Model Customization
Model Customization
Unlimited Requests / month
Unlimited Requests / month
In-VPC Deployments
In-VPC Deployments
Unlimited Custom Models
Unlimited Custom Models
Dedicated Slack Support
Dedicated Slack Support
SOC2, HIPAA Compliance
SOC2, HIPAA Compliance
FAQ
Frequently Asked Questions
Frequently Asked Questions
What do you mean by structured JSON extraction?
How do you compare to other foundation vision APIs?
Can I fine-tune on my own images?
Can you run support real-time or streaming use-cases?
How do you keep data private?
Do you offer a free-trial?
What do you mean by structured JSON extraction?
How do you compare to other foundation vision APIs?
Can I fine-tune on my own images?
Can you run support real-time or streaming use-cases?
How do you keep data private?
Do you offer a free-trial?
VLM Run
VLM Run
VLM Run