mm-ctx

Fast, multi-modal context for agents.

Familiar UNIX tools (find, grep, cat, wc) with unfamiliar multi-modal powers. A Rust-core CLI that indexes, searches, and extracts content from images, video, audio, and documents at speed.

mm · multi-modal cliv0.5.3

mmbench-tiny · 700 files · rust core · pyo3 bindings

98ms
3.59Gbps

mean latency · peak throughput · mm find · 700 files

CommandPurposeMeanBps
  • mm find .list files, like fd98.2 ms3.6 Gbps
  • mm wc .count tokens for LLM context109 ms3.2 Gbps
  • mm sqlquery metadata + chunks151 ms2.3 Gbps
  • mm greptext + semantic search143 ms2.3 Gbps
  • mm cat <img>image caption + object detection1.76 s4.2 Mbps
  • mm cat <video>video keyframe captioning + summary1.58 s172.1 Mbps
  • mm cat <pdf>PDF text extraction1.75 s1.6 Mbps
  • mm cat <audio>audio transcription131 ms872.3 Mbps

mean · bps · across 8 mm commands · image · video · pdf · audio

UNIX philosophy, multi-modal powers.

Rust core for speed, Python for developer experience, and composable commands you already know. Pipe, filter, and query across every file type on disk.

60ms across 700 files

Rust-core index and fast path deliver sub-100ms metadata commands (find, wc) on realistic workloads.

UNIX-familiar commands

find, grep, cat, wc, sql. Same mental model, extended to images, video, audio, and docs.

Implicit indexing

No "build index" step. Every command auto-indexes on first use and re-uses the cache afterwards.

Content extraction

mm cat extracts text from PDFs, captions from images, and descriptions from video. Supports fast or accurate LLM mode.

SQL over your files

mm sql runs SQL against the file index, extraction results, chunks, and embeddings, all in one place.

Pipeline-friendly

JSON output (--format json) everywhere. Pipe into jq, feed into agents, or build bigger workflows.

1234567891011121314151617181920
# Overview of a directory
mm find ~/data --tree --depth 2

# Find all images, output as JSON (60ms on 700 files)
mm find ~/data --kind image --format json

# Extract text from a PDF
mm cat paper.pdf

# Accurate mode: LLM caption for an image
mm cat photo.png -m accurate

# Video metadata + LLM-described keyframe mosaic
mm cat video.mp4 -m accurate

# Content search across documents
mm grep "attention" ~/data --kind document

# SQL over the index
mm sql "SELECT kind, COUNT(*) FROM files GROUP BY kind"

Quick Start

Your filesystem, as structured context.

Index every image, video, PDF, and document on disk, then query them with familiar commands. Built for agents that need multi-modal context without orchestration overhead.

  • Rust-core speed with Python dev-ex
  • Works on images, video, audio, PDFs, and text
  • SQL over file metadata, content, and embeddings
  • MIT-licensed, composable with any agent framework

Give your agent a real filesystem.