Fast, multi-modal context for agents.

Familiar UNIX tools (find, grep, cat, wc) with unfamiliar multi-modal powers. A Rust-core CLI that indexes, searches, and extracts content from images, video, audio, and documents at speed.

Install from PyPI Read the Docs

mm · multi-modal cliv0.5.3

mmbench-tiny · 700 files · rust core · pyo3 bindings

98ms

3.59Gbps

mean latency · peak throughput · mm find · 700 files

CommandPurposeMeanBps

mm find .list files, like fd98.2 ms3.6 Gbps
mm wc .count tokens for LLM context109 ms3.2 Gbps
mm sqlquery metadata + chunks151 ms2.3 Gbps
mm greptext + semantic search143 ms2.3 Gbps
mm cat <img>image caption + object detection1.76 s4.2 Mbps
mm cat <video>video keyframe captioning + summary1.58 s172.1 Mbps
mm cat <pdf>PDF text extraction1.75 s1.6 Mbps
mm cat <audio>audio transcription131 ms872.3 Mbps

mean · bps · across 8 mm commands · image · video · pdf · audio

UNIX philosophy, multi-modal powers.

Rust core for speed, Python for developer experience, and composable commands you already know. Pipe, filter, and query across every file type on disk.

60ms across 700 files

Rust-core index and fast path deliver sub-100ms metadata commands (find, wc) on realistic workloads.

UNIX-familiar commands

find, grep, cat, wc, sql. Same mental model, extended to images, video, audio, and docs.

Implicit indexing

No "build index" step. Every command auto-indexes on first use and re-uses the cache afterwards.

Content extraction

mm cat extracts text from PDFs, captions from images, and descriptions from video. Supports fast or accurate LLM mode.

SQL over your files

mm sql runs SQL against the file index, extraction results, chunks, and embeddings, all in one place.

Pipeline-friendly

JSON output (--format json) everywhere. Pipe into jq, feed into agents, or build bigger workflows.

1234567891011121314151617181920

# Overview of a directory
mm find ~/data --tree --depth 2

# Find all images, output as JSON (60ms on 700 files)
mm find ~/data --kind image --format json

# Extract text from a PDF
mm cat paper.pdf

# Accurate mode: LLM caption for an image
mm cat photo.png -m accurate

# Video metadata + LLM-described keyframe mosaic
mm cat video.mp4 -m accurate

# Content search across documents
mm grep "attention" ~/data --kind document

# SQL over the index
mm sql "SELECT kind, COUNT(*) FROM files GROUP BY kind"

Quick Start

Your filesystem, as structured context.

Index every image, video, PDF, and document on disk, then query them with familiar commands. Built for agents that need multi-modal context without orchestration overhead.

Rust-core speed with Python dev-ex
Works on images, video, audio, PDFs, and text
SQL over file metadata, content, and embeddings
MIT-licensed, composable with any agent framework

Install from PyPI Read the Docs

Give your agent a real filesystem.

Install from PyPI Book a Demo