Embedded vector database · Apache 2.0 · arXiv:2504.19874

Massive embedding
datasets.
Lightweight hardware.

Embedded vector DB in Rust. 10× smaller than raw f32, zero training, sub-10 ms search at d=1536.

$ pip install tqdb
Quick Start → Read the paper
The compression story · 100k vectors at d=1536

Same dataset. Pick your trade-off.

Raw f32 (baseline)
586MB
no quantization
b=4 codes
84MB
7× smaller · R@1 0.96
b=2 codes
47MB
12× smaller · R@1 0.84
b=4 + ri4 rerank
181MB
3.2× smaller · R@1 0.995
01 · How it works

One vector in.
A handful of bytes out.

RAW
d = 1536 · float32
6144 B
rotate R · x
MSE quantize Lloyd-Max · b bits
QJL residual 1-bit Gaussian sketch
TurboQuant · b=4 codes
codes only — no rerank
768 B −87%

No training. No eval set. No fit(). Codebook is closed-form on a Beta(d/2) marginal. Paper →

02 · Benchmarks

Real numbers.
Three datasets.

Config R@1 R@4 p50 Disk

100k vectors · 1k queries · single laptop · brute-force · python benchmarks/paper_recall_bench.py

04 · Code

Six recipes.
Copy. Paste. Run.

examples/quickstart.py
05 · Rerank precision

Disk per vector,
at d=1536.

disk per vector R@1 recall d=1536 · b=4 brute · DBpedia
rerank=False codes only
0 B
0.958
"residual_int4" new · v0.8.3 · recommended
772 B
0.995
"int8" previous default
1540 B
0.997
"f16" non-normalized vectors
3076 B
0.997
"int4" deprecated · v0.8.3
772 B
0.92
Low dim (≤384): rerank usually needed — codes alone often give R@1 < 0.85.
High dim (≥1024): codes alone may be enough — pick rerank=False for the smallest footprint.
Either way: when you do enable rerank, prefer residual_int4 — half the disk of int8, ~0.2 pp recall cost.
v0.8.3 2× faster brute search · residual_int4 rerank · half the disk, near-int8 recall. Changelog →
05 · Drop-in

Plays with
your stack.

LangChain
pip install 'tqdb[langchain]'
LlamaIndex
pip install 'tqdb[llamaindex]'
Chroma → TQDB
python -m tqdb.migrate chroma /old /new
HTTP server
tqdb-server # bundled in the wheel
06 · Built for

Local. Embedded.
Fast on a laptop.

Private RAG

Data stays on your box.

Laptop-scale

1M × d=1536 in ~1 GB.

Edge / on-device

One binary. No daemon.

Sub-10ms ANN

HNSW when N grows.

Pick the right config in seconds

The Config Advisor scores every combo.