TurboQuantDB — Massive embedding datasets on lightweight hardware

The compression story · 100k vectors at d=1536

Same dataset. Pick your trade-off.

Raw f32 (baseline)

586MB

no quantization

→

b=4 codes

84MB

7× smaller · R@1 0.96

→

b=2 codes

47MB

12× smaller · R@1 0.84

→

b=4 + ri4 rerank

181MB

3.2× smaller · R@1 0.995

01 · How it works

One vector in.
A handful of bytes out.

RAW

d = 1536 · float32

6144 B

rotate R · x

MSE quantize Lloyd-Max · b bits

QJL residual 1-bit Gaussian sketch

TurboQuant · b=4 codes

codes only — no rerank

768 B −87%

No training. No eval set. No fit(). Codebook is closed-form on a Beta(d/2) marginal. Paper →

02 · Benchmarks

Real numbers.
Three datasets.

Config	R@1	R@4	p50	Disk

100k vectors · 1k queries · single laptop · brute-force · python benchmarks/paper_recall_bench.py

04 · Code

Six recipes.
Copy. Paste. Run.

examples/quickstart.py

05 · Rerank precision

Disk per vector,
at d=1536.

disk per vector R@1 recall d=1536 · b=4 brute · DBpedia

rerank=False codes only

0 B

0.958

"residual_int4" new · v0.8.3 · recommended

772 B

0.995

"int8" previous default

1540 B

0.997

"f16" non-normalized vectors

3076 B

0.997

"int4" deprecated · v0.8.3

772 B

0.92

▸ Low dim (≤384): rerank usually needed — codes alone often give R@1 < 0.85.

▸ High dim (≥1024): codes alone may be enough — pick rerank=False for the smallest footprint.

▸ Either way: when you do enable rerank, prefer residual_int4 — half the disk of int8, ~0.2 pp recall cost.

05 · Drop-in

Plays with
your stack.

LangChain

pip install 'tqdb[langchain]'

LlamaIndex

pip install 'tqdb[llamaindex]'

Chroma → TQDB

python -m tqdb.migrate chroma /old /new

HTTP server

tqdb-server # bundled in the wheel

06 · Built for

Local. Embedded.
Fast on a laptop.

⌂

Private RAG

Data stays on your box.

◇

Laptop-scale

1M × d=1536 in ~1 GB.

◐

Edge / on-device

One binary. No daemon.

⏵

Sub-10ms ANN

HNSW when N grows.

Massive embedding
datasets.
Lightweight hardware.

Same dataset. Pick your trade-off.

One vector in.
A handful of bytes out.

Real numbers.
Three datasets.

Six recipes.
Copy. Paste. Run.

Disk per vector,
at d=1536.

Plays with
your stack.

Local. Embedded.
Fast on a laptop.

Private RAG

Laptop-scale

Edge / on-device

Sub-10ms ANN

The Config Advisor scores every combo.

Same dataset. Pick your trade-off.

One vector in. A handful of bytes out.

Real numbers. Three datasets.

Six recipes. Copy. Paste. Run.

Disk per vector, at d=1536.

Plays with your stack.

Local. Embedded. Fast on a laptop.

Private RAG

Laptop-scale

Edge / on-device

Sub-10ms ANN

The Config Advisor scores every combo.

One vector in.
A handful of bytes out.

Real numbers.
Three datasets.

Six recipes.
Copy. Paste. Run.

Disk per vector,
at d=1536.

Plays with
your stack.

Local. Embedded.
Fast on a laptop.