v0.4.0 / now on PyPI

Your documents,
answerable.
On your hardware.

Drop in PDFs, code, spreadsheets, or URLs. Ask anything, get cited answers from a local LLM. Nothing leaves your machine.

Hybrid + GraphRAG retrieval Sealed cloud-drive sharing Zero telemetry
pip install axon-rag View on GitHub
~/research — bash axon · 0.4.0 · python 3.11
(axon) $ pip install "axon-rag[starter]"
✓ Successfully installed axon-rag-0.4.0
(axon) $ axon --doctor
✓ Python 3.11 · Ollama reachable · llama3.1:8b cached · store writable
(axon) $ axon ingest ~/research/papers/
→ Scanning 42 files [PDF, MD, DOCX, ipynb] · SHA-256 dedup
✓ Indexed 42 files · 0 duplicates · 1,847 chunks · 312 entities · 89 relations
(axon) $ axon query "What were the v0.3.2 graph backend changes?"
↳ hybrid search · BGE rerank · GraphRAG local · cite
v0.3.2 added four user-visible graph capabilities [Doc 1] [Doc 2]:
Capability flags on FinalizationResult[3]
Point-in-time /graph/retrieve route — [4]
Conflict inspection via graph_conflicts[5]
Per-query federation weights

✶ Runs on Ollama / vLLM locally · OpenAI · Gemini · Grok · Copilot · MIT licensed

01 / Ingest
Drop your files
54 formats. SHA-256 dedup, content-aware chunking.
02 / Index
Build the graph
Hybrid BM25 + dense. GraphRAG, RAPTOR, code-graph. All on CPU.
03 / Retrieve
Route the query
Auto-router picks HyDE, multi-query, step-back, or graph. BGE rerank.
04 / Cite
Answer with proof
Structured citations array — Claude / OpenAI compatible shapes.
05 / Share
Seal & distribute
AES-256-GCM at rest. Cloud sees ciphertext. Rotate to revoke.
51
MCP Tools
54
File Formats
3
Graph Backends
8
LLM Providers
0
Telemetry
— Why Axon —

Five axes we refuse to compromise on.

Most RAG tools force a trade between cloud power and data privacy. Axon delivers full capability, local-first — zero egress when you run on Ollama or vLLM; cloud providers optional — and stacks four more constraints on top.

01 Axis
Private by default
All inference local. No API key, no upload, no telemetry. local_assets_only preflights at startup.
// air_gapped: true
02 Axis
Sealed by design
AES-256-GCM at rest, per-grantee AES-KW. Master key in OS keyring. Cloud sees ciphertext.
// AES-256-GCM + AES-KW
03 Axis
Shareable without infra
No server. Shares ride any cloud drive. 5-level hierarchy, instant rotate-to-revoke.
// no infra required
04 Axis
Graph native
Three backends: graphrag, bi-temporal dynamic_graph, federated RRF. Plus orthogonal code-graph.
// 3 backends, 1 protocol
05 Axis
Agent ready
Designed for autonomous agents from day one — MCP tools, in-process retrievers, and citation shapes all speak the protocols agents already expect.
// agent_native: true
— Architecture —

Five layers, one mental model.

Surfaces → a single AxonBrain via composable mixins. Layers don't leak.

L05Surfaces
REPL
REST · /query · /graph/*
MCP · 51 tools
VS Code · @axon
Streamlit
LangChain
LlamaIndex
L04Brain
AxonBrain
QueryRouterMixin
GraphRagMixin
CodeGraphMixin
Governance
RateLimit
L03Pipeline
HyDE
multi-query
step-back
hybrid retrieve
BGE rerank
graph expand
cite
compress
L02Storage
TurboQuantDB
BM25 (Rust)
graphrag.json
dynamic.sqlite
code_graph.json
.entity_graph
L01Sealing
AES-256-GCM
AES-KW wrap
HKDF-SHA256
OS keyring
version.json
audit log
"

Cloud RAG sells convenience. Vanilla local RAG sells control. We get most of the convenience and all of the control — and we add three constraints few local-first RAG engines combine: the graph remembers when, the share survives without infrastructure, and the conflicts get surfaced instead of swept aside.

— Axon design constraint
— Versus —

How it stacks up against the alternatives.

Cloud sells convenience, local sells control. Axon: most of the convenience, all of the control.

Axon Cloud RAG (e.g. SaaS) Vanilla local stack
Privacy / egress Zero — air-gap mode optional All queries to provider Local
Cited answers Structured citations array Provider-specific shape Roll your own
GraphRAG (community + temporal) 3 backends + federation Limited / paid tier Build it yourself
Sealed cloud sharing AES-256-GCM, no server Per-org account Not a thing
MCP / agent surface 51 tools drop-in None None
Recurring cost $0 · MIT Per-token / seat $0
— Showcase / 4 surfaces —

See it in action.

— Use it from anywhere —

Six surfaces. One brain.

Same brain, every surface.

# Install the recommended bundle (UI + sealed sharing + extra loaders) $ pip install "axon-rag[starter]" # Run health check before your first query $ axon --doctor ✓ Python 3.11 · Ollama reachable · llama3.1:8b cached · store writable # First run auto-launches the setup wizard, then drops into the REPL $ axon axon> /ingest ~/research/papers/ axon> /graph status axon> What were the v0.3.2 graph backend changes? axon> /graph retrieve "alice" --at 2025-06-01 axon> /share generate research alice --ttl-days 30
# Start the API server (set RAG_API_KEY first when binding to 0.0.0.0) $ export RAG_API_KEY="$(openssl rand -hex 32)" $ axon-api --host 0.0.0.0 --port 8000 # Query with structured citations enabled (default) $ curl -X POST localhost:8000/query \ -H "Content-Type: application/json" \ -d '{ "query": "What were the v0.3.2 graph changes?", "top_k": 8, "include_citations": true }' # Response: { "response", "sources", "citations", "provenance", "settings" } # Each citation: { marker, document_index, document_title, start_in_response, end_in_response } # Point-in-time graph query — new in v0.3.2 $ curl -X POST localhost:8000/graph/retrieve \ -d '{ "query": "who led acme?", "point_in_time": "2025-06-01T00:00:00Z", "federation_weights": { "graphrag": 0.3, "dynamic_graph": 0.7 } }'
# ~/.config/claude/mcp_servers.json — Claude Code, Codex CLI, Cursor, etc. { "mcpServers": { "axon": { "command": "axon-mcp", "env": { "RAG_API_KEY": "$AXON_KEY" } } } } # 51 tools become available to your agent. A few highlights: # query_knowledge — RAG query with citations # search_knowledge — raw chunk search # ingest_path — async ingest job # graph_retrieve — point-in-time graph query (v0.3.2) # graph_conflicts — list conflicted facts (v0.3.2) # graph_finalize — community rebuild # share_project — generate read-only share # seal_project — encrypt at rest with AES-256-GCM # governance_audit — query the audit log
# pip install "axon-rag[langchain]" from axon import AxonBrain, AxonConfig from axon.integrations.langchain import AxonRetriever brain = AxonBrain(AxonConfig.from_yaml("config.yaml")) retriever = AxonRetriever(brain=brain, top_k=5) # Drop into any LangChain chain that takes a Retriever: docs = retriever.invoke("What does the project do?") # → list[langchain_core.documents.Document] # Per-call override (one-off, doesn't mutate base) docs = retriever.with_overrides({"hyde": True, "rerank": True}).invoke(query) # All RAG features (hybrid, rerank, HyDE, multi-query, GraphRAG budget) # apply automatically — same codepath as REST and REPL.
# pip install "axon-rag[llama-index]" from axon import AxonBrain, AxonConfig from axon.integrations.llama_index import AxonLlamaRetriever from llama_index.core import VectorStoreIndex from llama_index.core.query_engine import RetrieverQueryEngine brain = AxonBrain(AxonConfig.from_yaml("config.yaml")) retriever = AxonLlamaRetriever(brain=brain, top_k=5) # Drop into any LlamaIndex query engine: engine = RetrieverQueryEngine.from_args(retriever) nodes = retriever.retrieve("What does the project do?") # → list[NodeWithScore] # Each NodeWithScore preserves Axon's hybrid + rerank scoring, # plus is_web tagging for CRAG-Lite web fallback rows.
# pip install axon-rag from axon import AxonBrain, AxonConfig # Bring your own config or load from YAML cfg = AxonConfig.from_yaml("config.yaml") brain = AxonBrain(cfg) # Ingest a directory (sync) or kick off an async job brain.ingest_path("~/research/papers") # Full RAG query — returns the LLM-synthesised answer string answer = brain.query( "What's new in v0.4.0?", overrides={"hyde": True, "graph_rag": True} ) # Need raw chunks instead? Skip the LLM: results, diagnostics, trace = brain.search_raw( "alice", overrides={"top_k": 10, "rerank": True} ) # Citations + sources are stashed for the API to surface print(brain._last_citations) # {"sources": [...], "citations": [...]}
— Install paths —

Pick your flavor.

[starter] ships sealed sharing on by default — recommended for everyone. Power users add granular extras.

Minimal
axon-rag
Bare engine only — TurboQuantDB + BM25 + Ollama. No sealed sharing, no UI. Add [sealed] separately if you skip the bundle.
turboquantdb · ollama-py
Agent
axon-rag[langchain]
Drop-in AxonRetriever. Hybrid + rerank + HyDE flow through.
langchain-core ≥ 0.1
Agent
axon-rag[llama-index]
Drop-in AxonLlamaRetriever returning native NodeWithScore.
llama-index-core ≥ 0.10
Graph
axon-rag[graphrag]
Leiden community detection. Use when GraphRAG is primary.
leidenalg · igraph
Stores
axon-rag[qdrant,chroma]
Alternative vector stores. Default turboquantdb works for most. Pick remote Qdrant or familiar Chroma here.
qdrant-client · chromadb