tenant_id on every row
Every table carries a tenant_id. Every query is automatically scoped. Cross-tenant FK references are validated before creation, not after the leak.
Drop-in RAG · FastAPI + Qdrant
Retrieval-grounded chat in five minutes flat.
A multi-tenant RAG backend and a Shadow-DOM chat widget. One script tag on the page, one Docker Compose on your box, one bearer token per tenant. Provider-agnostic via LiteLLM — OpenAI, Anthropic, or Gemini, your key, your call.
Integration time
≤ 5 min
Backend version
v1.0.0
Tenant-scoped tables
11
Widget script tag
1 line
LLM providers
3+
License
OSS
Embed in 5 minutes
The chat widget is a custom element — <minirag-widget> — shipped as a single JS file. It renders inside Shadow DOM so it can't fight your site's CSS, it streams tokens over SSE, and it authenticates with a per-bot bearer token you scope on the server.
On the page
<!-- Anywhere in your HTML --> <script src="https://your-host/dashboard/widget/minirag-widget.js" data-bot-id="BOT_PROFILE_ID" data-api-url="https://your-host" data-api-token="nrag_live_..."> </script> <!-- Or drop the element manually --> <minirag-widget bot-id="BOT_PROFILE_ID" api-url="https://your-host" api-token="nrag_live_..."> </minirag-widget>
Harmonia: Shadow DOM means your page CSS and the widget CSS can never collide. Accessible by default — labelled roles, focus trap, Esc to close.
On the server
# 1. Bootstrap a tenant (one-time) curl -X POST $HOST/v1/tenants \ -H 'content-type: application/json' \ -d '{"tenant_slug":"acme","owner_email":"ops@acme.io", "owner_password":"...","tenant_name":"Acme"}' # 2. Create a bot, pick a model, pin a system prompt curl -X POST $HOST/v1/bot-profiles \ -H "authorization: Bearer $TOKEN" \ -d '{"name":"Support","model":"gpt-4o-mini", "system_prompt":"Ground every answer in the sources."}' # 3. Add a source, trigger ingestion — returns 202 curl -X POST $HOST/v1/sources/$ID/ingest \ -H "authorization: Bearer $TOKEN"
Metis: ingestion returns 202 and hands off to an ARQ worker. The widget works the moment the source flips to status=ready.
What's actually running
MiniRAG is deliberately unfashionable. FastAPI for the API, Qdrant for vectors, Postgres for metadata, Redis + ARQ for the ingestion queue, LiteLLM for provider-agnostic completion and embedding. Nothing exotic — everything you'd have reached for anyway, wired once and tested.
Async Python, Pydantic v2, OpenAPI out of the box.
Single collection, tenant isolation via payload filters.
Async SQLModel, Alembic migrations, 11 tenant-scoped tables.
Ingestion queue, cron auto-refresh, backpressure by design.
OpenAI, Anthropic, Gemini — swap provider, keep the prompt.
Token-by-token in the widget; no polling, no websockets.
source.ingested · source.failed · chat.message, signed.
Postgres, Qdrant, Redis, web, worker — one file.
Request path
Bearer token → get_auth_context() → AuthContext(tenant_id, user_id, role) → route → service layer. Tokens with dots go JWT; otherwise SHA-256 lookup.
RAG turn
embed(query) → Qdrant search top_k=5, filtered by tenant_id + bot_profile_id → prompt = system + context + last 10 turns → LiteLLM acompletion → persist message + UsageEvent.
Your data, your vectors
Nemesis audited this section. Multi-tenancy isn't a feature bolted on top — it's a column on every table and a payload filter on every vector. The LLM provider keys you bring are encrypted at rest; the widget token never authenticates more than one bot profile; the webhooks you receive are signed before you have to trust anything.
Every table carries a tenant_id. Every query is automatically scoped. Cross-tenant FK references are validated before creation, not after the leak.
A single minirag_chunks collection, but every search is filtered by tenant_id + bot_profile_id. Vectors for tenant A are unreachable from tenant B's token.
Per-bot provider credentials stored Fernet-encrypted in encrypted_credentials. Read schemas expose has_credentials: bool — the ciphertext never leaves the database.
API tokens shown exactly once at creation, then hashed. Passwords hashed with Argon2. No plaintext in logs, no secrets in code.
Every outbound event is signed SHA-256 with the per-webhook secret. Your receiver verifies; MiniRAG doesn't ask you to trust the transport.
Docker Compose on your VM, Traefik-fronted, your Postgres, your Qdrant, your Redis. Nothing leaves the tenant perimeter you deploy.
BYOK: OPENAI_API_KEY at process scope, or per-bot Fernet-encrypted credentials. Nothing touches nyxCore infrastructure; MiniRAG runs entirely on the box you deploy.
Grounded or silent
A RAG system that hallucinates on a miss is a chatbot with extra steps. MiniRAG's orchestrator treats retrieval as a gate — not as decoration for a model that will answer anyway. Citations ship with the response; the refusal path ships with the product.
Every answer is grounded in top-k chunks retrieved from Qdrant — by default five. The orchestrator passes them into the prompt as context; the UI surfaces the source rows.
If the retrieved context is empty or below the similarity floor, the bot says so instead of filling the gap with a hallucination. Refusal is a feature, not a failure.
When a bot has active nyxCore sources, chunks come back tagged [MANDATORY] or [GUIDELINE]. The LLM sees which rules override which suggestions — Aletheia wrote that contract.
Normalised text, 512/64 chunk/overlap, deterministic IDs. The same document ingested twice doesn't double your retrieval distribution.
Paired with Ipcha Mistabra, MiniRAG inherits the 0.15 % hallucination floor — adversarial red teaming runs before deployment, not after the first support ticket.
What this is not
Ipcha Mistabra wrote this section. Before you drop the script tag onto production, know what MiniRAG refuses to be.
Disclosure
MiniRAG has one chat surface — the widget — and an API. It won't hand you a drag-and-drop flow designer, and it won't try to. Configure bots via JSON, ship in minutes, skip the UI theatre.
Disclosure
We run on Qdrant. We don't reinvent ANN. If you already have a vector store you trust, MiniRAG is the wrong layer — go read the Qdrant client code and build your own orchestrator.
Disclosure
Retrieval serves the conversation, not the other way round. If you want a Google-box for your docs, use a search engine. MiniRAG is grounding for an assistant, nothing more.
Quick start
Python 3.11+, Docker, and one OpenAI (or Anthropic, or Gemini) key for the model you pick. That's the whole prerequisite list. The compose file brings Postgres 16, Qdrant 1.13, Redis 7, the API, and the ARQ worker — boot in under two minutes on a laptop.
# Clone & configure git clone https://github.com/nyxCore-Systems/mini-chat-rag cd mini-chat-rag cp .env.example .env # Set ENCRYPTION_KEY (Fernet), JWT_SECRET_KEY, OPENAI_API_KEY # Bring up the whole stack docker compose up -d docker compose exec web alembic upgrade head # Sanity check curl http://localhost:8000/v1/system/health # Dashboard at /dashboard — Alpine + Tailwind, no build step open http://localhost:8000/dashboard
Widget URL
/dashboard/widget/minirag-widget.js
Vector collection
minirag_chunks
Chunking
512 / 64 overlap
Before you paste the script into production
Scope the bot-specific API token to one bot profile, put MiniRAG behind your own TLS terminator (the repo ships a Traefik Caddyfile for both), and run the Postman collection before you expose the widget. Five minutes to embed means five minutes of integration — not five minutes of calibration.
Metis says: ingest one source, watch it go to status=ready, then expose the widget. Not before.