Drop-in RAG · FastAPI + Qdrant

MiniRAG

Retrieval-grounded chat in five minutes flat.

A multi-tenant RAG backend and a Shadow-DOM chat widget. One script tag on the page, one Docker Compose on your box, one bearer token per tenant. Provider-agnostic via LiteLLM — OpenAI, Anthropic, or Gemini, your key, your call.

Install See the stack GitHub

Integration time

≤ 5 min

Backend version

v1.0.0

Tenant-scoped tables

Widget script tag

1 line

LLM providers

License

OSS

Integrate — one script tag

Embed in 5 minutes

Paste one element. The widget brings its own Shadow DOM.

The chat widget is a custom element — <minirag-widget> — shipped as a single JS file. It renders inside Shadow DOM so it can't fight your site's CSS, it streams tokens over SSE, and it authenticates with a per-bot bearer token you scope on the server.

On the page

<!-- Anywhere in your HTML -->
<script
  src="https://your-host/dashboard/widget/minirag-widget.js"
  data-bot-id="BOT_PROFILE_ID"
  data-api-url="https://your-host"
  data-api-token="nrag_live_...">
</script>

<!-- Or drop the element manually -->
<minirag-widget
  bot-id="BOT_PROFILE_ID"
  api-url="https://your-host"
  api-token="nrag_live_...">
</minirag-widget>

Harmonia: Shadow DOM means your page CSS and the widget CSS can never collide. Accessible by default — labelled roles, focus trap, Esc to close.

On the server

# 1. Bootstrap a tenant (one-time)
curl -X POST $HOST/v1/tenants \
  -H 'content-type: application/json' \
  -d '{"tenant_slug":"acme","owner_email":"ops@acme.io",
       "owner_password":"...","tenant_name":"Acme"}'

# 2. Create a bot, pick a model, pin a system prompt
curl -X POST $HOST/v1/bot-profiles \
  -H "authorization: Bearer $TOKEN" \
  -d '{"name":"Support","model":"gpt-4o-mini",
       "system_prompt":"Ground every answer in the sources."}'

# 3. Add a source, trigger ingestion — returns 202
curl -X POST $HOST/v1/sources/$ID/ingest \
  -H "authorization: Bearer $TOKEN"

Metis: ingestion returns 202 and hands off to an ARQ worker. The widget works the moment the source flips to status=ready.

Under the hood — 8 pieces

What's actually running

A small backend, boring on purpose.

MiniRAG is deliberately unfashionable. FastAPI for the API, Qdrant for vectors, Postgres for metadata, Redis + ARQ for the ingestion queue, LiteLLM for provider-agnostic completion and embedding. Nothing exotic — everything you'd have reached for anyway, wired once and tested.

FastAPI

Async Python, Pydantic v2, OpenAPI out of the box.

Qdrant

Single collection, tenant isolation via payload filters.

PostgreSQL

Async SQLModel, Alembic migrations, 11 tenant-scoped tables.

Redis + ARQ

Ingestion queue, cron auto-refresh, backpressure by design.

LiteLLM

OpenAI, Anthropic, Gemini — swap provider, keep the prompt.

SSE streaming

Token-by-token in the widget; no polling, no websockets.

HMAC webhooks

source.ingested · source.failed · chat.message, signed.

Docker Compose

Postgres, Qdrant, Redis, web, worker — one file.

Request path

Bearer token → get_auth_context() → AuthContext(tenant_id, user_id, role) → route → service layer. Tokens with dots go JWT; otherwise SHA-256 lookup.

RAG turn

embed(query) → Qdrant search top_k=5, filtered by tenant_id + bot_profile_id → prompt = system + context + last 10 turns → LiteLLM acompletion → persist message + UsageEvent.

Security posture — 6 guarantees

Your data, your vectors

Every row carries a tenant. Every search is filtered.

Nemesis audited this section. Multi-tenancy isn't a feature bolted on top — it's a column on every table and a payload filter on every vector. The LLM provider keys you bring are encrypted at rest; the widget token never authenticates more than one bot profile; the webhooks you receive are signed before you have to trust anything.

tenant_id on every row

Every table carries a tenant_id. Every query is automatically scoped. Cross-tenant FK references are validated before creation, not after the leak.

Qdrant payload isolation

A single minirag_chunks collection, but every search is filtered by tenant_id + bot_profile_id. Vectors for tenant A are unreachable from tenant B's token.

Bring your own LLM key

Per-bot provider credentials stored Fernet-encrypted in encrypted_credentials. Read schemas expose has_credentials: bool — the ciphertext never leaves the database.

Tokens are SHA-256 hashes

API tokens shown exactly once at creation, then hashed. Passwords hashed with Argon2. No plaintext in logs, no secrets in code.

HMAC-signed webhooks

Every outbound event is signed SHA-256 with the per-webhook secret. Your receiver verifies; MiniRAG doesn't ask you to trust the transport.

Self-hosted, by default

Docker Compose on your VM, Traefik-fronted, your Postgres, your Qdrant, your Redis. Nothing leaves the tenant perimeter you deploy.

BYOK: OPENAI_API_KEY at process scope, or per-bot Fernet-encrypted credentials. Nothing touches nyxCore infrastructure; MiniRAG runs entirely on the box you deploy.

Grounding — 4 disciplines

Grounded or silent

Aletheia’s rule. If the context is empty, so is the answer.

A RAG system that hallucinates on a miss is a chatbot with extra steps. MiniRAG's orchestrator treats retrieval as a gate — not as decoration for a model that will answer anyway. Citations ship with the response; the refusal path ships with the product.

Citations, not vibes

Every answer is grounded in top-k chunks retrieved from Qdrant — by default five. The orchestrator passes them into the prompt as context; the UI surfaces the source rows.

Silent when ungrounded

If the retrieved context is empty or below the similarity floor, the bot says so instead of filling the gap with a hallucination. Refusal is a feature, not a failure.

Authority-labelled chunks

When a bot has active nyxCore sources, chunks come back tagged [MANDATORY] or [GUIDELINE]. The LLM sees which rules override which suggestions — Aletheia wrote that contract.

Deduplication in the chunker

Normalised text, 512/64 chunk/overlap, deterministic IDs. The same document ingested twice doesn't double your retrieval distribution.

Paired with Ipcha Mistabra, MiniRAG inherits the 0.15 % hallucination floor — adversarial red teaming runs before deployment, not after the first support ticket.

Honest positioning — 03

What this is not

The adversary’s disclosure. Read before you install.

Ipcha Mistabra wrote this section. Before you drop the script tag onto production, know what MiniRAG refuses to be.

Disclosure

Not a chatbot builder.

MiniRAG has one chat surface — the widget — and an API. It won't hand you a drag-and-drop flow designer, and it won't try to. Configure bots via JSON, ship in minutes, skip the UI theatre.

Disclosure

Not a vector database.

We run on Qdrant. We don't reinvent ANN. If you already have a vector store you trust, MiniRAG is the wrong layer — go read the Qdrant client code and build your own orchestrator.

Disclosure

Not a search appliance.

Retrieval serves the conversation, not the other way round. If you want a Google-box for your docs, use a search engine. MiniRAG is grounding for an assistant, nothing more.

Install — under two minutes

Quick start

One compose file. Postgres, Qdrant, Redis, web, worker.

Python 3.11+, Docker, and one OpenAI (or Anthropic, or Gemini) key for the model you pick. That's the whole prerequisite list. The compose file brings Postgres 16, Qdrant 1.13, Redis 7, the API, and the ARQ worker — boot in under two minutes on a laptop.

# Clone & configure
git clone https://github.com/nyxCore-Systems/mini-chat-rag
cd mini-chat-rag
cp .env.example .env
# Set ENCRYPTION_KEY (Fernet), JWT_SECRET_KEY, OPENAI_API_KEY

# Bring up the whole stack
docker compose up -d
docker compose exec web alembic upgrade head

# Sanity check
curl http://localhost:8000/v1/system/health

# Dashboard at /dashboard — Alpine + Tailwind, no build step
open http://localhost:8000/dashboard

Widget URL

/dashboard/widget/minirag-widget.js

Vector collection

minirag_chunks

Chunking

512 / 64 overlap

Get it on GitHub Release notes

v1.0.0 · self-hosted · your keys, your data

Before you paste the script into production

Scope the bot-specific API token to one bot profile, put MiniRAG behind your own TLS terminator (the repo ships a Traefik Caddyfile for both), and run the Postman collection before you expose the widget. Five minutes to embed means five minutes of integration — not five minutes of calibration.

Metis says: ingest one source, watch it go to status=ready, then expose the widget. Not before.

See the rest of the nyxCore ecosystem Talk to the team