asman.malikov_ RU

System · ai

Grounded RAG assistant running in production

AI AutomationAI Development Culture GoAstroOpenRouterRAGSSEMCP

site content ──► knowledge base (per-lang, cached) ──► retrieval ──► guarded LLM (OpenRouter) ──► cited answer
                                                                     │
                                            prompt-injection defense · anti-fabrication · history sanitization

Problem

Visitors want a fast, honest answer on whether there's a fit — but generic chatbots hallucinate, leak their system prompt, and can be hijacked by instructions hidden inside user messages (prompt injection).

Approach

Built a grounded RAG assistant that answers using only a per-language knowledge base compiled from the site's own content. It resists prompt injection (ignores instructions embedded in user messages), refuses to fabricate metrics, clients, or availability, sanitizes conversation history, and runs a structured lead-qualification flow. Model routing and knowledge-base caching keep per-conversation cost bounded.

Result

A live, self-hosted production LLM system — not a demo — serving real visitor traffic on this site, grounded enough to cite its sources and safe enough to refuse out-of-scope or injected requests.

Evidence

Running live on this page — open the assistant panel and try it, including a prompt-injection attempt.

Available for: live demo

The engineering is in the guardrails, not the model call: grounding on retrieved context only, prompt-injection resistance, anti-fabrication rules, and a routed, cached model path that keeps cost predictable. Go’s concurrency makes concurrent and streaming requests natural — the design is SSE- and MCP-ready.

AI-readable record (Markdown) · ← Back to proof library