May 2025

Building a safe RAG health assistant for 537M diabetes patients

Proved that safety architecture can be designed into a health AI product from day one. The four-layer system (emergency keyword fast-path, system prompt scope constraints, source transparency, and hallucination prevention through structured citation parsing) handles real diabetes queries with source-backed answers and no unsafe responses in testing.

Challenge

537 million adults worldwide live with diabetes. For the newly diagnosed, the gap between leaving a doctor's office and feeling genuinely informed can be months wide. Most fill it with Google and Reddit, where misinformation is common. Two distinct user types that existing tools fail to separate: newly diagnosed patients who are emotionally overwhelmed and need reassurance in plain language, and lifelong learners or caregivers who want clinical depth and evidence. Most health chatbots treat these as the same user. Designing around that distinction, and doing so safely, became the central product decision.

Approach

• Binary mode toggle (Learning vs. Newly Diagnosed): Mode is a proxy for emotional state, not knowledge level. A binary toggle removes calibration burden and covers the majority of real use cases. The mode is injected into every LLM call, controlling tone at the system prompt level.
• Four-layer safety architecture: Emergency keyword scan (13 crisis terms trigger a 999/911 directive in under 1ms, bypassing the LLM entirely) -> system prompt scope constraints -> frontend disclaimer on every response -> hallucination prevention through structured citation parsing. The emergency fast-path was the first thing built and the last thing that could ever be removed.
• Radical source transparency: Every response surfaces which documents were retrieved and whether the answer came from the knowledge base or general LLM knowledge. For a health product, this is an ethical requirement, not a feature.
• No authentication: Target users skew older and less digitally confident. Removing sign-up friction was a deliberate conversion decision. Chat history stored in localStorage delivers 90% of the persistence benefit at none of the privacy cost.
• Follow-up chips: The LLM outputs a structured FOLLOWUPS block in every response, parsed and rendered as tappable chips, addressing the 'what do I not know to ask?' problem for newly diagnosed users and improving session depth.
• RAG over pure LLM: Fixed a critical bug discovered in Sprint 1 where the RAG chain was instantiated but never called, so every response had been generated from pure LLM knowledge with no retrieval. Fixing one line transformed response quality. Integration tests for the critical path are not optional.

Outcome

• Shipped a functioning RAG-powered health assistant proving that safety and trust requirements can be designed into product architecture rather than added post-launch.
• Streaming token-by-token responses reduced perceived latency dramatically without changing actual LLM generation time, validating that in a health context, a blank screen while a user worries is an active harm, not just a UX inconvenience.
• Personalisation via patient context block (name, age, gender) changed the emotional experience of the product without changing its information content, delivered at the cost of two localStorage reads and a string prefix.
• Learnt that a system prompt is a product requirement: the original prompt had three contradictory strategies that produced inconsistent output. Rewriting it as a single clear strategy with explicit fallback ordering was the highest-leverage change.
• Built a complete CSS design system from scratch (60+ custom properties, glassmorphism panels, spring animations, responsive layout) with no framework, a deliberate decision to understand constraints rather than abstract them away.

Skills Demonstrated

Health AIRAG ArchitecturePrompt EngineeringSafety Design0 to 1 ExecutionLangChainFlask / PythonUX Design

Links

Live Product GitHub Full Case Study (PDF)