Divyam Jindal

there's something that's been bothering me for a while, and i couldn't quite name it until recently. i spent months designing PALACE — a memory architecture for high-stakes domains, legal, medical, regulatory, the places where hallucination isn't a benchmark failure but an actual liability. parallel retrieval layers, citation verification loops, reverse provenance tracing, a chief model synthesizing the whole thing. i'm proud of what it does. but reading it back now, i keep running into the same uncomfortable thought: all of it is built on top of a brain that was never trained to do any of it.

the sticky note problem every RAG system ever built has the same design: model generates, you retrieve externally, you shove the retrieved stuff into the context window, model tries to use it. the model's weights, its actual architecture, have no idea retrieval is happening. it just sees a context window that now has extra text in it. that's not memory. that's a sticky note taped to someone who's been told to read it and incorporate it. PALACE fixes the verification problem on top of that. but the underlying model is still frozen. it doesn't know how to retrieve. it knows how to process what was retrieved for it. those sound similar. they're not. a model that was trained on retrieval builds internal representations that anticipate retrieved context. a model that wasn't trained on retrieval has to improvise every time. the improvisation usually works well enough for demos. in legal and clinical settings it's the difference between a system you can actually deploy and one you're always managing around.

what deepmind quietly figured out in 2022 RETRO. retrieval-enhanced transformer. the core move was jarring in a quiet way: instead of adding retrieval at inference time, they baked it into pretraining. trained from scratch on a 2 trillion token database, conditioning generation directly on retrieved chunks through something called chunked cross-attention. the result was GPT-3 level performance on standard benchmarks with 25x fewer parameters. sit with that number for a second. 25x fewer parameters. the implication is obvious and sort of devastating for how everyone's been building: storing knowledge as weights is wildly inefficient. we've been building bigger and bigger brains when the smarter move was building better access to external storage. RETRO's retrieval wasn't bolted on as a pipeline step. it was the architecture. the model learned to use retrieval because retrieval was there during training. this is the difference between someone who grew up reading in libraries and someone who grew up memorizing books alone and then got handed a library card at 25.

what RETRO left on the table RETRO solved "can the model retrieve?" it didn't touch the problem i spent the most time on in PALACE: can the model verify? hallucinations in high-stakes domains aren't just retrieval failures. they're failures to know what's unsupported. models trained to generate fluent text will generate fluent text even when the evidence doesn't back it. the confidence is completely decoupled from the grounding. that's the actual bug. and RETRO, elegant as it is, left this entirely untouched. it retrieves better. it still doesn't know when it's wrong. the whole RALM (retrieval-augmented language model) research line basically got stuck on one dimension: retrieval quality. better retrievers, better rerankers, better fusion into the forward pass. nobody trained a model where the objective included "you should also know whether your output is actually supported by what you just retrieved." that's the gap. it's a real one.

what generation 5 looks like PALACE tries to fix verification at inference time. a loop that decomposes claims into atomic pieces, searches for citations, does entailment checking, traces provenance backward. it works. it costs $15 a query and takes two minutes. that's fine for a law firm doing M&A diligence. it's not fine for a hospital running thousands of queries a day on clinical decision support. the cleaner version — call it generation 5 — would train the model so that retrieval, verification, and attribution are first-class training objectives, not post-hoc pipeline engineering. concretely, during pretraining the model doesn't just predict the next token. it also learns:

whether each claim it generates is actually supported by what it retrieved which part of the retrieved corpus a given output traces to when the honest answer is "the evidence doesn't support this" rather than a confident confabulation

not three fine-tuning tasks slapped on top. baked into the pretraining loop the same way RETRO baked retrieval in. you'd need claim-evidence training datasets at scale, a different loss function, probably a separate verification head in the architecture. i've been calling this Certified Retrieval Language Models. the word "certified" matters because the claim isn't just "this model retrieves." it's "this model knows whether its retrieval actually supports what it's saying." that's a different capability. nobody's built it.

why nobody has done this partly the infrastructure. training retrieval-native models from scratch requires building things most labs skip because fine-tuning a frozen model is cheaper and faster. adding verification objectives on top of that means you also need large-scale claim-evidence datasets that don't really exist in structured form yet. partly the benchmark problem. we don't have good evals for "did the model correctly identify that this claim was unsupported?" harder to measure than perplexity. labs optimize for what they can measure. and partly because inference-time compute is getting so cheap that people keep building PALACE-style pipelines instead of rethinking the training. which i get. i built the pipeline. but there's a ceiling to that approach and it shows up precisely in the domains where it matters most.

PALACE as the prototype in retrospect, PALACE is a spec for what a CRLM should do natively. every layer in the pipeline is a capability that should eventually live in the model's weights instead of the surrounding engineering. the structure-preserving retrieval is a training objective. the citation verification loop is a training objective. the reverse provenance tracing is an architecture feature. the chief council synthesizing conflicting evidence streams — that's an executive reasoning capability that a sufficiently trained model should develop internally, not something you need to orchestrate with prompts. this is how a lot of the best ideas in ML have progressed. someone builds an ugly inference-time pipeline that proves the capability is real and worth having. then someone trains it end-to-end and the pipeline collapses into the weights. RAG → RETRO followed that arc. PALACE → CRLM would follow the same one.

i don't know when this gets built. the training data problem is real, the benchmarks don't exist, and the infrastructure requirements are serious. but the direction feels clear enough that i wanted to write it down before someone at a frontier lab publishes it with a different name. the field has spent three years getting better at retrieval quality and generation fluency. the third thing is still missing: a model that genuinely knows what it doesn't know. not as a prompted behavior. as a trained one. that's the next wall.

the llm that doesn't know what it doesn't know

Comments (0)