Legal AI Has a Design Problem, Not a Hallucination Problem

Hallucinations in legal are workflow problem, NOT an AI problem. Separating research from drafting eliminates fabricated cases, wrong citations, and misapplied authority.

Tommy EberleTommy Eberle

The conventional wisdom in legal AI is that hallucinations are an AI problem that requires an AI solution. Better models, bigger databases, smarter verification. We think that's wrong. Hallucinations are a workflow problem, and the fix has nothing to do with the AI.

But before we get to the fix, let's talk about what hallucinations actually are in the context of legal work. There are several very different types of hallucination and it's important to understand them.

The Hallucination Spectrum

1. Completely Fabricated Cases

This is the one everyone talks about. The AI invents a case out of thin air. Smith v. Johnson, 342 F.3d 891 (2d Cir. 2019). These "cases" sound real and look real, but they are completely made up.

This is the most egregious type, and it's what got lawyers sanctioned in Mata v. Avianca and elsewhere. It's also, frankly, the easiest to prevent.

2. Real Case, Wrong Citation

The AI knows the case exists but gets the citation wrong. It might cite something like

Marbury v. Madison, 123 U.S. 1111 (1963)

The actual citation is Marbury v. Madison, 5 U.S. 137 (1803). The case is real, but the volume, page number, or year is fabricated. If you looked it up, you'd find nothing at that citation.

3. Right Case, Right Citation, Wrong Application

This is the sneaky one. The case is real. The citation is correct. But the AI misuses it. This can happen in a few ways:

  • The case is from a different jurisdiction and has no binding authority
  • The facts are nothing like the facts of your case
  • The AI fabricates or distorts a quote

Here's a concrete example. The real quote from Marbury v. Madison is: "It is emphatically the province and duty of the judicial department to say what the law is." An AI might generate: "It is emphatically the province of the judiciary to ensure that no party is denied their day in court." That sounds like something a court would say. It's close enough to the real language that you might not catch it. But it's wrong, and opposing counsel will.

4. The Case Is No Longer Good Law

The AI cites a case that was real and correctly cited, but has since been overturned, superseded, or otherwise abrogated. This isn't a "hallucination" in the traditional sense, but the practical effect is the same: you've cited authority that doesn't support your argument.

Why This Happens

AI is not a database.

Large language models like ChatGPT and Claude were trained on the internet. Whatever they "know" about case law is more or less an accident of what appeared in their training data. Ask them about Marbury v. Madison or Brown v. Board of Education and they'll probably nail it, because those cases are discussed on thousands of web pages. The 1L canon is all over the internet.

But ask about recent appellate decisions in a specific practice area? A 2025 Second Department opinion on whether a defense IME doctor addressed exacerbation of a preexisting cervical spine injury? The AI probably hasn't seen it. And if it hasn't seen it, it will do what language models are literally designed to do: generate something that sounds right based on patterns. That's where the fabrications come from.

The Problem With Combining Research and Drafting

Most AI legal tools ask the AI to find the law and apply it at the same time.

Research is finding the statutes, cases, and rules that support your client's position. Drafting is taking a known set of legal authorities and applying them to the facts of your case.

When you ask an AI to do both at the same time, you're asking it to pull cases from memory and then reason about them. That's where every type of hallucination on the spectrum comes into play. The AI might invent a case. It might get the citation wrong. It might grab a real case and misapply it. It might cite something that's been overturned.

Some tools try to solve this by connecting an LLM to a legal database like Westlaw or Lexis. This eliminates fabricated cases, but it introduces a subtler problem. The AI is now deciding which cases matter. An attorney doing their own research on Westlaw might run 15 different queries, refining as they go based on what they're finding and what they're not finding. A RAG tool runs whatever searches its algorithm generates and presents the results as if they're comprehensive. You don't know what searches it ran. You don't know what it missed. And the output looks polished and complete, which makes those blind spots harder to catch than an obvious hallucination. A fabricated case is easy to spot. A plausible but incomplete selection of authority is not.

If you do the research first and hand the AI a bounded set of vetted authorities, you minimize the surface area for hallucinations dramatically. The AI has no reason to make up a case if you've given it the only cases it's allowed to cite. It's far less likely to get a citation wrong if you've provided the correct citations upfront. It won't misquote a case if you've pulled the key quotes ahead of time. And it won't cite bad law if you've already confirmed everything is current. Across all four failure modes on the spectrum, pre-vetted research makes hallucinations at least an order of magnitude less likely.

Any Off-the-Shelf AI Can Already "Do the Law"

There's a misconception in legal tech that you need some kind of specialized, fine-tuned legal AI to handle legal work. You don't.

Think about what a motion actually involves. Take a serious injury threshold motion under Insurance Law 5102(d), which is bread and butter for NY insurance defense. When you break it down, a 5102(d) defense motion is a series of discrete tasks.

  1. Read the bill of particulars and extract which body parts and serious injury categories are claimed. That's text extraction.
  2. Review the IME report and check that every claimed body part was addressed. Check whether the expert's own ROM findings contradict the expert's conclusions. Flag any measurements at 20% or above. That's comparison and consistency checking.
  3. Comb through deposition testimony for admissions that undermine the 90/180 claim (the plaintiff kept working, was never confined). Look for prior accidents. Look for statements about when treatment stopped and why. That's text analysis with specific targets.
  4. Compare pre- and post-accident medical records to determine whether the defense expert reviewed and compared the imaging, and whether they addressed exacerbation of preexisting conditions. That's comparison.
  5. Check for gaps in treatment and whether there's an explanation (no-fault benefits terminated, for example). That's timeline analysis.
  6. Match the facts to the right case law from a vetted library of authorities. The expert didn't identify the measurement method? That maps to Cosme-Almandoz v. Alejandrino. The expert's own findings show 20%+ ROM loss? That's Dufel v. Green. Exacerbation not addressed? That's Petric v. Retsina Cab Corp. This is pattern matching against a known set of cases.
  7. Draft the argument, applying the facts to the matched cases. That's structured writing.

Not a single one of those tasks requires "legal training" or a specialized legal AI model. They require careful reading and reasoning about text. That is exactly what large language models are built to do.

How We Think About This at DocketDrafter

We built DocketDrafter around this exact principle of separating research from drafting.

All we need to get started is three NYSCEF index numbers from motions your firm has already filed. From those three real examples of your work product, we extract the cases you cite, the arguments you make, and the way you structure your motions, and encode it all into a structured guide. When a new case comes in, attorneys upload the new facts and our system applies the attorney-approved law. The AI doesn't do research. It doesn't pull cases from memory. Every citation comes from your own vetted work product.

This means we can validate against every type of hallucination on the spectrum. Is the case real? Yes, the attorney put it there. Is the citation correct? Yes, it was verified when the guide was built. Is it applied correctly? The system matches facts to cases based on the guide's structure, not the AI's imagination. Is it still good law? That's confirmed before anything goes into the guide.

Every factual claim in the output references specific paragraph numbers from affidavits, exhibit letters, and NYSCEF document numbers. The work product is fully auditable.

DocketDrafter is sophisticated document assembly that executes playbooks that already work.