I Did Substantive Legal Research Without a Law Degree or a Westlaw Subscription. Here's How.

A software engineer's approach to legal research: treating court opinions as data, using free tools and AI for bounded tasks, and keeping human judgment in control at every step.

Tommy EberleTommy Eberle

I'm a software engineer. I've never been to law school. I don't have a Westlaw or Lexis subscription.

I'm also the founder and CEO of DocketDrafter, a motion drafting platform for high-volume NY defense firms. DocketDrafter does not do legal research. It does not find cases, run searches, or generate legal arguments. Every citation in every motion comes from the attorney's own prior work.

We did not start with that thesis. We discovered it.

When we started building DocketDrafter, we spent hours and hours on NYSCEF, New York's electronic filing system, looking at what attorneys actually file. We found pockets of practice where the volume was staggering. We found many examples of single attorneys adding over a hundred new cases a month, filing the same types of motions over and over. We wanted to understand what it actually takes to write winning summary judgment motions in these areas, so we started doing the legal research ourselves. We started by studying the briefs that won and the ones that lost. Then, we wanted to figure out how to write a good brief in the first place.

That is when we realized that figuring out what wins and applying it at scale are two completely different problems. The research is hard, judgment-intensive work. But once you know what wins, the per-case drafting is assembly. Same arguments, new facts, every time. Most legal AI tools try to do both at once. We noticed that in high volume litigation, the assembly takes up far more time than the research. We decided to only do the assembly, and to do it well.

The irony is that this decision has forced me personally to do a massive amount of legal research. I've read many appellate opinions across multiple practice areas. I've published several articles sharing my findings:

Each of these required reading dozens of appellate decisions, grouping them by issue, extracting key holdings and quotes, and synthesizing them into something useful for practitioners. The 90/180-day article alone covers six decisions across three appellate departments.

How does a software engineer do legal research? By treating it as a data problem, of course.

Court Opinions Are Data

Legal research, at its core, is finding primary sources, reading them, and organizing what they say. That is a data pipeline. The inputs are court opinions. The transformations are filtering, grouping, and extracting. The output is structured analysis.

The traditional way to do this is Westlaw: type in a query, read cases one at a time, follow citations, refine your search, repeat. This works well when you need to find one case that supports one argument. It does not work well when you want to answer a systematic question like "across all four appellate departments, what are the most common reasons defense motions fail on the 90/180-day serious injury category?"

It is impossible to answer that question without finding recent appellate decisions that have to do with the 90/180 category in MSJs. The challenge then is finding those opinions.

The Pipeline

Here is the actual process I use to produce legal research. Every article linked above was built this way.

Step 1: Read Real Briefs

I start by reading actual motions filed on NYSCEF (New York's electronic filing system). Not case law. Not secondary sources. Real briefs that real attorneys filed in real cases.

For the 5102(d) serious injury research, I read through several summary judgment motions filed by different firms. After a few, patterns emerge: they all cite Insurance Law 5102(d), they almost all address the 90/180-day category, they cite many of the same cases, and they follow a similar structure.

AI is useful here. Feed it a few real briefs and ask it to surface patterns. This is exactly what language models are designed for, and it is easy to verify (command-F the brief and check). But the decision of which briefs to read and what patterns matter is mine.

Step 2: Write Search Queries

From those patterns, I write search queries for CourtListener, a free legal database with an excellent API. I'm usually looking for recent decisions from NY state courts, so I set the filters to include appellate divisions, the Appellate Term, Supreme Court, and I always include unpublished decisions since recent state court opinions often fall in that category.

This step is an art. I'll sometimes have AI help brainstorm query variations, but the actual query design is intentionally manual. I run the search, scroll through results in the CourtListener UI, click into opinions to confirm relevance. I'll copy full opinions into a fresh Claude or ChatGPT conversation along with one of the real briefs I found on NYSCEF and ask: "Is this opinion relevant to the legal issues in this brief?" What I'm trying to do is build a set of cases that are all relevant and filter out anything that is not. The less noise, the better everything works downstream.

If the results are not right, I adjust the filters, add terms, try different approaches. But notice: even though AI is helping me filter, I am in control the whole time. I am using AI as a tool to read faster, not having the AI decide what matters.

Step 3: Download the Dataset

Once I have a solid search, I download the opinions. I wrote a Python script that uses the CourtListener API to pull all the opinions from a search URL into a folder on my laptop. The opinions are HTML, which is an excellent format for language models to work with. The script tags each opinion with metadata: citation, court, date, docket number.

At this point I have a folder on my laptop with, say, 40 opinions. Each one is a verified, real court decision. This is my dataset.

If the AI also needs access to a statute or rule, I just go to an official source and copy-paste it into a text file. Insurance Law 5102(d) is published on the NY Senate website. The Federal Rules of Evidence are on Cornell Law. These are public documents. You do not need a Westlaw subscription to read the law itself.

Step 4: Group by Issue

Now I fire up Claude Code and point it at the folder. Time to organize.

For 5102(d) serious injury, there are many sub-issues: 90/180-day category, quantitative ROM thresholds, failure to address all claimed body parts, exacerbation of preexisting conditions, treatment gaps, and more. I want to group the opinions by which issues they substantively address. I should note that I did not know about all the categories at first. Claude helped me define and refine them.

There is a critical technical detail here. If you feed all 40 opinions to the AI at once, the quality degrades because it is using too much context. I have it work on 2-4 opinions at a time. It reads each one, identifies which issues it addresses, and adds it to a running grouping. I keep a checklist of opinions it has already read and loop through until everything is grouped.

This is the "data engineering" part. Batch size matters. Context management matters. It's a balancing act of not using too many at a time so that errors are minimized, but using enough so the process doesn't take forever.

Step 5: Synthesize

Once everything is grouped, I can write a real article or practitioner checklist. The opinions themselves tell you what needs to happen in winning briefs. A case where the defense lost because the IME report did not address the first 180 days tells you: your IME report must address the first 180 days.

During this step, I always have Claude re-read each opinion in full before writing about it (again, limiting to 2-4 at a time to keep the context clean). I have it pull exact quotes and include clickable links to CourtListener. The output is a draft article with every claim traceable to a specific opinion.

Step 6: Verify

I have Claude fact-check its own work. This sounds really dumb, but it works very well.

When writing the article, the AI is juggling a lot of context: dozens of opinions, the article structure, prior sections, the grouping analysis. The more context in play, the more likely errors become. From what I've seen, the quality cliff hits around 55% context usage. Past that point, hallucinations spike. I haven't run controlled experiments, but the pattern is reliable enough that I build my workflow around it. But if I start a fresh conversation and give the AI a single, narrow task ("here is one paragraph from my article and here is the full opinion it references; verify that the quote is accurate, the holding is correctly characterized, and the citation is right"), I have dramatically reduced the context. The AI is now doing mechanical comparison, not creative synthesis.

The things I'm fact-checking all have concrete right-or-wrong answers:

  1. Are all quotes verbatim? This is text comparison. Language models are very good at this.
  2. Did we accurately represent the holding? This is IRAC analysis on a single case. Also something language models handle well.
  3. Is the citation correct? Does our article use the same citation that CourtListener shows? We are not asking the AI to "find" the right citation. We are telling it what the real one is and asking if it matches.

After the AI fact-check, I read the article myself and verify quotes and holdings against the original opinions. I also cross-reference against the real briefs I started with. Since those briefs cite many of the same seminal cases, they serve as a sanity check on whether my research lines up with what practitioners actually rely on.

One thing this process does not cover is Shepardizing. Whether cited cases are still good law has to be verified outside this pipeline. That is a real limitation.

Why It Works: Bounded Questions on Curated Documents

The reason AI works well in this pipeline is that I never ask it to do open-ended legal reasoning. Every task is bounded:

  • "Is this opinion relevant to the issues in this brief?" (Yes/no on a specific document.)
  • "Which of these 12 sub-issues does this opinion address?" (Classification against a defined list.)
  • "Pull the key quote where the court states its holding on the 90/180-day category." (Extraction from a specific section.)
  • "Does this paragraph in my article accurately reflect this opinion?" (Comparison between two documents.)

These are all things language models are excellent at by design. None of them require "legal AI" or a fine-tuned model. I use Claude. I'm sure ChatGPT and Gemini work just as well. The model does not need to know the law. It needs to read carefully and answer specific questions about specific documents. That is exactly what language models were built to do.

The judgment lives with me. Which practice area to research. What search queries to write. Whether a case is actually relevant. How to frame the analysis for practitioners. Those decisions require understanding both the legal landscape and the audience. The AI never makes those calls.

I have never used Westlaw's or Lexis's AI tools. I do not know if they are good. But I suspect that when you ask them a question, the system tries to do everything I just described in a single pass: find cases, filter them, analyze them, and present a polished answer. That is convenient. But the attorney is not in the loop for any of the key decisions. Which cases did it find? Which did it miss? How did it decide what was relevant? The output looks authoritative, but you cannot see the pipeline that produced it.

When I research this way, I can see every step. I chose the search query. I reviewed the results. I verified relevance. I grouped the opinions. I wrote the synthesis. I fact-checked the output. At no point did the AI make a judgment call I could not inspect. I would even go as far to say that the search query at the beginning is the most important step, and one that I would not feel comfortable outsourcing to AI. EVERYTHING else depends on a good search at the beginning.

This approach requires programming. Most attorneys are not going to write Python scripts to download opinions from an API and batch them through Claude Code. But the point of this article is not to prescribe a workflow. It is to show that legal research, broken into its component parts, is a data engineering problem. And data engineering problems benefit from engineering.


About the Author

I'm Tommy Eberle, CEO and co-founder of DocketDrafter. Before starting DocketDrafter, I was a senior software engineer. If you want to discuss this research process or see how DocketDrafter applies the same principles to motion drafting, email me at tommy@docketdrafter.com.