Hallucinations, Pt 1: I'm Not Worried About the Chat Bots

I may be worried about the lawyers

Jan 14, 2025

I'm kicking off with a series of posts on hallucinations, a topic I frankly find tiresome as it consumes the plurality, if not the majority, of oxygen in the legal AI discourse. But the problem is not a trivial one and I’ve found that writing about it can be a useful segue into broader discussions about technology, information, and its impact on legal practice.

We'll start with Mata v. Avianca, the case that gave us the first story about a lawyer submitting a fake citation using ChatGPT. Although it broke in spring 2023, it's still the case that comes up when you search for news about lawyers making up cases:

An image of google search results for "chatgpt lawyer sanction." The first five results are all for the Mata v. Avianca case being discussed.

The case feels, in retrospect, like the tipping point when the profession realized it needed to account for AI: I remember a partner in my firm at the time sending the NYT article around in a staff-wide email; many courts issued rules and orders to either prohibit or limit the use of "generative artificial intelligence." Personally—despite the email—I don't believe I read so much as one news article about the case itself, but I did read and hear plenty of *discussion* about it. So, to make up for lost time, I decided to read the actual filings—from the complaint on down—in detail.

As Legaltech Hub noted in its 2024 wrapup, mentioning the impact of Mata:

When the first case involving an attorney submitting fake case citations generated by ChatGPT to a court hit the news in May 2023, it rocked the world—and not just the legal world, as the story was first widely reported by The New York Times. Many were quick to blame the tech, but eventually sentiment swayed to realizing this was a case of human error, not a tech failure. The incident was shocking since it flew in the face of all attorney professional and ethical obligations.

I agree that rather than blaming the technology, we should focus more on the humans. But the notion of "human error" is squishy and deserves to be unpacked. Let’s frame it more directly, and provocatively: Is the attorney sanctioned in Mata a bad lawyer?

“A highly sophisticated search engine”

Mata is a run-of-the-mill personal injury case brought by Steven A. Schwartz, who seems like a run-of-the-mill New York personal injury and worker's compensation plaintiff's attorney. In the case he's now infamous for, Mr. Schwartz, on behalf of Roberto Mata, sued Avianca (a Colombian airline) after a metal serving cart rolled down the aisle and struck Mr. Mata's knee in August 2019 on a flight from El Salvador to New York City.

In May 2020—before any lawsuit was filed—Avianca entered Chapter 11 bankruptcy proceedings in the United States. Under U.S. bankruptcy law, when an individual or corporate entity enters bankruptcy proceedings, the court issues a stay on all claims, and any new lawsuit filed outside the proceedings is void. Bankruptcy works out poorly for plaintiffs with claims against the bankrupt person because, in general, tort creditors (i.e., lawsuit plaintiffs) are last in line to receive any assets from the bankrupt entity. Since the point of bankruptcy is to distribute the assets of a party with more liabilities than assets, there is either no or little money leftover.

So, despite the stay being in place—or because his lawyer was unaware of the bankruptcy—Mr. Mata filed a complaint on July 7, 2020, and it was qucikly dismissed as void because of the bankruptcy. Likely for the reasons given above, Mr. Schwartz didn't file a complaint into the bankruptcy proceeding, and instead waited Avianca out. Avianca exited bankruptcy in December 2021, more than two years after Mr. Mata was injured. With the bankruptcy over, Mr. Mata filed his complaint again in February 2022. He filed in New York state court, which has a three year statute of limitations for personal injury claims.

Upon filing, Avianca "removed" the case to federal court for the Southern District of New York. Under federal law, defendants to state court lawsuits may remove a case to federal court if the plaintiff could have filed the lawsuit in federal court (in legal terms: the federal court had "jurisdiction") in the first instance. Avianca removed because the case raised "federal question" jurisdiction, wherein federal courts have the power to hear cases dealing with issues of federal law. The specific federal question raised by Avianca was the Montreal Convention, an international treaty that provides uniform rules for awarding damages to injury victims on international flights.

Crucially, the Montreal Convention has a two-year statute of limitations. Thus, after removing the case, Avianca moved to dismiss it, arguing that Mr. Mata's lawsuit was filed outside the statute of limitations. In its brief, Avianca anticipated the obvious follow-up questions, arguing that i) the filing of the previous lawsuit did not matter, and ii) the bankruptcy stay did not pause (in legal terms: "toll") the statue of limitations. To save Mr. Mata's lawsuit, Mr. Schwartz needed to get up to speed on federal bankruptcy law and the Montreal Convention.

Avianca filed its motion to dismiss in January 2023, roughly two months after ChatGPT went viral. In the sanctions filings, we learn that Mr. Schwartz's firm did not pay for access to Westlaw or Lexis legal research tools, which are widely considered the "best" research tools but are also cost-prohibitive for smaller firms. The firm instead had a subscription to a platform called Fastcase, and used it to primarily research New York state law. In fact, according to testimony from the firm's principal, access to federal case law had been inadvertently deactivated "for years" because of a billing error, which was only discovered after the ChatGPT incident.

Lacking another route, he turned to Google. Only then did he ask ChatGPT.

THE COURT: Now, did you prepare the memorandum of March 1, 2023?
MR. SCHWARTZ: Yes.
THE COURT: How did you go about finding the cases that you cited in your memoranda?
MR. SCHWARTZ: First, I went to Fastcase, which is the research tool that our office subscribes to. It did not have access to federal cases that I needed to find, so I began to attempt to try to find another source to find the cases. I tried Google. Again, I didn't have access to Westlaw or Lexis. And it had occurred to me that I heard about this new site which I assumed -- I falsely assumed was like a super search engine called ChatGPT, and that's what I used.
THE COURT: What did ChatGPT produce for you, sir?
MR. SCHWARTZ: First, I asked it questions about the topic that I was researching, in this case the Montreal Convention and the issue of statute of limitations, and then I asked it to provide case law, and it did.
THE COURT: Case law under the Montreal Convention or case law supporting the position you wanted to take in opposition?
MR. SCHWARTZ: Case law supporting the position I wanted to take in opposition, which was, how does the statute of limitations -- what is the statute of limitations according to the Montreal Convention? And then ultimately the issue of whether or not a bankruptcy tolls that statute of limitations.
THE COURT: So you were not asking ChatGPT for an object; you were asking them to produce cases that support the proposition you wanted to argue, right?
MR. SCHWARTZ: Right. First, I asked it for an analysis, and then I asked it for the cases.
THE COURT: What did it say when it gave you an analysis?
MR. SCHWARTZ: Well, it told me what the Montreal Convention stood for. It told me that in many cases the bankruptcy can toll the statute of limitations, what the statute of limitations was. And then I asked it for case law to support what its analysis -- what the analysis that was given to me was.
THE COURT: Did it say to you that a bankruptcy stay can toll the statute of limitations under the Montreal Convention?
MR. SCHWARTZ: Yes.

In his declaration submitted before the hearing, Mr. Schwartz wrote:

I had never used ChatGPT for any professional purpose before this case. I was familiar with the program from my college-aged children as well as the articles I had read about the potential benefits of artificial intelligence (AI) technology for the legal and business sectors.
At the time I used ChatGPT for this case, I understood that it worked essentially like a highly sophisticated search engine where users could enter search queries and ChatGPT would provide answers in natural language based on publicly available information.
I realize now that my understanding of how ChatGPT worked was wrong. Had I understood what ChatGPT is or how it actually worked, I would have never used it to perform legal research.

Ultimately, the court fined Mr. Schwartz and his firm $5,000. Contrary to the grave tone in which this case was received, the whole case reads to me like a farce: An international corporation uses an obscure international treaty and arcane bankruptcy proceedings to avoid paying a dollar on a very straightforward claim; meanwhile, the hapless passenger is represented by overmatched attorneys who do not even have the basic research tools needed to write a federal court brief.

A bad lawyer, or just a poor lawyer?

The details of this case suggest we should not be worried about a scourge of hallucinated citations. Even from the written transcripts, the reader can tell that Mr. Schwartz is contrite and embarrassed; he clearly understands that he misunderstood the purpose of ChatGPT and what it was capable of.

What is both more troubling and more interesting is that Mr. Schwartz apparently could not find the cases he needed—mired in an obscure world of bankruptcy and international aviation treaties, he tried using Fastcase and Google and came up short. It's possible that he was embellishing for sympathy, but the fact that the firm's principal testified that—for years—the firm thought they were paying for federal case law access there may have been an access issue (and probably that the firm needs a good office manager).

Does this make Mr. Schwartz a bad lawyer? Attorney ethicists would note that—when presented with this novel situation—Mr. Schwartz could (and probably should) have brought in co-counsel who were prepared to handle the matter. At the very least, he should have notified his firm's principal that he needed access to this body of case law. Although we don't know exactly what he did or did not do, we can likely agree that Mr. Schwartz was not diligent in his research.

On the other hand, Mr. Schwartz faced a structural disadvantage in terms of information access. While his firm was overpaying Fastcase, his opponent hired an attorney who specializes in aviation litigation at a firm "best known for [its] expertise in aviation and aerospace." The firm not only knows what the Montreal Convention even is—I had not heard of until I read about this case—they probably have written an almost identical version of this statute-of-limitations argument before. And, they probably use Westlaw. So even if he was not completely diligent, the deck was stacked against Mr. Schwartz and his client.

Thus, rather than deciding whether Mr. Schwartz did a "good" or "bad" job we might describe his efforts as such:

A table showing performance assessment criteria across three dimensions: 'How did he do?', 'Did he look? (Diligence)', and 'Could he find it? (Information Access)'. The table has four rows of performance levels: Hapless (No, No), Incompetent (No, Yes), Under-resourced (Yes, No), and Competent (Yes, Yes). The data is displayed in a dark-themed table with alternating row shading.

In practice, of course, this is more nuanced. For one, there is selection bias. Well-resourced firms will hire and retain diligent attorneys, who thus appear more competent because their firms are well-resourced. The issues also may compound. If a plaintiff's lawyer, through experience, knows that they are unlikely to find a case, they will learn not to look.

But this provides us a helpful way of thinking about "hallucination risk"—who are the attorneys that even ask ChatGPT to give them a case in the first place? It's probably those who have the desire to be, or at least appear, diligent but struggle to access information. Competent lawyers are competent, in part, because don't need to go to ChatGPT for information.

What’s In the Plaintiff’s lawyers’ water?

Looking for a way through this question, I found a 2013 law review article, Bad Briefs, Bad Law, Bad Markets: Documenting the Poor Quality of Plaintiffs' Briefs, Its Impact on the Law, and the Market Failure It Reflects by Scott A. Moss, a law professor at the University of Colorado.

Moss analyzes 102 briefing pairs from the Northern District of Illinois (i.e. Chicago) and the Southern District of New York (i.e., Manhattan) where a fired employment discrimination plaintiff writes a brief responding to the defendant's motion for summary judgment. In every pair, the defendant-employer raises the “same-actor” defense, wherein it argues that it did not “discriminate” because the same person both hired and fired the employee.1

Moss culls “same-actor” briefs because, at the time, the relevant federal appellate courts in these two districts (the Seventh and Second Circuits) had "intra-circuit" splits on whether the same-actor defense is even viable. In other words, while the defendant could cite "precedent" in its favor to argue that it should win because of this defense, the plaintiff could just as easily cite "precedent" to argue the defense was not viable; if both parties are doing their jobs, the trial court would have a hard decision.

Thus, Moss argues there is no excuse for not citing these cases.

In sum, the premise of this study is this: when a defendant presses the same-actor defense in a circuit with split authority, there is no excuse for a plaintiff failing to rebut with caselaw rejecting and criticizing the defense. Saying “no excuse” is blunt but, per scholarly and judicial authorities on briefings, fair.

Put differently, Moss has neutralized the "information access" column in my table above. Now we can just decide whether these lawyers are competent or incompetent.

A simplified table showing lawyer assessment criteria with two rows. The table has three columns: 'What kind of lawyer are they?', 'Did they look? (Diligence)', and 'Could they find it? (Information Access)'. The two categories shown are Competent (Yes, Yes) and Incompetent (No, Yes). The table uses a dark theme with alternating row shading.

Moss then (1) reviews all defense briefs with certain search terms, (2) reviews the plaintiffs’ opposing briefs for citations to the available contrary caselaw, and (3) reviews the court decision for its outcome and any holding on the same-actor defense.

Disturbingly, he finds that only 27% of plaintiff's lawyers (28/102) actually cite the cases to support their position when the same-actor defense is raised. Tragically for the other 73% of plaintiffs, he finds that citing these cases is quite effective: Of the lawyers who do cite the good cases, they win (meaning that the court does not dismiss their case) roughly half the time. When they don't, they only win 14% of the time.

What to make of this? As the title of his paper suggests, Moss seeks answers in markets, and proposes a "darker" explanation for this phenomenon--that the economics of a contingency fee practice (which would apply to personal injury cases as well) gives lawyers little incentive to perform additional research to win a case on briefing. That's interesting and I would like to return to that, eventually.

However, he only briefly—and dismissively—addresses the information access question.

Failing to find and cite available caselaw has become especially inexcusable with online research costs having substantially decreased. From the late 1990s to present, basic Lexis or Westlaw for a solo practitioner has cost as little as $100-$175 a month. Caselaw is also free online, if in less easily searchable form.

We'll pick up there next time.

For example, if a plaintiff argues he was fired because he was Black, the employer would point to the fact that the same person hired the employee and the fact that it was the same person would be a defense to a discrimination claim, because of the inference that the employer’s agent is not racist if they hired the Black employee to begin with. Moss correctly notes that the logic is dubious and—as the existence of the intra-circuit split suggests—not a defense all courts are willing to credit.

make law

Discussion about this post