A confusing thing about our current "AI" moment is that when ChatGPT was released we collectively switched from using "AI" as a shorthand for "feeding in a bunch of data to predict outcomes" to talking about magic word and image-generating machines. We now often call the former "machine learning" while the word-machines are "language learning models." Although many people have a bee in their bonnet about the exact definition of "AI," I don't mind this. The definition is always in flux; in the 1950s we were building "anticipatory tortoises" that learned from their environment and called it "cybernetics."
I'm recalling the distinction here because I'm going to put a stake in the ground and loudly proclaim my pessimism about AI, but only insofar as we use it to predict legal outcomes. That critique doesn't necessarily extend to other parts of AI, including the language models, so for purposes of the argument we'll call this "predictive AI."1
Why stake a position at all? As part of my ongoing research into legal ethics issues, I read a 2019 article, called The Model Rules of Autonomous Conduct: Ethical Responsibilities of Lawyers and Artificial Intelligence. Consistent with the era, it does not discuss LLMs. It does discuss predictive AI, suggesting that these tools could be used to "choose the most favorable forum in which to file suit, assess whether to pursue particular claims in front of certain judges, assess the settlement strategy of an opposing party, and much more." The article takes a remarkably bullish view, suggesting that lawyers who do not use predictive AI technology will soon fall below "the standard of competent practitioners," in violation of professional ethics rules.
Although I'd never been that optimistic, I've previously had guarded optimism that these kinds of predictive AI tools would be useful once the data was "good enough." But with time and more reading, I've come to believe the premise of predictive AI is faulty at its core.
To provide an example, I'll talk about something I've studied previously, tenant screening algorithms. In the U.S., landlords typically run background checks on prospective tenants that look at credit history, as well as court records. The companies that sell background checks to landlords were marketing products that purported to predict whether a prospective renter would be a good or bad tenant. (Literally: For a fee, a landlord can purchase a report that shows a green checkmark or a red X, or give the tenant a "risk score" similar to a credit score.)
At the time I researched these algorithms, I was focused on whether these tools were legally biased—i.e., whether their use in this context violated the Fair Housing Act's disparate impact doctrine. The answer to the narrow legal question is "probably," while the answer to the question of whether they exhibit some kind of bias is "almost certainly."
The reason we can be fairly certain about bias is because of what's colloquially known as the "garbage in, garbage out" problem. At their core, all machine learning models—including LLMs—are created by taking a set of data and "training" a model over that data. Which is to say, the researcher uses an algorithm that produces an estimate of the probability that a given outcome occurs when presented with certain characteristics in the dataset. The trained model can then, in theory, be applied in the real world to predict the outcome of a new event when presented with similar data.
In the case of our tenant screening algorithms, the models are proprietary and we don't actually know what they consist of. But since the companies creating them are background check companies that sell information about court records and private credit information to landlords, we can assume that their dataset consists largely of court records and private credit information. To predict whether someone is likely to be evicted, the company presumably takes data about who has been evicted in the past (from court records) and then uses an algorithm to see which features (prior credit history, past evictions, criminal records, etc.) are associated with that outcome.
Even ignoring the racially disparate nature of eviction, an astute observer can notice any number of assumptions about reality that are being built into this model. We might note, for example, that:
Data besides court and credit records (which the company happens to own) are not being used to predict the outcome.
The data is not collected in a uniform or consistent manner. "Court data" is not one thing but information from thousands of different courts that apply different laws and collect data in different formats. Likewise, "credit" information can encompass information from courts (debt collection lawsuits, bankruptcy), utility companies, credit card companies, etc., which all collect different kinds of information relevant to their business.
Any individual data point may not reflect the "reality" of a situation. For example, two people may lose their house because they are behind on their rent by three months; one settles with their landlord before going to court, while the other fights it and loses at trial. The circumstances leading to the eviction are the same but the outcome appears differently in the data.
This is a long way of saying something should be very obvious, which is that "data" is a highly attenuated representation of reality, and reality is very complicated.
To use another example from housing, I've been thinking recently about the idea of "market value" in real estate. (As a side project, I've been learning about land value taxes and how property tax assessors model the value of land and buildings.) In Illinois (and most other states/jurisdictions), the assessor has the legal duty to figure out the "market value" of a property for purposes of taxing it.
This seems simple enough, but actually calculating this value for every single property on a fixed cycle is horribly complicated in practice. Contrary to my first intuition—and probably that of many others—assessors should not just look at recent sales in the market and extrapolate from that number. This is called "sales chasing" in the biz, and it doesn't really work if your goal is to create a fair, uniform, annual assessment. Thus, even if you just bought a house, the value of that purchase is not the "correct" number for purposes of assessing the market value.
Sounds weird! But consider that the final value of any individual property purchase is a totally idiosyncratic process. Even in a relatively simple residential transaction, sellers are motivated to sell houses for completely different reasons; a seller who is moving across the country with young children is probably more motivated to "get rid of" the house at whatever price, while a retired couple that owns the house free and clear but is downsizing may negotiate more sharply. If the owner is an investor or a bank, then the motivations are obviously much different. Buyers, of course, try to figure all this out and price their offers accordingly and based on their perceptions of what other buyers will do, their resources, and their own motivation to purchase the house.
The assessor uses data to average out all those variables in an effort to perform the social function of collecting revenue for the state, but in a way that doesn’t lead to a citizens’ revolt. The "prediction" only makes sense in the context of that revenue-collecting function, and it's not a good prediction of what your house will sell for.
For that purpose, real estate websites such as Zillow have the "Zestimate," which is sort-of good at predicting the sales price of listed homes, but not very good at predicting the sale price of off-market homes. But with a listed home, the "machine" has the advantage of looking at the property listing. That listing is an estimate of what someone who buys and sells homes for a living believes is a good number to get people to bid on the house. Thus,, the Zestimate is most "useful" when it serves as a lagging indicator of real-time human evaluations about the market value of the house. Which means that if you are buying or selling a house and want to make an informed offer, you’re better off talking to a human professional who pays attention to the market for a living.
I use this example because I recently heard an interview with the founder of a legal predictive AI company who says that they are trying to create a "Zestimate" for lawyers that will predict the settlement value of a case. Because settlement values are generally confidential, they are apparently using synthetic data to train the model and make it more accurate.
Just based on my own personal experience negotiating settlements through a formal mediation process, the final settlement value is determined through an even more idiosyncratic process than the ordinary residential real estate sale. Usually, each party comes in with a back-of-the-napkin estimate that represents a reasonable but aggressive statement of their position, and those numbers are shared through offer and counter-offer. Then the parties use a few rounds of "negotiation" to establish a hypothetical midpoint and either i) agree that the midpoint is reasonable, ii) engage in some gamesmanship to move one party towards their side, or iii) walk away.
I can see the market for these products in industries such as litigation finance, where they making investment decisions (re: bets) is a benefit to having data about average outcomes, like the assessor. But I doubt that it's possible to make a "good-enough" general-purpose predictive AI that will supplant human lawyer predictions about individual case values, and I seriously doubt that a time will come when the failure to do so will amount to an ethics violation.
Credit for this distinction comes from a recently-published book, AI Snake Oil, which provides much of the inspiration for this post. I don’t discuss the book and underlying research in detail here but may return to it at some point once I’ve read the literature more thoroughly.