AI vs. human judges. The US experiment surprised researchers

Researchers from the University of Chicago conducted a very interesting experiment on the 61st floor judges federal and two models AI. They ordered everyone play the role of a judge adjudicating in a hypothetical case involving a car accident with tragic consequences.
The rest of the article is below the video
Judges vs. AI. Surprising results of the American experiment
The task was seemingly simple. It was about choice of applicable law (choice-of-law). The judges and the AI had to decide which state's law – Kansas or Nebraska – should be applied when calculating compensation for pain and suffering. Due to the significant damage to the injured party's health, the compensation amounted to PLN 750,000. hole.
However, in Kansas there is an upper limit (cap) of compensation of PLN 250,000. dollars. Nebraska law has no limits compensation. Under Nebraska law, the defendant would be required to pay the full amount, and under Kansas law, it would be a maximum of $250,000. dollars. Even if the plaintiff's actual losses were assessed at a higher amount, she could not receive more than the limit.
The investigators presented the judges with various combinations of the location of the accident and the place of residence of the people involved, along with the necessary records and regulations, and recorded what they ruled.
AI formalism versus judges' realism
The latest generation of OpenAI and Google language models, i.e. GPT-5 i Gemini 3 Proin 100% of cases the answer was consistent with the letter of the law. Judges they strictly adhered to the letter of the law only 52 percent of the time. matters
Why such differences? For GPT-5 or Gemini, the victim's tragic injuries and loss of huge amounts of money were completely indifferent. The algorithm imposed a compensation limit of exactly 50%. cases where the letter of the law required it. He was a perfect formalist who did not care about nuances.
The judges, seeing the drastic description of the injured person's injuries, they considered the reduction unfair her compensation for PLN 500,000. hole. Consciously or subconsciously, they acted to vindicate Nebraska law. Judges behaved like “realists”, taking into account extra-legal factors (social goals, the victim's sense of injustice), instead of being pure “formalists” sticking to dry regulations.
The effect of the experiment in Poland would be different
Legal advisor Tomasz Zalewski, founder and president of the LegalTech Polska Foundation, emphasizes that law does not consist of only rules. — In addition to “if, then” norms, there are general clauses that serve to correct the result of applying the rules. The article shows that judges deviate from the rules for precisely this purpose – to correct a result that they believe would be unfair. It's hard to call it a “mistake” – assesses. In his opinion, vthe result of the experiment in Poland would be different. — I suspect there would be fewer situations described in the article as “mistakes” by the referees – he adds.
The expert also points out that in fact we only have a description of the experiment, which does not contain a full analysis of the possibilities of using AI in the judiciary.
— For example, there is no analysis of the impact of the use of AI on the behavior of judges, who may stop analyzing cases on their own, which would be bad from a systemic point of view. There is also no analysis of how differences in the actual situation could affect AI results, as well as the problem that the operation of models may change after they are updated – emphasizes Tomasz Zalewski.
Read also: Is China quietly winning the artificial intelligence race?
Can AI replace judges?
Tomasz Zalewski thinks that the experiment is an interesting illustration of both the potential and limitations of AI, but it does not prove that AI is suitable for replacing judges. In Poland, this will not happen quickly for one more reason. The constitution does not allow this.
— Article 45 of the Constitution guarantees the right to have the case heard by an independent and impartial court, and Art. 6 of the European Charter of Human Rights requires an independent and impartial court established by law, which in the current legal situation means only a court with human participation – explains Włodzimierz Chróścik, president of the National Council of Legal Advisors. —
Tomasz Zalewski also has no doubt that in order to remove people from adjudicating, the constitution must be changed.
However, both experts believe that there is a clear need for a “referee-algorithm” from a “referee's assistant”.
— We want doctors to make decisions about our health, a pilot to sit in the cockpit of an airplane, and designs to be created by engineers and architects who use technology but do not take responsibility for the final choice. We apply the same logic to law – technology can help, but the final judgment should always rest with humans. Experience shows that artificial intelligence is a support, and responsibility for the strategy and consequences of decisions always rests with the representative – says attorney Chróścik.
Although in his opinion Pilots in registration matters or writ proceedings may also be a reasonable steph, where the facts are simple, repeatable and largely objective.
The authors of the experiment reached similar conclusions. They indicate that the law is sometimes specifically constructed in such a way as to leave room for maneuver and interpretation. In the hands of an experienced judge, such flexibility in the rules can be used to render a verdict that is not necessarily fully consistent with the letter of the law, but rather with its broad spirit, which AI simply does not understand. This is another thing that needs to be remembered before you start replacing “protein” judges with “silicon” judges on a massive scale.
AI can speed up adjudication
However, both experts have no doubt that AI can make adjudication easier and faster.
— If AI copes well with rules (regulations), it could support judges at the stage of preparing proceedings, formal checking, and generating orders, says Zalewski.
Włodzimierz Chróścik sees the role of AI in courts similarly. — AI can be used as a tool to support the judge's work and could relieve him of the burden of analyzing filessearching for case law, preparing draft justifications and verifying letters. These are time-consuming and repetitive tasks – perfect for the algorithm, which gives the opportunity to speed up the issuance of judgments – says Chróścik.
The American experiment shows that the latest generations of large language models can already be a helpful tool in courts.
AI is very good at “connecting the dots”, i.e. finding common elements and relationships. If you give it high-quality data on the principles to follow and information about previous convictions, it will effectively stick to the letter of the law and precedent because it is in its “nature.” This could be a way to reduce or even completely eliminate the known weaknesses of human judges, whose decisions are influenced by various undesirable factors, such as prejudice, fatigue, or simple hunger.
The theme of blind justice has accompanied civilization since the dawn of time.
|
Michal Kalasek / Shutterstock
Beware of blind law
However, Włodzimierz Chróścik notes that the European AI Act classifies AI systems in the justice system as high risk (Annex III), which imposes obligations on their suppliers and users in terms of transparency, human supervision and risk management – adds Chróścik.
It reserves that The condition for implementing technology is the precise limits of its use, because excessive trust in systems carries three key threats.
— First, the automation of biases contained in training data, which may lead to systemic discrimination. Secondly – the so-called automation bias, i.e. the risk that judges will begin to uncritically rely on the algorithm's suggestions, giving up their own in-depth analysis of the files. Thirdly, the loss of legitimacy of the justice system in the eyes of citizens; The parties to the proceedings must feel that their case has been heard and considered by a human being, adds Chróścik.
Everyone would like the courts to be efficient and impartial, following the old principle that justice is blind. In this context, a perfect, formalistic algorithm that does not get tired, always takes into account all regulations related to a given case and issues consistent judgments, sounds like a dream come true for many people.
However, the authors of the American experiment point out that you have to be careful what you ask for because you may get it. “As LLM models evolve, the direction of change is clear: unerring adherence to formalism rather than human, sometimes clumsy, discretion that softens the sharp edges of the law. Does this mean that LLM models are becoming better or worse than human judges?” – we read at the end of the report.





