AI Outperforms Law Professors in Legal Reasoning Study

A Stanford study finds that law professors preferred AI-generated legal responses over those from their peers 75% of the time, indicating AI's growing role in legal reasoning.

Summary

Approximately 75% of law professors favored AI-generated responses over those from their peers.
AI outputs were identified as harmful less frequently than those authored by professors.
Researchers concluded that large language models can meet professional standards.

A study from Stanford University indicates that law professors often prefer responses generated by artificial intelligence to those produced by their colleagues, shedding light on the performance of large language models in legal reasoning tasks.

The research involved 16 professors from 14 law schools across the U.S., including prestigious institutions like Stanford, Yale, and NYU. They developed 40 contract law questions that spanned various topics such as legal doctrine and case law, providing an effective framework to assess the capabilities of contemporary AI.

The authors noted, “Large language models (LLMs) are increasingly promoted as educational tutors, yet most evaluations focus on domains with a single ground truth. Many disciplines, however, hinge on judgment: reasoning, weighing ambiguity, and reaching defensible conclusions. Law provides a sharp test.”

In a total of 2,918 blinded evaluations, professors chose the AI-generated answers over those from human instructors 75.92% of the time for Google's Gemini 2.5 Pro and 74.75% for NotebookLM, indicating a significant preference for AI outputs in roughly three-quarters of cases.

To assess whether these outcomes reflected a broader consensus among professionals, the researchers analyzed the agreement levels when professors rated the same answer pairs. They found that “observed agreement exceeded the level expected if judgments were entirely idiosyncratic, indicating that the LLMs’ success reflects alignment with common disciplinary criteria.”

AI models were also found to surpass human instructors in various categories, including recall questions related to case law, hypotheticals, and policy discussions.

The study also sought to determine if the AI's perceived advantage stemmed from superficial writing style rather than substantive content. Researchers developed a range of lexico-syntactic features—such as answer length and clarity—and evaluated how much these factors influenced preference patterns.

Interestingly, AI-generated answers were flagged as harmful at a lower rate, with Gemini recording a 3.41% harmfulness rate and NotebookLM at 3.64%, compared to 12.06% for human-written responses. In further analyses, Anthropic’s Claude Opus 4.7 topped the rankings, followed by OpenAI’s ChatGPT 5.4 and Gemini 2.5 Pro, with all AI models outperforming human instructors on average.

The researchers did caution that their study did not assess whether the responses aligned with each professor's specific teaching style, which suggests AI-generated answers may be deemed generally acceptable rather than specifically tailored to individual preferences. They stated, “While LLM responses are generally preferred over those of human instructors, our evaluation setting does not allow us to directly measure the extent to which instructor preferences are satisfied.”

This study emerges at a time when the legal profession is increasingly exploring the integration of AI tools. For instance, the Los Angeles Superior Court began testing AI tools to assist judges with growing workloads, and law schools are incorporating AI training into their curricula programs.

John P. Anderson, Dean of Mississippi College School of Law, previously remarked to Decrypt, “The potential benefits of these new technologies as a force multiplier in the practice of law just can’t be ignored. Whether our students plan to be litigators or transactional attorneys, their future employers will expect familiarity with these AI tools.”

Despite the advantages, law firms still face challenges due to inaccuracies associated with AI, such as hallucinations. For example, in April, Sullivan & Cromwell, a prominent law firm, admitted that a recent filing contained fictitious citations generated by AI.

Daily Debrief Newsletter

Stay informed every day with the latest news stories, along with original features, podcasts, videos, and more.

Study Reveals AI Outperforms Law Professors in Legal Reasoning

Summary

Daily Debrief Newsletter