Jonathan H. Choi

Print
Article
Volume 91.1
Measuring Clarity in Legal Text
Jonathan H. Choi
Professor of Law, University of Southern California Gould School of Law.

Thanks to Aaron-Andrew Bruhl, Bill Eskridge, Abbe Gluck, Lilai Guo, Kristin Hickman, Claire Hill, Dongyeop Kang, Michael Livermore, Stephen Mouritsen, Julian Nyarko, Arden Rowell, Brian Slocum, Larry Solum, Jed Stiglitz, and the participants in the Harvard/Stanford/Yale Junior Faculty Forum, the Junior Faculty Forum for Law and STEM, the Cornell Law School Faculty Workshop, the University of Virginia School of Law Faculty Workshop, the University of Minnesota Faculty Squaretable, the University of Minnesota Public Law Workshop, the Online Workshop for the Computational Analysis of Law, the Singapore Management University Conference on Computational Legal Studies, the Conference on Empirical Legal Studies, the University of Illinois College of Law Faculty Workshop, the Max Planck Institute Law and Economics Seminar, the American Law and Economics Association Annual Meeting, the Association of American Law Schools Annual Meeting, and the Georgetown Legislation Roundtable, for their helpful comments. Thanks to David Lamb, Jay Kim, and Chad Nowlan for outstanding research assistance. Thanks to the outstanding editors at the University of Chicago Law Review for their careful work.

Legal cases often turn on judgments of textual clarity: when the text is unclear, judges allow extrinsic evidence in contract disputes, consult legislative history in statutory interpretation, and more. Despite this, almost no empirical work considers the nature or prevalence of legal clarity. Scholars and judges who study real-world documents to inform the interpretation of legal text primarily treat unclear text as a research problem to be solved with more data rather than a fundamental feature of language. This Article makes both theoretical and empirical contributions to the legal concept of textual clarity. It first advances a theory of clarity that distinguishes between information and determinacy. A judge might find text unclear because she personally lacks sufficient information to decide which interpretation is best; alternatively, she might find it unclear because the text itself is fundamentally indeterminate. Fundamental linguistic indeterminacy explains ongoing interpretive debates and limits the potential for text-focused methods (including corpus linguistics) to decide cases. With this theoretical background, the Article then proposes a new method to algorithmically evaluate textual clarity. Applying techniques from natural language processing and artificial intelligence that measure the semantic similarity between words, we can shed valuable new light on questions of legal interpretation. This Article finds that text is frequently indeterminate in real-world legal cases. Moreover, estimates of similarity vary substantially from corpus to corpus, even for large and reputable corpora. This suggests that word use is highly corpus-specific and that meaning can vary even between general-purpose corpora that theoretically capture ordinary meaning. These empirical findings have important implications for ongoing doctrinal debates, suggesting that text is less clear and objective than many textualists believe. Ultimately, the Article offers new insights both to theorists considering the role of legal text and to empiricists seeking to understand how text is used in the real world.