Article

Algorithmic Fair Use

Dan L. Burk

Chancellor’s Professor of Law, University of California, Irvine; 2017–2018 US-UK Fulbright Cybersecurity Scholar.

My thanks to members of the Oxford Internet Institute’s Digital Ethics Lab, participants in the Cambridge Faculty of Law CIPIL Intellectual Property Seminar Series, participants in the session on “Data Commons, Privacy, and Law” at the ECREA Digital Culture and Communication Section Conference, as well as to Oren Bracha,Pamela Samuelson, and participants in the CyberProf listserv conversation on algorithmic fair use for helpful discussion in preparation of this Essay. Portions of this research were made possible by support from the US-UK Fulbright Commission.

Legal governance and regulation are becoming increasingly reliant on data collection and algorithmic data processing. In the area of copyright, online protection of digitized works is frequently mediated by algorithmic enforcement systems intended to purge illicit content and limit the liability of YouTube, Facebook, and other content platforms. But unauthorized content is not necessarily illicit content. Many unauthorized digital postings may claim legitimacy under statutory exceptions like the legal balancing standard known as fair use. Such exceptions exist to ameliorate the negative effects of copyright on public discourse, personal enrichment, and artistic creativity. Consequently, it may seem desirable to incorporate fair use metrics into copyright policing algorithms, both to protect against automated overdeterrence and to inform users of their compliance with copyright law. In this Essay, I examine the prospects for algorithmic mediation of copyright exceptions, warning that the design values embedded in algorithms will inevitably become embedded in public behavior and consciousness. Thus, algorithmic fair use carries with it the very real possibility of habituating new media participants to its own biases and so progressively altering the fair use standard it attempts to embody.

TABLE OF CONTENTS

Introduction

Law, like other human artifacts, is costly to produce, to distribute, and to apply. Like other human artifacts, the marginal cost of law benefits from economies of scale; standardized, one-size-fits-all regulations can be economically produced and promulgated, with perhaps, like a made-to-measure suit, a bit of tailoring at the end of the supply chain by a court or other arbiter. But even moderate judicial tailoring adds enormously to the cost of applied law, and rare instances of bespoke regulation are even more socially costly.1

This Symposium examines the proposition that technological advances might dramatically lower the cost of bespoke regulation. The potential for such “personalized law” is dependent on the development of ubiquitous data collection and algorithmic data processing coupled with dramatically lower costs in real-time communication.2 Applications of these technologies have emerged in numerous areas, including criminal law, immigration, taxation, and contract.3 In the area of copyright, protection of digitized works is already increasingly mediated by algorithmic enforcement systems that are intended to effectuate the rights of copyright owners while simultaneously limiting the liability of content intermediaries. On YouTube, Google, and many other online platforms, both internet service providers (ISPs) and copyright owners have deployed detection and removal algorithms that are intended to purge illicit content from their sites.4

But unauthorized content is not necessarily illicit content. In particular, many unauthorized digital postings may claim legal legitimacy under one or more exceptions to the rights of the copyright holder, most notably under the legal balancing standard known as fair use.5 Exceptions such as fair use exist to ameliorate the negative effects of exclusive control over expression on public discourse, personal enrichment, and artistic creativity. Consequently, it may seem desirable to incorporate context-specific fair use metrics into copyright-policing algorithms, both to protect against automated overdeterrence and to inform users of their compliance with copyright law.6 Fair use was intended to “personalize” copyright to individual contexts; hence the question arises whether old-style statutory personalization can be translated into data-driven, machine-mediated personalization.

In this Essay, I examine the prospects for personalized law, taking the outlook for algorithmic mediation of fair use as a vehicle. A large and growing literature on algorithmic regulation already warns us of the pitfalls inherent in reliance on such technology, including ersatz objectivity, diminished decisional transparency, and design biases.7 Drawing on this literature, I argue that automated implementation of legal standards is problematic as a practical and technical matter, and these limitations will inevitably serve to shape user expectations regarding the processes they govern. It seems clear that this effect is already occurring in conjunction with automated enforcement of copyright, as the design values embedded in automated systems become embedded in public behavior and consciousness. Thus, algorithmic fair use carries with it the very real possibility of habituating new media participants to its own biases and so progressively altering the fair use standard it attempts to embody. Critical analysis of algorithmic fair use offers a cautionary tale that should give us pause, not only regarding the development of such systems but also regarding the development of algorithmic law generally.

I. Copyright’s Fair Use Standard

Copyright allows authors to restrict reproduction, performance, and related uses of their original works as a pecuniary incentive.8 But copyright, like any property right, is never absolute. Jurisdictional copyright systems typically include some number of user privileges or exemptions—circumstances under which the statute will condone or authorize particular uses of a copyrighted work even if the copyright owner has not done so.9 These vary between jurisdictions but typically cluster around socially beneficial uses of the work, such as education, news reporting, scholarship, personal enrichment, or public commentary.10 Often known in British Commonwealth countries as “fair dealing” provisions, these exceptions to the authorization of the copyright holder entail a specific laundry list of discrete, statutorily defined circumstances under which a protected work can be used without permission.

In the United States, the Copyright Act11 also includes a number of such discrete statutory carve outs. For example, § 110 of the statute allows otherwise unauthorized performances of certain nondramatic works for classroom instruction, or for religious services, or for the benefit of blind or handicapped persons.12 Section 110 also permits uses that might or might not be judged socially beneficial but that, in any event, were judged by Congress for whatever reason to be statutorily permissible without the authorization of the copyright holder, such as the “performance of a nondramatic musical work by a governmental body or a nonprofit agricultural or horticultural organization, in the course of an annual agricultural or horticultural fair or exhibition conducted by such body or organization.”13

Additionally, the United States, together with a small handful of other nations, includes in its copyright limitations an indeterminate exception known as “fair use.”14 Codified into the current statute from common law precedent, fair use is not categorically or specifically defined but rather is decided based on adjudicatory assessment of four factors. Roughly speaking, a court determining whether an otherwise infringing use might be fair is to consider how much of the work was taken, what was done with it, what kind of work was subjected to the taking, and what effect the taking likely had on the market for the work.15 Determination as to whether unauthorized use of a copyright work falls under this provision varies from situation to situation depending on the contextual assessment of the four factors.

Copyright’s multifactor fair use balancing test thus presents a classic example of what has been dubbed a legal standard.16 Scholars have long divided legal imperatives into the categories of “rules” and “standards,” the former constituting discrete and defined legal requirements and the latter constituting malleable and fact-dependent directions. These have reciprocal virtues and vices. Rules are simple to understand and enforce but lack nuance and flexibility; standards are flexible and context-sensitive but lack clarity. Institutionally, rules tend to be promulgated ex ante by legislative enactment; standards tend to be determined ex post by courts or other adjudicatory fora. The major institutional costs for rules are typically incurred in development in advance of administration; the major institutional costs for standards are typically incurred during enforcement or administration.17

In an influential discussion of the topic, Professor Carol Rose noted that these are typically not distinct modes of imperative but lie on a continuum, and legal imperatives tend to process between the two.18 Because formal rules are too rigid to fairly accommodate unforeseen circumstances, they tend to accumulate exceptions until they begin to resemble standards. At the same time, because standards are expensive to administer, adjudicators begin to develop shortcuts or per se doctrines that are automatically applied when certain recurring circumstances arise, creating de facto rules. Thus, regulation incorporates some combination of ready-to-wear and bespoke regulation, reaping the cost savings from legal economies of scale while attempting to minimize the pinch or the gaps that result from one-size-fits all.

Fair use and similar standards represent attempts by the institutional legal system to personalize copyright usage by allowing a tribunal to take into account the individualized circumstances of the unauthorized use, after the fact, in rendering a decision on infringement. As with other standards-based legal doctrine, fair use carries with it the disadvantage of ex ante uncertainty; no one can be entirely certain in advance how a court will weigh the four factors, and hence there is always some apprehension that a use may be found infringing rather than fair. Risk averse content users, unable to confidently predict the ultimate decision on their activities, may forgo some socially beneficial uses. But at the same time, this strategy extends copyright exceptions to new or unforeseen scenarios that the legislature would have been unable to anticipate under a discrete “fair dealing” approach.

II. Algorithmic Copyright

Recent commentary has argued that the doctrinal deployment of rules and standards either has come to an end or will be drastically altered by imminent changes in technological cost structures.19 This change is expected to be driven by ubiquitous data collection and algorithmic data processing, coupled with dramatically lower costs of communication. The argument postulates a coming world of “microdirectives,” in which automated systems supply citizens with tailored directives, thus capturing both the ex ante advantages of rules and the ex post advantages of standards.20

Such speculations likely overstate any foreseeable capability of the relevant technology and certainly understate the role of other social agents in the deployment and implementation of algorithmic systems.21 Perhaps not surprisingly, this vision of personalized law largely replicates the neoclassical economist’s nirvana of zero transaction costs and perfect information by postulating a world in which data-processing and communication technologies realize the simplifying assumptions of the simplest economic models. As in much of the hypothetical discussion surrounding big data and artificial intelligence, this speculation partakes of the “magic[al]” worldview 22 of trending technology, which promises costless production without the disadvantageous investment of time and resources that technological activity inevitably entails.23

A more grounded framing for algorithmic fair use, then, is to ask whether old-style legal personalization can be translated into data-driven, machine-mediated personalization. Clearly technical practice is already trending in such a direction. Commentators such as Professor Matthew Sag have observed that such algorithmic agents are already commonly deployed to detect and effectively determine cases of digital copyright infringement.24 In some cases, such agents are deployed by copyright-intensive industries, such as the recorded music or motion picture industries, to trawl the internet for potentially unauthorized copies of their proprietary works in order to enforce their copyright.25 In other cases these search devices are deployed by intermediaries, such as YouTube, to remove or deter infringing copies so as to avoid contributory liability, meet their obligations as content hosts, and maintain an ostensible public image of vigilance against lawlessness.26

Sag argues that online algorithmic policing has already changed the nature of copyright enforcement and so effectively changed the nature of copyright infringement.27 Identification and removal of allegedly infringing content is automated, and human oversight or involvement in the removal process is infrequent and perfunctory. Algorithmic removal decisions are seldom challenged due to severe cost asymmetries. Automated identification and removal, whether accurate or mistaken, is relatively cheap, whereas legal and institutional engagement is comparatively expensive. Most removal decisions are effectively final, and all the parties involved—whether users, service providers, or content owners—have altered their legal expectations in light of these realities.28

Copyright enforcement algorithms typically make no provision for user privileges or exceptions, and the removal decision is effectively final before the dispute reaches any forum in which defenses such as fair use might be considered. Thus, far from greater personalization of the copyright notice and takedown procedure, the cost structure of algorithmic content policing has created a largely impersonal process, in which the context-specific factors that should be taken into account in fair use analysis are absent and go unconsidered. The question then arises whether automated copyright policing can and should incorporate determinations of fair use or other statutory exceptions.29

III. Automating Fair Use Analysis

Such questions are implicated in decisions like the opinion of the Ninth Circuit Court of Appeals in Lenz v Universal Music Corp.30 Lenz posted to the video platform YouTube a twenty-nine-second clip of her baby dancing to the Prince song “Let’s Go Crazy,” which is heard playing distantly in the background audio of the clip.31 The unauthorized use of the music was detected by the recording label, resulting in a demand under the Digital Millennium Copyright Act32 (DMCA) that it be removed from the platform and leading to a countersuit by Lenz over the propriety of the demand.33 A major legal question in the case was whether Universal had a “good faith belief” that the clip was infringing before demanding that it be taken down from the platform.34 Arguably, the formation of such a belief would require some consideration of fair use or other copyright exceptions because the use could not be infringing if excused by such exceptions.

The court held that consideration of fair use was required before demanding removal of online content.35 But clearly with the use of automated detection and removal algorithms in mind, the court continued: “We note, without passing judgment, that the implementation of computer algorithms appears to be a valid and good faith middle ground for processing a plethora of content while still meeting the DMCA’s requirements to somehow consider fair use.”36

Perhaps not surprisingly, the court later withdrew this particular passage of dicta from the published opinion. The record label’s copyright enforcement search and judgment in Lenz was done manually,37 and it is unclear whether fair use consideration can in fact be automated. In 2001, Professor Julie Cohen and I argued in the context of secured copyrighted content that, because fair use standards could not be programmed into technical protection systems, some type of human oversight or institutional infrastructure would be required to ensure continued access for such uses.38 Prominent computer scientists similarly expressed their deep skepticism that fair use could be programmed into a technical system.39 Much of this skepticism was based on the ability or inability to translate inchoate legal imperatives into executable computer code.40 These challenges would include not only the limitation of human programmers to define the parameters and characteristics of legal texts but also the inherent limitations of computer languages, their operating environments, and the capabilities of the hardware available to execute coded instructions.41

In particular, the ex ante indeterminacy of a legal standard such as fair use, which in the institutional operation of the law constitutes a benefit, presents a challenge for operational machine coding.42 Rule-oriented legal imperatives may better lend themselves to automated instructions. It is perhaps not too far-fetched to imagine a programmable exception of the fair dealing laundry list sort—although even for supposedly discrete statutory exceptions, concepts like “educational use” or “news reporting” might be unexpectedly tricky to reduce to computable code. But one can, for example, imagine programming a system to determine, perhaps on the basis of geolocational data and scraped calendaring or advertising data, whether a nondramatic musical work is being performed at an agricultural fair.43 It is far more difficult to envision how one might program a system to determine whether a given use has a relevant degree of impact on the actual or potential market for the work being used or whether an excerpt from the work is so significant as to constitute the “heart” of an author’s creation.44

Thus, the prospects for deploying what Fred von Lohmann has called “a judge on a chip” are at best remote.45 Current machine learning techniques attempt to sidestep such difficulties by creating routines that recognize data patterns and by allowing the routine to operate according to the values in the pattern found, rather than attempting to specify values in advance.46 This raises the possibility that algorithmic fair use parameters might not have to be explicitly defined and coded. Empirical investigation of the corpus of fair use decisions from federal courts suggests that fair use outcomes are neither random nor unpredictable but may follow particular patterns of judicial decision-making.47 One can imagine that a neural network or other machine learning system could detect these or other patterns in the data surrounding past cases, matching them to similar patterns in the data surrounding future fair use incidents, situations, and scenarios without formal programming definition of the fair use factors.48 Such a system might provide the kind of fair use assessments envisioned by the Ninth Circuit, if not prior to the actual use, at least in conjunction with online copyright enforcement decisions.

While such algorithmic decision-making would lower the immediate cost associated with fair use balancing and may be, as the court observes, the only feasible way to deal with petabytes of online content, it cannot be expected to eliminate the costs associated with fair use determinations.49 As I suggest in this Essay, it would at best reallocate such costs. Law and attendant legal institutions are embedded in a complex web of sociotechnical actors; technological realignment of costs in one section of the network inevitably results in realignment throughout other sections of the network.50 Costs do not disappear; they are redistributed. As the saying goes, there is no free lunch, and pressing down on the network at one point inevitably causes protrusion at some other point. Rather than imagining that costs vanish, it is imperative to estimate where, and to whom, and in what form they will occur.

IV. The Social Cost of Algorithms

There already exists a fairly large body of literature attempting to determine, predict, and assess the social impact and cost of algorithmic governance. Professor Tarleton Gillespie, in an influential article, encapsulates and categorizes several social effects that are already apparent in algorithmic deployment.51 Such effects entail hidden or unexpected costs of algorithmic deployment:

Patterns of Inclusion: “Big data” does not simply mean a lot of data; data must be collected, structured, and groomed for processing. The explicit or implicit biases of these procedures, including the choice of what data are included or excluded before algorithmic processing, are determinative of algorithmic output.52
Cycles of Anticipation: Data processing routines are structured with particular audiences and purposes in mind; they are tailored and retailored according to predicted uses. Such predictive designs determine who is likely to find the output useful, and the characteristics of the user pool recursively shape future updates to the algorithm.53
Evaluation of Relevance: Presentation of algorithmic output necessarily entails assignment of relevance; assigned relevance is meaningful only when adopted by users. Data processing routines thus effectively enact policy choices through their determination of what is relevant or irrelevant.54
Illusion of Objectivity: Design and execution of algorithmic processes are typically hidden from their audience. Machine-generated outputs thus appear to materialize without human bias, often creating the unwarranted perception of impartiality and objectivity. This perception further obscures the origins and the biases of the algorithm, lending it unwarranted authority.55
Patterns of Entanglement: Audiences will inevitably alter their behavior under the influence of the algorithms they depend on, and these behavioral changes then impact the data and data relationships that form the inputs to the same algorithms, a mirrored parallel to the cycles of anticipation in design.56
Production of Calculated Publics: Presentation of algorithmic results to an entangled public reshapes the public’s sense of self, propriety, and purpose. But the audience expectation embedded in algorithmic systems may be taken up by other institutions—by courts, schools, businesses, legislatures—reinforcing both the social position of the algorithm and its assumptions about its audience.57

In this Essay, I am primarily concerned with the latter two issues, although these concatenated effects are deeply intertwined and the preceding four topics are undoubtedly also matters of serious concern. There is no question that the open and hidden biases introduced in the construction of algorithmic systems are bound to have an effect on their social relevance.58 As Professor N. Katherine Hayles points out, the products of data processing have no inherent meaning; they require some explanatory narrative that lends them significance.59 Thus, as Professor Malte Ziewitz explains, algorithms are developed in terms of the problem they are expected to address, and so their design is inevitably framed in terms of a particular narrative about the algorithm’s purpose.60 Such “ontological gerrymandering” manipulates the boundary between the problematic and unproblematic by presenting selected assumptions about the problem as given or by cloaking their presence altogether.61

It is therefore somewhat alarming to read legal commentators who confidently assert that “human decision makers are flawed and biased. The biases and inconsistencies found in individual judgments can largely be washed away using advanced data analytics.”62 On the contrary, the observation that “[r]aw data is . . . an oxymoron,” famously coined by Professor Geoffrey Bowker in his influential work on scientific classification,63 has become something of a catchphrase among critical analysts of big data and its attendant algorithmic processes. The data are always cooked, before algorithmic processing and certainly during algorithmic processing, as indeed they must be in order to be useful in any way.64 The question is never whether the data are biased but rather how, by whom, and for what purposes.

When the four factors of the fair use standard are concerned, many of the points at which such design choices must be made quickly become obvious. Determining the impact of the unauthorized use of a work on the actual or potential market for the underlying work requires a model of the market and decisions about the data that properly populate that model. The amount of the work used can be mapped to the percentage of lines or words or pixels or bits taken for a given use, but some weight or significance must be accorded to that number, whether defined by explicit programming values or by algorithmically learned data patterns. The type of work used and the use to which the protected taking is put require some categorization of works and uses. These and a multitude of other design choices made in advance would determine the allowance or disallowance of uses for protected content; algorithms do not make judgments; they are rather the products of human judgment.

The need for human interpretation stems from the disjunction between data representation and reality: the correlations found by data mining algorithms have meaning within the formal properties of the data set but have unknown significance outside the data set.65 Thus, for example, facial recognition algorithms, employed to further security, law enforcement, immigration screening, and other purposes, have been much discussed in relation to algorithmic governance.66 But as computer scientist Bill Smart reminds us, such systems are not in fact “face detectors,” they are actually “set-of-pixel-values-that-often-correlate-well-with-the-presence-of-faces-in-the-training-data-that-you-collected-detector[s].”67 Similarly, fair use algorithms would be more accurately understood as something like “patterns-of-numerical-values-that-often-correlate-well-with-similar-patterns-of-numerical-values-related-to-judicial-findings-of-fair-use-in-the-training-data-that-you-collected-detectors.” Patterns detected by a machine evaluating fair use–related data should not be confused with a legal institutional determination of fair use.

Thus, again, algorithms do not make judgments; they are the products and the tools of human judgments. Human narrative may be baked into the system ex ante or it may be assigned to the output ex post, but at some point someone must put policy and ideology to work to declare the numbers relevant.68 Data analysis may indicate that certain data occurrences coincide, but the explanation as to why this occurs is a human narrative or categorization, not a technical determination.69 Thus, when data mining (in one famous example) shows a strong correlation between movements in the S&P 500 Stock Index and the production of butter in Bangladesh, a human decisionmaker is required to designate the trend as spurious rather than meaningful.70

Equally problematic is the realization that fair use is not a static concept. Even if the engineering vision of fair use, whether it is in defined programming values or as machine learned patterns, is somehow entirely faithful to the relevant legal doctrines, we are left with the question as to exactly which version of fair use is being instantiated in the machine. The common law evolves, whether from purely judicial reasoning or from judicial riffing off of legislative enactments. Fair use today does not look entirely like it has in the past, either as it did as a common law doctrine prior to its 1976 statutory codification or as it did when codified by Congress. Neither has the codified version remained static, as the Supreme Court has added a variety of judicial glosses, most notably the concept of transformativity.71 No doubt the official, judicially articulated understanding of the doctrine’s character can be expected to continue to change in the face of developing technical and social circumstances.

Thus, one concern that could stem from the dynamic legal nature of fair use is whether automated instantiation of fair use freezes the standard as of the time it was encoded, so that the law and the algorithm diverge. The algorithm could of course be updated to learn or incorporate shifts in the legal standard. But far from preventing divergence, updating almost assures it. Whether to incorporate new data or to accommodate new equipment, digital processes require continual updating that create unexpected dynamism as the deployed algorithm evolves.72 Maintenance and upgrades to the system inevitably deviate from the expectations of the original design. This is a source of inconsistency, as are the multiple serendipitous interactions of the particular algorithm with other hardware, software, and devices with their ongoing updates.73

To be sure, one might argue that judicial determinations of fair use factors requires the same set of judgments, and no matter what a judge may articulate in her written opinion, the actual process of judicial reasoning is never fully transparent.74 But as Professor Lawrence Lessig long ago pointed out, when technical design is effectively legal regulation, the major difference between legal code and computer code may be that the latter type of regulation devolves policy choices from the hands of publicly accountable officials to those of largely invisible and unaccountable software engineers.75 Or as Professor McKenzie Wark suggested, technology is merely politics by different means; when speaking of the technical or of the political, one is speaking of the same systems viewed through different lenses.76 Political, ideological, and even unconscious biases are well understood to permeate traditional legal codes developed in the legislative arena, but they are equally present in deployment of computer code developed in the technical arena.

V. Institutional Infrastructure Redux

To guard against intentional or unintentional algorithmic error, the natural suggestion is to require some degree of human oversight.77 And thus, the suggestion of automated fair use assessment, notwithstanding any changes in machine learning technology, circles back to the finding by Professor Cohen and myself nearly twenty years ago that automated fair use systems require human institutional oversight.78 But once again we are confronted with the observation of Professor Paul Dourish and others that algorithms may be best regarded as “convening” objects79 that interact with a complex ecology of hardware, software, social institutions, and human actors.80 The social impact of algorithmic fair use depends on the assemblage of actions and actors that are tied together in such an infrastructure. We should therefore consider closely how any putative fair use detector came into existence and consider what entities have the motivation and the resources to construct such a system.

Designing, maintaining, repairing, gathering, curating, and updating a database and its attendant algorithm are not costless activities.81 They are, to the contrary, likely to be expensive as standalone activities, or might constitute marginal costs related to the investment in a larger undertaking. Copyright algorithms are currently deployed, as the Ninth Circuit underscores in its Lenz dictum, in order to manage the overwhelming job of policing digital content.82 In the vision of algorithmic fair use casually articulated by the Ninth Circuit, the fair use algorithm might constitute part of a good faith effort by content owners to evaluate likely infringement.83 Alternatively, one could imagine service providers, such as Google or Facebook, creating and deploying algorithmic fair use as part of their effort to comply with their responsibilities under the Copyright Act and to justify their decisions to remove or allow content on their platforms. Far less likely is any scenario in which the users or consumers of copyrighted content deploy a fair use algorithm, or even in which fair users would have any hand in designing or crafting the systems that assess the applicability of the exemption to their activities.

Given the inordinate cost associated with reviewing online content for infringement, what type of human oversight might we expect from the likely originators of automated fair use assessment? Would human oversight guard against Type I or Type II error?84 False algorithmic fair use positives are the likely concern of content holders, whereas false algorithmic fair use negatives are most detrimental to the public good. Screening for both types of error would effectively mean human review of every algorithmic decision, negating any cost advantage from the algorithmic review, and so is utterly implausible. The statute considered in Lenz clearly contemplates human decision-making in the formation of a “good faith belief,”85 and judicial enforcement of that expectation might require human review before a takedown demand; but this does not address algorithmic policing of uploads, downloads, or access.86 Certainly, content owners are no more likely to engage expert human oversight of fair use analysis than they do now for automated decision-making regarding content blocking or removal.

In other contexts, the idea of third party audits for algorithmic decisions has been advanced.87 But ex ante assessment of intended or unintended biases will prove difficult or impossible. Transparency of algorithmic systems is obscured in at least three different interlocking dimensions.88 First is the explicit or intentional obscurity stemming from trade secrecy and protection of confidential business information—to the extent that algorithms are commissioned or developed by commercial entities, they may attempt to shield proprietary aspects of the technology from misappropriation or competitive copying.89 A second barrier to transparency stems from the esoteric nature of the technology, requiring technical expertise to understand its workings. Even if the code is openly available, judges and lay consumers are unlikely to understand how the algorithm operates, so any assessment of the technology’s operation or suitability is at best reliant on expert interpretation and translation of the algorithm’s features into understandable lay terms.

Third, and closely related to the point regarding updates: the complexity of the algorithm in operation creates opacity. Even if the system is entirely open to inspection by experts, the experts are unlikely to understand how it operates.90 Because machine learning codes for routines and leaves the routine to develop values, it is often impossible to predict or even to know what the machine has learned.91 Additionally, the algorithm itself is embedded in a larger technical structure, including other software and hardware components that will affect its operation, often in unexpected or inscrutable ways.92 Thus, when Google’s image recognition algorithm infamously labeled pictures of African American people as “gorillas,” it might have been due to some unconscious racial bias in the training data, or it might have been due to some kind of bias in the system design, or it might have been an unfortunate but inadvertent occurrence caused by a random technical glitch somewhere in the system.93 But the most significant lesson from the debacle, whatever the origin of the offensive output, may be that the only solution was to block the system from labeling any image as a “gorilla” because Google’s technical staff simply had no ability to locate, isolate, or remedy the source of the problem.94

VI. Self-Fulfilling Algorithms

Because the technical limitations I briefly sketch above are opaque, there is a strong tendency for them to remain invisible and unconsidered, and the lack of apparent limits lends to the machine a magical aura of automated objectivity. And yet the limitations will be there, preventing automation of what we now call fair use. While we should be deeply concerned with these inevitable biases that attend algorithmic design and implementation as well as with the distracting myth of algorithmic objectivity, my primary concern here is with their combination to produce recursive biases that change public practice and so change social meaning. As I note above, this type of effect is already seen in the algorithmic copyright policing of online content, in which the algorithmic removal action has become a de facto finding of infringement, the public has begun to internalize such outcomes, and formal copyright law may be incorporating those expectations into its weft.95 As one video creator has described the development of an online guide to moviemaking:

You could make a video that meets the criteria for fair use, but YouTube could still take it down because of their internal system (Copyright ID) which analyzes and detects copyrighted material.

So I learned to edit my way around that system.

Nearly every stylistic decision you see about the channel — the length of the clips, the number of examples, which studios’ films we chose, the way narration and clip audio weave together, the reordering and flipping of shots, the remixing of 5.1 audio, the rhythm and pacing of the overall video — all of that was reverse-engineered from YouTube’s Copyright ID.

I spent about a week doing brute force trial-and-error. I would privately upload several different essay clips, then see which got flagged and which didn’t. This gave me a rough idea what the system could detect, and I edited the videos to avoid those potholes.96

Whatever form algorithmic fair use might take would likely become a similar social, legal, and creative default.

The effective loss of any user exceptions in current online copyright enforcement might be seen to favor incorporation of some approximation of fair use into policing algorithms, however far it may depart from the actual legal grant to users, on the theory that occasional and biased user access is better than none.97 I have become increasingly chary of such interventions, however well-intentioned. We have some historic experience with the effect of fair use approximations in the context of old-fashioned, nonautomated legal formulas. In the context of the 1976 revision of the US Copyright Act, educators, publishers, and other stakeholders met to discuss the application of fair use standards to the photocopying of classroom materials.98 After considerable discussion, the groups reported to the relevant congressional committee that they had agreed on certain guidelines for photocopying.

The guidelines were not enacted into law, nor endorsed or approved by Congress, although they were discussed in committee reports.99 Rather, the guidelines were effectively an agreed-upon metric, conformity with which would be considered by the group to be “fair” and so excused from infringement liability. For example, the guidelines specified that material taken without authorization must be brief and offered definitions of permissible “brevity” for various types of works literary, such as:

A complete poem if less than 250 words and if printed on not more than two pages or from a longer poem, an excerpt of not more than 250 words.100
Either a complete article, story or essay of less than 2,500 words, or an excerpt from any prose work of not more than 1,000 words or 10 percent of the work, whichever is less, but in any event a minimum of 500 words.101
Each of the numerical limits stated above may be expanded to permit the completion of an unfinished line of a poem or of an unfinished prose paragraph.102

Note that, under the statutory test, these metrics might or might not be deemed “fair.” Depending on the circumstances, 250 words from a poem might fall within the statutory determination of “fair,” or 250 words might be too much. Certainly, many uses not recognized within such guidelines would be fair under the statute. The guidelines comprised a set of simple, discrete, quantitative (and, not coincidentally, eminently programmable) substitutions of private rules for the statutory standard, offering certainty rather than flexibility.

Despite the fact that they were not legally required, and copyright users were likely entitled to more than the guidelines offered, the guidelines quickly found considerable purchase with copyright users, who were often advised by their employers or by professional societies to remain within the guidelines in order to avoid the uncertainties of the actual statute’s multifactor calculus. This is perhaps not surprising, as the discrete metrics of the guidelines were easier to communicate and to understand than the inchoate factors of the actual legal test. Somewhat more surprisingly, the guidelines began to show up in infringement litigation, were cited by copyright owners as marking the boundaries of fair use, and were adopted by some courts as indicative of fair use.103 Fair use analysis is costly in terms of judicial resources, and the guidelines offered a ready-made rule for some judges to use.

In short, implementation of algorithmic fair use will inevitably, and probably detrimentally, change the nature of fair use.104 Much as in the historical case of the fair use guidelines, we should expect that the deployment of any algorithm purporting to assess fair use would engage strong incentives toward the adoption of a quick and easy substitute for a complicated legal test. Adoption might be explicit, as in the case of the fair use guidelines, or tacit, as courts and the public internalize the activity of the algorithm. Indeed, our experience to date with algorithmic systems suggests that the incentives toward de facto definition of fair use as equivalent to its automated doppelganger would be much stronger. In practice then, whatever choices or biases, inclusions or exclusions, expectations or oversights were engineered into the algorithm would become a self-fulfilling prophecy as to the nature of fair use.

Thus, the problem is not so much the concern advanced by Professor Roger Brownsword that regulation such as fair use by design forecloses a population’s moral and normative choices,105 as the concern is that the moral and normative choices embraced by the population are informed, manufactured, and ultimately distorted by the architecture of regulation. Indeed, Professor Mireille Hildebrandt has suggested that algorithmic technologies cannot support, and may be inimical to, the public values that are fundamental to democratic and civil society.106 Machine learning may seem a novelty, and technology may have changed, but basic human nature and institutional practice have not. Careful consideration of these and related effects are necessarily part of any realistic assessment of algorithmic fair use—or indeed of any movement toward automated governance.

Conclusion

The conclusion compelled by our current understanding of algorithmic governance is stark but real. I have outlined two possible roads ahead, and on neither of them does it appear that viable fair use survives intact in the algorithmic twenty-first century.107 Failure to incorporate fair use into copyright enforcement algorithms likely means the de facto loss of the fair use exception, making it available only as a rarified defense to the few litigants who can afford to persevere until favorable judicial review. However, the alternative of attempting to incorporate fair use into enforcement algorithms threatens to degrade the exception into an unrecognizable form. Worse yet, social internalization of a bowdlerized version of fair use deployed in algorithmic format is likely to become the new legal and social norm. We can of course try to shoehorn some type of infringement forgiveness into enforcement algorithms, and we might even label such mechanized user latitude “fair use,” but it will not resemble fair use or serve the goals of fair use, in any sense that we now know them.

Essentially, because fair use cannot be automated, algorithmic fair use simply cannot be fair use at all. There is perhaps some cold comfort, when drawing upon the very deep literature examining algorithmic governance, to realize that this situation is not unique either to fair use in particular nor to copyright in general. The reality of what Professor Jack Balkin has called the “Algorithmic Society”108 is that these processes are operating across vast swaths of legal governance, from privacy to consumer welfare to freedom of speech.109 In the little corner of the world concerned with copyright, it has been clear for some time that copyright in the information age is probably not fulfilling its mandate of encouraging authors while promoting human flourishing;110 the dysfunction inherent in algorithmic copyright is only the latest sign of a system in dissolution.

It should, therefore, come as no surprise that the fair use component of copyright is no more amenable to automation than is the overall copyright system itself. Moreover, I have focused here only on the difficulties of algorithmic fair use, but similar difficulties would attend automation of the idea/expression distinction,111 exhaustion,112 functionality,113 and other doctrines that likely do far more than fair use to control the shape and scope of copyright. Rather than attempting to salvage the accustomed analog copyright balance by substituting some changeling form of fair use for the familiar doctrine, the reality of algorithmic governance may instead mean radically rethinking the goals of copy-right as a whole.

1See generally Note, Private Bills in Congress, 79 Harv L Rev 1684 (1966).
2See Natascha Just and Michael Latzer, Governance by Algorithms: Reality Construction by Algorithmic Selection on the Internet, 39 Media, Culture & Society 238, 247–48 (2017) (describing algorithmic personalization); Paul Dourish, Algorithms and Their Others: Algorithmic Culture in Context, 3 Big Data & Society *3 (July–Dec 2016) (discussing algorithms in the context of digital automation). As Professor Paul Dourish points out, the concept of the “algorithm” is slippery, and usage is loose, encompassing everything from actual computer code to systems of digital control and management. Id at *3–4. Because the idea of a “fair use algorithm” currently lies somewhere between conjecture and fantasy, making it impossible to predict just what technology might accommodate such a system, I use the term here in the broad sense of “encoded procedures for transforming input data into a desired output, based on specified calculations.” Tarleton Gillespie, The Relevance of Algorithms, in Tarleton Gillespie, Pablo J. Boczkowski, and Kirsten A. Foot, eds, Media Technologies: Essays on Communication, Materiality, and Society 167, 167 (MIT 2014).
3See generally Frank Pasquale, The Black Box Society: The Secret Algorithms That Control Money and Information (Harvard 2015) (surveying use of algorithmic controls across multiple sectors).
4See Matthew Sag, Internet Safe Harbors and the Transformation of Copyright Law, 93 Notre Dame L Rev 499, 543–44 (2017); Maayan Perel and Niva Elkin-Koren, Accountability in Algorithmic Copyright Enforcement, 19 Stan Tech L Rev 473, 478–81 (2016); Annemarie Bridy, Copyright’s Digital Deputies: DMCA-Plus Enforcement by Internet Intermediaries, in John A. Rothchild, ed, Research Handbook on Electronic Commerce Law 185, 195–98 (Edward Elgar 2016).
5See 17 USC § 107.
6See Sag, 93 Notre Dame L Rev at 522–26 (cited in note 4); Niva Elkin-Koren, Fair Use by Design, 64 UCLA L Rev 1082, 1093–99 (2017).
7See, for example, danah boyd and Kate Crawford, Critical Questions for Big Data: Provocations for a Cultural, Technological, and Scholarly Phenomenon, 15 Info, Commun & Society 662, 667–75 (2012) (surveying the challenges attending deployment of big data systems); Gernot Rieder and Judith Simon, Big Data: A New Empiricism and Its Epistemic and Socio-political Consequences, in Wolfgang Pietsch, Jörg Wernecke, and Maximillian Ott, eds, Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data 85, 91–94 (Springer 2017).
8See Dan L. Burk, Law and Economics of Intellectual Property: In Search of First Principles, 8 Ann Rev L & Soc Sci 397, 401 (2012).
9See Pamela Samuelson, Justifications for Copyright Limitations and Exceptions, in Ruth L. Okediji, ed, Copyright Law in an Age of Limitations and Exceptions 12, 18–24 (Cambridge 2017).
10P. Bernt Hugenholtz, Fierce Creatures—Copyright Exemptions: Toward Extinction?, in David Vaver, ed, 2 Intellectual Property Rights: Critical Concepts in Law 231, 232 (Routledge 2006).
11Pub L No 94-553, 90 Stat 2541 (1976), codified at 17 USC § 101 et seq.
1217 USC § 110(1), (3), (8).
1317 USC § 110(6).
1417 USC § 107. See also Jennifer M. Urban, How Fair Use Can Help Solve the Orphan Works Problem, 27 Berkeley Tech L J 1379, 1429 n 219 (2012) (noting similar provisions in Israeli and Philippine law).
1517 USC § 107.
16See, for example, Jason Scott Johnston, Bargaining under Rules versus Standards, 11 J L Econ & Org 256, 269–70 (1995); Louis Kaplow, Rules versus Standards: An Economic Analysis, 42 Duke L J, 557, 575–77 (1992); Pierre Schlag, Rules and Standards, 33 UCLA L Rev 379, 381–83 (1985).
17See Kaplow, 42 Duke L J at 599–601 (cited in note 16) (discussing how context can change the cost of rule development or standard application).
18Carol M. Rose, Crystals and Mud in Property Law, 40 Stan L Rev 577, 601–04 (1988). Although they do not use Rose’s terminology, some scholars have observed the same modulating effect in fair use doctrine. See Niva Elkin-Koren and Orit Fischman-Afori, Rulifying Fair Use, 59 Ariz L Rev 161, 177–86 (2017) (discussing the procession between rules and standards in fair use).
19See generally Anthony J. Casey and Anthony Niblett, The Death of Rules and Standards, 92 Ind L J 1401 (2017); Anthony J. Casey and Anthony Niblett, Self-Driving Laws, 66 U Toronto L J 429 (2016).
20See Casey and Niblett, 92 Ind L J at 1411–12 (cited in note 19).
21See Lucas D. Introna, Algorithms, Governance, and Governmentality: On Governing Academic Writing, 41 Sci, Tech & Hum Values 17, 20 (2015) (describing algorithmic governance mechanisms as embedded in a complex flow of social practices); Kate Crawford, Can an Algorithm Be Agonistic? Ten Scenes from Life in Calculated Publics, 41 Sci, Tech & Hum Values 77, 79 (2015) (observing that algorithms function, are produced, and are modified in complex political environments).
22M.C. Elish and danah boyd, Situating Methods in the Magic of Big Data and AI, 85 Commun Monographs 57, 63–64 (2017). See also Malcolm Campbell-Verduyn, Marcel Goguen, and Tony Porter, Big Data and Algorithmic Governance: The Case of Financial Practices, 22 New Polit Econ 219, 220 (2016) (labeling as “techno-utopian” the optimistic view that algorithmic governance will “overcome the imperfections of politics and faulty forms of knowledge”).
23See Alfred Gell, Technology and Magic, 4 Anthropology Today 6, 9 (1988).
24See Sag, 93 Notre Dame L Rev at 543–44 (cited in note 4).
25Id at 543–44.
26Id at 545–46.
27Id at 504–05, 543–44.
28Sag, 93 Notre Dame L Rev at 503 (cited in note 4). See also Roger Brownsword, Lost in Translation: Legality, Regulatory Margins, and Technological Management, 26 Berkeley Tech L J 1321, 1328–29 (2011) (noting that technological regulation measures allow only nonnormative practical responses). Thus, while Professor Karen Yeung argues that digital content filtering is a tool of identification and selection rather than control, students of Foucault understand that identification and selection technologies are indeed control technologies. Compare Karen Yeung, Toward an Understanding of Regulation by Design, in Roger Brownsword and Karen Yeung, eds, Regulating Technologies: Legal Futures, Regulatory Frames and Technological Fixes 79, 88 (Hart 2008), with Oscar H. Gandy Jr, The Panoptic Sort: A Political Economy of Personal Information 71–80 (Westview 1993) (extending Foucault’s observations on surveillance to data mining technologies).
29See Sag, 93 Notre Dame L Rev at 531–32 (cited in note 4); Elkin-Koren, 64 UCLA L Rev at 1093–99 (cited in note 6).
30815 F3d 1145 (9th Cir 2016).
31Id at 1149.
32Pub L No 105-304, 112 Stat 2860 (1998).
33Lenz, 815 F3d at 1150.
34Id at 1151.
35Id at 1153.
36Lenz v Universal Music Corp, 801 F3d 1126, 1135 (9th Cir 2015). This passage was withdrawn and superseded by Lenz, 815 F3d at 1148.
37Lenz, 815 F3d at 1149.
38Dan L. Burk and Julie E. Cohen, Fair Use Infrastructure for Rights Management Systems, 15 Harv J L & Tech 41, 55–58 (2001). See also Timothy K. Armstrong, Digital Rights Management and the Process of Fair Use, 20 Harv J L & Tech 49, 82–85 (2006) (critiquing the Burk and Cohen proposal).
39See, for example, John S. Erickson, Fair Use, DRM, and Trusted Computing, 46 Commun ACM 34, 37–38 (2003); Edward W. Felten, A Skeptical View of DRM and Fair Use, 46 Commun ACM 57, 58 (2003). See also Deirdre Mulligan and Aaron Burstein, Implementing Copyright Limitations in Rights Expression Languages, in Joan Feigenbaum, ed, Digital Rights Management—ACM CCS-9 Workshop 137, 140–41 (Springer 2002) (discussing the possibilities for expression of automated fair use permissions).
40See John S. Erickson and Deirdre K. Mulligan, The Technical and Legal Dangers of Code-Based Fair Use Enforcement, 92 Proceedings of the IEEE 985, 992 (2004) (observing that “copyright law is difficult (if not impossible) to reduce to code”).
41See id; Mulligan and Burstein, Digital Rights Management at 144 (cited in note 39).
42See Elish and boyd, 85 Commun Monographs at 73 (cited in note 22) (“Because computational systems require precise definitions and mathematically sound logics, sociocultural phenomena that are typically nuanced and fuzzy are rendered in coarse ways when implemented into code.”).
43See note 13 and accompanying text.
44See Harper & Row, Publishers, Inc v Nation Enterprises, 471 US 539, 564–65 (1985) (holding that an unauthorized publication of a short excerpt constituting the “heart” of a biography weighed against fair use).
45See Mulligan and Burstein, Digital Rights Management at 139 (cited in note 39), quoting Fred von Lohmann, Reconciling DRM and Fair Use: Preserving Future Fair Uses? *1 (Computers, Freedom, and Privacy Conference, 2002), archived at http://perma.cc/UF8R-VHZP. See also Armstrong, 20 Harv J L & Tech at 108–20 (cited in note 38) (reviewing limitations of a range of embedded fair use architecture possibilities).
46See Elish and boyd, Commun Monographs at 63 (cited in note 22) (describing the current movement toward machine learning techniques); Marion Fourcade and Kieran Healy, Seeing Like a Market, 15 Socio-Econ Rev 9, 24 (2017) (observing that artificial intelligence research abandoned the idea of machines that can think in favor of machines that can learn).
47See Matthew Sag, Predicting Fair Use, 73 Ohio St L J 47, 75–81 (2012); Barton Beebe, An Empirical Study of U.S. Copyright Fair Use Opinions, 1978–2005, 156 U Pa L Rev 549, 594–621 (2008). See also generally Pamela Samuelson, Unbundling Fair Uses, 77 Fordham L Rev 2537 (2009) (arguing that fair use decisions fall into regularized patterns).
48See Elkin-Koren, 64 UCLA L Rev at 1096–97 (cited in note 6) (speculating about this type of fair use algorithm). Ironically, because machine learning techniques inevitably involve the digital reproduction of training content, algorithmic fair use may be dependent on the fair use doctrine in order to acquire and process the materials needed to learn fair use. See Benjamin L.W. Sobel, Artificial Intelligence’s Fair Use Crisis, 41 Colum J L & Arts 45, 80–81 (2017) (discussing the dependence of artificial intelligence learning on fair use); Amanda Levendowski, How Copyright Law Can Fix Artificial Intelligence’s Implicit Bias Problem, 93 Wash L Rev 579, 619–29 (2018) (same). See also James Grimmelmann, Copyright for Literate Robots, 101 Iowa L Rev 657, 665–67 (2016) (arguing that large-scale robotic scanning or copying is permissible as a fair use).
49See Felten, 46 Commun ACM at 58–59 (cited in note 39).
50See Bryan Pfaffenberger, Technological Dramas, 17 Sci, Tech & Hum Values 282, 291 (1992).
51See generally Gillespie, The Relevance of Algorithms (cited in note 2).
52See id at 169–72.
53See id at 172–75.
54See id at 175–79.
55See Gillespie, The Relevance of Algorithms at 179–82 (cited in note 2).
56See id at 183–88.
57See id at 188–91.
58See Anton Vedder and Laurens Naudts, Accountability for the Use of Algorithms in a Big Data Environment, 31 Intl Rev L Computers & Tech 206, 208–09 (2017).
59N. Katherine Hayles, How We Think: Digital Media and Contemporary Technogenesis 176 (Chicago 2012).
60Malte Ziewitz, How to Think about an Algorithm: Notes from a Not Quite Random Walk *10 (draft discussion paper, Sept 29, 2011), archived at http://perma.cc/S3CZ-866V. See also Hayles, How We Think at 176 (cited in note 60) (observing that database outputs acquire meaning only with narrative explanation).
61Steve Woolgar and Dorothy Pawluch, Ontological Gerrymandering: The Anatomy of Social Problems Explanations, 32 Soc Problems 214, 217–18 (1985).
62Casey and Niblett, 66 U Toronto L J at 437 (cited in note 19). While this rather astonishing assertion lacks any citation to supporting authority, one possible source could be Professor Viktor Mayer-Schönberger and Kenneth Cukier, who have similarly claimed that the completeness of big data sets obviates the messiness of the data. Viktor Mayer-Schönberger and Kenneth Cukier, Big Data: A Revolution That Will Transform How We Live, Work, and Think 33–35 (Houghton Mifflin 2013). This claim has been strongly criticized as inaccurate by a number of subsequent commentators. See, for example, Carl Lagoze, Big Data, Data Integrity, and the Fracturing of the Control Zone, 1 Big Data & Society *5 (July–Dec 2014); S. Leonelli, What Difference Does Quantity Make? On the Epistemology of Big Data in Biology, 1 Big Data & Society *6–8 (Apr–June 2014).
63Geoffrey C. Bowker, Memory Practices in the Sciences 184 (MIT 2005) (“Raw data is both an oxymoron and a bad idea; to the contrary, data should be cooked with care.”).
64See Lisa Gitelman and Virginia Jackson, Introduction, in Lisa Gitelman, ed, “Raw Data” Is an Oxymoron 1, 3 (MIT 2013).
65See Dourish, 3 Big Data & Society at *7 (cited in note 2); Elish and boyd, Commun Monographs at 70 (cited in note 22) (developing machine-readable code is “not about a search for meaning, but about the construction and depiction of statistical models”).
66See generally Clare Garvie, Alvaro Bedoya, and Jonathan Frankle, The Perpetual Lineup: Unregulated Police Face Recognition in America (Georgetown Law Center on Privacy & Technology, Oct 18, 2016), archived at http://perma.cc/L3CC-BNS6 (discussing use of facial recognition algorithms).
67See Elish and boyd, Commun Monographs at 69–70 (cited in note 22) (quoting Smart).
68See boyd and Crawford, 15 Info, Commun & Society at 667–68 (cited in note 7). See also John Symons and Ramón Alvarado, Can We Trust Big Data? Applying Philosophy of Science to Software, 3 Big Data & Society at *4–6 (July–Dec 2016) (discussing the epistemic difficulties in error correction for big data systems).
69See Dourish, 3 Big Data & Society at *7–8 (cited in note 2).
70See David J. Leinweber, Stupid Data Miner Tricks: Overfitting the S&P 500, 16 J Investing 15, 16 (2007).
71See Campbell v Acuff-Rose Music, Inc, 510 US 569, 579 (1994).
72See Dourish, 3 Big Data & Society at *8–9 (cited in note 2) (discussing the co-evolution of algorithms and data sets as implemented).
73Id at *9.
74See Casey and Niblett, 66 U Toronto L J at 437 (cited in note 19).
75See Lawrence Lessig, Code and Other Laws of Cyberspace 99 (Basic 1999). See also generally Joel R. Reidenberg, Lex Informatica: The Formulation of Information Policy Rules through Technology, 76 Tex L Rev 553 (1997) (discussing implementations of regulation through technology); Bruno Latour, Where Are the Missing Masses? The Sociology of a Few Mundane Artifacts, in Wiebe Bijker and John Law, eds, Shaping Technology-Building Society: Studies in Sociotechnical Change 225 (MIT 1992) (discussing the behavioral imperatives embedded in technical design).
76McKenzie Wark, #Celerity: A Critique of the Manifesto for an Accelerated Politics ¶ 3.7 (Speculative Heresy, May 14, 2013), archived at http://perma.cc/2YKW-X5NA. See also Langdon Winner, The Whale and the Reactor: A Search for Limits in an Age of High Technology 29 (Chicago 1986) (“[T]echnological innovations are similar to legislative acts or political foundings that establish a framework for public order.”).
77See Elkin-Koren, 64 UCLA L Rev at 1098 (cited in note 6) (suggesting such mixed oversight in the context of automated removal decisions).
78See Burk and Cohen, 15 Harv J L & Tech at 59 (cited in note 38). See also Erickson and Mulligan, 92 Proceedings of the IEEE at 993–94 (cited in note 40) (discussing the prospects for integrating algorithmic and human oversight of fair use). Professor Yeung argues that such institutional oversight is a generalized requirement for regulation by design. See Yeung, Toward an Understanding of Regulation by Design at 93–94 (cited in note 28).
79Dourish, 3 Big Data & Society at *3 (cited in note 2), quoting Mike Annany, Toward an Ethics of Algorithms: Convening, Observation, Probability, and Timeliness, 41 Sci, Tech & Hum Values 93, 100–02 (2016).
80Dourish, 3 Big Data & Society at *3 (cited in note 2); Daniel Neyland and Norma Möllers, Algorithmic IF . . . THEN Rules and the Conditions and Consequences of Power, 20 Info, Commun & Society 45, 47 (2017).
81See Elish and boyd, Commun Monographs at 69 (cited in note 22).
82Lenz, 801 F3d at 1135.
83See notes 30–31 and accompanying text.
84See J. Neyman and E.S. Pearson, The Testing of Statistical Hypotheses in Relation to Probabilities a Priori, 29 Mathematical Proceedings of the Cambridge Phil Society 492, 497–98 (1933) (labeling false positives and false negatives rejecting null hypotheses as Type I or Type II errors, respectively).
85Lenz, 801 F3d at 1135.
86See generally Bridy, Copyright’s Digital Deputies (cited in note 4). As Professor Annemarie Bridy has documented in some detail, the statutory notice and takedown procedure considered in Lenz has morphed into a network of voluntary filtering, blocking, and removal practices that are cheaper and more convenient for the businesses involved. Id at 195–98.
87See generally, for example, Pasquale, The Black Box Society (cited in note 3) (arguing for audits of search engine and financial accounting algorithms).
88See Jenna Burrell, How the Machine “Thinks”: Understanding Opacity in Machine Learning Algorithms, 3 Big Data & Society *9 (Jan–June 2016) (questioning the likely value and efficacy of proposed algorithmic audits).
89This is a central and signature concern of Professor Frank Pasquale’s analysis. See Pasquale, The Black Box Society at 4 (cited in note 3).
90Id at 6–7.
91See Burrell, 3 Big Data & Society at *11 n 17 (cited in note 89).
92See id at *5.
93See id at *7.
94See Marion Fourcade and Kieran Healy, Categories All the Way Down, 42 Hist Soc Rsrch 286, 293–94 (2017).
95See Sag, 93 Notre Dame L Rev at 503 (cited in note 4).
96Tony Zhou, Postmortem: Every Frame a Painting (Medium, Dec 2, 2017), archived at http://perma.cc/U5WU-M6ZZ. Hat tip to Professor James Grimmelmann for pointing out this example.
97See Burk and Cohen, 15 Harv J L & Tech at 65 (cited in note 38) (suggesting that some de minimis access rules might be incorporated into digital rights management algorithms). See also Barbara L. Fox and Brian A. LaMacchia, Encouraging Recognition of Fair Uses in DRM Systems, 46 Commun ACM 61, 62–63 (2003) (advocating inclusion of “safe harbor” uses in digital rights management algorithms).
98See Kenneth D. Crews, The Law of Fair Use and the Illusion of Fair-Use Guidelines, 62 Ohio St L J 599, 615–18 (2001); Ann Bartow, Educational Fair Use in Copyright: Reclaiming the Right to Photocopy Freely, 60 U Pitt L Rev 149, 149–63 (1998).
99See Crews, 62 Ohio St L J at 636 (cited in note 99).
100Agreement on Guidelines for Classroom Copying in Not-for-Profit Educational Institutions with Respect to Books and Periodicals, HR Rep No 94-1476, 98th Cong, 2d Sess 68 (1976).
101Id at 68–69.
102Id at 69.
103See Crews, 62 Ohio St L at 662–63 (cited in note 99) (summarizing judicial uses of the guidelines).
104See Deirdre K. Mulligan, John Han, and Aaron J. Burstein, How DRM-Based Content Delivery Systems Disrupt Expectations of “Personal Use”, DRM 2003: Proceedings of the Third ACM Workshop on Digital Rights Management 77, 85 (2003) (arguing that digital rights management constraints may change consumer expectations for personal use of secured content).
105See Roger Brownsword, Disruptive Agents and Our Onlife World: Should We Be Concerned, 4 Critical Analysis L 61, 66–67 (2017). See also Dan L. Burk and Tarleton Gillespie, Autonomy and Morality in DRM and Anti-circumvention Law, 4 Triple C: Cognition Commun Cooperation 239, 241 (2006) (arguing that technical copyright protections impact user autonomy).
106Mireille Hildebrandt, Smart Technologies and the End(s) of Law: Novel Entanglements of Law and Technology 184–85 (Edward Elgar 2015).
107But see Elkin-Koren, 64 UCLA L Rev at 1100 (cited in note 6) (advocating the development of fair use algorithms for the twenty-first century).
108Jack M. Balkin, Free Speech in the Algorithmic Society: Big Data, Private Governance, and New School Speech Regulation, 51 UC Davis L Rev 1149, 1151 (2018). See also Ian Bogost, The Cathedral of Computation (The Atlantic, Jan 15, 2015), archived at http://perma.cc/3K2G-FCNN (“We’re not living in an algorithmic culture so much as a computational theocracy.”).
109See generally Pasquale, The Black Box Society (cited in note 3). See also Joshua A. Kroll, et al, Accountable Algorithms, 165 U Pa L Rev 633, 678–94 (2017) (attempting to articulate general principles of accountability for algorithmic regulation).
110The literature grappling with this problem is now immense, but a sampling of major works would include Hector Postigo, The Digital Rights Movement: The Role of Technology in Subverting Digital Copyright (MIT 2012); John Tehranian, Infringement Nation: Copyright 2.0 and You (Oxford 2011); Lawrence Lessig, Remix: Making Art and Commerce Thrive in the Hybrid Economy (Penguin 2008); Tarleton Gillespie, Wired Shut: Copyright and the Shape of Digital Culture (MIT 2007); Jessica Litman, Digital Copyright (Prometheus 2001).
111See Pamela Samuelson, Reconceptualizing Copyright’s Merger Doctrine, 63 J Copyright Society 417, 419 (2016).
112See Aaron Perzanowski and Jason Schultz, Digital Exhaustion, 58 UCLA L Rev 889, 912 (2011).
113See Pamela Samuelson, Why Copyright Law Excludes Systems and Processes from the Scope of Its Protection, 85 Tex L Rev 1921, 1951–52 (2007).