Essay

Data Mining and Internet Profiling: Emerging Regulatory and Technological Approaches

Ira S. Rubinstein

Associate General Counsel, Microsoft Corporation (ret)

Ronald D. Lee

Partner, Arnold & Porter LLP

Paul M. Schwartz

Professor of Law, UC Berkeley School of Law

The views expressed in this article are those of the authors alone. All three authors received their JD degrees from Yale Law School in 1985.

The 9/11 terrorists, before their deadly attacks, sought invisibility through integration into the society they hoped to destroy. In a similar fashion, the terrorists who carried out subsequent attacks in Madrid and London attempted to blend into their host lands. This strategy has forced governments, including the United States, to rethink counterterrorism strategies and tools.

One of the current favored strategies involves data mining. In its pattern-based variant, data mining searches select individuals for scrutiny by analyzing large data sets for suspicious data linkages and patterns. Because terrorists do not “stand out,” intelligence and law enforcement agents want to do more than rely exclusively on investigations of known suspects. The new goal is to search “based on the premise that the planning of terrorist activity creates a pattern or ‘signature’ that can be found in the ocean of transaction data created in the course of everyday life.” Accordingly, to identify and preempt terrorist activity, intelligence agencies have begun collecting, retaining, and analyzing voluminous and largely banal transactional information about the daily activities of hundreds of millions of people.

Private organizations have their own reasons for gathering widespread information about individuals. With the expansion of internetbased services, companies can track and document a broad range of people’s online activities and can develop comprehensive profiles of these people. Advertisers and marketing firms likewise have strong incentives to identify and reach internet users whose profiles have certain demographic, purchasing behavior, or other characteristics. The construction, storage, and mining of these digital dossiers by internet companies pose privacy risks. Additional privacy issues arise when the government obtains this information, which it currently can without much legal process.

This essay begins by examining governmental data mining; its particular focus is on pattern-based searches of databases according to a model of linkages and data patterns that are thought to indicate suspicious behavior.

TABLE OF CONTENTS