Robot Assisted Discovery—ESI and the Use of “Predictive Coding” Software in Civil Litigation

For various reasons, including storage, ease of access, environmental concerns, etc., many businesses have moved away from keeping hard copies of files—and most files are now stored electronically.  The shift from paper to electronic files has created new challenges for litigation discovery—the process of gathering information for a lawsuit.  Increasingly, discovery requests take the form of seeking electronically stored information (“ESI”).  ESI is a broad term that can mean nearly anything in electronic form, but in litigation, it most frequently includes documents (Word, JPEG, PDF, TIFFs, spreadsheets), texts, and email.  The use of electronic media means there is a lot more discoverable information out there, but ESI also can be a nightmare to sift through.  In a lawsuit, sifting through ESI means dollars—whether paying contract attorneys, associates, or partners to sort the relevant from irrelevant and privileged from nonprivileged materials.

A traditional method of sorting relevant from irrelevant ESI involves use of keyword searches across the platform where documents are stored.  In many cases, litigants will negotiate (1) to identify custodians of relevant data; and (2) about keyword and Boolean searches to be used to parse the data.  After application of the search terms to the data, there are privilege reviews and then productions, with volume that can sometimes be overwhelming.  With any form of electronic discovery, the litigants must be careful of what they ask for—because they may just get it, and overly broad search terms will tremendously increase the burdens/expenses of the litigants.  And under inclusive term searches have the opposite problem—you may never find the smoking gun.  So, those are some of the limitations with ESI in the traditional keyword search format.   A keyword search will only yield documents that satisfy the exact criteria searched.  It will only return documents that contain the keyword or Boolean derivatives of the keyword.  A keyword search may miss a document that is similar conceptually to a relevant document, but that does not contain the exact keyword phrasing.  Furthermore, keyword searching can also yield results that are irrelevant, but may nonetheless contain the keyword in the search.

A new, smarter form of document searching is emerging—known as “technology assisted review.”  Technology assisted review is also known as predictive coding or computer assisted coding.  In predictive coding, attorneys, typically partners or associates with extensive knowledge of the litigation will select several documents that are the most relevant to the case.  Software algorithms will analyze these documents for recurring patterns and identify documents that follow these patterns.  The relevant documents may be selected by one attorney, or several attorneys may each select a group of documents.  For example, if three attorneys are selecting representative documents, some documents may be selected by all three attorneys, some by only two attorneys, and some by just one.  The predictive coding will recognize that the documents selected by more attorneys are the most relevant, and will factor that information in a search.  Thus, in theory, predictive coding will yield smarter results that are far more accurate than traditional keyword searches since the algorithms “learn” about what documents are needed.  Also, it has also been argued that because the only human input needed is selecting a few key documents, predictive coding is much less time consuming and more cost effective than keyword searching. Of course, whether this is true or not will depend on how much the vendor of smart software wants to charge the users and whether the court will go along with the new technology.  Also, even though predictive coding could save time and costs over traditional keyword review of documents, it does not replace a human search entirely.  Predictive coding may save time and effort in the process of separating relevant and irrelevant documents, but the documents still must be searched for privileged, confidential, and/or trade secret information.  And of course, human effort will be required to select documents to “train” the predictive coding software.

Like many new technologies, courts have been somewhat reluctant initially to embrace predictive coding.  In October of 2011, U.S. Magistrate Judge Andrew Peck of the Southern District of New York wrote an article supporting the use of predictive coding.[1]  At the time of the article, there was no court opinion on the subject.  Proponents of predictive coding looked to this article as a sign that there was judicial support for predictive coding, and that it would not be long before predictive coding was an accepted practice.

Since 2011, there have been a few court opinions on the use of predictive coding.  The first opinion to address the use of predictive coding was Da Silva Moore v. Publicis Groupe, 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y.).  In this case, the parties had stipulated to the use of predictive coding, but the parties were in dispute on how to use it.  In this opinion by Judge Peck, the court approved the use of ESI because of the large volume of documents, the need for cost effectiveness and proportionality, and the transparency of the predictive coding process.

In another case, Global Aerospace Inc. v. Landow Aviation, L.P., No. CL 61040 (Va. Cir. Ct. Apr. 23 2012), a judge approved the defendant’s use of predictive coding as a means of searching discoverable documents.  In that case, the defendants wanted to use predictive coding, but the plaintiffs protested that predictive coding was not as effective as manual (keyword) review, and a radical departure from the standards of practice.  The defendants argued that using manual review would cost $2 million more and would yield less of the relevant documents.  The judge sided with the defendants and approved the use of predictive coding, but gave the plaintiffs the option of paying for the additional costs of using manual review.

While predictive coding is not a perfect solution for sorting through large amounts of electronic information, it may soon be more cost effective than traditional keyword searches—in some cases.  The courts are slowly embracing the new technology of predictive coding.  As the courts begin to embrace this new technology, it may become the norm for electronic discovery.  And of course, as with most technology, it tends to get better, cheaper and faster with time.  So, the future may yield a new world of discovery—with much of the hard search work outsourced to robots (computer algorithms) instead of associates, clerks and contract attorneys.

The law firm of Buche & Associates, P.C. is based in Southern California with offices in San Diego and Los Angeles, and specializes in litigation of patent, trademark, copyright and other intellectual property matters.  www.pacificpatentlawyers.com

By Jennifer Blanton, J.D. and John Buche, J.D.



[1] Predictive Coding: Reading the Judicial Tea Leaves (Law Tech. News, Oct. 17, 2011)