Our Insights

Thought Leadership and Industry Trends

Home 9 Insights 9 Advisory Services 9 Applying Generative AI to Modern eDiscovery Workflows

Applying Generative AI to Modern eDiscovery Workflows

Aug 16, 2024

Technology assisted review (TAR) software has remained much the same over the last decade. Until now. A new wave of generative AI solutions—including Relativity aiR and DISCO’s Cecilia—are coming to market. The hype surrounding these products has been immense, prompting teams to ask: Will generative AI replace TAR? Will it replace human review entirely?

DOWNLOAD OUR EDISCOVERY AI GLOSSARY

Traditional TAR Workflows “Learn” by Example

Established TAR software uses machine learning to distinguish likely relevant from likely irrelevant documents. The software “learns” the distinction from exemplar documents classified as relevant and irrelevant by attorney reviewers. From these examples, all the remaining unreviewed documents receive rankings (typically on a scale of 0 – 100). The higher the ranking, the more likely a document will be relevant. Depending on client needs, those rankings can be used to create a variety of workflows. The two most common workflows are known as TAR 1.0 and TAR 2.0.

TAR 1.0

The TAR 1.0 workflow uses the TAR rankings as a classifier in large matters like Second Requests, where it would be unfeasible to review all relevant documents before production. Documents that score above a determined threshold, plus any attachments, qualify for review or production (subject to a screen for sensitive information and privilege). Documents below that threshold are discarded. The threshold rank is typically determined by review of a random sample that can be used to estimate which rank will supply a sufficient percentage of the relevant documents, i.e., sufficient “recall”.

TAR 2.0

TAR 2.0 uses TAR rankings to sort documents for review. After the software has generated its initial rankings using a small sample set of reviewed documents, the highest-ranking unreviewed documents are presented to the review team first. As the review progresses and the team identifies additional relevant and irrelevant documents, the TAR software updates its rankings periodically to incorporate the new exemplars, reshuffling the document sort order. By front-loading the review with the most likely relevant documents rather than sorting the documents randomly, teams are able to move into second level review or complete their production requirements much more efficiently.

Since the most likely relevant documents get reviewed first, the high-ranking documents eventually run out and the review team begins to see predominantly low ranking and likely irrelevant documents. When this occurs clients commonly consider stopping review before the entire set has been completed. This can be defensibly accomplished by reviewing a random “Elusion Sample” from the unreviewed pile to estimate how many relevant documents would elude identification by cutting off the review.

Generative AI for Document Review

Here’s basically what Generative AI does: A human user enters a prompt to Open AI’s Chat GPT, Microsoft Co-Pilot, Google’s Gemini, or some other interface. The chat interface submits the prompt to a Large Language Model (LLM) which generates a response. The responses have such high quality that it seems like the model understood the prompt, like a human. However, the model processes information much more quickly than a human ever could, and sometimes the responses include errors – termed “hallucinations” – that a human would not make.

So how can this technology help with document review?

Many eDiscovery platforms have developed integrations with LLMs using a technique known as Retrieval Augmented Generation (RAG). RAG combines a generative model prompt with access to a knowledge base of documents. This combination allows the model to ground its responses in that knowledge base, rather than its original training data.

In this example using Google Gemini Advanced, AI extracts the information from the source material.

prompt in Gemini asks to see the leadership team list from the cdslegal.com website

Screenshot: Google Gemini Advanced

It is also possible to tell the LLM to classify a document as relevant or irrelevant to a set of instructions.

Prompt in Gemini asks whether CDS supplies ediscovery services

Screenshot: Google Gemini Advanced

Relativity aiR for Review features a much more robust and useful set of outputs than the one-word reply above, making it is possible to:

Input a set of instructions detailing the type of content that is relevant to the review. The instructions can be in plain English, in a format that is nearly identical to a protocol written for a review team.
Use aiR (via GPT-4o) to classify all the documents in the database on a scale of 0-4, with higher ranks likely being more relevant.

Using aiR Classifications to Drive Other Workflows

As with traditional TAR software, it is possible to use the classifications to drive different workflows.

TAR 1.0 Replacement

Using aiR as a replacement for relevance review in the same way TAR 1.0 has been applied most of the last decade is the easiest use case for this workflow—The Sedona Conference provides a deeper dive into this framework. In testing, aiR for Review has regularly achieved 85-95% recall, significantly better than the 80-85% typically achieved with traditional TAR software. Although aiR for Review is much faster and less costly than manual review, it is more expensive than traditional TAR 1.0 software. Other potential downsides: it might only be helpful for outgoing production reviews and there is not yet any Da Silva Moore-type court approval of this workflow.

TAR 1.0 vs Gen AI workflow

TAR 2.0 Replacement

Although aiR for Review generates only four rankings, the rankings can be useful for prioritizing a review in the same way as traditional TAR software. It’s also possible to turn on traditional TAR 2.0 software—such as Review Center—within each rank to more efficiently prioritize each subset. However, because this workflow encompasses review AND the heightened software fees, it will likely be most appealing for smaller data sets with more expensive attorney reviewers.

TAR 2.0 vs. Gen AI Workflow

Issue Coding

One of the biggest benefits of aiR for Review over traditional TAR software is that it is not limited to a binary relevant/not relevant designation. Instead, it can classify for 10 different issues in a single analysis, which could be used after review to help with fact discovery and deposition preparation.

Gen AI Offers an Exciting Assortment of Review Options

Generative AI has expanded the already bountiful array of review options. While this might make the best approach a little more challenging to identify, the benefits of this software will likely be a major win for the industry going forward. As with any new technology, the costs will go down and the quality of the product will go up. Keep checking in with CDS to learn more about the latest advancements in Gen AI.

CDS provides a full range of advisory services related to document review and production. To discuss how we can streamline your organization’s workflow, contact us at  for a consultation today.

About the Author

Dan Diette, Esq.

Daniel Diette is an eDiscovery Data Scientist specializing in Technology Assisted Review (TAR) and eDiscovery Analytics at CDS. He has over 13 years of experience applying analytics to all phases of eDiscovery. As head of CDS' Advisory Services Analytics Team, Daniel manages the TAR process for CDS’ large, complex projects, from Second Requests to multi-billion dollar litigations and investigations. He consults clients on efficient document search and review in Relativity, Brainspace, DISCO, and Reveal, and the defensible use of predictive coding software and workflows.

07 October 2025

Relativity Fest 2025

As a Sapphire Sponsor and longtime Relativity Gold Partner, Complete Discovery Source (CDS) is proud to return to Relativity Fest 2025 in Chicago!

Find out more

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	This cookie is set by LinkedIn and used for routing.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gcl_au	3 months	This cookie is used by Google Analytics to understand user interaction with the website.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
_hjFirstSeen	30 minutes	This is set by Hotjar to identify a new user’s first session. It stores a true/false value, indicating whether this was the first time Hotjar saw this user. It is used by Recording filters to identify new user sessions.
oktgid	1 year	This cookie is used for storing the visitor ID of the user who clicked on an okt.to link.
oktsid		This cookie is used for storing the session ID of the user who clicked on an okt.to link.
pardot	past	The cookie is set when the visitor is logged in as a Pardot user.
vuid	2 years	This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
fr	3 months	The cookie is set by Facebook to show relevant advertisments to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Duration	Description
_dc_gtm_UA-109542572-2	1 minute	No description
_hjAbsoluteSessionInProgress	30 minutes	No description
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInPageviewSample	2 minutes	No description
_hjTLDTest	session	No description
AnalyticsSyncHistory	1 month	No description
CONSENT	16 years 8 months 26 days 9 hours 2 minutes	No description
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.