Our Insights

Thought Leadership and Industry Trends

Home 9 Insights 9 AI 9 Generative AI is Rocking the Legal World: Here’s How

Generative AI is Rocking the Legal World: Here’s How

Aug 3, 2023

If you’re involved in eDiscovery, you’ve probably overheard battle-hardened veterans reminiscing about the old days, when the litigation support world was tangible – attorney teams sifting through banker’s boxes to (hopefully) find the relevant documents, paralegals photocopying and manually Bates-stamping every page.

With the advent of the digital age, electronically stored information (ESI) replaced paper documents and “e-” became forever attached to discovery. The word on the street in 2023 is that everything is going to radically change again. Why? Generative AI. LLMs. ChatGPT.

What is Generative AI?

Generative Artificial Intelligence (AI) – a type of technology that uses machine learning to generate novel content in response to user prompts actually dates back to the 1960s when the first chatbot, Eliza, was developed at MIT. Most generative software used today works interactively, meaning the user types or says an instruction using natural language, and the software outputs an attempt to fulfill the request – human language text, programming code, image, audio, video, or 3D models.

While the concept has been around for decades, generative AI hasn’t made much news outside the computer science community until recently. The lack of prior fanfare was appropriate – people could draw images and perform music that were objectively better and cheaper than the software, so Generative AI wasn’t creating value.

In the last few years, Generative AI content quality has become extremely realistic. It is now capable of creating images that look like photographs, songs that sound like Drake, and countless other deepfakes that continue making headlines. Seeing the potential value, venture capital investment in generative AI was estimated at $4.8 billion in 2022, nearly 12x growth since 2018.

The legal profession isn’t in the business of generating deepfakes. Beyond the litigation spurred by those businesses, legal professionals are not likely to see much direct impact. However, the legal industry generates enormous amounts of text, and will be directly impacted by the most well-known Generative AI software – ChatGPT.

ChatGPT and LLMs

Released by OpenAI in November 2022, ChatGPT reportedly attracted over 100 million users in its first two months, making it the fastest growing consumer application in history. ChatGPT is one of several similar Large Language Models (LLM) products (some of the others include Bard by Google and You.com) made available due to some key advancements in deep learning and computing power over the last five to ten years.

The key deep learning development that led to LLMs was the creation of the transformer – an artificial neural network (ANN) deep learning model architecture made famous in Google’s white paper Attention is All You Need. Before 2017, text generation was typically used to understand text using a model called a Recurrent Neural Network (RNN). These models “read” text sequentially, word by word. Although the order of words is very important to understanding language – this is how people read – it is limiting for a computer for several reasons:

Speed. RNN is very slow, making it impossible to train on large data sets.
Accuracy. RNN models exhibited memory issues, e.g., if they were processing a large paragraph, they might forget what happened at the beginning.
Complexity. RNN worked well to translate a word or short phrase but would fail when faced with too much complexity or content.

The transformer solved these issues by processing full sentences at once (while still being able to appreciate the order of the words), instead of reading training data word by word. Sentence-by-sentence processing dramatically reduced forgotten context, increased the quality of training, and made it possible to multiply the amount of processors running in parallel to speed up the model creation process and enable training on larger datasets. LLMs have been trained using the entirety of Wikipedia, all links on reddit, Common Crawl, and many other massive datasets. The result: state-of-the-art performance across the spectrum of NLP tasks including language translation, question answering, summarization, sentiment analysis, named entity recognition, and more.

Generative AI and GPT

Perhaps the most impressive and impactful capability of the LLMs has been their application in Generative AI – Generative Pretrained Transformers (GPTs). GPT generates chat by using probabilities to guess what the next word should be following a user prompt. Because it was pretrained with a sizable portion of the content on the internet, the responses can be quite impressive.

Unlike earlier chatbots that could only respond to one sentence questions, the largest GPT model from OpenAI is able to intake up to 50 pages of text at a time, and unlike the few sentence replies of earlier chatbots, ChatGPT is capable of responding with long format content. For example, it can compose an email in the style of Shakespeare, program code and translate text in several different languages, and can be used to summarize documents, tell jokes, edit an essay to find grammatical errors, and solve math problems. ChatGPT has even passed the bar exam.

However, as impressive as ChatGPT and other LLMs seem, they have some very real limitations. Since they only “know” what they were trained on, this issue can have multiple effects:

Time limitations: Models are trained with collected data and then deployed, sometimes without updates. For example, ChatGPT was trained with data collected before 2022, and isn’t able to answer questions about current events they way that Google can.
Hallucinating: The models do not “know” what they’ don’t know, meaning they are designed to output a sequence of words with the highest estimated probability of correctness in response to a prompt and their training data. However, when given a prompt where the model does not “know” the correct answer, LLMs are prone to what is referred to as “hallucination,” a phenomenon where the LLM does not know the correct answer to a prompt but generates a response anyway. In addition, hallucinations are not just simple summarization errors, e.g., the models might invent events, articles, or even fake legal precedent.
“Bad” Information: Since the models only “know” what they were trained on, and they were trained with content written by humans, the training content might be biased, private, illegal, or otherwise dangerous. Although this “bad” information might also be found via Google search, the models don’t always provide source information to help users separate the “good” from the “bad.”

What do LLMs and other Generative AI Mean for Businesses?

McKinsey estimates that the “increases in productivity that are likely to materialize when generative AI is applied across knowledge workers’ activities amounts to $6.1 trillion to $7.9 trillion annually.” While it might be tough to envision how a chatbot with dubiously reliable pre-2022 internet information bolsters productivity, the reason for this projection is the models’ flexibility and adaptability – it is possible to use the models as natural language “foundations” in other software.

Although the software may work in a variety of ways, at a very basic level the targeted LLM software being developed today might lead to the following workflow:

A set of data or documents, specific to a business need (for example, a set of contracts) is pre-processed so that an LLM “knows” its content.
A user can enter prompts in the business-specific software (for example, a question about the contracts).
The LLM “reads” the prompt and generates a response using information from the business-supplied documents.

In this type of deployment, known as “grounding,” LLMs will not be constrained by pre-2022 internet knowledge, and are much less likely to present issues with the hallucinations and bad information. Instead, they will be able to generate content specifically tailored to needs of different businesses, and users will be able to use their natural spoken language to tell the software what they need to it to generate. For businesses that generate content, which is most, the potential use cases are only beginning.

What About the Legal Implications of Generative AI?

There will likely be many legal implications connected to the use of generative AI. Here are some potential scenarios:

What are the liabilities for companies that scrape data from the internet? Did they need consent?
Are user interactions with the model being used to improve the models/product? What level of confidentiality should users expect?
Who owns the copyright to content generated by the models?
Who is liable for harmful content it generates?
What are the requirements and possibilities for preserving and collecting employee interactions with the models?
What data security precautions should corporations take if their employees might be using these products?

Are you wondering how to harness generative AI and how it might impact your organization? CDS provides a full range of advisory services related to analytics, early case assessment, and AI. To discuss how we can help your organization navigate the legal implications surrounding this technology, contact us for a consultation today.

About the Author

Dan Diette, Esq.

Daniel Diette is an eDiscovery Data Scientist specializing in Technology Assisted Review (TAR) and eDiscovery Analytics at CDS. He has over 13 years of experience applying analytics to all phases of eDiscovery. As head of CDS' Advisory Services Analytics Team, Daniel manages the TAR process for CDS’ large, complex projects, from Second Requests to multi-billion dollar litigations and investigations. He consults clients on efficient document search and review in Relativity, Brainspace, DISCO, and Reveal, and the defensible use of predictive coding software and workflows.

01 May 2024

7th Annual Putting Insights into Practice Forum

Navigate a virtual journey through today’s biggest legal data management challenges at PIIP 2024: ADVENTURES ON THE DATA CONTINUUM

Find out more

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	This cookie is set by LinkedIn and used for routing.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gcl_au	3 months	This cookie is used by Google Analytics to understand user interaction with the website.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
_hjFirstSeen	30 minutes	This is set by Hotjar to identify a new user’s first session. It stores a true/false value, indicating whether this was the first time Hotjar saw this user. It is used by Recording filters to identify new user sessions.
oktgid	1 year	This cookie is used for storing the visitor ID of the user who clicked on an okt.to link.
oktsid		This cookie is used for storing the session ID of the user who clicked on an okt.to link.
pardot	past	The cookie is set when the visitor is logged in as a Pardot user.
vuid	2 years	This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
fr	3 months	The cookie is set by Facebook to show relevant advertisments to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Duration	Description
_dc_gtm_UA-109542572-2	1 minute	No description
_hjAbsoluteSessionInProgress	30 minutes	No description
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInPageviewSample	2 minutes	No description
_hjTLDTest	session	No description
AnalyticsSyncHistory	1 month	No description
CONSENT	16 years 8 months 26 days 9 hours 2 minutes	No description
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.