Personally identifiable information (PII) – data that can be used to identify, contact, or locate someone directly or indirectly – is a popular target for identity theft. In the context of litigation, huge amounts of data are routinely exchanged between the parties during the eDiscovery process, and technological safeguards like PII detection and extraction must be in place to avoid disclosing sensitive PII and, as a result, incurring steep court sanctions.
What constitutes PII?
The answer to this question relies heavily on context: the surrounding circumstances or background information that provide the “why” and “where” to words and phrases in a dataset. Context offers the clarity needed to identify exactly what is (and isn’t) PII.
Traditional PII detection systems often struggle to identify sensitive information because they use simple keyword or regex matching, methods that cannot detect sensitive information when it is subtle or inferred. In contrast, generative AI entity extraction allows organizations to quickly and easily scan vast volumes of structured and unstructured data to detect numerous types of PII.
The U.S. has not yet passed a comprehensive data privacy law. As a result, several states have enacted measures to regulate how consumer data may be collected, accessed, and shared. This map shows which states have already (or are considering) passing privacy legislation.
What is Generative AI Entity Extraction?
Generative AI entity extraction is an advanced method that uses large language models (LLMs) to automatically identify, extract, and categorize specific pieces of information (entities) from unstructured data—documents, email messages, and social media posts—and then convert this raw data into structured formats such as JSON for easier analysis and integration into databases or other systems.
Generative AI entity extraction models can adapt to new tasks without extensive retraining, identify not only entities but also their relationships, and extract relevant information from diverse data formats regardless of presentation style. This means that instead of merely flagging a sequence of numbers as a potential phone number, generative AI analyzes the surrounding sentences to determine the context of the information, e.g., the entity connected to the phone number. The advanced capabilities of Generative AI entity extraction allow users to:
- Automate complex workflows that previously required manual review to understand context
- Save significant time and resources
- Reduce the chance of missing sensitive information
- Gain a deeper understanding of operations, customer attitudes, and industry trends
- Use actionable insights to make better-informed decisions
Unfortunately, pattern recognition of personal data and entities can only detect so much. Well-trained generative AI entity extraction models that can learn and adapt across contexts and variations can be far more effective at accurately detecting PII than traditional approaches.
CDS Vision Tools Go Beyond Pattern Recognition to Detect Nuanced Entities
CDS offers AI-powered redaction tools built to reduce the time and cost of the PII detection and redaction process. Two 2025 Relativity Innovation Award-nominated tools, CDS Vision AI Toolkit and CDS Vision AI-Empowered DSARs, offer effective PII management.
CDS Vision Personal Data Detection
CDS Vision Personal Data Detection, part of the CDS Vision AI toolkit, utilizes built-in filters, workflows, and analytics to identify key entities outside of typical patterns. CDS Vision Personal Data Detection utilizes AI to:
- Reduce PII review time from weeks to hours with automated detection
- Minimize breach risk through comprehensive personal data identification
- Cut redaction costs by 80% compared to manual review
- Ensure compliance with GDPR, CCPA, and other privacy regulations
CDS Vision AI – Personal Data Detection allows organizations to automatically identify and redact key personal data such as names, email addresses, job titles, and physical addresses.
CDS Vision AI-Empowered DSARs
CDS Vision AI-Empowered DSARs is a workflow for data subject access requests that incorporates Relativity technology and generative AI solutions to reduce data volumes, time, and cost, and minimize human review. The AI-powered workflow includes data normalization, deduplication, culling using analytics, and leveraging Relativity aiR for document review.
Vision AI-Empowered DSARs leverages Vision AI Toolkit Data Detection and automated redaction across all formats and languages to intelligently protect third-party information, minimize human review, and maintain compliance quality. First-level review can typically be completed within a single day, leaving only human quality control checks.
Use Case: Large DSAR with Limited Internal Resources
A manufacturer received a DSAR resulting in more than 75,000 documents. Without external support or advanced technology, meeting the regulatory deadline would have been impossible. CDS stepped in to support the matter end-to-end.
What CDS achieved:
- Proprietary workflows and Relativity analytics reduced the dataset from 75,000 documents to 11,671 documents.
- After only two small validation samples, the client was comfortable applying the review criteria, and Relativity aiR for Review processed the entire set within minutes, flagging 2,222 documents as relevant or borderline.
- From these documents, CDS Vision AI – Personal Data Detection identified 221,278 unique personal data points including: 35,120 names, 67,643 phone numbers and 25,010 addresses.
- Using Relativity Redact, nearly 9 million redactions were applied to images and Excel files overnight.
- The client’s internal resource only needed to perform the final QC on the 2,222 documents.
Client outcome:
The entire process, from data receipt to final production was completed in just seven days, requiring only one internal staff member to perform validation samples and QC the final output.
Handling sensitive information and the nuances involved must be addressed early in the eDiscovery process. Remediation for mistakes can be costly and ignoring the regulations or believing they don’t apply to “your data” is not a sufficient excuse. Without leveraging technology, identifying personal and sensitive information for redaction can be extremely tedious, time-consuming, and error prone.
Fortunately, CDS Vision dashboards allow clients to efficiently pinpoint and address sensitive data across collections. To learn more about how you can elevate your organization’s handling of sensitive data with generative AI entity extraction, contact us at .


