Thought Leadership and Industry Trends
Adapting eDiscovery Tools for Smarter Post-Data Breach Reviews
eDiscovery technology isn’t limited to helping with document reviews in a litigation context. As discussed in a prior post, it can be used to comply with notification requirements in the case of data breaches. Once a company knows it has experienced a data breach, various state laws and in some cases, international law, require them to investigate what specific data was compromised and notify affected individuals. A smart review strategy can make use of eDiscovery analytics software to reduce volume and organize the data for efficient review, so companies can meet their obligations quickly, accurately and cost-effectively.
Specific tools that can be employed in such post-data breach reviews include the following:
- De-duplication. This refers to the process of removing duplicate files from a collection of ESI based on their hash values. If two documents, or a family of documents, in a collection have the same hash value, one of them is removed. Hash value refers to an algorithm that generates a unique value for each document. It is like a digital fingerprint and is used to authenticate documents and to identify duplicate documents.Traditional processing-level de-duplication should always be used in a data breach review, but other analytics options are also worth consideration. For example, near duplicate documents, meaning documents that maintain a high-percentage of similar text, are likely to contain mostly the same Personally Identifiable Information (PII), and can be grouped together for efficiency. These near duplicates will only need to be reviewed once to locate PII and satisfy notification obligations.In addition, email threading can be used to ensure that the same content in an email chain is not reviewed multiple times. Email threading provides the ability to group email conversations together and sort them logically. In that way, only the email in the thread with unique content, such as the last email in a conversation or the last time that an attachment appears is reviewed.
- Prioritized Review Using Pattern Recognition, Analytics and TAR. The fastest way to move through any document review is to have the review team look at the most likely relevant documents first. Discovery software such as Relativity and Brainspace have tools that can automate this prioritization before a single document has been reviewed. For example, pattern (or regular expression) search in Relativity can be used to find documents with recognizable patterns such as social security numbers, employee identifiers, phone numbers, etc.For a deeper dive, Brainspace uses Natural Language Processing to conduct entity extraction. In addition to simple patterns like social security numbers, entity extraction can identify documents with names, locations, organizations, and more. This can even be used to identify documents with multiple identifiers that meets several states’ threshold for PII.The software-identified documents above can be prioritized initially, and this can be supplemented using Technology Assisted Review (TAR) to prioritize documents for the review team. TAR broadly refers to many methods of technology assistance, including analytics and Predictive Coding, used to organize or expedite reviews. Machine Learning tools such as Continuous Multi-Modal Learning from Brainspace and Active Learning from Relativity can learn from decisions made by the review team, which documents in the remaining set of documents are most likely to contain PII. The highest scoring documents can be continuously prioritized for review until the team runs out of relevant material. At that point, a review of a random sample from the unreviewed documents can be used to determine if further review is merited.
All eDiscovery service providers may not be equipped to help companies with a post-breach audit to identify compromised data. That’s why it’s important to discuss these issues with a provider.
About the Author