search
  resources   /   news   /   events
Select Page

    Our Insight

    Thought Leadership and Industry Trends

    Adapting eDiscovery Tools for Smarter Post-Data Breach Reviews

    July 31, 2018

    eDiscovery technology isn’t limited to helping with document reviews in a litigation context. As discussed in a prior post, it can be used to comply with notification requirements in the case of data breaches. Once a company knows it has experienced a data breach, various state laws and in some cases, international law, require them to investigate what specific data was compromised and notify affected individuals. A smart review strategy can make use of eDiscovery analytics software to reduce volume and organize the data for efficient review, so companies can meet their obligations quickly, accurately and cost-effectively.

    Specific tools that can be employed in such post-data breach reviews include the following:

    • De-duplication. This refers to the process of removing duplicate files from a collection of ESI based on their hash values. If two documents, or a family of documents, in a collection have the same hash value, one of them is removed. Hash value refers to an algorithm that generates a unique value for each document. It is like a digital fingerprint and is used to authenticate documents and to identify duplicate documents.Traditional processing-level de-duplication should always be used in a data breach review, but other analytics options are also worth consideration. For example, near duplicate documents, meaning documents that maintain a high-percentage of similar text, are likely to contain mostly the same Personally Identifiable Information (PII), and can be grouped together for efficiency. These near duplicates will only need to be reviewed once to locate PII and satisfy notification obligations.In addition, email threading can be used to ensure that the same content in an email chain is not reviewed multiple times. Email threading provides the ability to group email conversations together and sort them logically. In that way, only the email in the thread with unique content, such as the last email in a conversation or the last time that an attachment appears is reviewed.
    • Prioritized Review Using Pattern Recognition, Analytics and TAR. The fastest way to move through any document review is to have the review team look at the most likely relevant documents first. Discovery software such as Relativity and Brainspace have tools that can automate this prioritization before a single document has been reviewed. For example, pattern (or regular expression) search in Relativity can be used to find documents with recognizable patterns such as social security numbers, employee identifiers, phone numbers, etc.For a deeper dive, Brainspace uses Natural Language Processing to conduct entity extraction. In addition to simple patterns like social security numbers, entity extraction can identify documents with names, locations, organizations, and more. This can even be used to identify documents with multiple identifiers that meets several states’ threshold for PII.The software-identified documents above can be prioritized initially, and this can be supplemented using Technology Assisted Review (TAR) to prioritize documents for the review team. TAR broadly refers to many methods of technology assistance, including analytics and Predictive Coding, used to organize or expedite reviews. Machine Learning tools such as Continuous Multi-Modal Learning from Brainspace and Active Learning from Relativity can learn from decisions made by the review team, which documents in the remaining set of documents are most likely to contain PII. The highest scoring documents can be continuously prioritized for review until the team runs out of relevant material. At that point, a review of a random sample from the unreviewed documents can be used to determine if further review is merited.

    All eDiscovery service providers may not be equipped to help companies with a post-breach audit to identify compromised data. That’s why it’s important to discuss these issues with a provider.

    CDS provides a full-range of advisory services. To discuss how we can assist you with a data breach, contact us for a consultation.

    About the Author

    Dan Diette, Esq., Data Scientist, CDS

    Dan is an eDiscovery Data Scientist specializing in Technology Assisted Review and eDiscovery Analytics at CDS.  He has over five years of experience focusing on the application of machine learning and predictive coding technology to eDiscovery.  He has designed TAR workflows and validation reporting that have been presented to and approved by the DOJ and FTC for HSR Second Requests, as well as in multi-billion dollar civil litigation in federal courts.   Dan has managed the Technology Assisted Review process for all of CDS’s large and complex Second Request Reviews during his tenure at CDS.  Dan is additionally an attorney admitted to the New York State Bar Association.

         ddiette@cdslegal.com