Thought Leadership and Industry Trends
Decoding Continuous Active Learning with CDS London
This article co-authored by CDS London’s Mark Anderson and CDS New York’s William Wallace Belt, Jr., Esq.
Last week, Complete Discovery Source’s London office hosted a breakfast seminar on Continuous Active Learning (CAL). Joining Mark Anderson and Bill Belt of CDS, were Zoe Davies, Legal eDiscovery Manager at Barclays; Jeffrey Shapiro, eDiscovery Manager at Clifford Chance; and Paul Gordon, International Solutions at Relativity. This panel detailed CAL’s key analytics advancements and the impact of CAL on eDisclosure practices for corporations, law firms, and consultancy teams. Practitioners have long anticipated the benefits of deploying cutting edge analytics technology to the massive burden of eDisclosure. Before CAL, time and budget limitations restrained adoption of the technology despite its ability to penetrate the content in datasets. Pre-planned “seed” and “control” set workflows required time and resource commitment with no guarantee of significant project-end savings.
Panelists explained that what makes CAL different is the seamless integration into traditional workflows that teams already use. CAL uses analytics to utilise review decisions (e.g. coding documents as “relevant” or “privileged” and calculating the likelihood that other documents in the data set are also “relevant” or “privileged”). Prior to CAL, eDisclosure teams had to spend time reviewing statistical samples containing both relevant and not relevant documents in order to “train” the algorithm before scoring the rest of the documents within the data set. This is not the case with CAL where documents are scored in real time as coding decisions are applied. In the panel’s opinions, there is really no reason NOT to switch on CAL as a review starts. At worst, although rarely, the team may have to review the entire dataset if there is found to be a high richness level i.e. the majority of the dataset is relevant. Even in those cases CAL pushes the most important documents to the front of the review queue, and the team learns more quickly the key information from the relevant documents. In most cases the technology pushed the most likely relevant document to the front of the review queue, reviewers continue to progress through the dataset, until they find there are no more “relevant” or “privileged” documents being pushed forward. In one case, a legal team reviewed less than 2% of a 7 million document set when it reached the end of the responsive documents, and the determination could then be made to stop the review, saving immensely on the project cost.
Panelists also described the importance of recent legal authority allowing litigants to use analytics technologies including predictive coding. Pyrrho v. MWB Property, Brown v. BCA Trading, Tchenguiz v. Grant Thornton, and this year’s Triumph Controls v. Primus International Holding Co. all favor deployment of predictive coding technologies to better comply with the Civil Procedure Rule Committee’s (“CPRC”) Overriding Objective to ”deal with cases justly and at proportionate cost.” With the launch of the CPRC Disclosure Pilot this January 2019, Predictive Coding technologies including CAL will see an additional boost, when as expected, practitioners see the advantages and cost savings from CAL.
Finally, the panel discussed some best practices when deploying CAL. First, the technology is not a “black box” and offers transparent and verifiable QC capabilities. It also faces certain technical limitations when the data does not easily lend itself to analytics (e.g. documents with little text, large spreadsheets, chat, audio and video data). Technology service providers can offer the expertise to work through the obstacles and optimise results. In addition, the tools simplify QA capabilities with easy-to-understand visual reporting, supplemented with metrics like confidence levels and project progress. The panel also discussed integration of CAL with other eDisclosure technologies such as keyword searches, concept searching, and clustering tools. CAL doesn’t replace other tools; on the contrary, CAL can work best when used alongside other tools. The key is making sure you have access to the right expertise to identify potential workarounds and catch any snags you come across in your matter.
What does the future hold for CAL? The panel agreed that increased adoption is here, as corporations, law firms, and the judiciary learn more about the technology. In addition, the developers have their work cut out for them. Audio and video files, those tough-to-parse chats, and even image files are already amenable to analysis in other analytics applications. We shouldn’t have to wait too long before eDisclosure teams have access to those capabilities in a CAL offering.
CAL presents an opportunity in terms of analytics technology adoption, judicial acceptance, and more proportionate costs. And it couldn’t come soon enough. As questions from the audience made clear, developments like GDPR requirements (e.g. DSARs) and even the challenge of growing complexities in today’s disclosure projects increase demands on corporate and law firm disclosure teams. CAL represents a powerful and much needed advancement in technological solutions to legal team needs.
If you are interested in learning more about CAL and its potential applications in your eDisclosure process, contact us for further expertise and guidance.
About the Author