Our Insights

Thought Leadership and Industry Trends

Home 9 Insights 9 Insights - Analytics and ECA 9 Decoding Continuous Active Learning with CDS London

Decoding Continuous Active Learning with CDS London

Oct 25, 2018

This article co-authored by CDS London’s Mark Anderson and CDS New York’s William Wallace Belt, Jr., Esq.

Last week, Complete Discovery Source’s London office hosted a breakfast seminar on Continuous Active Learning (CAL). Joining Mark Anderson and Bill Belt of CDS, were Zoe Davies, Legal eDiscovery Manager at Barclays; Jeffrey Shapiro, eDiscovery Manager at Clifford Chance; and Paul Gordon, International Solutions at Relativity. This panel detailed CAL’s key analytics advancements and the impact of CAL on eDisclosure practices for corporations, law firms, and consultancy teams. Practitioners have long anticipated the benefits of deploying cutting edge analytics technology to the massive burden of eDisclosure. Before CAL, time and budget limitations restrained adoption of the technology despite its ability to penetrate the content in datasets. Pre-planned “seed” and “control” set workflows required time and resource commitment with no guarantee of significant project-end savings.

Continuous Active Learning Panel

Panelists explained that what makes CAL different is the seamless integration into traditional workflows that teams already use. CAL uses analytics to utilise review decisions (e.g. coding documents as “relevant” or “privileged” and calculating the likelihood that other documents in the data set are also “relevant” or “privileged”). Prior to CAL, eDisclosure teams had to spend time reviewing statistical samples containing both relevant and not relevant documents in order to “train” the algorithm before scoring the rest of the documents within the data set. This is not the case with CAL where documents are scored in real time as coding decisions are applied. In the panel’s opinions, there is really no reason NOT to switch on CAL as a review starts. At worst, although rarely, the team may have to review the entire dataset if there is found to be a high richness level i.e. the majority of the dataset is relevant. Even in those cases CAL pushes the most important documents to the front of the review queue, and the team learns more quickly the key information from the relevant documents. In most cases the technology pushed the most likely relevant document to the front of the review queue, reviewers continue to progress through the dataset, until they find there are no more “relevant” or “privileged” documents being pushed forward. In one case, a legal team reviewed less than 2% of a 7 million document set when it reached the end of the responsive documents, and the determination could then be made to stop the review, saving immensely on the project cost.

Panelists also described the importance of recent legal authority allowing litigants to use analytics technologies including predictive coding. Pyrrho v. MWB Property, Brown v. BCA Trading, Tchenguiz v. Grant Thornton, and this year’s Triumph Controls v. Primus International Holding Co. all favor deployment of predictive coding technologies to better comply with the Civil Procedure Rule Committee’s (“CPRC”) Overriding Objective to ”deal with cases justly and at proportionate cost.” With the launch of the CPRC Disclosure Pilot this January 2019, Predictive Coding technologies including CAL will see an additional boost, when as expected, practitioners see the advantages and cost savings from CAL.

Finally, the panel discussed some best practices when deploying CAL. First, the technology is not a “black box” and offers transparent and verifiable QC capabilities. It also faces certain technical limitations when the data does not easily lend itself to analytics (e.g. documents with little text, large spreadsheets, chat, audio and video data). Technology service providers can offer the expertise to work through the obstacles and optimise results. In addition, the tools simplify QA capabilities with easy-to-understand visual reporting, supplemented with metrics like confidence levels and project progress. The panel also discussed integration of CAL with other eDisclosure technologies such as keyword searches, concept searching, and clustering tools. CAL doesn’t replace other tools; on the contrary, CAL can work best when used alongside other tools. The key is making sure you have access to the right expertise to identify potential workarounds and catch any snags you come across in your matter.

What does the future hold for CAL? The panel agreed that increased adoption is here, as corporations, law firms, and the judiciary learn more about the technology. In addition, the developers have their work cut out for them. Audio and video files, those tough-to-parse chats, and even image files are already amenable to analysis in other analytics applications. We shouldn’t have to wait too long before eDisclosure teams have access to those capabilities in a CAL offering.

CAL panelists

CAL presents an opportunity in terms of analytics technology adoption, judicial acceptance, and more proportionate costs. And it couldn’t come soon enough. As questions from the audience made clear, developments like GDPR requirements (e.g. DSARs) and even the challenge of growing complexities in today’s disclosure projects increase demands on corporate and law firm disclosure teams. CAL represents a powerful and much needed advancement in technological solutions to legal team needs.

If you are interested in learning more about CAL and its potential applications in your eDisclosure process, contact us for further expertise and guidance.

About the Author

Mark Anderson

In his role as Director of UK Operations for CDS, Mark Anderson provides project management and expert consulting through all stages of eDisclosure and eDiscovery. Mark works alongside corporate and law firm clients to identify data for collection and advises on best practices for collection of data, data processing, and document review workflows. He has supervised multi-national teams and has experience working on some of the largest, most challenging matters, including cases involving cross-border issues and the application of technology assisted review (TAR). Prior to joining CDS, Mark conducted forensic collections, assisted with data investigations, and served as a project team lead for multiple international legal technology service providers. Mark holds multiple Relativity certifications including Relativity Master and is an Encase Certified Examiner.

01 May 2024

7th Annual Putting Insights into Practice Forum

Navigate a virtual journey through today’s biggest legal data management challenges at PIIP 2024: ADVENTURES ON THE DATA CONTINUUM

Find out more

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	This cookie is set by LinkedIn and used for routing.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gcl_au	3 months	This cookie is used by Google Analytics to understand user interaction with the website.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
_hjFirstSeen	30 minutes	This is set by Hotjar to identify a new user’s first session. It stores a true/false value, indicating whether this was the first time Hotjar saw this user. It is used by Recording filters to identify new user sessions.
oktgid	1 year	This cookie is used for storing the visitor ID of the user who clicked on an okt.to link.
oktsid		This cookie is used for storing the session ID of the user who clicked on an okt.to link.
pardot	past	The cookie is set when the visitor is logged in as a Pardot user.
vuid	2 years	This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
fr	3 months	The cookie is set by Facebook to show relevant advertisments to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Duration	Description
_dc_gtm_UA-109542572-2	1 minute	No description
_hjAbsoluteSessionInProgress	30 minutes	No description
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInPageviewSample	2 minutes	No description
_hjTLDTest	session	No description
AnalyticsSyncHistory	1 month	No description
CONSENT	16 years 8 months 26 days 9 hours 2 minutes	No description
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.

Our Insights

Thought Leadership and Industry Trends

Decoding Continuous Active Learning with CDS London

Mark Anderson

7th Annual Putting Insights into Practice Forum

Our Blog

Sign Up for Our Newsletter

About CDS

Contact Us