Document reviews face intense time and cost constraints, all the while data volumes grow, new data types flourish, and communications become more complex. Data visualizations help review teams make fast progress on complicated matters with enormous, varied datasets. Managers can use visualizations to drive crucial decisions, smarter workflows, more efficient classifications and better QC.
CDS experts Chris O’Connor, Cory Logan, Michael Milicevic, Esq. and Sue-Deelia Tang, recently discussed the impact of data visualizations on litigation and investigations on our webinar Seeing is Believing: Why Visualization Matters in eDiscovery?
Read on for Part II of a three-part blog series, where our team provides specific examples and use cases where eDiscovery practitioners can leverage data visualization dashboards. To watch the recorded webinar, click here.
Data visualizations can help drive key legal and business decisions.
If you’re in high-level management, a relationship partner, a general counsel of a company, or if you simply want to check on the progress of discovery for litigation, this is a way to do it without having to get into the dirt too much. So from a management perspective, the benefits are pretty significant.
Visualizations can help attorneys and litigation support professionals of all technical proficiencies make the most of eDiscovery technology with minimal training.
Visualizations can also provide key business insights to corporate legal stakeholders in real time. So, a matter specific billing dashboard built into a review database, for example, can help case teams track costs and provide budget information in real time at a glance as opposed to waiting for invoicing or reviewing static, monthly reporting, multimedia billing dashboards.
You can provide budget insights against a company’s eDiscovery portfolio to help control costs and an attorney work product scoreboard to help track coding decisions and review progress, and make sure production deadlines are met.
A cluster wheel and communications network web can identify internal corporate communication patterns and help identify irregularities in investigations.
I can go on and on, but those are a few examples. The point is that visualizations can play an integral role in analyzing and driving certain business and legal decisions.
What are you going to get from that information? How will we share it and what’s the immediate value? What can I do with this information that I get at the outset?
At a 30,000 feet view, I think visualizations will help you to draw high-level conclusions about data sets that would be pretty difficult if not impossible in some cases to fully capture linearly.
For example, you can readily identify outliers like missing date ranges and complete data sets, unusual communication patterns between custodians and unrelated departments.
You can better categorize documents like potentially privileged, or responsive and non-responsive to help expedite review.
You can also identify patterns that may help inform review strategy.
Make decisions on large volumes of data without getting deep in the weeds, doc by doc.
One example that comes to mind from a recent case: So, let’s say you see a spike in email communication at the end of every month, usually a high count of PDF attachments. Those might be automated notices or invoices going out to clients that you can pull from a review set. That’s a real-life example. I think we pulled 15,000 of those out of a set of 100,000 documents. Visualization can help give you the insight to make those broad decisions on large volumes of data in a much more informed way without having to get so deep in the weeds doc by doc. You may still have to do that, but having an idea of where you’re going early on is a key benefit of visualization.
Right, organizing the information, in a manner which people can then proceed through the review in a sensible way.
How can you organize the big picture for an investigation or litigation by leveraging visualizations?
Obviously privilege is a consideration within that. I think there’s many different ways that you can go about organizing it.
For example, you can create widgets that follow the electronic discovery reference model that’ll help guide the discovery process and workflow. So in the review phase of Midi RM, you might want to set up a visualization that searches and displays categories of PII, for instance, information like phone numbers, social security numbers, which you can then feed into redaction tools like Relativity Redacted RelOne.
And if we’re talking about internal investigations, you might want to just deploy the cluster wheel paired with network communications analysis, maybe a timeline chart and perhaps a sensitive term filter. So you can see who’s talking to who and when they’re having potentially inappropriate conversations. For example, you can use it to help spot bad actors within an organization in employment related investigations.
I think if we’re talking about the production phase of a matter, you might want a dashboard with widgets displaying review progress, or work product designation so you can track responsiveness rates and begin to put together initial production sets. I think the use cases are pretty vast for visualizations. And I also believe that visualizations are really underutilized considering the value that they can add.
Are there limits to what we can see, so long as there’s a data point for it?
I would say that there are. But I think there are far fewer limitations to visualizations today with the power of Relativity One and Aero UI in terms of discovery than there were in the past. If you can dream it or if you can display it in a static format in a report, we can probably find a way to build it into Relativity and make it dynamic and interactive to really bring the data to life in useful and intuitive ways.
Use visualizations to reveal patterns and gaps in your dataset, without being a technologist.
Excellent. So, I’m going to transition to Sue Tang. If I’m not a data science technologist, how can I leverage the visualization to accomplish my goals for a particular matter?
I’m a technologist and I can tell you that in a server environment, it’s going to take a lot of time to build out some of these complex searches if for example you’re talking about: “I want this phrase within three of this phrase, and I want it only between this timeframe and sent between these two people,” it’s going to take time to build out that search. And depending on the tool, you might need to learn different syntaxes.
Visualization allows the teams to quickly and easily click on predefined and prebuilt graphs, charts, and tables, allowing you to identify those key patterns, easily categorize your datasets and quickly make mass decisions on large volumes of data.
So, imagine logging into a platform and within seconds you’re able to see the date range of all the documents within your dataset. You can immediately identify communication patterns between individuals, and quickly see terms within the documents that can be conceptually grouped together with other terms.
Using the Enron data as an example, say we wanted to filter our data to only show communications during the date range of 1999 to 2001. You can just go onto your timeline, and just click on the timeline and filter it by 1999 to 2001. And then say we only wanted to see communications between Jeff Skilling and Ken Lay within that time frame. Now we’re looking at all the communications between 1999 and 2001 between Jeff Skilling and Ken Lay. Now within the cluster wheel, you’re going to be able to see multiple references or groups of words with the word Raptor but two different color schemes. One, the word Raptor has terms grouped with it like basketball or NBA, referring to the Toronto Raptors. Another one, you’ll see Raptors together with terms like entity or fees, things like that, because Raptor was a special business entity where they hid losses.
So right away, you can mass tag a bunch of non-responsive communications, any communications between those individuals within that timeframe that are referencing the Toronto Raptors. It takes three to four mouse clicks to get to that point, whereas without visualization, it would take a lot longer.
Information that took the other DOJ investigators quite a while to find. It’s not a knock at the DOJ, it was just the technology at the time.
At that time. Yes. And so this same workflow can be used to identify other patterns and gaps within your dataset, and you don’t have to be a technologist to do it.
Drill down into key communications in a few clicks, not a few hundred docs.
Let’s talk a little bit more about the content that you can gain out of clustering and communication analysis at the outset. I think they’ve done a nice job over at Relativity with the wheel and the bubble thing. But what benefits do we see up front? Outliers are going to be the things to look at first, right? If there’s small bubbles floating way up by themselves and only one person communicating with them, it could be someone’s mom, it could be someone’s cousin, whatever “let’s have dinner,” “happy birthday kind of stuff,” or it could be a Gmail address they’re floating their protected IP into. Where do you start?
Right away, you can see that the larger bubbles are where most communications are occurring. So, sometimes you want to just focus on that – clicking on one of those would then tell you who that person is mostly communicating with.
Then looking at the cluster, it’ll tell you what types of communications are occurring between those individuals.. So, that’s the benefit, just being able to look at it and start drilling down right away.
You can see who’s communicating with who, what they’re talking about, and then if you add other widgets along with these widgets on a single dashboard, you can also zoom in on other things like timeline, whether or not those documents have been reviewed and who they were reviewed by and how they were tagged.
Were people tagging most of those communications between those individuals within this topic as privileged? Were they tagging them as responsive? And then it just really allows the team to have an overall picture of the entire dataset quickly and easily just by using these visualizations.
How we approach the data really depends on the case and what they’re looking for. Are they doing this for litigation? Are they doing this for an internal investigation?
So, if it’s an internal investigation, maybe you want to group this together with some sensitive terms and see whether or not there’s any communications involving some of the terms. Maybe you want to see if there is any PII being sent to and from certain individuals that shouldn’t be.
And then of course, as soon as you see communications with people that have conflicting business interests, you definitely want to zoom in on that and flag that and see why they’re communicating and what they’re communicating about.
In addition, you can see gaps in communications where people have switched from email to mobile chat or text messaging. So then, you’re going to say all of a sudden within this timeframe, these two individuals went offline and started texting with each other. I want to see what those text messages are about. So, that’s something else that we can do here.
Read Part 3 of the blog series: Visualization in eDiscovery: CDS Vision Synthesizes Proprietary Workflows and Custom Analytics.