Global privacy expert Jonathan Armstrong of Cordery addressed the role of analytics and AI in identifying and protecting data in our recent webinar, Global Data Privacy Update: GDPR Walks the Walk. This blog, adapted from his comments, is the second of a 3-part series. To start with Part 1, click here. To view the entire webinar on demand, click here.
Growing tension between AI and data privacy compliance
First, from a regulatory point of view, we need to be clear about what AI is versus a very sophisticated algorithm that’s running a logical process. Of course, either can be useful in areas like eDiscovery, and all sorts of other ways. The first AI case that we had with the UK regulator was a project for a hospital to predict serious illness from patients based on patterns that had been observed in other patients.
There’s undoubtedly some good to be done under AI, but we’ve had a number of warnings now, both from the UK regulator in the Google health case that I’ve talked about. And then more recently, a campaign from the Spanish regulator, who persuaded the Italian regulator to join some investigations looking initially at least at food delivery apps. We’ve had a couple of cases that illustrate some of the issues.
AI under fire the EU
The first case is over a food delivery app, a Spanish entity owned by Glovo. And the Italian subsidiary is quite a popular app in Italy, with some 19,000 riders running around Italy, delivering food off that app and a second app that’s UK-based called Deliveroo. It has around 8,000 riders in Italy, again, running round delivering food. And the cases are somewhat similar. The fines are somewhat similar, 2.6 million euros for Glovo’s Foodinho and 2.5 million euros for Deliveroo.
Effectively, they were allocating riders to jobs on the basis of an algorithm that they called AI. And the regulator, the Italian Data Protection Authority, said they weren’t transparent in how that was running and that their algorithm wasn’t fair. It took too much data from the riders. They said that it was justified to take some geolocation data but not constant geolocation data.
They found it hard to justify things like capturing battery levels. And I know that’s been something that we’ve highlighted as a concern in the past. A couple of other corporations have gotten into trouble for this. My understanding is that it’s as simple as many developers just taking a standard set of data, either from Android or from Apple and that includes battery data. Some people don’t even want the data, but they just get it as part of the standard package.
But the more fundamental concern I think here was that at least one of the apps scored riders over whether they worked Friday, Saturday or Sundays. And the delivery company said that was necessary because they were busy times and wanted to incentivize people who would turn up and work when they were busy.
“Fine-plus” cases are catching on
The concern is that the Sabbath for most major religions falls on Friday, Saturday or Sunday. So, you may be discriminating against a Jewish rider or a Catholic rider based on them not wanting to work because they wanted to observe their Sabbath.
What we’re increasingly seeing under GDPR is these “fine-plus” cases – Italy has been the pioneer of that. The Telecom Italia mobile case starts off with a fine of 20 million euros plus do the following five things. And these are “fine-plus” cases as well in that the regulator fined both operations but also dictated what that algorithm should look like going forward. They have to do a data protection impact assessment. They have to be prepared to justify the lines in the code, if you like, or the AI parameters that are set there. And you can’t replicate human bias in machines.
Transparency sounds great, until it comes to source code
I think for the eDiscovery world, I think that’s somewhat instructive as well, because oftentimes we set our keywords, we set our rules, we set the engine in a way to go and do its stuff. And sometimes we teach the engine and sometimes in a more sophisticated setting, the engine learns itself, but we still have to know what’s going on. And one of the real challenges in this area is of course, a lot of people developing these applications don’t want to say how it’s built, because that’s their secret sauce. And if they’re a startup corporation for example, they want to hold on to that so that they can use that to attract customers and attract funding.
We’re going to see a real fight, I think, in AI in the next year or so as regulators get tougher on AI and want the source code to be released, want the coding criteria to be released. Developers will resist that and data subjects will sit in the middle and say, somebody made bad decisions about me and nobody’s being transparent.
This is complicated by the fact that we’re in a world at the moment where conspiracy theorists thrive. If you’re not transparent about AI, people assume or just make up the criteria that’s being used. My prediction is they’re going to be solid business reasons behind people being more transparent as well, so that they can say, no, you weren’t just excluded on the basis of your religion, you’re excluded because you always delivered 30 minutes later than Giuseppe. And the more transparent we are about how we’re making those decisions, maybe the less pushback we’ll get in this conspiracy-oriented world.
Click here to read Part III: 5 Proactive Steps Toward GDPR Compliance.