Google recently announced that it would be adding a new feature to its Gmail offering that would automatically update emails stored in users’ mailboxes. The programming feature would allow senders to incorporate dynamic information within emails using Google’s accelerated mobile pages (AMP) technology. Potential use cases include allowing users to shop or respond to surveys within an email, or to see automatically updated event or travel information within a pre-existing confirmation email. Aakash Sahney, Product Manager for Gmail said the feature would “…make it possible for information to easily be kept up-to-date, so emails never get stale and the content is accurate when a user looks at it”. This technology is exciting for both web developers and Gmail users, but it raises concerns for eDiscovery professionals.
In addition to being used for personal emails, Gmail is currently being used by many businesses as an alternative to traditional email applications such as Microsoft Outlook and IBM Notes. Gmail’s ease of use, low cost, and robust feature set make it a competitive offering, and many users are already familiar with the platform from personal use.
Once the dynamic emails go into regular use, eDiscovery professionals will be faced with a challenge of how to recover past versions of the information contained within these communications. Currently, a forensic collection of an email account would include all emails from a chain, assuming they had not been permanently deleted. Should any changes have been made to an earlier email in the thread then the original email would be preserved for comparison. Analytical applications such as email threading and textual comparison tools could be used to easily identify where differences in the content might have occurred. For example, if a custodian books a flight on a specific date to a location, but later changes the date of that flight, the original confirmation email as well as the second updated confirmation email may be retained (if the original email was not deliberately deleted by the custodian). With an automatically updating email, only the most recent flight information may be contained within the single original email. It’s easy to imagine a scenario in litigation where disputes may arise regarding the original content of an email, and tracing these activities within a custodian’s communications would come into play.
In addition to the challenges this may hold for a legal team, who are making a case where evidence may have been changed after the fact, is a further challenge for eDiscovery professionals who are managing the data. This challenge comes in the form of deduplication, the method of comparing and removing multiple copies of the same identical file in order to reduce a reviewable population. When deduplicating non-email files, an algorithm (most commonly MD5 or SHA-256) is used to generate an alphanumeric value of a file known as a hash. Any change to a document either in its content or its metadata will result in a different hash value being generated. Hashing two identical copies of the same file will generate the same hash, allowing for any secondary copies to be culled, in turn reducing the overall reviewable population.
When hashing emails an alternative method is used, whereby a de-duplication hash is generated based on a selection of email metadata attributes, most commonly using the Sender, Recipient(s), Sent Date/Time, Subject and Attachment values in order to generate the hash. The actual written content of the email is not commonly used, due to the excessive numbers of false positives which occur due to additional textual information being added to the email body such as confidentiality or virus protect footers.
It is apparent that the above method of metadata comparison may yield incorrect hash matches when comparing a single automatically updating email whereby only the content of the email is updated. If Google provides an archival feature which stores copies of each version of an email prior to the content update then it is likely that the metadata will be identical. Any deduplication using the standard metadata fields will match all copies of the email regardless of the content. There may also be times when the sender’s copy of the email has been updated to the latest version but a recipient’s copy may not. This may occur due to devices being offline and not synchronizing to the latest version of the email prior to the forensic collection. In this instance, only one copy of the email would be provided for review even if the content differs over multiple versions of the same email.
There is no news at this time as to whether Google plans to enable an archival feature designed to preserve corporate data in light of the push towards dynamic content. Many communication applications that were initially designed with personal use in mind have since been adapted to business use. Many of these applications now offer archiving options and litigation hold features at an additional cost. Courts are challenged in keeping legislation up-to-date with the latest technology developments so it remains to be seen if this automatic updating feature will result in a shift in attitudes towards email preservation.
As adoption of cloud-based technologies increases due to their ease of use, low maintenance, scalability, and low cost, companies considering moving their corporate communications onto new technology platforms must keep a close eye on the long-term development plans for these platforms. Technology companies move at a fast pace and companies must determine their level of comfort with the risks this presents, and how and where their data may be stored.
CDS has deep experience with email collection and data preservation. Contact us today to learn more about how we can help your company manage its data challenges.