IST Discover-E White Paper:

Data Expansion

We all know additional data can pile onto a case throughout the e-discovery lifecycle. But did you know that the data you already have may grow, too?

Here’s a sample scenario: You identify custodians relevant to the case and collect files from each.  Roughly 50 gigabytes (GB) of Microsoft Outlook email PST files and loose “efiles” is collected in total from the custodians.  You process the files to load into Relativity so that you can perform first pass review and, eventually, linear review and produce the files to opposing counsel.  After processing, you get a bill for more than you were expecting!!  What happened?!?



Many of the files in most ESI collections are stored in what are known as “archive” or “container” files.

For example, as noted above, Outlook emails are typically saved for each custodian in a personal storage (.PST) file format, which is an expanding container file. For most custodians, all of their email (and the corresponding attachments, if present) resides in a few PST files.  The scanned size for the PST file is the size of the file on disk.


Did you ever see one of those vacuum bags that you store clothes in and then suck all the air out so that the clothes won’t take as much space? 

The PST file is like one of those vacuum bags – it typically stores the emails and attachments in a compressed format to save space.  When the emails and attachments are processed into a review tool, they are expanded into their normal size.  This expanded size can be double the scanned size (or more).


There are other types of archive container files that compress the contents – .zip and .rar files are two examples of compressed container files.

These files are often used not only to compress files for storage on hard drives, but they are also used to compact or group a set of files when transmitting them, usually in – you guessed it – email.  With email comprising a majority of most ESI collections and the popularity of other archive container files for compressing file collections, the expanded size of your collection may be considerably larger than it appears when stored on disk.  It’s important to be prepared for that and know your options when processing that data, so you can effectively anticipate those processing costs.



IST Discover-E Feels Your Pain - Options to offset the increased cost of data expansion are provided by your IST Discover-E Project Managers (PM) as they too would like to make sure your eDiscovery project workflow and pricing is as predictable as possible.  Further, IST Discover-E’s tiered pricing model was created to counterbalance unforeseen costs associated with data expansion.  As data sets expand past predetermined thresholds, IST Discover-E pricing decreases.  This built in mechanism provides considerable cost savings on every project and adds predictability back into the equation when it comes to data processing.


At IST Discover-E, we have years of experience helping our clients with their eDiscovery needs along with full scale legal support management systems.  We are expert in creating and customizing eDiscovery processes that best fit our client’s needs and expectations. Our model is uniquely transparent, easy to understand and effective in aiding our clients get the decision they want for their clients.

Talent Acquisition Team

Innovative • Service • Technology • Passion