Active Data: Data on a computer that can be accessed without restoration. This term is often used to describe information currently displayed on a computer screen. The more technical usage refers to information stored on local storage media or a device that is visible to the operating system and/or application software with which it was created. Active data is accessible to users immediately and without modification or restoration.
Application: A software program such as Word or Excel. (commonly used in place of “program” or “software”): A program or group of programs designed to enable end-users to manage computer resources and/or utilize end user programs.
Archival Data: Information that is maintained for long-term storage and record keeping purposes, but is not immediately accessible. Archived data can be stored in a number of ways. For example, it can be written onto removable media, like a DVD or backup tape, or maintained on a system hard drive.
Attachments (or Child): Electronic files appended to an e-mail message. This term most commonly refers to a file attached to an email message. More generally, it refers to a file or record that is attached or associated with another, often for purposes of retention, transfer, processing, review, production and/or routine records management. Multiple attachments can be associated with a single file or record (referred to as the “parent” or “master” record).
Author (or Originator): The person or office that created or issued an item.
Cache (pronounced “cash”): A special high-speed storage mechanism that usually is utilized for frequently used data. Website contents, for example, often reside in cached storage locations on a hard drive.
Carving: The process of searching through the unused parts of a disk for files that haven’t been overwritten and recovering those files. Word to the wise: “deleted” does not mean gone--deleting a file usually just unlinks it from your computer’s file system. With the right software, the deleted files can usually be recovered.
Chain of Custody: Process of documenting and tracking possession, movement, handling and location of evidence. Chain of custody is tracked from the time evidence is obtained, until presentation in court or other submission. A clear chain of custody is important when issues of admissibility and authenticity arise, as it can establish that the evidence was not altered or tampered with in any way.
Clawback Agreement or Quick Peek: A very handy agreement which states that if you accidentally give the other side your privileged documents, they have to give them back and can’t use them against you or claim they aren’t privileged anymore. There are no known reasons for not having a clawback agreement, but there are very good reasons to have one in place. A serviceable clawback agreement can be written in one paragraph.
Checksum: A sequence of numbers and letters that is essentially unique for each and every file in the world. Comes in several different flavors, including MD5 and SHA1. Extremely useful for finding duplicates, determining if someone has files they shouldn’t have, and identifying evidence.
Cloning: Cloning is a term generally used when referring to making a copy of the drive, as an example, to put into another machine without having to install everything from scratch. Another reason for cloning is mainly for backup purposes. Typically, cloning programs are not configured properly to get all areas of the drive. There is also a problem with later authentication, meaning there is no way to tell if anything was deleted or added to the clone after the day it was made.
Cloud (or Tagging): The National Institute of Standards and Technology (NIST) defines cloud computing, in part, as: “… a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”
Cloud Computing: Internet-based computing wherein services - such as storage and applications - are delivered through the Internet, as opposed to using local servers or devices.
Coding: Process of examining and evaluating documents through the use of predetermined codes, and recording the results.
Compression: The reduction in the size of data to save storage space and reduce bandwidth necessary for access and transmission. “Lossless” compression preserves the integrity of the data (e.g., ZIP and RLE), while “lossy” compression does not (e.g., JPEG and MPEG).
Collection: The process of retrieving or collecting electronic data from storage media or devices; an E-Discovery vendor “collects” electronic data from computer hard drives, file servers, CDs, and backup tapes for processing and load to storage media or a database management system.
Corrupted File: A corrupted file is one that has been damaged and cannot be read by a computer in part or in whole. Common causes include viruses, hardware or software failures, and degradation due to the passage of time.
Cost Shifting: When the responding party forces the requesting party to pay for the costs of responding to certain discovery. Often a Solomonic remedy imposed by the Judge when one party is asking for too much but maybe shouldn’t be prevented outright from getting it. Under so-called American rules of discovery, cost shifting is unlikely to be applied to well-drafted and reasonable discovery requests.
Culling: Processing a large set of data and removing the junk data so that it’s easier to search and less expensive to host or transfer. It’s best for the parties to agree on the criteria that will be used to cull the data i.e. date range, file types, domains, file sizes etc.
Custodian: A custodian is a person from whom the data is collected.
ECA – Early Case Assessment: The implementation of litigation analysis and management protocol that provides for an aggressive stance, gathering information as quickly as possible to ensure the company can determine the most favorable way to resolve the case instead of simply reacting to opposing counsel. This process involves making a concerted effort to complete all the major work within the first 90 to 120 days of a lawsuit’s filing. In order to truly conduct an ECA with a return on investment (or ROI), you need to start the process before you collect your first gigabyte of data. ECA should occur when you are assessing whether to file or defend a suit, or immediately thereafter. Conducting your case’s assessment at this point will ensure that you not only have a defensible approach, but that you minimize eDiscovery costs across the lifecycle of the case, especially down the line when you’re conducting a document/information review.
eDiscovery or Electronic Discovery: A process where the parties to litigation exchange electronic evidence. eDiscovery has been the subject of much teeth-gnashing and hair-pulling, with many lawyers and commentators complaining about its cost and difficulty, but eDiscovery is inescapable unless the parties live in caves and do not use computers. If a lawyer wants to prove that certain facts did or did not occur, then eDiscovery is strongly recommended.
EDRM or Electronic Discovery Reference Model: The EDRM is not the be-all and end-all of eDiscovery, but it is a good place to start. While every law firm, consultant, and ESI provider claims they follow the EDRM, few actually cover the entire EDRM, which includes:
Electronic Document Management: This refers to the process utilized in the management of documents, whether hard copy or electronic. In the case of hard copy documents, it includes those steps necessary to make them available electronically, such as images, archiving, etc.
Electronic Image: An electronic or digital picture of a document; the most common image used in E-Discovery is TIFF (Tagged Information File Format)
ESI – Electronically Stored Information: ESI is used to refer to all information or data found in computers, tablets, mobile phones, servers, the cloud, social media sites, etc. (Basically, it refers to anything stored digitally that may need to be processed, hosted, and reviewed.)
Email: Electronic mail or computer-based mail
Email String (or Email Thread): A series of e-mails linked together by e-mail responses or “forwards”
Email Archiving: The process of preserving and storing email.
Embedded Metadata: Metadata embedded with content. See Metadata.
Encryption: A procedure that makes the contents of a file or message scrambled and unintelligible to anyone not authorized to read it.
Expanded Data: To expand or restore compressed data to its original size and format. Practice Note: most eDiscovery vendors bill on expanded data.
Extracted Text: Because a native file is designed to be electronically usable, it tends to be inherently electronically searchable. Converting to TIFF removes application metadata. To restore a measure of electronic searchability, parties extract text from the electronic document and supply it in a file accompanying the TIFF images. It’s called multi-page extracted text because, although the single-page TIFFs capture an image of each page, the text extraction spans all of the pages in the document. A recipient runs searches against the extracted text file and then seeks to correlate the hits in the text to the corresponding page image.
Hard Drive: Self-contained storage device, generally with a high capacity, that has a read-write mechanism and one or more hard disks.
Hash: A hash value (or hash) is an alpha-numeric string that is generated by an algorithm and uniquely identifies original data. It is useful to authenticate data (such as a file) for evidence admissibility in court, for determining duplicate documents and for identifying alterations to documents. Common hashes are MD5 or SHA. An example of a hash value: d41d8cd98f00b204e9800998ecf8427e.
Hash Coding: Method of coding that provides quick access to data items capable of being distinguished through use of a key term, like the name of a person. Each data item to be stored is associated with the key term, the hash function is applied to that term, and the resulting hash value may then be used as an index that permits users to select one of several “hash buckets” in a hash table. The table contains pointers to the original item.
Hyperlink: An element in an electronic document (usually appearing as an underlined word or image) that links to another place in the same document or to an entirely different document when clicked.
Image: Refers to an exact replica. Image may refer to a type of document, such as a .tif or .jpeg. To image a hard drive means to make an identical copy. Forensic imaging is a bit for bit copy and can be made at a logical or physical level, meaning a user just copies the C drive or the D drive or the unallocated space (which is where deleted data resides). The main advantage to forensic imaging is the checksums and verification by digital fingerprint contained in the image format, which show that the image has not been altered in any way since the day it was made. If the image has been altered, the CRC values (checksums) and the digital fingerprint (such as the MD5 hash) will change and not match, and the image will not verify.
Image Processing: Capturing an image, usually from data in its native format, so it can be entered into another computer system for processing and, often, manipulation.
Import: To bring information or data to one environment or application from another.
Inactive Record: Records that no longer are routinely referenced but must be retained, usually for audit or reporting purposes.
Index: In the context of electronic discovery, index refers to database fields used to categorize, organize and identify each document or record.
Information Governance: The entire kit and caboodle relating to a company’s information/data, be it paper documents or ESI – not just document retention policies and document retention, but the true life cycle of information, which includes:
Media or Medium: An object on which data is stored. Examples include disks, backup tapes, servers and hard drives.
Meet & Confer: A meeting (or phone call) at the beginning of a case for lawyers to talk about discovery and try to reach agreement on preliminary matters like forms of production and dates for depositions. Required in federal court. Most often the meet & confer session is “phoned-in” both literally and figuratively, to the detriment of everyone involved. Best if counsel prepare beforehand, talk with their clients about eDiscovery and the evidence that’s likely to be sought, and come with a game plan.
Metadata: Least helpful definition: “data about data.” More helpful definition: contextual information about computer files that helps explain how/when/where/why they were created. Metadata can also prove that a piece of a evidence “is what it purports to be”--e.g., “even though he denies it ladies and gentlemen, this email is in fact an email written by Mr. X on [insert date] from his home computer.” Metadata comes in two main categories, embedded metadata and system metadata. The handy thing about embedded metadata is that it travels with the file, so that if you copy the file to transfer media and give it to your opponent, it will still be there. In contrast, system metadata does not travel, and is therefore difficult to produce in discovery. Examples of system metadata are: directory paths, last-modified dates, and created dates. System metadata is often produced in load file that accompanies the discovery response.
Mirroring: Duplicating data or a disk in a manner that results in an exact copy. This is often done for backup purposes.
Native: A file that is in the form in which it was originally created. If the file started its life by someone opening Microsoft Word, typing something, and then hitting “save,” then the native file will have a “.doc” or “.docx” extension. The opposite of a native file is printing a “.doc” file to paper or to “virtual” paper--e.g., TIFF (see below) or PDF.
Natural Language Search: A manner of searching that does not require formulas or special connectors (e.g. ,“origin” and “basketball”), but can be performed by using plain statements (e.g., “What is the origin of basketball?”).
Near Duplicates: The process of identifying and culling documents that are nearly duplicate. Deduplication software can group near duplicate documents by percentage of similarity, so reviewers can quickly review and code documents for responsiveness or privilege. See Deduplication.
Near Native: Functionally the same as native. Because some things can’t really be produced in the application that created them, then we call the next best thing near-native. An example is an email generated in Gmail.
OCR Text: Optical Character Recognition; searchable text that corresponds to a document image; an OCR software program designed to “read” a document image generates OCR text. OCR serves to pair the images of letters with their electronic counterparts and impart a rough approximation of searchability.
Off-line Storage: Storage of electronic records on a removable disc or other device for disaster-recovery purposes.
Overwrite: To copy or record new data over existing data, as with backup-tape recycling or when updating a file or directory. Practice Note: Overwritten data cannot be retrieved, making it important to suspend policies and procedures likely to result in the overwriting of potentially relevant data, once on notice of the duty to preserve.
Records Manager: The person responsible for implementation of a records management and information governance program.
Records Retention Period (or Retention Period): The length of time a given set or series of records must be maintained. The retention period is often expressed as a period of time (such as six years), an event or action (such as completion of an audit), or both (six years after completion of an audit).
Redaction: The intentional concealing of a portion of a document or image, done for the purpose of preventing its disclosure. Practice Note: Redactions and their basis should be clearly indicated and disclosed to avoid an appearance of bad faith.
Restore: Transferring data from a backup medium to an active system. Data is often restored for the purpose of recovering the data after a problem, failure or disaster, or where the data is relevant and has not been preserved or cannot be accessed elsewhere.
Review: Process used to read or otherwise analyze documents in order to determine content, relevance or applicability of some other objective or subjective standard.
Review Platform: Software like Relativity for examining electronic evidence--either your own or the other side’s. Can be hosted in a “cloud” environment--in which case expect to pay by GB, and don’t say I didn’t warn you. Alternatively, software that runs on one’s desktop. Ranges from inexpensive to insanely expensive. More often the latter. We’re trying to change this.
TAR (Technology Assisted Review): See Predictive Coding.
Temporary (Temp) File: Files created by applications and stored temporarily on a computer. Temp files enable increased processor speeds. In the case of temporary Internet files, for example, a browser stores website data so that the next time the same website is accessed it can be loaded directly from the temporary Internet file. Stored data may also be viewed even in the absence of an Internet connection.
Thread: A series of related communications, usually on a particular topic.
TIFF: A mild form of disagreement among opposing counsel, usually caused by bickering about forms of production. Ok, sorry for the pun. A TIFF is an image file, like a JPEG, PNG, or GIF, except that it has almost no legitimate purpose for existing. (At least one can make hilarious cat videos with GIFs!) In very backward, retrograde forms of eDiscovery, native files (see above) are converted to TIFF images and produced as such, with a load file (see above) provided to make up for the fact that the TIFF conversion process strips out almost every useful piece of information contained in the original file! Responding parties: please stop giving us TIFFs. Producing parties: don’t accept TIFFs.
Unallocated Space: The area of computer media, such as a hard drive, that does not contain normally accessible data. Unallocated space frequently results from deletion, wherein data resides but is not generally accessible, until being overwritten, wiped or retrieved through utilization of forensic techniques.
Vendor: People who firmly believe that you cannot survive without them. Sometimes, but not always, they are right. If you are about to try carving and are not yourself trained in this field, please call a vendor.
VPN (Virtual Private Network): Secure networks that utilize mechanisms, such as encryption, to ensure access by authorized users only and prevent data interception.
ZIP: Common file formatting allowing fast and simple storage for the purposes of archiving or transmitting.
The amended Federal Rules of Civil Procedure (FRCP), recent headlines highlighting spoliation sanctions, and rising costs have made eDiscovery a top priority for legal, IT, security, and records management teams. However, there seem to be no precise, concise definitions available that can truly put boundaries around the process. It becomes sort of an umbrella phrase that essentially deadens decision making from senior non-technical decision makers. The following index of eDiscovery terms breaks down eDiscovery lingo into its most simplistic terms and provides a homogenized basis to help guide eDiscovery decision makers’ process.
Backup: A copy of data for preservation purposes; data is often backed-up on a network file server or backup tapes. Active electronically stored information (ESI) copied onto a second medium (like a CD, DVD or backup tape) in its exact form, often intended as a source for recovery should the first medium fail. Usually, backup data is stored separately from active data, and differs from archival data (though may be a copy of archival data) in method and structure of its storage.
Backup-tape Recycling: The process of overwriting backup tapes with new data, typically on a fixed schedule (referred to as a “rotation”). Rotation schedules vary depending on the type and purpose of the backup tape. Practice Note: Once a duty to preserve arises, parties must suspend all routine deletion practices likely to result in the alteration or loss of potentially relevant evidence. This includes backup-tape overwriting, which commonly is overlooked.
Bates Number: Sequential numbering used to track documents, images or production sets (as with productions made in native format), which often includes a suffix or prefix to help identify the producing party, case name or similar information. Practice Note: You cannot Bates Label a native document Best practice is to include a labelled place-holder.
Data: Electronic information stored on a computer/Electronically Stored Information or ESI.
Data Extraction: Process of parsing data from electronic documents into separate fields, such as “Date Created,” “Date Modified,” “Author,” etc. In a database, this allows for searches across data or by sorting respective fields.
Data Filtering: Use of specified parameters to identify specific data.
Data Mapping: Method used to capture information relating to how ESI is stored, both virtually and physically. A basic data map will include name and location information, while a more complex data map may include several, if not all, of the following: software and formatting information; description of backup procedures in place; interconnectivity and utilization of each type of ESI within the organization; accessibility, policies and protocols for retention and management; and record custodian information.
Data Set: Named or defined collection of data.
Database (DB): Refers to a collection of information, organized so that specific data or groups of data can be identified and searched quickly.
Decompression: To expand or restore compressed data to its original size and format. Practice Note: most eDiscovery vendors bill on expanded data.
Deduplication: A process that removes multiple copies of the same file from a set of files, leaving you with only one of the copies. This is super helpful when you have to review a large number of files and you don’t want to waste your time going line-by-line through files to see if they are the same. Horizontal or Global deduplication means removing all the duplicates across the board. Vertical or Custodial deduplication means keeping a copy of a duplicate if it belongs to a different custodian.
Deleted Data: Refers to live data that has been deleted by a computer system or user activity. “Soft deletion” refers to data marked for deletion that may no longer be accessible to the user (such as the emptying of one’s “recycle bin”), but has not yet been physically removed or overwritten. Soft deleted data may be recoverable. Further, deleted data in general may remain on storage media, in whole or in part, until it is overwritten or “wiped” and, even after being wiped, it may be possible to recover information relating to the deleted data.
Deletion: The process of removing or erasing data from active files and other data storage structures, although some or all of it may be recoverable with special data recovery tools.
DeNISTing: One way of culling data (see above). One takes a huge list of checksums (see above) for known junk files and removes any matching files from the data set. The NIST part derives from the National Institute for Standards and Technology, who, among other things, maintains the list of junk files.
Directory: An organizational unit or container used to organize folders and files into a hierarchical or tree-like structure. Some user interfaces use the term “folder” instead.
Discovery: General term used to describe the process of identifying, locating, obtaining, reviewing, evaluating, and/or producing information and other evidence for use in the legal process.
Document (or Document Family): Pages or files produced either in hard copy or through a software application, which constitute a logical single communication of information. For example, fax cover letter, faxed letter and attachments. For document review purposes, the cover letter often is referred to as the “parent,” and the letter and attachments as the “child.”
Document Type (or Doc Type): A typical field used in coding, examples include “correspondence,” “memo,” “agreement,” etc.
Download: The process of moving data from another location to one’s own, typically over a network or the Internet.
Duty to Preserve: Duty arising under state and federal law, upon reasonable anticipation of litigation, to preserve documents, electronic records and data, and any other evidence or information potentially relevant to a dispute. The duty also arises in the context of audits, government investigations and similar matters. The scope of the duty and what is required under a specific set of circumstances is determined by considerations of reasonableness and proportionality. Practice Tip: While the duty to preserve unquestionably arises upon the filing of a formal complaint, it also often arises sooner, such as when an investigation takes place or after receipt of a credible complaint or demand letter.
Fielded: A form of production (usually native, or nearly native) wherein the fields that hold discrete bits of information remain in place. For example, an email when converted to a PDF file is no longer fielded because the “to:” and “from:” fields of the email in a PDF document have the same status as any of the other text on the page. In contrast, when email is produced in a native or near-native format, the “to:” and “from:” fields retain their special status, and it is possible to construct searches like ‘from:email@example.com to:firstname.lastname@example.org subject:conspir!’ using a review platform. This can be very effective.
File: A collection of data or information stored under a specific name, called a filename.
File Extension: A suffix to the name of a file, separated by a dot. Often an abbreviated version of the name of the program in which the file was created or saved, the suffix indicates the program that may be used to open the file.
File Format: The organization or characteristics of a file that allow it to be used with certain software programs.
File Server: Refers to a computer attached to a network, of which the main purpose is to provide a centralized location for shared storage of computer files (such as documents and images) that can be accessed by the workstations attached to the same network. File servers are the heart of any server network. They can contain data for other programs or direct access to documents themselves.
Flash Drive: A small, data storage device used to store files or transport them from one computer to another, also commonly referred to as a USB or thumb drive.
Forensics: The application of scientifically proven methods to retrieve, examine and/or analyze data in a way that can be used as evidence. Practice Note: this is the most defensible collection method.
Forms of Production: Electronic evidence can be “produced” (i.e., exchanged) in multiple forms. For example, if there is a Word file on your client’s laptop, and you need to produce it to another party, you have several choices: (1) you can copy the file to some sort of transfer media (e.g., a thumb drive) to produce an exact copy; (2) you can convert the file to PDF and produce the PDF file; (3) you can print the file to TIFF (see below) also produce a load file that contains searchable text; or (4) you can literally print the file out on a piece of paper using a printer and deliver a copy of the paper to the other party. There are pros and cons to each form of production. If you are billing hourly, the only known “pro” of option 4 (printing) is that it wastes a lot of time and paper, and often results in motion practice. For reasons that we do not comprehend, some attorneys are flustered by native production and instead choose to have files produced PDF.
Format (noun): The internal structure of a file that defines the way it is stored and the programs in which it can be used.
Format (verb): The act of preparing a storage medium ready for first use.
FTP (File Transfer Protocol): The protocol for transferring files over a network or the Internet.
Full-text Search: When a data file can be searched for specific words and/or numbers.
Fuzzy Search: Searches that use approximate, rather than exact, matches.
Keyword Search: A search – of the text of documents in a database – designed to retrieve documents containing a “keyword;” generally the most basic of a number of searches; depending on the software application’s capabilities, a variety of advanced searches can be performed.
Linear Review: Assume that your client has 1TB of data that could be responsive to discovery requests. Assume that you agree on some keywords with the other party. Assume that those keywords are “hits” for 500,000 documents. Linear review is the process of having a human--usually a lawyer--set eyes on each of the documents before any of them are produced to the other side. On average, human reviewers can review 55 documents per hour, and the average hourly cost for a reviewer is $70 per hour. That means you’ll spend, ahem, more than $600,000 on document review! The process will also take several months, even for a large review team. But the legal system doesn’t have time for this. Discovery is supposed to finish … it can’t drag on and on for years while reviewers strain their eyes and wonder if this is what they went to law school for. In short, linear review is a bad idea, and it’s prohibitively expensive and time-consuming. Alternatives include technology-assisted review and creative use of keyword searches, selective review, and clawback agreements.
Litigation Hold: A document provided to a custodian when litigation is on the horizon or already happening that instructs him or her how to avoid deleting or corrupting evidence. Sometimes litigation hold letters confuse ordinary people by telling them things like “cease rotating backup tapes.” Ideally, a litigation hold should be readable and comprehensible by its target audience, and compliance with the hold should be monitored. Watch out for company-sponsored paper shredding or hard-drive dumping events!
Load File: A special file that you get (or give) with other files that provides additional information about those files, such as the directories they came from, metadata not contained in the files themselves, Bates numbers corresponding to the files, and information about the requests to which the files are supposed to responsive. Even though load files are essentially “flat”--i.e., non-relational databases (like Excel files)--they appear in any number of bizarre proprietary formats. There is no agreed-upon standard for formatting load files, and unless one happens to own the same software that was used to generate the load file, viewing one can be a serious pain in the hindquarters. If you don’t own the software that generated the load file, you may want to ask for a comma-delimited (CSV) file instead, which at least you can open in Excel.
Peripheral: Refers to a device that attaches to a computer, such as a printer, modem or disk drive.
Predictive Coding: A method of culling relevant documents for production or review. Predictive coding uses algorithms to determine the relevance of documents based on linguistic and other properties and characteristics. It relies on the coding from a human sampling of documents called a “seed set.” The seed set allows the computer to identify and evaluate the remaining documents. Also referred to as IRT or TAR.
Preservation: The process of managing, identifying, and retaining documents and other data for legal purposes. Practice Note: Reasonable efforts to preserve include the suspension of routine deletion policies, issuance of adequate preservation instructions and oversight as appropriate. Delegation is not a defense when evidence is lost, altered and/or destroyed after a parties’ duty to preserve arose.
Preservation Demand: A letter or email to your adversary demanding that he or she keep evidence safe and prevent it from being destroyed. Sometimes critical to point to when seeking sanctions at a later date if the other side “lost” some evidence. Preservation demands are often wildly overbroad, but hey, how is the sender supposed to know what the receiver has and doesn’t have?
Privilege Data Set: Documents withheld from production despite being relevant and/or responsive on the grounds of legal privilege. Parties are generally required to produce a privilege log, identifying enough information about each document so that the opposing party can determine whether or not to challenge the withholding (e.g., senders and recipients, creation date, general description of subject matter and privileges asserted).
Processing: This is the second most overused and under-defined term in our industry. Every attorney, corporation, and service provider defines processing differently. It can include the Electronic Discovery Reference Model (EDRM), information governance, the cloud, social media, and big data.
Production: The process of producing or making available for another party’s review the documents and/or other ESI deemed responsive to one or more discovery requests.
Program: See Application and Software (synonymous with software).
Propagation: An eDiscovery coding term for the action of adding code to document families, duplicates, near-duplicates etc.
PST (or Personal Storage): A personal folder file in Microsoft Outlook with a super handy file format for wrapping up huge numbers of emails and attachments in a way that preserves their ability to be searched. We like PSTs. Ask for them often, but remember that compressed files expand during processing.
Sampling: Usually refers to any process of which a large collection of ESI or a database is tested to determine the existence and/or frequency of specific data or types of information.
Seed Set: The initial set of data/documents used in predictive coding. The seed set is “trained” by learning algorithms to cull data down to a potentially relevant set for reviewers to analyze for production or privilege. See Predictive Coding. Practice Note: Cooperation between counsel as to methodology and selection of the seed set(s) can avoid challenges down the road as to whether sufficient efforts were made to cull the data for relevant documents, as well as allegations that relevant evidence was withheld.
Social Network: A group of people who utilize social media, typically based on a specific theme or interest. Facebook is an example of a popular social network.
Social Media: Web sites and other online means of communication that are used by large groups of people to share information and to develop social and professional contacts: Many businesses are utilizing social media to generate sales. Do NOT neglect this source of ESI. It is easy to collect and review; and can potentially change the outcome of a case.
Sources: The places where electronic evidence lives--computer disks, smart phones, thumb drives, Dropbox. Custodians (see above) have been known to have them.
Spoliation: The destruction of evidence and information that may be relevant to ongoing or reasonably anticipated litigation, government audit or investigation. Courts differ as to the requisite level of intent required for imposition of sanctions, with fault (possession and failure to preserve) on one end and willfulness on the other.
System Files: Nonuser created files that permit computer systems to run.
Index of Terms
IST Discover-E :
Index of Terms