processing

Processing O2

Orange Legal Technologies provides you with the ability to prepare relevant files for subsequent use while ensuring that the techniques used are defensible.

With a detailed approach to electronic discovery processing, we focus and deliver on key tasks to include:

  • Chain of Custody Security and Tracking
  • Data Staging
  • Data Filtering
  • Deduplication
  • Metadata Extraction
  • Full Text Extraction
  • Exception Handling
  • Data Conversion
  • Load File Production

Our Processing Services also include:

  • Scanning and Coding
  • Custom Database Development

With our internal, best-of-breed technology, we are able to process data from a variety of formats to include:

  • Electronic Text To Electronic Text Via Text Extraction Engines
  • Electronic Images To Electronic Text Via OCR
  • Hard Copy Images To Electronic Text Via OCR
  • Audio To Electronic Text Via Sound Extraction Engines

We support these services with:

  • Secure Hosted Repositories
  • Extensive/Customizable Reporting
  • Integrated Review Services
  • Dedicated Project Management/Customer Support

To determine how we can help you with your specific processing requests, contact us and we can immediately help you translate your requests into action.

Processing

What is “Processing”?

 

In the realm of electronic discovery, “Processing” is any operation or set of operations which is performed upon data, whether or not by automatic means, such as collection, recording, organization, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, blocking, erasure or destruction. [i]

Why is “Processing” important?

The principal objective of electronic discovery processing is to prepare relevant files for efficient and expedient review (in most instances by attorneys), production and subsequent use while ensuring that the techniques and processes used are both defensible with respect to clients’ legal obligations and appropriately cost-effective and expedient in the context of the matter. [ii]

What are the major tasks that take place in electronic discovery processing?

While there are many ways to define, describe, and organize the tasks that take place in electronic discovery processing, for the purpose of this discussion we will focus on the following nine major tasks and how they interrelate to accomplish electronic discovery processing:

  • Chain of Custody Security and Tracking
  • Data Staging
  • Data Filtering
  • Deduplication
  • Metadata Extraction
  • Full Text Extraction
  • Exception Handling
  • Data Conversion
  • Load File Production

Chain of Custody Security and Tracking

Defined by The Sedona Conference as “the documentation and testimony regarding the possession, movement, handling and location of evidence from the time it is obtained to the time it is presented in court; used to prove that evidence has not been altered or tampered with in any way; necessary both to assure admissibility and probative value”, Chain of Custody is the part of electronic discovery processing that ensures the evidence is authentic.

By developing, documenting, and tracking the physical media that contains electronically stored information (ESI) throughout the entire electronic discovery process can help organizations ensure their evidence is viewed as authentic. Additionally, just as physical media containing electronic documents must be treated as evidence, the same rule holds true for each individual file.

Automation of technical chain of custody activities can help in the substantiation of an exact process files go through prior to admission in a case. The benefits of automation can be even greater when the case consists of millions of files and automation ensures each file goes through the exact same process.[iii]

Data Staging

Data Staging is the process by which original ESI files are copied, isolated, and stored in a forensically sound manner for future use.

This staging typically occurs in three phases:

  1. Copying and storage of original ESI files on a closed and isolated network file server.
  2. Storage of original media and ESI files in a forensically sound manner.
  3. Storage of copied ESI files for use in further electronic discovery processing.

Data Filtering

Data Filtering consists of the process of identifying specific data for extraction based on specific parameters. Filtering can occur at many different levels to include:

• System File Filtering: This type of filtering is designed to exclude those files known as system files from the filtering results data set.

• Data Range Filtering: This type of filtering is designed to either include or exclude prescribed date and time ranges from the filtering results data set.

• Extension Filtering: This type of filtering is designed to either include or exclude specific files based on their extension and typically includes file type validation.

• Custodian Filtering: This type of filtering is designed to either include or exclude specific custodians from the filtering results data set.

• Key Word Filtering: This type of filtering is commonly referred to as “keyword search” and is designed to filter data by prescribed keywords and/or keyword driven concepts.

Deduplication

Deduplication is the process of identifying and segregating those files that are exact duplicates of one another. The goal is to provide a deliverable that contains one copy of each original document, while maintaining the information associated with each instance of that document within the collection.

Several ways duplicates can be identified are:

• A combination of metadata information can be compared to match files.

• An electronic fingerprint of each file can be taken and compared using a mathematical hashing algorithm such as MD5 Hash, SHA-1, or SHA-180.

In some cases, a hashing algorithm is used in combination with metadata. [iv]
In addition to deduplication, the advent near-deduplication technologies allow for an even higher level of data deduplication as identify files that are materially similar are not bit-level duplicates. These near-deduplication technologies help identify and group/tag electronic files with “near duplicate” similarities, yet some differences in terms of content or metadata, or both. Examples include document versions, emails sent to multiple custodians, different parts of email chains, or similar proposals sent to several clients.

Metadata Extraction

Metadata are used to describe data or information. Metadata can describe just about anything you find on a computer, and the term is often used to refer to information about things that aren’t on the computer.

  • The National Information Standards Organization (NISO) defines metadata as “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.”
  • The World Wide Web Consortium (W3C) defines metadata as “machine understandable information for the web “.
  • The Federal Geographic Data Committee (FGDC) defines metadata as describing, “the content, quality, condition, and other characteristics of data.”
  • The Sedona Conference (TSC) defines metadata as “Data typically stored electronically that describes characteristics of ESI, found in different places in different forms. Can be supplied by applications, users or the file system. Metadata can describe how, when and by whom ESI was collected, created, accessed, modified and how it is formatted. Can be altered intentionally or inadvertently. Certain metadata can be extracted when native files are processed for litigation. Some metadata, such as file dates and sizes, can easily be seen by users; other metadata can be hidden or embedded and unavailable to computer users who are not technically adept. Metadata is generally not reproduced in full form when a document is printed to paper or electronic image.

Put simply, metadata is data about data. It provides a context for data, ideally in a machine-readable format.

Metadata is extracted and archived as part of processing the source data so that it is available during review. Although metadata may not be used during processing, it is still critical that it be maintained for purposes of electronic discovery. If not, the integrity and authenticity of the data can be brought into question.

Full Text Extraction

Full text extraction consists of the automated and/or non-automated processes of retrieving text from electronic text, hard copy, and/or sound files and presenting the data in a form suitable for further electronic discovery processing.

The extraction of data from electronic files commonly takes one of several paths:

  • Electronic Text To Electronic Text Via Text Extraction Engines
  • Electronic Images To Electronic Text Via OCR
  • Hard Copy Images To Electronic Text Via OCR
  • Audio Files To Electronic Text Via Sound Extraction Engines

These electronic files typically range from e-mail (and attachments), databases, text documents, spreadsheets, text messages, instant messages, to digital voice mail messages– all of which are considered electronically stored information (ESI) and are potentially discoverable
under current Federal Rules of Civil Procedure.

Exception Handling

As in any process, there are times when standard processes are not effective in completing a task.

In electronic discovery processing, the extraction of full text is no exception to this fact.

With that fact in mind, most organizations have what is commonly referred to as an “exception handling” process that allows for further, non-standard text extraction tasks and also ensures the full documentation and reporting of files that cannot be successfully processed.

Data Conversion

After electronically stored information (ESI) has been processed from the “Chain of Custody” stage to the “Full Text Extraction” Stage, the ESI is usually converted into a normalized format that allows for the review of the information by legal professionals.

Typical ESI conversions are to formats that include:

  • “Tagged Image File Format” or TIFF: An electronic copy of a document in the form of an image, and as such contains no embedded text, fonts, images, or graphics. TIFFs do not retain metadata from a source electronic document.
  • “Portable Document Format” or PDF: Developed by Adobe Systems, Inc., ‘PDF’ is the de facto standard for the exchange of electronic documents. PDF preserves the fonts, images, graphics, and layout of any source document, regardless of how the original document was created. PDF files can be shared, viewed, and printed with Acrobat, a viewer application available free from Adobe Systems. Documents can be converted to PDF using software products created by Adobe and others. Depending on how they are created, PDFs can also be searchable PDF, either by retaining text from the source document or by having a source image file converted by OCR. Depending on capture methodology, PDFs may retain some metadata.

Additionally, there are times when legal professionals choose not convert ESI into TIFF or PDF formats.

In those situations they may chose ultimately review ESI in its original format – commonly referred to as a Native Format or Native File.

Simply defined, a Native File is an electronic document produced as it was originally maintained and used.

Load File Production

In the last phase of electronic discovery processing, data is exported to a desired review tool format through the production of Load Files. Defined by The Sedona Conference as “a file that relates to a set of scanned images and indicates where individual pages belong together as documents”, a load file may also contain data relevant to the individual documents, such as metadata and coding data. To ensure usability by reviewers, load files must be obtained and provided in prearranged formats to ensure transfer of accurate and usable images and data.

When the reviewer’s opt to use TIFF or PDF formats, two load files are typically generated. The first load file contains a record entry for each document and its associated metadata as well as parent-child relationships. A second load file is also generated and it links all TIFFs or PDFs to each record in the first load file. These two load files are then delivered, along with all of the associated TIFF or PDF files, to the reviewers so that the files can be imported into a selected review application.

If the review involves native files, then the export process will generate a single load file that contains a record entry for every native file and all of its associated metadata. The load file also contains document relationships such as parent-child and a link to the native file itself. Next, the native files are gathered and built into a structure that is compatible with the load file generated so that when the native file link is selected during review, the proper native file appears.

Why is “Processing” important? Reprise

As stated earlier, the principal objective of electronic discovery processing is to prepare relevant files for efficient and expedient review (in most instances by attorneys), production and subsequent use.  Through this overview of electronic discovery processing, the hope is that you will have enough information to ask the right questions and to evaluate the presented process tasks so as to ensure that the techniques and processes used in your specific electronic discovery matters are defensible with respect to clients’ legal obligations as well as appropriately cost-effective/expedient in the context of the matter.

To learn more, contact us.

Resources

[i] The Sedona Conference Glossary, December 2007
[ii] The Electronic Discovery Reference Model, March 2008
[iii]The Electronic Discovery Reference Model, March 2008
[iv] The Electronic Discovery Reference Model, March 2008
[v] The Sedona Conference Glossary, December 2007
[vi] The <MMI/> Marine Metadata Interoperability Guide
[vii] The Electronic Discovery Reference Model, March 2008
[viii] Lexbe Glossary, www.lexbe.com, March 2008
[ix] Electronic Discovery Processing: What You Need To Know To Maximize Success In Winning Cases And Cutting Costs, Metropolitan Corporation Counsel October 2007
[x] The Electronic Discovery Reference Model, March 2008

Share