Use Excel to Count the Number of Emails in Each Email Chain

Courts and litigants have long struggled with the question of how to describe email chains on a privilege log.  Should you log only the most recent email, or log every email in the chain–or something in between?  New York has recently adopted a potentially burdensome rule on this topic–one that cries out for an Excel solution.

Effective September 2, 2014, Commercial Division Rule 11-b imposes new obligations on litigants in New York Supreme who create document-by-document privilege logs, as opposed to the now-preferred “categorical privilege logs.” See here to read the rule.  Among other things, entries for email chains should now indicate “the number of e-mails within the dialogue.”  Rule 11-b (b)(3)(iii).  That means you can log only the most recent email in a given chain, but you need to also disclose how many emails are in the chain.

How, exactly, does one figure out the number of emails in every email chain?

Option 1:  Have your paralegal page through every page of every email on your log and count how many email headers occur in each chain.   Time required:  Hours, days . . . weeks?

Option 2:  Use the available email threading metadata and a couple simple formulas.   Time required:  about 8 minutes.   

This article will address Option 2.

Your document review platform should contain metadata fields that reflect email-threading properties for all of the emails in your collection. This information helps to understand how emails in the same chain or family are related to each other.  Chances are you’ve never thought twice about those fields.  The field commonly called Conversation Index is a unique email identifier that has a common beginning for related emails, and grows predictably longer as the email chain grows longer.  You can parse this field to determine how many emails are in any given email chain.

Here’s a sample of metadata for several emails in an email chain with the Conversation Index displayed:

ThreadIndex-example1

Here’s how the Conversation Index works:

When you create a brand new email in Outlook, Outlook assigns a new Conversation Index, which is a unique 44-digit alphanumeric code, like this:

01C9A33C787774C8A92015CA4C26A02A143446BDC766

When someone replies to (or forwards) this email, Outlook tacks on 10 more characters, like so:

01C9A33C787774C8A92015CA4C26A02A143446BDC76600045A69D0

Every reply or forward will have an index length 10 digits longer than that of the previous email.  Importantly, if the same email gets three different responses, those three will have the same index length because they occupy the same position in the hierarchy.

Meanwhile, Excel has a convenient function called LEN that will count the number of characters in a cell.  For example, if the 44-digit code is in cell G2, the function =LEN(G2) will return a value of 44:

ThreadIndex-LEN-formula

Next let’s copy this formula down to calculate the length of the Conversation Index value for each email in our example:

ThreadIndex-LEN-formula-copydown

Notice how, with each new email in the chain, the index length gets 10 characters longer.

The next step requires a bit of arithmetic: Since the first email has 44 characters, and every subsequent email in the chain has 10 more, we should subtract 34 and then divide by 10 to get the number of emails in any given email chain. Thus:

number of emails:  =(H2-34)/10

Applying this formula to the four emails in this example yields the following results:

ThreadIndex-number-of-emails

Now here is a messier, but more realistic, example.  It includes all the logged emails in our fictitious conversation thread:

ThreadIndex-messy2

Notice several features of this set of emails:

  • Not every email in the chain is actually represented on the log (perhaps they were not privileged, or were deleted pre-collection, not collected, etc.)
  • Some emails have the same length because they are distinct responses to the same email
  • When listed in chronological order, as they are here, they are not all in numerical (# of emails) order, because people are sometimes responding to older parts of the chain.  For example, the penultimate email listed above is a late response to the original email, so it contains only 2 emails.

The algorithm does exactly what we wanted it to do: it correctly determines the number of emails in any given email chain, based on the Conversation Index.  The emails need not be grouped together by conversation thread either–I used examples from the same thread above merely for illustrative purposes.

Here are a few more pointers to help you carry out this analysis:

1.  I explained the process above in two steps for clarity, but the two functions can be combined into a single function that generates the number of emails directly:

=(LEN(G2)-34)/10 
where G is the Conversation Index

2.  If some of your rows have attachments, those will not have a Conversation Index at all, so your LEN will be 0, and the number of emails will appear as -3.4. To avoid seeing these strange values, you can use this formula instead:

=IF(G2<>0,(LEN(G2)-34)/10,”n/a”)

This formula says, “As long as there is a Conversation Index, use the same formula; otherwise show “n/a”.”

3.  Finally, if a user cuts an email from one window and pastes it into the body of a new email, Outlook will obviously not adjust the thread index to account for the pasted material.  Do a reality check on your results to see if there might be some anomalies.

Thanks for reading! Please comment, follow, share, or re-post if you found it helpful.

Feel free to contact me if you have any questions about using Excel to create a bullet-proof privilege log.

Advertisements

About excelesquire

NYC attorney and Excel enthusiast.
This entry was posted in eDiscovery, formulas, Lit Support, Privilege Logs and tagged , . Bookmark the permalink.

7 Responses to Use Excel to Count the Number of Emails in Each Email Chain

  1. ktvc15 says:

    this looks awesome

    Liked by 1 person

  2. David says:

    Message headers:
    View the Internet header information for an email message

    Email message headers provide a list of technical details about the message, such as who sent it, the software used to compose it, and the email servers that it passed through on its way to the recipient.

    About message headers

    After a message is sent, it is processed by the Microsoft Exchange Server or email server used by your organization or Internet service provider (ISP). If the message is for someone who does not have a mailbox on your email server, the server forwards the message to another email server. The message is forwarded from server to server. Frequently, the message passes through several email servers until it reaches the email server on which the recipient of the message has a mailbox.

    From the time when the message is first created, information about it is added to a hidden section of the message known as the Internet header. The information includes technical details, such as who created the message, the software used to compose it, and the email servers it passed through on its way to the recipient. These details can identify problems with the message or help discover the sources of unsolicited commercial messages.

    Top of Page

    View message headers

    1.In an open message, click the File tab.

    2.Click Properties.

    Header information appears in the Internet headers box.

    Note The practice of providing false information in message headers is a growing problem. This is also known as spoofing. For example, a message might indicate that it is from Eric Lang at Alpine Ski House (eric@alpineskihouse.com) when it is actually from a bulk email service that promotes schemes to get rich quickly. Therefore, before you send a complaint response to someone about his or her message, remember that the header information might be forged.

    Like

  3. David says:

    Parsing Internet Headers aka Automatic Extraction of Metadata from Microsoft Outlook Emails
    http://www.filehold.com/help/library/email-extraction-rule

    Like

  4. Reblogged this on The Barrister's Toolbox and commented:
    This is vexing problem which can take weeks and cost clients thousands of dollars in attorney time. Please read this practical solution…

    Like

  5. I will be passing this on to my paralegal and those that follow my blog, TheBarristersToolbox.com
    Thanks for share such a time saving idea.

    Like

  6. Matthew Golab says:

    A very elegant and simple method of threading. Thank you.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s