Scenario: You’re doing QC on a batch of reviewed documents that needs to be released for production. One thing you need to check for is whether family members (i.e., emails and their attachments) have been coded consistently. If an email is tagged Responsive, then its attachments should be too, and vice-versa. In the screen shot below there are two document families that are tagged correctly, and one that is tagged incorrectly:
The first document family was consistently tagged Non-Responsive, and the second family was consistently tagged Responsive, but the third family is tagged inconsistently: the parent email is tagged Responsive, while the attachments are tagged Non-Responsive. This family will have to be reviewed, and the discrepancy resolved, before this batch of documents can be released.
In this post I describe an easy technique to identify every family in a metadata set that has inconsistently tagged members. This makes it easy to isolate and re-review all of them at once. The same technique can be applied for privilege searches, redaction searches, or any other situation when you want to identify rows of data that match on one criteria but not another.
Download the Excel file here so you can follow along.
When performing this routine on your own data, make sure the metadata you download from your document review platform includes the FamilyID or GroupID field–this is the field that tells you which documents belong to the same document family.
Step 1: Generate a Pivot Table
With your cursor anywhere in the table of metadata, go to the Insert Ribbon and select Pivot Table, and Pivot Table again:
Accept the default settings, and click OK.
Excel will send you to another worksheet with the following display:
DO NOT PANIC. This is easy. Drag the Family ID field into the Row Labels area and into the Values area, like so:
This generates a tally of the number of documents in each family. This tells us there are 342 document families comprising 795 documents. So far, so good.
In order to see how many documents in each family are responsive vs. non-responsive, simply drag the “Resp” field into the Column Labels box. This is where the magic happens:
Now the pivot table tells you, for each document family, how many members are tagged responsive and how many are tagged non-responsive. Most of the families (those that are tagged consistently) will have a number in only one of these columns. The ones that are mismatched, however, will have a number in both columns. The first culprit here is the family ENRON-000001750:
Here’s a nifty trick: Double-clicking on the number 3 in the Grand Total column will prompt Excel to extract the three rows in question and place them on a new worksheet for you to review:
Step 2: Use a Formula to Identify the Rows with Values in Both Columns
Now the trick is to identify every row that has numbers in each column, without simply scrolling down and looking at the numbers. A bit of arithmetic comes in handy here; if you multiply the two numbers together, you can easily spot when both tag types occur on the same FamilyID. The reason is that if either column is blank–signifying that all the documents in that family have the same tag–the product will be zero. Otherwise you’ll get some other number! For example, if you have 2 of each tag type, you will get a product of 4 (because 2 X 2 = 4). If you have 4 of one type and none of the other, you will get a product of 0 (because 4 X 0 = 0).
Generating the formulas to evaluate the product in each row can be a little tricky. Clicking the cells inside a pivot table to build a function will not work well under the default settings, because Excel wants to generate a function called GetPivotData instead of just grabbing the value you see. The function you create also will not copy properly. There are several ways around this:
- Type the formula from the keyboard rather than clicking cells in the pivot table to generate the references. Then you will get a working formula that can be copied down. (My method of choice.)
- Disable the GetPivotData feature by going to the Option menu of the Pivot Table Tab, selecting the dropdown on the far left for “Options” and deselecting the option for Generate GetPivotData.
- You can also copy the whole pivot table and paste it into another worksheet, where it will no longer behave as a pivot table.
I opt to simply type in the formula; I add the header “check” in Column E and put my formulas there. Since the first row of the pivot table is Row 5, and two numbers I want to multiply are in cells B5 and C5, I just type =B5*C5 in E5:
The formula returns zero, because there is a zero in C5, reflecting the zero responsive documents in this family. Copy this formula down to the other rows by locating the cross-hairs on the lower right-hand corner of cell E5 and dragging it down to the bottom (or double-clicking on it!):
The formula copied properly, and is working as intended–rows with two numbers are generating non-zero products, and the other ones are coming up zero:
Step 3: Filter the Results to Target the Mismatched Families
Of course, you don’t want to scroll down hundreds (or thousands) of rows looking for those non-zero values either. Instead, use the filter to display all those values. (Yes–you can filter pivot table results!)
Turn on the filter tool by navigating to the Data ribbon and hitting the Filter button. Make sure your cursor is in the pivot table somewhere.
Now click the dropdown on the Check field and select the values you want to see. In this example we have deselected 0 and the large number at the bottom, which represents the grand total of all the check products. All the other options are non-zero products that indicate a family mismatch. Hit OK.
The result, in Column A, is the list of 12 document families with inconsistent tags. As expected, each one has results in both the Responsive and the Non-responsive columns. Voilà!
Copy and paste this list of Family ID numbers back into a Relativity search, select the ‘include family members’ option, and you will be able to re-review and re-tag this group of documents in one fell swoop.
As with any new skill, explaining this process takes much longer than actually doing it. Once you get the hang of setting up a pivot table, this entire routine can be done in a couple minutes. Performing this simple check can spare you the dreaded 6 PM email from the vendor, telling you that your production has “issues” and cannot be burned yet. And you’ll save your client a few hundred dollars in “tech time” while you’re at it.
Learn more about pivot tables, formula writing, and much more in my 2-hour workshop Excel Essentials for Busy Lawyers. Students have called it an “excellent presentation” that is “hugely useful and easy to follow.” New York CLE accreditation is pending. Contact Excel Esquire to learn about hosting an in-house presentation of Excel Essentials at your law firm, company, bar association, or public interest organization.
I think you can also do it (perhaps more simply / quicky) by inserting a new column D (to the right of the “Resp.” column, typing the formula =+IF(AND(A2=A3,C2C3),1,””) into cell D3, and then copy that formula down column D. 1s will appear in all rows that have errors i.e. the formula places a 1 in column D if successive rows have family names that are the same but do not have successive “Resp.” codings that are the same.