Does this scenario sound familiar? Your adversary drops a production on you in the form of a large zip file, and you need to know what’s in that production NOW. For example, you might have a motion to compel hearing later that day–are these the documents you moved on, or just a bunch of junk? There’s no time to have the documents loaded in Relativity before the hearing, but if you could inspect the metadata you’d be able to learn a lot of helpful things. For example, you could examine the filenames, determine the date range of the production, see how many emails, attachments, and loose files are in the production, see which custodians are involved, etc. Fortunately, there is an easy way to open the “.dat” file–the part of a standard electronic document production which contains all this metadata–in Excel. Follow these instructions, and you’ll be able to review the metadata before you can say “document dump.”
I’ve created a sample .dat file that you can download here if you’d like to follow along (WordPress will not host a .dat or .txt file, so I’ve saved this as a .doc file). When you receive an actual electronic production, you will probably need to unzip the production. You are looking for a file with a .dat extension (i.e., BigBank_005.dat)–it is often placed in a folder called “Data.”
Once the file is located, right-click on the .dat file and use “Open with” to open it in Notepad.
In Notepad, hit Ctrl+A to select all the text, and Ctrl+C to copy it onto the clipboard.
In a new Excel spreadsheet, copy the contents into Column A. Take a look at the funny trio of characters used between the headings. The “thorn” character is a delimiter used to separate the fields when the metadata is loaded. We will use it to split the text into separate columns in Excel.
Click into the formula bar and copy a single instance of the thorn symbol onto the clipboard with Ctrl+C. It should be the very first character in the first row. Then hit return to get out of the formula bar.
Select Column A and navigate to the Data ribbon. Select the Text-to-Columns option.
Choose the Delimited option (not fixed width) and click next.
Paste the thorn character you copied into the “Other” field like so. Click Next and Finish.
The data should now look something like this. Now you’ve got the 30-odd metadata fields split into separate columns (good), but you’ve also got blank columns interspersed throughout (bad). The blank columns are actually filled with a non-printing character. Here it shows up as a narrow rectangle, but you may see a paragraph symbol (¶) or other character here.
The last step is to get rid of those columns. First, add a blank row above the top row. Right-click on row A and click Insert.
Now type “a” and “b” in the first four cells above the data like so:
Select those four cells and copy that over to the right and Excel will repeat the a/b pattern all the way across:
Now the fun part: You will sort the whole table horizontally, with the ‘a’ columns first. Select the entire table and go to Data –> Sort.
Under Options, choose “Sort left to right” and hit OK
Select Row 1 as the Sort criterion and use the default order A–>Z
Voila! All the columns with metadata fields are now adjacent.
Meanwhile, all the junk columns (with the b’s at the top) have been shifted over to the far right, where you can delete them all easily.
Now each metadata field is in its own column–you can sort, filter and pivot to your heart’s content. Here are some things to look for:
- How many actual emails are there (as opposed to attachments)? Filtering the “To” field for non-blanks will usually give you the answer
- Sort the data by Sent Date to preview the date range of the production
- Run a pivot table against the custodian field to see how many documents were produced for each custodian in the production
Thank you for reading. To learn more about how to use Excel in the practice of law, subscribe to the Excel Esquire blog. To learn about hosting a CLE-approved Excel workshop at your law firm, company, public interest organization, or bar association, click here or email Ben Kusmin from Excel Esquire.
Very good explanation.