Metadata has been with us since the beginning. It is not magical; nor is it limited to Microsoft Office products. Here we describe and define the term, “metadata” in order to provide a more clear understanding of what it is and how it may be utilized in litigation. The intended audience for this paper is anyone with a desire to have a deeper understanding of what a computer knows about the information it stores.
Definition of Metadata
The most repeated definition of metadata appears to be “data about data”. However, this definition is overly simplistic and leaves too much room for confusion. It’s true that metadata is simply data itself, but what makes it metadata is its purpose.
The purpose of metadata is to enhance the understanding of the information to which it is related. A more accurate definition of metadata would be “the facts that describe, enhance, categorize, identify, explain, secure, symbolize, augment, or classify the when, why, what, where, who, and how of related data.” Some examples of different types of metadata will help clarify this definition.
While metadata has always been with us, it wasn’t until recently that it has begun to attract attention. With the increasing visibility of computer security issues, and the strong focus on Microsoft’s security faults, metadata finally hit the headlines. Magazine articles give the impression that metadata is found only in Microsoft Office applications, but in reality, metadata exists everywhere. Some metadata is created by the user, but most is created and maintained by the computer system or the installed applications.
Here’s a simplified example of metadata => Zip Code: 75025-2653 – “Zip Code” is the metadata. Without the label, you might not recognize the set of numbers as a zip code. You might believe it was a mathematical equation, especially without the surrounding address data. So, “Zip Code” describes the information that follows it.
Metadata can exist internally within the file it enhances or externally in other files. In some cases, the exact same metadata will exist in both places, such as file name and date stamps in Microsoft Word. Metadata can tell us many things including:
- User Identification
- File Identification
- File Size
- Publishing Organization
- Data Structures
- Organizational Names
- File Naming Conventions
- Directory Structure
- Computer Names
- Network Server Devices
- Creation Dates
- Access Dates
- Modification Dates
- Access Security
- Print Dates
- Directory Contents
Example 1 – External Metadata
The Operating Systems (Windows, MacOS, Linux, etc) and the file systems (FAT, NTFS, EXT2, EXT3, etc) they utilize are a primary source of metadata. This metadata is out of the control of the typical computer user and most don’t even realize it exists. Figure 1 below provides a graphical view of some of the external metadata provided by Windows XP. From the metadata, we can determine:
- The filename
- Type of file
- Folder location
- Size of the file
- Amount of space reserved on disk for the file
- Created, modified, and last accessed dates and times
- The file was not “read only” or “hidden”
- The file was modified since its last backup and needs archiving
- The file was not “compressed” or “encrypted
- We even have a good idea of who may have created the file
Figure 1: External metadata from Windows XP for file “Metadata.doc”
Example 2 – Internal Metadata
The metadata shown in Figure 2 is stored internally within the Microsoft Word document. The “Modified” and “Accessed” dates should duplicate the metadata maintained by the Operating System. If it does not, further investigation would be prudent, as this may indicate tampering. However, the “Creation” date can be and often is different, because a Word document will keep its internal creation date even when the file is copied to a new name. A Word document could be copied for many reasons including revisions, versioning, and template utilization. The “printed” date could be used to match to a Windows event log entry providing evidence as to the users’ identity.
“Last saved by” can be a valuable piece of information, assuming the field hasn’t been tampered with. “Revision number” and “Total editing time” can potentially lead to discovery of spoliation or copy/paste operations.
Figure 2: Internal metadata from Microsoft Word for file “Metadata.doc”
There are numerous options within Word, for example, “fast saves”, “track changes”, “comments”, and “versions”, among others, that will cause an enormous amount of information to be stored within the file, invisible to the normal user. This is not a feature limited to Word as many applications, even those outside of Microsoft, have similar functionality.
Example 3 – Database Metadata
Without metadata, databases and their associated programs would be useless. A database schema enables the computer programs and computer user to understand the fields contained within the database. This metadata, see Figure 3, provides critical information such as field names, labels, indexes, default field values, field type (i.e. numeric or text), and special formatting.
Figure 3: Sample Database Schema for Employee Table
Example 4 – Policies and Procedures as Metadata
Corporate policies, standards, procedures, and documentation also fit within the metadata definition. These documents will often describe directory structures, file naming conventions, directory contents, and other information that enhance the company’s ability to function efficiently and quickly find the information required to operate.
This paper provides only a small sampling of the vast amount of metadata that exists within computing systems. All data has the potential to be metadata and is limited only by one’s perspective.