To set the stage for this issue’s discussion of e-discovery developments, we begin with this assessment of elementary computer science principles. How does a computer work, and what do computer forensic experts do to discover electronically stored information?
Data Processing Fundamentals
Computer hardware includes input devices (e.g., keyboard, mouse), a hard drive for long-term storage of information, a processor (that runs everything), the main circuit board (or motherboard) that connects key devices, memory chips for short-term storage of information, and an output device (i.e., a monitor) that displays the work. These hardware devices are connected by electrical circuits controlled by switches. (Computer processing basically involves the opening and closing of the switches at the right time and in the correct sequence as directed by the software program.) The hardware system is managed by a software program known as “operating system” software (e.g., Microsoft Windows). The operating system is the set of instructions (software is simply a permanent sequence of instructions) that allows the applications software (e.g., Microsoft Word) to be processed by the computer’s hardware while the user operates the keyboard.
When the computer is turned on, its “boot (or start) up” process begins. The electric signal initiates instructions (often found on a read only memory or “ROM” chip located on the motherboard) to have the operating system software retrieved from a storage device and loaded onto a different memory chip. It does so after first making sure that all the hardware is connected and operating properly. Then the user, via keyboarding or mouse clicks, selects application software also to be loaded onto the memory chip. When the user then begins to use the applications software, the operating system directs a microprocessor known as the central processing unit or “CPU” (also mounted on the motherboard) to execute the software instructions. A computer purchased today uses a processing chip (such as the Intel Core i7) that can handle upwards of 80,000 mips, that is, million instructions per second. As the CPU processes the data, the results are displayed on the output device or monitor.
The memory devices adjacent to the CPU on the motherboard include the cache memory chip and the random access memory or “RAM” chip. These chips contain integrated circuits that store information electronically. Cache memory or RAM hosts the programs placed there by the operating system in response to the user’s commands. They run the programs at high speed (cache more so) and interact with “registers” contained within the CPU while the data processing is occurring. These devices are fast because they store the data or program electronically (not mechanically and magnetically like the slower hard drive memory storage), but RAM and cache storage is temporary, vanishing instantly from memory when power to the computer is turned off. This feature is key to the computer’s functionality; because data is stored electronically on cache or RAM memory and remains there only when power is supplied, the input/output process is far faster than if the CPU had to access information from magnetic hard drive storage in the course of program execution.
Data Storage Mechanics
Computers store information in two forms: in primary storage devices, and in secondary storage devices. The term “primary” is used because the computer prefers to process data in that form of memory because it is faster. It only wishes to access data in “secondary” storage if it is not available in primary memory.
Primary storage devices are the cache and RAM memory chips that contain millions of capacitors (components that store electrical energy) paired with transistors (switches) that are etched onto the surface of a silicone card in a pattern of rows and columns. Each capacitor can hold one “bit” of data (short for binary digit) through the electrical charge residing in each capacitor. When clustered together in eight bit “bytes” (short for binary term), the capacitors can hold large amounts of information. Readers are familiar with the terms “gigabyte” (a billion bytes) and “megabyte” (a million bytes) that describe the storage capacity of the computer’s memory devices and hard drive.
Secondary storage devices are hard disk drives or flash drives that contain “media.” Media are surfaces coated with magnetic material containing small storage cells. Like the primary electronic form of storage, each magnetic cell holds one “bit” of data that adheres to the cell surface through the electromagnetic attraction between the electrical signal sent from the CPU to the media’s magnetic coating. Unlike electronic storage in which data vanishes when power is turned off, data remains saved in hard drive storage because the magnetism continues after the electricity is turned off.
The hard drive is a stack of magnetically sensitive media devices; they are disks or platters that spin around an axis inside an enclosure. “Heads” positioned above the spinning disks on an arm are able to “read” the bytes of data stored in the cells on the disk, or they can “write” the data bytes they convey from the CPU onto open cells on the disk for storage or saving. Each cell has its own “address” on the disk. As the electro-magnetized heads pass over the magnetized surface of the disk, particles within the disk can be polarized magnetically in one of two directions that allows the computer to distinguish the number 1 from the number 0. As the disks spin, the read/write heads follow circular tracks of cells that are open or contain bits of data. The heads deposit (write) data into open cells or retrieve (read) blocks of data on the disk as the CPU processes the instructions of the software.
A CD or DVD is also a form of secondary storage known as optical storage. They work similarly to hard disk drives except that a laser is used in place of electromagnetism to deposit or retrieve data to or from the disk. The CD or DVD contains microscopic pits etched onto the surface of the disk. As the laser passes over the disk it distinguishes data based upon the manner in which the beam is reflected off the pits’ surfaces. Again the reflections distinguish the number 1 from the number 0 in the data sequences.
Why are computers so focused on 1s and 0s?
Binary Number System
All computers store and process information using the binary number system. (Computers do not recognize letters.) When a key stroke occurs, the keyboard generates a code representing the key and sends it to the CPU. The CPU “digitizes” the signal by breaking it down into small parts and converting the parts into binary numbers: the number “1” or “0” singularly or any combination of the two in a sequence.
The binary number system has two functions in a computer. First, the computer is programmed to know that 1 means “open” and 0 means “close.” Thus these numbers determine whether an electrical switch present on a circuit is to be opened or closed, and when. Second, binary mathematics is similar to the decimal system (numbers 0 through 9) except that to create numbers beyond 1 or letters, a longer sequence of 1s and 0s is necessary. Thus most of the work of a computer involves processing long sequences of 1s and 0s that represent numbers and letters (alphanumeric data).
Computing as we know it would not exist in the absence of the binary number system. Its ability to replicate numbers above 1 and letters—while the computer simply needs to distinguish two numbers—is the foundation of computer science. If computing required the computer to distinguish the 10 individual numbers of the decimal system, we would still be using slide rules.
File Storage and Metadata
As noted above, the read/write heads in the hard drive are involved when the user saves data or loads a software program onto the magnetic disks in the hard drive. When the user creates a combination of data to be saved, a “file” is created. The sequence of bytes stored in a “block” in the file might represent a program, a graphical image, or text for a document. Thus, the data combination may be a data file, a text file, a program file, a directory file, etc. To “save” the file in the computer, the save click signal is directed to the heads of the hard drive. They search the disks to find open or “unallotted” storage cells where new data may be placed. The data to be saved is then deposited in (or “written” to) the space where it adheres to the cells through electromagnetism. The file management system of the operating system software remembers the pathway taken by the computer to get to the spot on the hard drive disk where the file is stored so it may be retraced on demand.
When a file is created, background information about the file is collected automatically by the computer. The information includes the date the file was created, the time it was created, when it was last modified or accessed, who created or edited the file, etc. This information is known as “metadata” or “data about the data.” Users do not ordinarily see the metadata, as it is not typically displayed on the monitor. There are dozens of metadata fields containing background information about each Word, Excel, or PowerPoint document, for example. As one might expect, metadata can be important in computer forensics because it reveals whether evidence has been altered or covered up in anticipation of, or during, the litigation.
Deleting Data
The search for discoverable evidence also considers whether data or documents have been deleted from storage and destroyed. This may happen intentionally (i.e., to hide evidence) or accidently (e.g., through auto-purge systems). Pressing the “delete” key does not by itself erase the document or e-mail from the e-mail program, network server, or hard drive. Deleted files end up in the recycle bin, where they may remain for some time, available to be restored. When we delete an e-mail from our inbox, all readers know the message is directed to the “deleted items” file in Microsoft Outlook. Clicking delete in that file calls up a warning box on the monitor asking if the user wishes to delete the message “permanently.” If “yes” is clicked, many would think the message has been deleted permanently from the system.
But it has not been permanently deleted. Clicking delete merely disrupts the pathway necessary for the computer to return to the message, and tells the computer that the space occupied by the message in storage may now be written over by new data. The message (or a document) remains in storage on the network server or in the cells of the hard disk drive. When new e-mail messages are sent or received, or new documents are to be stored on the hard drive, the hard drive read/write heads search for cells that are empty, as well as cells that contain “deleted” but not erased data; that is, they search for “available” storage space. The heads will then write the new data to the empty cells, or they will “overwrite” the data to cells containing deleted but not erased data. If the new data completely overwrites the old data, the latter is erased permanently. But sometimes the overwriting process leaves fragments of the old data untouched and they persist in storage. A job of the computer forensic specialist is to restore deleted data that has not been erased by being overwritten, or to find fragmented data from incomplete overwrites that may contain relevant information.
Data Search Scope
In addition to running search programs to find deleted or fragmented data on the hard drive, the computer forensic specialist may also seek access to the computer’s or network’s backup tapes, known as “archival” data. Backup tapes copy the system regularly and may contain relevant data not otherwise available. Data searches on backup tapes are more difficult and expensive because data stored on such media is stored in a “sequential” or “linear” manner, and is not formatted typically for ease of access. To get to the data desired, the searcher must review in sequence from the beginning of the data files leading up to the data to be retrieved. This is to be distinguished from “random access” data searches in which the searcher can retrieve records from anywhere in the file in a random sequence, without first having to retrace the steps necessary to get to the record.
Relevant data may also be found in other parts of the user’s computer. Data may reside in the computer’s “cache” memory used to store frequently accessed data, and in its Internet browsing program that retains information about the user’s Internet site visits.
Discoverable information may reside in more sites than merely the key witness’s desktop or laptop. Those devices likely are part of a network, the server of which contains media on which data is stored. The user’s computer may be connected to a personal digital assistant, Blackberry, iPhone, tablet, or other mobile device containing “memory cards.” The user’s e-mails may reside on recipients’ computers or their mobile devices along with e-mail attachments. Data may also exist on the user’s Internet service provider’s system. The user may have “burned” or copied data to a CD or DVD via laser technology, or loaded data onto a flash drive or other portable storage media. Information may have been digitized onto media found in copiers and scanners. Plus, the user may have left voice-mails on systems containing media where the digitized message may well persist. Obviously forensic data searches involve a wide ambit of potential information sources.
Conclusion
In conclusion, computers process sequences of 1s and 0s along electrical circuits as directed by software program instructions that receive inputs from the user’s keyboard or mouse clicks, or from electronically or magnetically stored information found in memory chips or disk drives within the computer. The outcome of the process is then converted to understandable language and displayed on the monitor. Computer forensics in litigation focuses on the storage capacity of computers that retains information even when it is deleted but not erased. It is interested in the who-what-and-when metadata of documents that proves witness involvement with the documents. It also focuses on the search for discoverable information stored on media beyond the litigant’s laptop or network server. All of this awaits service of the request to produce “electronically stored information.”
Reference information supporting this discussion was found in a variety of web-based sources. For an excellent, and readable, explanation of computer science see Irv Englander, The Architecture of Computer Hardware and Systems Software: An Information Technology Approach (1996).