Data archaeology


Data archaeology refers to the art and science of recovering computer data encoded and/or encrypted in now obsolete media or formats. Data archaeology can also refer to recovering information from damaged electronic formats after natural disasters or human error.

The term originally appeared in 1993 as part of the Global Oceanographic Data Archaeology and Rescue Project (GODAR). The original impetus for data archaeology came from the need to recover computerized records of climatic conditions stored on old computer tape, which can provide valuable evidence for testing theories of climate change. These approaches allowed the reconstruction of an image of the Arctic that had been captured by the Nimbus 2 satellite on September 23, 1966, in higher resolution than ever seen before from this type of data.[1]

NASA also utilizes the services of data archaeologists to recover information stored on 1960s-era vintage computer tape, as exemplified by the Lunar Orbiter Image Recovery Project (LOIRP).[2]


There is a distinction between data recovery and data intelligibility. One may be able to recover data but not understand it. For data archaeology to be effective, the data must be intelligible.[3]

Disaster recovery

Data archaeologists can also use data recovery after natural disasters such as fires, floods, earthquakes, or even hurricanes. For example, in 1995 during Hurricane Marilyn the National Media Lab assisted the National Archives and Records Administration in recovering data at risk due to damaged equipment. The hardware was damaged from rain, salt water, and sand, yet it was possible to clean some of the disks and refit them with new cases thus saving the data within.[3]

Recovery techniques

When deciding whether or not to try and recover data, the cost must be taken into account. If there is enough time and money, most data will be able to be recovered. In the case of magnetic media, which are the most common type used for data storage, there are various techniques that can be used to recover the data depending on the type of damage.[3]: 17 

Humidity can cause tapes to become unusable as they begin to deteriorate and become sticky. In this case, a heat treatment can be applied to fix this problem, by causing the oils and residues to either be reabsorbed into the tape or evaporate off the surface of the tape. However, this should only be done in order to provide access to the data so it can be extracted and copied to a medium that is more stable.[3]: 17–18 

Lubrication loss is another source of damage to tapes. This is most commonly caused by heavy use, but can also be a result of improper storage or natural evaporation. As a result of heavy use, some of the lubricant can remain on the read-write heads which then collect dust and particles. This can cause damage to the tape. Loss of lubrication can be addressed by re-lubricating the tapes. This should be done cautiously, as excessive re-lubrication can cause tape slippage, which in turn can lead to media being misread and the loss of data.[3]: 18 

Water exposure will damage tapes over time. This often occurs in a disaster situation. If the media is in salty or dirty water, it should be rinsed in fresh water. The process of cleaning, rinsing, and drying wet tapes should be done at room temperature in order to prevent heat damage. Older tapes should be recovered prior to newer tapes, as they are more susceptible to water damage.[3]: 18 


To prevent the need of data archaeology, creators and holders of digital documents should take care to employ digital preservation.

See also


  1. ^ Techno-archaeology rescues climate data from early satellites U.S. National Snow and Ice Data Center (NSIDC), January 2010 Archived
  2. ^ LOIRP Overview NASA website November 14, 2008 Archived
  3. ^ a b c d e f [1] Study on website October 23, 2011
  • World Wide Words: Data Archaeology
  • O'Donnell, James Joseph. Avatars of the Word: From Papyrus to Cyperspace Harvard University Press, 1998.
  • Ross, Seamus & Gow, Ann (1999). Digital archaeology : rescuing neglected and damaged data resources (PDF). Electronic libraries programme studies. London & Bristol: British Library and Joint Information Systems Committee. ISBN 1-90050-851-6.