Using DVD technology for archiving astronomical data.

 

Benoît Pirenne (1, 2), Miguel Albrecht (1)

(1) ESO/DMD, (2) ESO/ST-ECF
 

Background

Due to the slow evolution of some astrophysical phenomena, long-term preservation of observations has always been a major concern of observatories around the world. Be it hand-drawings on paper or 19th century glass-plate photograph, the issue at stake is how to best preserve data for the future generations.
The advent of digital imaging and recording equipment in the second half of this century has provided both more observations and denser data storage media. These media can therefore no longer be read by the human eye. Moreover, with no immediate readability to the unaided eye, digital recordings require specific equipment to decipher their content. If a lot of progress has been made in the past decades to manufacture long-lived data storage media, the same is not true for the reading/writing equipment, quickly reaching obsolescence, and the repair of which is rapidly becoming impossible. This apparent contradiction between durable media and transient reading equipment is easy to understand if one realizes that the media is usually "passive" whereas the reading device is always active, with mechanical components.

Archivists must therefore reconsider storage technology every few years: a transposition of the archive content from endangered media to the newest technology has to be undertaken almost every 3-5 years. Another major factor pushing towards migration of data to new technology is costs: the cost of the new technology compared to the old one often brings savings per unit of volume of up to an order of magnitude and are a strong motivation for migration.

Current Situation.

In the case of the ESO/HST Science Archive in Garching, since 1988, three different storage media have been used and migrated from/to: The 2GB LMSI 12" Optical disk, the 6.4GB Sony 12" optical disk and the current 0.64GB CD-R in juke boxes. The reasons for migrating from one to the other are given in table 1 below.

 

    The various data storage technologies used so far at the ESO/ST-ECF Archive Facility. Shaded area represent the solutions actually implemented.. Units of cost represent an arbitrary monetary unit set to 100 per GB for the most expensive solution.

Medium Name
Reason for choice/migration to
Cost per GB w/o Juke box
Cost per GB with Juke box
2GB/vol LMSI 12" optical disk
Direct access, best of technology back then. In sync with STScI and HST archive.
17 
100 
6.4GB/vol Sony 12" optical disk
Direct access, factor of 2-3 cheaper to operate, previous technology difficult to maintain. In sync with STScI and HST archive
8
34
0.6GB/vol 5 1/4" CD-R
Jukebox allows for online, no-operator-required access, ISO standard for file system
0.6 
7.8 
4.0GB/vol 5 1/4" DVD-R
Much higher density, keeps direct access advantage of CD-ROMs
2.2 
2.8 
 
We abandoned the 12" optical disk in favour of the more common 5 1/4" CD-R for two reasons. On the one hand, CD-R were enjoying an international standard defining the way their content should be laid out (ISO9660). This was a guarantee of durability and multi-vendor support. On the the other hand, the possibility of having all the data on-line in juke boxes was finally an affordable possibility as the cost of juke boxes for 12" optical disks was prohibitive for our archive system (see right-most column of table 1). This last reason sealed the fate of hardware compatibility with the HST archive at the STScI where the data is still on 12"OD in jukeboxes.
However, now that we have completed the migration to CD-R, we are faced with another concern: the data rate growth. The VLT and HST instruments, soon to be commissioned, will produce several TB worth of raw data per year. We could not practically keep this data using CD-ROMs in juke boxes without making major infrastructure investment in storage buildings!

The solution that addresses the density problem and keeps the advantage of the CD-R technology (direct access medium, cheap juke box capability) is the DVD.

Digital Versatile Disk (DVD)

The DVD technology has been very long to come, heralded as it was by the specialized press for a number of years already. However, various disagreements within the industry and disputes around copyright issues have considerably slowed the introduction of this technology. Since a few months however, equipment to record one's own media (the DVD-R -see table 2 for a brief description of the variants) has become available. Our archive facility was understandably quick to procure and test the equipment and prepare the necessary software to support the device (from Pioneer Corp.). Even now, little support is available. The DVD-R can only be called such if its file system is compliant with the UDF file system. However, software drivers to support this format for both read and write are hardly available. To our knowledge only the latest version of the MacOS operating system has genuine support for it. The Unix world so far enjoys no support.
In order to obtain quick results and to be as compatible as possible with the existing archive tools and procedures we are using, we took a pragmatic approach: we contacted the developer of a public-domain CD-R recording tool "cdrecord" (a popular Linux tool, see below) and arranged with him to extend his software for the production of DVD-Rs as well. Within a few months, a workable system was delivered to us. However, due to the lack of software support for the DVD native UDF file system, we are using a the standard CD-ROM format (650MB ISO9660) extended to 4GB. To the host computer, our "DVD-R" once written simply looks like an unusually large CD-ROM.

Projects and schedules.

The most pressing and demanding project in our archive for high density storage media at the moment is the future 2.2m telescope mosaic camera which will be commissioned in La Silla starting this October. If our tests and prototypes, together with juke box support are positive, the DVD technology will be the system of choice for this particular archive. Also, we have started to migrate the NTT archive from the current Sony 12" optical disks to DVD. By the time this issue of the Messenger is distributed, we will have copied a few dozen Sony 12" optical disks onto the new medium.
We still expect to have full UDF support later in 1999. Our current experience shows that computer operating system will probably transparently identify and mount media using any of the standards. So the co-existence in the same jukebox of CD-R, DVD-R with ISO9660 and plain DVD-R with UDF should be no problem.

The next step, in 1999 or 2000 will be the gradual migration of our CD-Rs onto the new medium to save juke box storage space, as this is by far still the largest part of the storage cost of CD-Rs.

For more information about this system, please contact the authors (bpirenne@eso.org or malbrech@eso.org). Information about "cdrecord" can be obtained from Jörg Schilling (schilling@fokus.gmd.de). The DVD-R recording device we are using is Pioneer model DVR-S101.


 

    The Jungle with acronyms

Acronym
Meaning
Description
CD
Compact Disc
Mass produced (Audio) CD
CD-ROM
Compact Disc - Read-Only Memory
Mass-produced (silver) CD-ROM
CD-R
CD Recordable
Write-once, read-many CD
 
CD-RW
CD ReWriteable 
Re writeable CD-ROM
DVD
Digital Versatile Disc
Mass-reproduced Video medium
DVD-ROM
DVD Read-only Memory
Mass-reproduced data disc 4.7, 9.4, 18.8 GB.
DVD-R
DVD Recordable
recordable DVD (3.95GB)
DVD-RAM
DVD random access memory
re-writeable DVD (2.6 GB)
DVD-RW
DVD re-writeable
re-writeable DVD (??)
 

References