New media for archive data distribution

Benoît Pirenne, ESO DMD & ST-ECF


The ESO/ST-ECF archive has, in its 13 years of existence, gone through a fairly large amount of data storage media and systems. For data distribution in particular and besides electronic (FTP) transfer, we have been offering our users the choice between a number of media: from the now defunct 9-track tape to the most recent DVD-R. In between, all sorts of formats have come and gone: Quarter-inch cartridge (QIC), DAT-DDS 1 & 2, Exabyte. Still available today are the DAT-DSS 3 and the DLT 4000 and DLT 7000, as well as CD-R.

The reasons for the changes are of course the adoption of the most economical technologies and the adaptation to data volumes. The former is important for our users who must be able to afford reading the data we provide. The latter is essential if we want to limit the total amount of volumes to be written, shipped to the users and later read by him or her. So our choices have always been, to a large extent, cost- and user-driven.


We have now come to a point where a technology change is again necessary: the tape formats we are using are outdated and still suffer from the same lack of user-friendliness as they did in the past. The DVD-R is still fine and economical but is less and less attractive when it comes to deliver data volumes such as those produced by the large mosaic cameras of ground-based telescope (e.g., the Wide Field Imager on ESO's 2.2m telescope) or data from HST's ACS with its many by-products. And the future promises an even worse situation on ESO's side: OmegaCAM will produce data by the "truck load" and VISTA and ALMA are already on the horizon. Based on those considerations, we now realize the need for something as convenient as a DVD, with a better price per GB and a low manipulation cost for the archive site, without requiring on the user's side large investment in media reading equipment. A study was carried out last year with those requirements as a driving force.

The study led to the constitution of a table (reproduced below) listing the current devices in use as well as the new promising technologies.


Table 1: Comparison of currently available digital media suitable for data transport. Besides the media type, the capacity, practical transfer rate, cost per volume and manpower cost per volume are provided. The re-use factor indicates whether a medium would be given away or requested back from the user. TCO stands for "Total cost of ownership". In this case, it indicates how much a typical data package of about 1 TB would cost on a per GB basis.

Type

GB/vol

Transfer rate

Vol Cost

Manpower

Re-use

TCO

 

(uncompr.)

MB/s

/Vol

Hours/Vol

Factor

/GB

DAT-DSS5

36

3.0

24

0.2

1

1.00

LTO-Ultrium2

200

20.0

109

0.2

1

0.60

S-DLT

160

16.0

77

0.2

1

0.55

SAIT

500

20.0

205

0.3

1

0.44

 

 

 

 

 

 

 

DVD-R

4

2.7

2

0.04

1

1.07

USB/FW Disk 250GB

250

14.0

265

0.2

10

0.50

 

 

 

 

 

 

 

FTP

11

1.0

1.4

0.003

1

0.15


The conclusion is obviously that the emphasis should be put more and more on network data transfer, as it is by far the cheapest method available. However, pure electronic data transfer cannot address all possible case as very large data packages will not conveniently copy across, given the current bandwidth available. Moreover, the user or the archive center might have good reasons to want a physical medium to be delivered, even for smaller datasets (e.g., Principal Investigator data).

If we consider the other arguments that determine the suitability of a given medium for data transfer, it is quite clear that some of the tape systems available today such as the Super AIT from Sony or the "Super DLT" come quite cheap. However, next to the inconvenience linked to the sequential access and the loss of the file names due to the use of the FITS tape format, the tape drives required to read the media on the user side are exceedingly expensive (between 5 and 10 K€). This is a burden that would not make recipients of our tapes very happy. This is why we have investigated a new technology: the USB/FireWire external magnetic disk. The disk is still quite expensive but much less so if re-used a number of times. The burden and manual cost of following up disks' whereabouts is compensated by the ease of use for both the archive center and the user. As a matter of fact, those disks are mostly "plug-and-play" and are detected by the computer as soon as they are plugged in. For the largest possible compatibility across operating systems and hardware platforms, a ISO9660-compatible file system is written on the disks such that it will look like a very large CD to the operating system. The disks we are considering to use are either 80GB or 300GB units with USB and/or FireWire interfaces. They have been tested for compatibility with several platforms. Table 2 below summarizes what can be offered.


Operating system

USB

FireWire

Comments

Plug-and-play

Linux RH 9

OK (USB 1.1 min. required)

Not tested

May work with other versions of Linux. USB 2.0 recommended

Need to run a mount command

Solaris 8

OK (USB 1.1 min. required)

Not available

Addonics USB 2.0 PCI host controller (ADUSB2PCI) recommened,

Need to run a mount command

MacOS X

OK (USB 1.1 min. required)

OK

USB 2.0 recommended

Yes

Windows

Not supported

Not supported




At the time of this writing, the necessary software and procedures to manipulate, write and follow up those magnetic disks have been prepared and tested. Further tests with a few "power users" are currently taking place. The new service will be offered starting April 1st, 2004 only for request whose total volume would be in excess of several dozen gigabytes. The recipient of the disks are requested to return them to ESO within about 10 working days.


Parallel to the introduction of the new magnetic disks we will also stop offering tapes as archive data distribution medium. This means that neither DAT nor DLT 4000 and 7000 will be available in the media options of the request submission procedure.