ESO

  The VLT Science Archive System


The ESO Very Large Telescope (VLT) will deliver a Science Archive of astronomical observations well exceeding the 80 Terabytes mark already within its first six years of operations. ESO is undertaking the design and development of both On-Line and Off-Line Archive Facilities. This paper reviews the current planning and development state of the VLT Science Archive project.

Introduction

The VLT Archive System goals can be summarized as follows:
  1. Record the history of VLT observations in the long term and provide the memory of observatory operations.

    The VLT observatory, with its wide spread instrumental capabilities and unique data assurance tools will generate a large reservoir of science data with well understood instrument performance and stable calibration. An archive of these observations will allow astronomers to re-analyze the data from different perspectives. Such systems have shown in the past to develop enormous scientific value. Best known examples are the science archives of the IUE satellite and of the Hubble Space Telescope. In both cases, the data archives have delivered to the community many times more data than the telescope themselves.

    The experience of building and operating a science archive for the ESO New Technology Telescope (NTT at La Silla) has allowed us to understand and master the issues involved in dealing with weather dependency and evolving instrument configuration of a ground-based telescope.

  2. Provide a research tool - make the Science Archive another VLT instrument.

    Due to the very large data volume expected from the VLT (see table below), handling of data for research projects with archive data becomes a task manageable only through large facilities (disk farms, cpu processing power, etc.) and a corresponding data handling environment (common algorithms, housekeeping databases, results databases, catalog correlation tools, etc). In order to support the maximum scientific exploitation of the archive, ESO is planning and developing a Science Archive Research Environment that will support the massive processing of archive data for selected Archive Research Programmes. Such programmes would be peer reviewed and graded by a time allocation committee and, when selected would be given support needed to execute the data processing project.

  3. Help VLT operations to be predictable by providing traceability of instrument performance.

    The assessment of instrument and telescope performance relies heavily in resorting to the memory of the system. Monitoring the engineering and scientific throughput requires access to a large database of runtime parameters, physical characteristics and configuration data. Such database must also keep essential information also on the long-term because slow varying trends and time spanning processes can only be understood if seen in long time periods. The VLT Engineering Archive will cover these needs and include tools to analyze and correlate this large data warehouse.

  4. support observation preparation and analysis.

    The VLT Science Archive system will provide tools and interfaces to astronomers that will enable them to access survey data, common astronomical catalogs, archive data, publications, observatory ambient condition measurements and other information sources. Such tools will be geared to support the preparation of observation projects but also support the post observation analysis phase.

The data volume expected from the different instruments over the next years is listed in Table 1. Figures are given in gigabytes for a typical night during steady state operations. Estimated total rates per night are derived by making assumptions on a mixture of instrument usage for a typical night.

Table 1: Estimated VLT data rates.

In order to achieve the goals listed above, a system is being built that will include innovative features both in the areas of technology and functionality. Among its most distinct features, the system

System Architecture

The main components of the VLT Archive System are (see figure 1): the On-Line Archive Facility (OLAF) and the off-line Science Archive Facility (SAF).

The On-Line Archive System (OLAS) takes care of receiving the data and creates the Observations Catalog while the Archive Storage system (ASTO) saves the data products onto safe, long-term archive media. The SAF includes a copy of ASTO used mainly for retrieval and user request handling, the Science Archive System (SAS) and the Science Archive Research Environment (SARE).

All the data is described in an observations catalog which typically describes the instrument setup that was used for the exposure. Weather and seeing information is recorded in an ambient conditions database. Engineering data, instrument configurations and the operations logs are stored in the Engineering Archive databases.

In addition to the raw science data, all calibration files will be available from the calibration database. The calibration database includes the best suitable data for calibrating an observation at any given time.

Figure 1: Overview of the VLT Archive System Architecture.

The Archive System Features

The archive system being developed for the VLT includes a number of innovative features both in technology as well as from the point of view of information handling.
Replication
The data package delivered to Principal investigators collects in addition to raw and reduced data all the associated information. That includes weather parameters, quality assurance flags for every data product, moon brightness and distance to target, operational events such as alarms and errors and last not least the summary of observations blocks and their execution status. Such feature is possible due to the system's ability to correlate information originating in different sources within the data flow system. From the technical point of view, this has been implemented through an application layer transparent replication of databases between the observatory site and ESO's headquarters.

Safe Store and Data Volume
The safe storage of large data volume has been implemented in terms of a cascading model. Data is collected on-line on an intermediate data stage that is dimensioned to typically hold 3-5 days worth of data. This staging area is put on fault tolerant redundant RAID Level-5 system that offers a high degree of availability. Each of the four VLT Unit Telescopes is equipped with such a system. As part of the daily science operations, the data is transferred to a central high-throughput processing stage (RAID-0) where the data is compressed and write onto permanent media (DVD-ROM). Because of using ATM network interfaces, RAID systems and parallel processing, a maximum throughput of about 5 MB/s can be achieved, where the major bottleneck are compression (1 MB/s per task) and DVD writing (1.5 MB/s per task).

Distributed Modularity
The core Archive System is designed in terms of a basic set of suppliers and subscriber services for on-line processing and a multi-site spanning data server for archive retrieval. This model allows the easy addition of suppliers and subscribers while retaining an application transparent access to data and databases. This model supports transparent transfer of data between the Paranal mountain and ESO's headquarters and is easily reconfigurable according to science operations needs.

The Science Archive Research Environment

Observation data will be stored within the VLT Science Archive Facility and will be available to Science Archive Research programmes one year after the observation was made. However, in face of the very large data amounts, the selection of data for a particular archive research project becomes quickly an unmanageable task. This is due to the fact that even though the observations catalog gives a precise description of the conditions under which the observation was made, it doesn't tell anything about the scientific contents of the data. Hence, archive researchers have to first do a pre-selection of the possibly interesting data sets on the basis of the catalog, then assess each observation by possibly looking at it (preview) and/or by running some automated task to determine its suitability. Such procedure is currently used for archive research with the HST Science Archive and is acceptable when the data volume is limited (e.g. 270 GB of WFPC2 science data within the last 3.5 years of HST operations).

Already during the first year of operations, the VLT will be delivering data quantities that make it not feasible to follow the same procedure for archive research. New tools and data management facilities are required. The ESO/CDS Data Mining Tools project aims at closing the gap and develop methods and techniques that will allow a thorough exploitation of the VLT Science Archive.

The Science Archive Research Environment (SARE) provides the infrastructure to support research programmes on archive data. Figure 2 shows an overview of the SARE setup.

Figure 2: Overview of the VLT Science Archive Research Environment.

Archive Research Programmes are user defined processing chains that are applied to the raw data. Each of the processing steps is called a Reduction Block (RB). Typically the first reduction block would be the re-calibration of data according to the standard calibration pipeline. A reduction block consist of one or more processes which are treated by the system as black boxes, i.e. without any knowledge of its implementation. However, the reduction block interface (input and output data) do comply to a well defined specification. This feature allows any reduction module to become part of the chain. In fact, this flexible architecture also allows the research programme to analyze different kinds of data from images and spectra to catalogs and tables of physical quantities. The output of an archive research programme will be derived parameters that are fed into the data mining database. From there on, the archive research programme will be able to use cross-correlation tools to sort and analyze object parameters for a large sample.

Access to catalogs and survey data

An essential service offered by the VLT Archive System is the on-line access to survey data such as the Digitized Sky Surveys I and II and to very large astrometric catalogs such as the HST Guide Star Catalog and the US Naval Observatory A-1.0 catalog. Such services are used by the Telescope Control System for telescope guiding, by astronomers when preparing observations and by survey projects to quality check processing results.

The development of search engines for large catalogs is an ongoing activity that has already shown a high level of acceptance by the community. As an example, the search engine for the USNO-A1.0 catalog has been accessed more than 60,000 times in its first six months of operations. ESO is now capitalizing upon this expertise and will develop the search engine for the GSC-II export catalog. This catalog is expected to include more than two billion objects with positions and colors.

The Figure below shows a screen dump of the ESO SkyCat Tool showing an image of NGC 1275 obtained from the on-line Digitized Sky Survey server at ESO, and superimposed sources from the USNO-A1.0 catalog (circles) and from the NED database (squares).

Figure 3:The ESO Skycat Tool.

SkyCat is a tool that combines visualization of images and access to catalogs and archive data for astronomy. Main features of SkyCat are described below:

Conclusions

The VLT Archive System being developed will provide the infrastructure needed to offer the Science Archive as an additional instrument of the VLT. The main capabilities of the system will be
The VLT Science Archive Project is developed by the Science Archive Group within ESO's Data Management Division.

Send comments to Miguel Albrecht
Last Updated on Wed March 4, 1998