 |
The VLT Science Archive System |
The ESO
Very Large Telescope (VLT) will deliver a Science Archive of
astronomical observations well exceeding the 80 Terabytes mark
already within its first six years of operations. ESO is undertaking
the design and development of both On-Line and Off-Line Archive
Facilities. This paper reviews the current planning and development
state of the VLT Science Archive project.
Introduction
The VLT Archive System goals can be summarized as follows:
- Record the history of VLT observations in the long term and provide
the memory of observatory operations.
The VLT observatory, with its wide spread instrumental
capabilities and unique data assurance tools will generate a
large reservoir of science data with well understood instrument
performance and stable calibration. An archive of these
observations will allow astronomers to re-analyze the data from
different perspectives. Such systems have shown in the past to
develop enormous scientific value. Best known examples are the
science archives of the IUE satellite and of the Hubble Space
Telescope. In both cases, the data archives have delivered to
the community many times more data than the telescope
themselves.
The experience of building and operating a science archive for
the ESO New Technology Telescope (NTT at La Silla) has allowed
us to understand and master the issues involved in dealing with
weather dependency and evolving instrument configuration of a
ground-based telescope.
- Provide a research tool - make the Science Archive another
VLT instrument.
Due to the very large data volume expected from the VLT (see
table below), handling of data for research projects with
archive data becomes a task manageable only through large
facilities (disk farms, cpu processing power, etc.) and a
corresponding data handling environment (common algorithms,
housekeeping databases, results databases, catalog correlation
tools, etc). In order to support the maximum scientific
exploitation of the archive, ESO is planning and developing a
Science Archive Research Environment that will support the
massive processing of archive data for selected Archive Research
Programmes. Such programmes would be peer reviewed and graded by
a time allocation committee and, when selected would be given
support needed to execute the data processing project.
- Help VLT operations to be predictable by providing traceability
of instrument performance.
The assessment of instrument and telescope performance relies
heavily in resorting to the memory of the system. Monitoring the
engineering and scientific throughput requires access to a large
database of runtime parameters, physical characteristics and
configuration data. Such database must also keep essential
information also on the long-term because slow varying trends
and time spanning processes can only be understood if seen in
long time periods. The VLT Engineering Archive will cover these
needs and include tools to analyze and correlate this large data
warehouse.
- support observation preparation and analysis.
The VLT Science Archive system will provide tools and interfaces
to astronomers that will enable them to access survey data,
common astronomical catalogs, archive data, publications,
observatory ambient condition measurements and other information
sources. Such tools will be geared to support the preparation of
observation projects but also support the post observation
analysis phase.
The data volume expected from the different instruments over the next years
is listed in Table 1. Figures are given in gigabytes for a typical night
during steady state operations. Estimated total rates per night are derived
by making assumptions on a mixture of instrument usage for a typical night.
Table 1: Estimated VLT data rates.
In order to achieve the goals listed above, a system is being built
that will include innovative features both in the areas of technology and
functionality. Among its most distinct features, the system
- will be scalable through quasi on-line data storage with
DVD Jukeboxes and on-line storage with RAID arrays and HFS;
- will include transparent replication across sites;
- will be data mining-aware through meta-databases of extracted
features and derived parameters.
System Architecture
The main components of the VLT Archive System are (see figure 1): the
On-Line Archive Facility (OLAF) and the off-line Science Archive
Facility (SAF).
The On-Line Archive System (OLAS) takes care of receiving the data and
creates the Observations Catalog while the Archive Storage system
(ASTO) saves the data products onto safe, long-term archive media. The
SAF includes a copy of ASTO used mainly for retrieval and user request
handling, the Science Archive System (SAS) and the Science Archive
Research Environment (SARE).
All the data is described in an observations catalog which typically
describes the instrument setup that was used for the exposure. Weather
and seeing information is recorded in an ambient conditions
database. Engineering data, instrument configurations and the
operations logs are stored in the Engineering Archive databases.
In addition to the raw science data, all calibration files will be
available from the calibration database. The calibration database
includes the best suitable data for calibrating an observation at any
given time.
Figure 1: Overview of the VLT Archive System
Architecture.
The Archive System Features
The archive system being developed for the VLT includes a number of
innovative features both in technology as well as from the point of
view of information handling.
- Replication
-
The data package delivered to Principal investigators collects in
addition to raw and reduced data all the associated information. That
includes weather parameters, quality assurance flags for every data
product, moon brightness and distance to target, operational events
such as alarms and errors and last not least the summary of
observations blocks and their execution status. Such feature is
possible due to the system's ability to correlate information
originating in different sources within the data flow system. From
the technical point of view, this has been implemented through an
application layer transparent replication of databases between the
observatory site and ESO's headquarters.
- Safe Store and Data Volume
-
The safe storage of large data volume has been implemented in terms of
a cascading model. Data is collected on-line on an intermediate data
stage that is dimensioned to typically hold 3-5 days worth of
data. This staging area is put on fault tolerant redundant RAID
Level-5 system that offers a high degree of availability. Each of the
four VLT Unit Telescopes is equipped with such a system. As part of
the daily science operations, the data is transferred to a central
high-throughput processing stage (RAID-0) where the data is compressed
and write onto permanent media (DVD-ROM). Because of using ATM network
interfaces, RAID systems and parallel processing, a maximum throughput
of about 5 MB/s can be achieved, where the major bottleneck are
compression (1 MB/s per task) and DVD writing (1.5 MB/s per task).
- Distributed Modularity
-
The core Archive System is designed in terms of a basic set of
suppliers and subscriber services for on-line processing and a
multi-site spanning data server for archive retrieval. This model
allows the easy addition of suppliers and subscribers while retaining
an application transparent access to data and databases. This model
supports transparent transfer of data between the Paranal mountain and
ESO's headquarters and is easily reconfigurable according to science
operations needs.
The Science Archive Research Environment
Observation data will be stored within the VLT Science Archive
Facility and will be available to Science Archive Research programmes
one year after the observation was made. However, in face of the very
large data amounts, the selection of data for a particular archive
research project becomes quickly an unmanageable task. This is due to
the fact that even though the observations catalog gives a precise
description of the conditions under which the observation was made, it
doesn't tell anything about the scientific contents of the
data. Hence, archive researchers have to first do a pre-selection of
the possibly interesting data sets on the basis of the catalog, then
assess each observation by possibly looking at it (preview) and/or by
running some automated task to determine its suitability. Such
procedure is currently used for archive research with the HST Science
Archive and is acceptable when the data volume is limited (e.g. 270 GB
of WFPC2 science data within the last 3.5 years of HST operations).
Already during the first year of operations, the VLT will be
delivering data quantities that make it not feasible to follow the
same procedure for archive research. New tools and data management
facilities are required. The ESO/CDS Data Mining Tools project aims at
closing the gap and develop methods and techniques that will allow a
thorough exploitation of the VLT Science Archive.
The Science Archive Research Environment (SARE) provides the
infrastructure to support research programmes on archive data. Figure
2 shows an overview of the SARE setup.
Figure 2: Overview of the VLT Science Archive Research
Environment.
Archive Research Programmes are user defined
processing chains that are applied to the raw data. Each of the
processing steps is called a Reduction Block (RB). Typically the first
reduction block would be the re-calibration of data according to the
standard calibration pipeline. A reduction block consist of one or
more processes which are treated by the system as black boxes,
i.e. without any knowledge of its implementation. However, the
reduction block interface (input and output data) do comply to a well
defined specification. This feature allows any reduction module to
become part of the chain. In fact, this flexible architecture also
allows the research programme to analyze different kinds of data from
images and spectra to catalogs and tables of physical quantities. The
output of an archive research programme will be derived parameters
that are fed into the data mining database. From there on, the archive
research programme will be able to use cross-correlation tools to sort
and analyze object parameters for a large sample.
Access to catalogs and survey data
An essential service offered by the VLT Archive System is the on-line
access to survey data such as the Digitized Sky
Surveys I and II and to very large astrometric catalogs such as
the HST Guide Star Catalog and the US Naval Observatory A-1.0 catalog.
Such services are used by the Telescope Control System for telescope
guiding, by astronomers when preparing observations and by survey
projects to quality check processing results.
The development of search engines for large catalogs is an ongoing
activity that has already shown a high level of acceptance by the
community. As an example, the search engine for the USNO-A1.0 catalog
has been accessed more than 60,000 times in its first six months of
operations. ESO is now capitalizing upon this expertise and will
develop the search engine for the GSC-II export catalog. This catalog
is expected to include more than two billion objects with positions
and colors.
The Figure below shows a screen dump of the ESO SkyCat Tool showing
an image of NGC 1275 obtained from the on-line Digitized Sky Survey
server at ESO, and superimposed sources from the USNO-A1.0 catalog
(circles) and from the NED database (squares).
Figure 3:The ESO Skycat Tool.
SkyCat
is a tool that combines visualization of images and
access to catalogs and archive data for astronomy.
Main features of SkyCat are described below:
- visualize a variety of FITS images including support
for World Coordinate System (WCS), interactive measurement of offsets and
other standard visualization functions (SAOimage-like);
- overlay and edit color graphic objects on the image, like
`tagging' sources with text, arrows, circles or other graphic elements such
as masks;
- postscript color printing of the display (image + graphics);
- access and load an image from a network server of the Digitized
Sky Survey scans;
- Access and load catalog information from a number of popular
astronomical catalogs like the HST Guide Star Catalog, the USNO-A1.0
catalog and others;
- Access local user catalogs, either from a file of through a local
search engine. Save catalog data on a local catalog (file).
- Interact with Netscape to display more object information when
available;
- access the observations catalog from the NTT, HST and CFHT
Science Archives including access to preview data when available. Access
the VLT Science Archive in the future;
- access to SIMBAD and NED both as {\em name resolvers} as well
as for information on known objects;
- calculate, display and plot the center position (centroid),
FWHM, angle and other information for a selected star/object;
- support plugins developed externally;
- load lists of catalogs from sites (ESO, CADC, CDS, local) and
allow user to select his/her preferred default catalog list.
Conclusions
The VLT Archive System being developed will provide the infrastructure
needed to offer the Science Archive as an additional instrument of the
VLT. The main capabilities of the system will be
-
handling of very large data volume,
-
routine computer aided feature extraction from raw data,
-
data mining environment on both data and extracted parameters and
-
an Archive Research Programme to support user defined projects.
The VLT
Science Archive Project is developed by the Science
Archive Group within ESO's Data Management
Division.
Send comments to Miguel
Albrecht
Last Updated on Wed March 4, 1998