The HST Cache
The HST Cache
What is the HST Cache?
The cache is an envelope around HST archive file production. It is a set of database tables and software agents that ensures that all science pipeline products are locally available preprocessed and readily available from storage at all times. This includes mechanisms to discover newly observed datasets to insert, and automatic reprocessing of datasets which benefit from updates to reference files, available meta-data and general processing software upgrades.
Why do we need a cache?
Since 2002 all data from active instruments has been produced from scratch triggered by user requests. The reasoning behind the On The Fly Reprocessing (OTFR) and On The Fly Calibration (OTFC) pipelines was that it would guarantee that the archive user always would get her data equipped with the newest set of meta-data and calibrated according to the best methods available. This was a clear advantage to the previous system, where the raw data was produced centrally at the STScI and delivered to the partner-sites, essentially freezing that data in time. Another advantage of the system was that it conserved storage space as only the Hubble Space Telescope telemetry files and a few smaller auxiliary files needed to be stored, an important resource aspect when data is stored on optical disks in jukeboxes.
With the advent of cheap mass storage in form of hard-disk arrays this aspect became less important and a number of other drawbacks of the on-the-fly paradigm became apparent over time as well: Live processing of data requires that support is available at all times to resolve errors and bugs in the pipeline, a inevitable task when a system becomes as complex as this with such a heterogeneous set of data as input. Another drawback is the processing speed: Producing a dataset could take from several minutes to hours, which might not be an issue for the patient astronomer, but makes it impossible to expose the data through synchronous VO protocols. Next level efforts like data-mining/metadata harvesting and production of high-level data products is also enormously difficult in the on-the-fly world.
The advantages of the HST Cache are:
- Faster access Speed
- Shields users from processing errors
- Direct programmatic & VO protocol access to the data
- Makes the archive less prone to overall system breakdowns.
- Allows site interoperability and redundancy
- Less maintenance in the long run
- Allows harvesting of meta-data and data-mining
Programmatic access to the data:
As the files now are available pre-processed they can be downloaded directly via the ST-ECF file proxy using a web browser, command-line tool or similar.
The URL to download any given file is:
http://archive.eso.org/archive/hst/proxy/ecfproxy?file_id=<FILE ID>
The file id to give is the datasetname+extension without '.fits'. As an example, to get o6d701030_x1d.fits the URL would be:
http://archive.eso.org/archive/hst/proxy/ecfproxy?file_id=o6d701030_x1d
Please note that the separator between dataset name and extension can be either "." or "_". All older instruments, up to and including WFPC2 use a dot, STIS, NICMOS and ACS have underscores. For an overview over the possible extensions and examples of fileids please consult the instrument specific pages describing the filenames.
In case you use a commandline tool like curl or wget to download a file you might have to specify a output name as they do not always get the filename from the HTTP header, example:
curl -o o6d701030_x1d.fits 'http://archive.eso.org/archive/hst/proxy/ecfproxy?file_id=o6d701030_x1d'
