This section of the "ESO Science Archive Programmatic: HOWTOs" shows how to programmatically query the database table that exposes the metadata of the reduced data, using Python.
_Usage: You can access this file either as a static HTML page (download it here), or as an interactive jupyter notebook (download it here) which you can download and run on your machine (instructions). To interact with the jupyter notebook (if you have download it): move up and down the various cells using the arrow keys, execute the code by pressing CTRL+ENTER; you can also modify the code and execute it at will._
The table is called ivoa.ObsCore and it is fully compliant to the so-called ObsCore Virtual Observatory standard: Observation Data Model Core Components and its Implementation in the Table Access Protocol, Version 1.1, IVOA Recommandation 09 May 2017, Louys et al.
In the following, it will be shown how to query the ivoa.ObsCore table using the IVOA Astronomical Data Query Language, Version 2, IVOA Recommandation 30 October 2008, Ortiz et al.
You can interact with this jupyter notebook: move up and down the various cells using the arrow keys, execute the code by pressing CTRL+ENTER, modify the code and execute again. Or if you prefer you can download it, and play with it on your machine (link to be provided).
Table of Content
First of all, let's:
import os
import sys
import numpy as np
from astropy.visualization import astropy_mpl_style
from astropy import table
from astropy.coordinates import SkyCoord
from astropy.units import Quantity
from pyvo.dal import tap
import matplotlib.pyplot as plt
import pandas
ESO_TAP_OBS = "http://archive.eso.org/tap_obs"
tapobs = tap.TAPService(ESO_TAP_OBS)
Each dataset in the archive has been assigned a unique ESO identifier. For reduced products this is a 27-character name whose first 4 characters are 'ADP.' followed by a datetime string, up to the millisecond. Example: ADP.2013-12-10T16:12:17.697 (That timestamp is **not** the datetime when the observation was taken, take it just as a synthetic string without any associated meaning)
Completely equivalently, a Virtual Observatory identifier (colloquially called "ivorn") exists, and takes the form of constant string 'ivo://eso.org/ID?' prepended to the ESO identifier. Example: ivo://eso.org/ID?ADP.2013-12-10T16:12:17.697
The look for a given reduced dataset, the query can be expressed using either the ESO or the VO identifier; the ESO identifier is stored in the ESO-specific non-standard column 'dp_id', while the ivorn is stored in the ObsCore standard column 'obs_publisher_did'. Let's execute the query by 'dp_id', while showing in the comment the 'obs_publisher_did' equivalent:
dp_id = 'ADP.2013-12-10T16:12:17.697'
query = """SELECT *
FROM ivoa.ObsCore
WHERE dp_id = '%s'""" % (dp_id)
# ivorn = 'ivo://eso.org/ID?' + dp_id
# query = """SELECT *
# FROM ivoa.ObsCore
# WHERE obs_publisher_did = '%s'""" % (ivorn)
print(query)
Send the ADQL query to the ESO TAP service and get the resulting table
Beware: the next cell could return some warnings about wrong UCDs; that simply means that you have not yet updated to the latest version of astropy. Anyways, those warnings are not important, and you can continue disregarding them.
Passing the query to the tapobs service and getting its results is easy, thanks to the pyvo module. The search returns the table of results (res), consisting of a single record given that we constrained the query by a unique identifier.
res = tapobs.search(query=query)
print(res)
It seems useful before continuing, to have a closer look at the previous result set, so to gain a feeling of the kind of information/columns that can be found in the ObsCore table.
Given the 'SELECT *' in the query above, the result set contains all the ObsCore columns.
The ESO ObsCore table contains 41 columns of the ObsCore standard, plus 9 ESO-specific columns; the ESO-specific are:
The following shows all the returned columns (and their values for the specific dataset). You can use any of those columns to add constraints in your queries.
To gather full information on each column (like units, description, UCD, etc), please refer to the section The ivoa.ObsCore columns at the bottom of this page.
for colname in res.to_table().colnames:
print("{:20s} \t = \t {}".format(colname, res[colname][0]))
In case you know the original file name defined by the data provider before ingestion into the ESO archive, you must use the 'obs_creator_did' column (which is also an ivorn), as shown here:
origfile = 'UV_SFLX_981908_2013-10-31T02:09:09.322_BLU390d1_2x2_11.fits'
ivorn = 'ivo://eso.org/origfile?' + origfile
query = """SELECT *
FROM ivoa.ObsCore
WHERE obs_creator_did = '%s'""" % (ivorn)
print(query)
You are interested in a positional/spatial query, when you want to find products around a certain position in the sky. There are multple ways one can pose such question; it all depends on the use case at hand and which data you are looking for.
On one side, the data...
The reduced data are spatially characterised by their footprint (standard ObsCore column: s_region). In the previous example, the footprint of the source table ADP.2017-09-14T08:39:35.649 is described by a polygon. The types of footprins in the ESO archive are:
On the other side, your use case...
You might want to apply different criteria, e.g.:
3.A Give me any dataset closer than N arcsec to a certain point
3.B Give me any dataset covering a certain point,
3.C Give me any dataset covering a user-defined region in its entirety
3.D Give me any dataset intersecting a complex region, i.e. a user-defined polygon
Suppose you are looking for datasets closer than 2.5 arcmin from NGC 4666.
Either you know the equatorial coordinates of your object, or you rely onto a name resolver like SESAME (CDS) to get them. In the end you have a circle defined by the 3 quantities; ra, dec, and radius, all expressed in degrees.
The cone search constraint is defined by the INTERSECTS operator, which takes two footprints as input, and return true (1) if the two intersects in at least 1 point, false (0) if they don't.
In a cone search query, one of the two footprints is the input circle, while the other is the _s_region_ column, which represents the footprint of any dataset.
# Defining the position via SESAME name resolver, and the search radius
target = "NGC 4666"
pos = SkyCoord.from_name(target)
# pos now contains the coordinates of NGC 4666
print("SESAME coordinates for %s: %s (truncated to millidegrees)\n" % (target, pos.to_string()))
sr = 2.5/60. # search radius of 2.5 arcmin, always expressed in degrees
# Cone search: looking for footprints of reduced datasets intersecting a circle of 2.5' around NGC 4666
query = """SELECT *
FROM ivoa.ObsCore
WHERE intersects(s_region, circle('', %f, %f, %f))=1
""" % (pos.ra.degree , pos.dec.degree, sr)
print(query)
res = tapobs.search(query=query, maxrec=1000)
print("Num matching datasets: %d" % (len(res)))
#print(res)
The above shows that the table of results contains 89 records. Below, the distribution of results by the 'dataproduct_type' is graphically illustrated.
from collections import Counter
dp_types = res['dataproduct_type']
dp_types_counts = Counter(dp_types)
df = pandas.DataFrame.from_dict(dp_types_counts, orient='index')
df.plot(kind='bar', rot=45)
If you were looking specifically for spectra, whose footprint is a point, you could have used the stricter CONTAINS operator instead, to ensure that only footprints entirely contained in the defined circle are returned.
Just remember, while INTERSECTS is commutative, the order of the CONTAINS operands is important and defined as:
CONTAINS( contained, container )=1
You might want to repeat the cone search query after changing INTERSECTS with CONTAINS to see the difference. [Answer: images, and measurements derived from those images, no longer show up in the result table given their larger field of view (>2.5')].
Suppose you are interested in finding extended(*) datasets that cover a certain point in the sky (e.g., datasets that could have imaged the progenitor of a supernova).
(*) "Extended" means that their footprint area is > 0.0, excluding from the result set spectra and visibilities whose footprint is a point in the sky.
How to find the datasets that include such point? You can use either the INTERSECTS or the CONTAINS operator, as for a point there is no difference. Here we use CONTAINS.
# Looking for either images or cubes containing the location of SN2016x;
query = """SELECT t_min, abmaglim, dataproduct_type as type, dp_id, obs_release_date
FROM ivoa.ObsCore
WHERE CONTAINS(point('', 193.815, 0.099819), s_region)=1
AND dataproduct_type in ('image', 'cube')
ORDER BY t_min asc"""
# The above query returns all there is, whether observed before or after the SN discovery (which happened on t_min=57408)
# One could imit to only the data taken before the event, by adding the constraint:
# AND t_min < 57408
print(query)
res = tapobs.search(query=query, maxrec=1000)
print("Num matching datasets: %d" % (len(res)))
print(res.to_table())
If you want to ensure that the matching datasets contain an entire region, you must use the CONTAINS operator. In the first operand you place the entire region you want to have covered, while the second operand is the s_region column.
The covered region could be a simple circle, or a more complex shape (e.g. a polygon). Here we show a circle around NGC253: | ![]() |
# The provided polygon embraces the (optical) disk of the NGC 253 galaxy:
# see the purple polygon in: https://tinyurl.com/wws5wgd
query = """SELECT t_min, s_fov, dataproduct_type as type, dp_id, obs_release_date
FROM ivoa.ObsCore
WHERE CONTAINS(CIRCLE('', 11.888002, -25.288220, 0.21), s_region)=1
AND dataproduct_type in ('image', 'cube')
ORDER BY t_min asc"""
#CONTAINS(POLYGON('J2000', 11.69167, -25.42802, 11.70837, -25.39377, 11.82948, -25.28317, 11.95405, -25.1965, 12.04864, -25.15928, 12.05789, -25.19052, 11.95461, -25.28743, 11.84107, -25.37337, 11.75762, -25.41363), s_region)=1
# The above query returns all there is, whether observed before or after the SN discovery (which happened on t_min=57408)
# One could imit to only the data taken before the event, by adding the constraint:
# AND t_min < 57408
print(query)
res = tapobs.search(query=query, maxrec=1000)
print("Num matching datasets: %d" % (len(res)))
print(res.to_table())
Here above you see the list of datasets large enough to entirely embrace the provided circle.
You can search for all datasets intersecting a polygon. This is useful, for example, when looking for optical, infrared, or radio, counterparts of a gravitational wave (GW) event. GW spatial probability maps exist, e.g., see: Ligo Skymap. Those maps can be converted into confidence contours at a certain probability level, resulting in counterclockwise polygons that can be used to search for ESO data potentially covering the GW event.
The query example uses a polygon constructed as explained above for the GW170817 event. | ![]() |
query = """SELECT t_min, snr, abmaglim, dataproduct_type as type, dp_id
FROM ivoa.ObsCore
WHERE INTERSECTS(s_region, POLYGON('J2000', 196.8311,-23.5212, 196.7432,-23.3586, 196.6553,-23.1962, 196.4795,-23.0339, 196.3916,-22.8719, 196.3037,-22.71, 196.2158,-22.5484, 196.1279,-22.3869, 196.04,-22.2257, 195.9521,-22.0646, 195.8643,-21.9037, 195.7764,-21.7429, 195.7764,-21.5824, 195.6885,-21.422, 195.6006,-21.2618, 195.5127,-21.1018, 195.4248,-20.942, 195.3369,-20.7823, 195.3369,-20.6228, 195.249,-20.4634, 195.1611,-20.3043, 195.1611,-20.1452, 195.0732,-19.9864, 194.9854,-19.8277, 194.8975,-19.6692, 194.8975,-19.5108, 194.8096,-19.3526, 194.7217,-19.1945, 194.6338,-19.0366, 194.6338,-18.8788, 194.5459,-18.7212, 194.458,-18.5637, 194.458,-18.4064, 194.3701,-18.2492, 194.458,-18.0922, 194.458,-17.9353, 194.6338,-18.0137, 194.8096,-18.1707, 194.9854,-18.3278, 195.0732,-18.4851, 195.1611,-18.6425, 195.249,-18.8, 195.3369,-18.9577, 195.4248,-19.1155, 195.5127,-19.2735, 195.6006,-19.4317, 195.6885,-19.59, 195.8643,-19.7484, 195.9521,-19.907, 196.1279,-20.0658, 196.2158,-20.2247, 196.3916,-20.3838, 196.4795,-20.5431, 196.5674,-20.7025, 196.6553,-20.8621, 196.7432,-21.0219, 196.8311,-21.1818, 196.9189,-21.3419, 196.9189,-21.5022, 197.0068,-21.6626, 197.0947,-21.8233, 197.1826,-21.9841, 197.2705,-22.1451, 197.3584,-22.3063, 197.4463,-22.4676, 197.5342,-22.6292, 197.6221,-22.7909, 197.71,-22.9529, 197.7979,-23.115, 197.7979,-23.2773, 197.8857,-23.4399, 197.9736,-23.6026, 198.0615,-23.7655, 198.1494,-23.9287, 198.2373,-24.092, 198.3252,-24.2556, 198.4131,-24.4193, 198.501,-24.5833, 198.501,-24.7475, 198.5889,-24.9119, 198.6768,-25.0765, 198.7646,-25.2414, 198.8525,-25.4064, 198.9404,-25.5717, 199.0283,-25.7373, 199.0283,-25.903, 199.1162,-26.069, 199.1162,-26.2352, 199.2041,-26.4017, 199.2041,-26.5684, 199.2041,-26.7353, 199.2041,-26.9025, 199.1162,-26.9025, 198.9404,-26.7353, 198.7646,-26.5684, 198.5889,-26.4017, 198.501,-26.2352, 198.4131,-26.069, 198.2373,-25.903, 198.1494,-25.7373, 198.0615,-25.5717, 197.9736,-25.4064, 197.8857,-25.2414, 197.7979,-25.0765, 197.71,-24.9119, 197.6221,-24.7475, 197.5342,-24.5833, 197.3584,-24.4193, 197.2705,-24.2556, 197.1826,-24.092, 197.0947,-23.9287, 197.0068,-23.7655, 196.9189,-23.6026))=1
ORDER BY t_min asc"""
print(query)
res = tapobs.search(query=query, maxrec=2000)
print("Num matching datasets: %d" % (len(res)))
print(res.to_table())
Are you interested in finding images in different bands of the same sky region, for photometrical studies?
The following example shows how you can compose a spatial join, so to find:
query = """SELECT J.* FROM
(select * FROM ivoa.Obscore WHERE dataproduct_subtype ='srctbl'
AND obs_collection = 'HAWKI'
AND gal_lat < 10 AND gal_lat > -10
AND em_min < 1.265E-6 AND em_max > 1.265E-6 ) J,
(select * FROM ivoa.Obscore WHERE dataproduct_subtype ='srctbl'
AND obs_collection = 'HAWKI'
AND gal_lat < 10 AND gal_lat > -10
AND em_min < 1.66E-6 AND em_max > 1.66E-6 ) H
WHERE INTERSECTS( J.s_region , H.s_region)=1 and
ESO_INTERSECTION( J.s_region , H.s_region) > 0.8*AREA( J.s_region )"""
# Let's get a maximum of 20000 images
res = tapobs.search(query=query, maxrec=20000)
print("")
print("Num matching datasets: %d" % (len(res)))
print(res.to_table())
Notice the J.* in the SELECT part: that is to retrieve just only the information about the J images. You could then repeat the query for the H band. This is useful, for example, to visualise the results in a tool like Aladin, using different colours for the different queries, and hence for the different bands.
Otherwise, you could simply get all of them at once, using SELECT * instead, or using a more selective statement, prepending the column name that you want from the J or H set of results, like in: SELECT J.dp_id, H.dp_id, etc.
Please notice that the IVOA ObsCore standard requires the wavelvengths to be stored un meters.
You can always output them in different units rescaling them in the SELECT statement, as in the following query, but you need to use meters when setting a constraint on the em_min and em_max columns. And this even for data which use frequency on the spectral axis (e.g. radio data). That choice makes possible to standardise the query across all observatories. Please remember to use meters.
query="""SELECT obs_collection as collection, dataproduct_type as type,
dataproduct_subtype as subtype, em_min*1E9 min_wavel_nm, em_max*1E9 max_wavel_nm, em_res_power
FROM ivoa.ObsCore
WHERE target_name = 'a370'
and em_res_power < 3000"""
res = tapobs.search(query=query, maxrec=3)
print(res.to_table())
The ivoa.ObsCore table contains many various scientifically intersting parameters describing the reduced data, from signal-to-noise ratio of the processed spectra, to the limiting magnitude for images and cubes, to the spectral resolution, the min/max wavelengths (for most data types), the (linear) spatial field of view, etc.
Query constraints can be build on any of those parameters, using the corresponding column names; the values in the constraints must use the correct units.
The following query can be used to find out the columns of the ObsCore table, along with their units, UCDs, descriptions, etc.
query="""SELECT column_name, unit, ucd -- and others like description, etc.
from TAP_SCHEMA.Columns
where table_name='ivoa.ObsCore'"""
res = tapobs.search(query=query)
print("{:20s} \t {}\t{}".format('column_name', 'unit', 'UCD'))
print("{:20s} \t {}\t{}".format('--------------------','------','---------------------------'))
for row in res:
print("{:20s} \t {}\t{}".format(row['column_name'].decode(), row['unit'].decode(), row['ucd'].decode()))