Authentication & Authorisation

How to access private data and metadata Straight to the code please

 

1. Introduction

The ESO science archive contains, for its major part, data that can be searched and downloaded anonymously.

Though, there are some datasets which can be downloaded (1.1), and sometime even searched (1.2), only by authorised users.

 

1.1 Authorised data access

The most typical case is the case of observational data during the so-called "proprietary period": the ESO data policy foresees a period of time (typically one year) during which only the principal investigators [PIs] can download their own scientific data. During that time, the existence and description of those data (the metadata) are visible to the entire community; that is, the proprietary period covers only the scientific data files, not their metadata; at ESO, we call data access the permissions related to accessing the science data files. Note: the PIs can delegate data access of all the data of a given observing run to some other persons; this is done by the PIs themselves assigning specific permissions to registered(1) ESO users, via the Access Control at ESO service.

Another typical case is the case of an Instrument Team that, for a certain amount of time, could be given data access permission to all data files obtained by a specific instrument for operational reasons, etc.

 


Footnote (1): Users can freely register via the ESO User Portal and so receive the ESO user's credentials.

 

1.2 Authorised data searches

In some cases, not only the data but also their metadata (the records describing those data) can be protected, such that an anonymous user is not able to know that some observations exist.

This is done for example in the case of observations taken during the commissioning phase of a new instrument, whereby those data are not considered yet of sufficiently guaranteed quality to permit scientific exploitation by casual users, who could derive wrong understanding.

Another example is the case of scientific observing programmes of sensitive targets, whereby just only revealing what is being observed can pose a threat that risks to diminish the value of the entire observing programme, resulting in a damage to the programme and to the Observatory.

In these cases, not only the data but also the metadata get protected, and only authorised users can get access to those metadata and data.

The GTO policy page describes this, and provides the list of GTO contracts currently subject to data and metadata protection.

The table describing the raw data (dbo.raw) is at the moment the only table that supports authorised data searches. The table of reduced products (ivoa.ObsCore) will be the next candidate table to support this, to permit searches on products derived from metadata-protected raw files.

 

1.2.1 Which users should (not) perform authorised data searches?

Authorised data searches are...

  • ... useful only to:

    • users who have been granted specific permissions (on a run, or on an instrument), for example to members of instrument and operation teams;
    • PIs of confidential (i.e. metadata-protected) observing programmes (or their delagates);
  • ... not useful to:

    • PIs of regular programmes, even if under proprietary period;
    • normal archive users;
      These users can still run authenticated queries (provided they have an ESO account), but:
      • their queries will return exactly the same records as if unauthenticated,
      • their queries will run slower than if unathenticated!

 

1.2.2 Confidentiality

Authorised data searches remain confidential, no matter how you execute them:

  • Synchronous searches (e.g. queries via the VO Table Access Protocol [TAP], or via the special access web query interface) are not stored, hence cannot be seen by anybody else than you;
  • TAP jobs executed with token-authentication are only visible by the user that issued them (2).

 


Footnote (2): Delegation of jobs is currently not supported.

 

2. Authentication

If you are one of those users that has got permissions to search and/or download non-public data/metadata, you need to prove your identity via the so-called Authentication process.

You are probably already aware of this if you are using one of the ESO web services, which prompts you for your ESO credentials (username and password).

But what to do if you want to access the archive programmatically?

Programmatically it works like this:

  • first, you programmatically logon to the ESO authentication service (https://www.eso.org/sso/oidc/token) providing your ESO credentials;
    •  
  • then, you receive back a token (basically a string around 1000 characters long), which you will use to prove your identity.

The received token remains valid for 8 hours. After its expiry, you can get a new one and continue where you left.

Examples: How to authenticate and receive back a token

 

curl: curl -L 'https://www.eso.org/sso/oidc/token?response_type=id_token%20token&grant_type=password&client_id=clientid&client_secret=clientSecret&username=yourusername&password=yourpassword'>
  • change the last two parameters with your own credentials
  • from the curl response be sure to use only the id_token value part
python:
import requests
import json

TOKEN_AUTHENTICATION_URL = "https://www.eso.org/sso/oidc/token"

def getToken(username, password):
"""Token based authentication to ESO: provide username and password to receive back a JSON Web Token."""


    if username==None or password==None:
        return None

    token = None
    try:
        response = requests.get(TOKEN_AUTHENTICATION_URL,
                        params={"response_type": "id_token token",
                                "grant_type":    "password",
                                "client_id":     "clientid",
                                "client_secret": "clientSecret",
                                "username":      username,
                                "password":      password})
        token_response = json.loads(response.content)
        token = token_response['id_token']
    except NameError as e:
        print(e)
    except:
        print("*** AUTHENTICATION ERROR: Invalid credentials provided for username %s" %(username))

    return token

 

 

3. Authorisation

Once you get your token (which remains valid for 8 hours), you can pass it to the ESO archive services you want to use so to exercise your own specific permissions. This is done by inserting the token in the header of your http request.

The header must include the following directive: Authorization: Bearer yourtoken

Here we show how to do that either when accessing directly a ESO service, or when using the pyvo astropy python package.

Examples: How to use your id token

curl: curl -LH 'Authorization: Bearer yourtoken' 'https://dataportal.eso.org/dataPortal/file/HARPS.2027-0J-USTan:ex:am.ple'
python: Most typically, in python you create a python requests session, and add the token to its header.

    session = requests.Session()
    session.headers['Authorization'] = "Bearer " + yourtoken

You will then pass this session to either the eso_programmatic DownloadURL() method, or to the pyvo service encapsulation, for example:

    tap = pyvo.dal.TAPService(TAP_URL, session=session)
or
    status, filename = eso.downloadURL(url, session=session)
etc.

 

Note:Please do not pass your token to anybody else! If you want to give access to some archive assets to a colleague of yours, please use the above mentioned service (Access Control at ESO) to delegate data access (provided that your colleague is already registered with the ESO User Portal(1)).

 

3.1 What happens in case of unauthorised access?

If you try to access a protected archive asset without providing an active token, or if the token is for a user not having the necessary permissions, or the token has expired, the response you get depends on the service you invoke:

  • the dataPortal service for download or cutout returns an http status 401
  • a TAP query for a file with protected medata returns an http status 200, with the table of results not providing any information regarding the protected dataset
  • the Datalink of a file with protected metadata returns an http status 200, with the table of results informing you that the provided dataset either does not exist or you do not have access to it.

Services supporting authentication & authorisation

The programmatic services that support authentication are:

service url supported authentication schemas
description
http://archive.eso.org/tap_obs token write your own query, tap_obs will return all records you have access to (including public records)
http://archive.eso.org/datalink token datalink will return the list of all the files you have access to;
if you run it on a file you do not have access to, you will get a valid response (http status: 200) with message that the file might not exist (not to disclose potentially sensitive information).
https://dataportal.eso.org/dataPortal token, basic the ESO download service accepts token authentication to provide access to non-public files to users who have been granted the necessary permissions.
https://dataportal.eso.org/dataPortal/soda token, basic the cutout service (using the Server-side Operations for Data Access VO protocol) accepts token authentication to provide access to non-public files to users who have been granted the necessary permissions.
https://archive.eso.org/calselector token the service to find calibration reference files associated to one (or more) raw file(s) requires authentication only for the files that have their metadata protected

 

4. Let's code it

A python jupyter notebook with some examples is available here (both in html and ipynb formats), or can be executed live at MyBinder (click on the link, wait a couple of minutes for MyBinder to start the repository, click on Programmatic_Authentication_and_Authorisation.ipynb to lunch it; to interact with it see e.g. this python jupyter notebook primer).

This notebook drives you through the process of:

  1. Authenticating to receive a token
  2. Performing authorised archive searches on raw data via TAP (using your token to exercise your permissions)
  3. Downloading science raw data with authorisation
  4. Finding and downloading the associated calibration reference files (using DataLink and calSelector)
  5. Downloading calibration reference files and the association tree

It uses a little utility module called eso_programmatic.py, downloadable here, which contains, among others, the method to get a token (getToken).