{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "charming-breast",
   "metadata": {},
   "source": [
    "# ESO Programmatic Authentication & Authorisation  \n",
    "## How to access private data and metadata \n",
    "\n",
    "\n",
    "This jupyter notebook complements with some python examples what described in the <a href=\"http://archive.eso.org/cms/eso-data/programmatic-access/authentication-and-authorisation.html\">ESO Programmatic Authentication &amp; Authorisation</a> documentation page.\n",
    "\n",
    "It drives you through the process of:\n",
    "\n",
    "1. Authenticating to receive a token\n",
    "2. Performing authorised archive searches on raw data via TAP (using your token to exercise your permissions)\n",
    "3. Downloading science raw data with authorisation\n",
    "4. Finding the associated calibration reference files (via DataLink and calSelector)\n",
    "5. Downloading the calibration reference files and the association tree\n",
    "\n",
    "\n",
    "This notebook is based on a little utility module called <code>&nbsp;eso_programmatic.py&nbsp;</code> and <a href=\"eso_programmatic.py\">downloadable here</a>, which contains, among others, the method to get a token (<strong title='def getToken(username, password):&#10;    \"\"\"Token based authentication to ESO: provide username and password to receive back a JSON Web Token.\"\"\"&#10;    if username==None or password==None:&#10;        return None&#10;    token_url = \"https://www.eso.org/sso/oidc/token\"&#10;    token = None&#10;    try:&#10;        response = requests.get(token_url,&#10;                            params={\"response_type\": \"id_token token\",&#10;                                    \"grant_type\":    \"password\",&#10;                                    \"client_id\":        \"clientid\",&#10;                                    \"username\":      username,&#10;                                    \"password\":      password})&#10;        token_response = json.loads(response.content)&#10;        token = token_response[\"id_token\"]+\"==\"&#10;    except NameError as e:&#10;        print(e)&#10;    except:&#10;        print(\"*** AUTHENTICATION ERROR: Invalid credentials provided for username %s\" %(username))&#10;&#10;        return token&#10;'>getToken</strong>). \n",
    "\n",
    "<hr>\n",
    "<b>Note:</b> a live version of this notebook can be run using <a href=\"https://mybinder.org/v2/gh/almicol/eso_authentication_and_authorisation/HEAD\">this MyBinder page</a> (allow some time for the repository to start).\n",
    "<hr>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eastern-prophet",
   "metadata": {},
   "source": [
    "##### Initialisations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "proper-mailing",
   "metadata": {},
   "outputs": [],
   "source": [
    "TAP_URL = \"http://archive.eso.org/tap_obs\"\n",
    "\n",
    "# Note: The TAP_CAT service (used to query public catalogues) does not need to support authentication\n",
    "\n",
    "# Importing useful packages\n",
    "import os \n",
    "import sys\n",
    "import requests\n",
    "#import cgi\n",
    "import json\n",
    "import time\n",
    "\n",
    "import pyvo\n",
    "from pyvo.dal import tap\n",
    "from pyvo.auth.authsession import AuthSession\n",
    "    \n",
    "# Verify the version of pyvo \n",
    "from pkg_resources import parse_version\n",
    "pyvo_version = parse_version(pyvo.__version__) \n",
    "test_pyvo_version = (pyvo_version == parse_version('1.1') or pyvo_version > parse_version('1.2.1') )\n",
    "if not test_pyvo_version:\n",
    "    print('You are using a not supported version of pyvo (version={version}).\\nPlease use pyvo v1.1, v1.3, or higher, not v1.2* [ref. pyvo github issue #298]'.format(version=pyvo.__version__))\n",
    "    raise ImportError('The pyvo version you are using is not supported, use 1.3+ or 1.1.')\n",
    "\n",
    "print('\\npyvo version {version} \\n'.format(version=pyvo.__version__))\n",
    "\n",
    "import eso_programmatic as eso"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "simplified-prospect",
   "metadata": {},
   "source": [
    "<span id='getToken'></span>\n",
    "## 1 Authenticating\n",
    "#### Get an ESO token using your ESO credential\n",
    "With your ESO username and password you can get an authorization token (the *id_token*) using the *getToken()* method (<a href=\"http://archive.eso.org/cms/eso-data/programmatic-access/authentication-and-authorisation.html#getToken\">see it here</a>), part of the *eso_programmatic.py* module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ideal-shelf",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prompt for user's credentials and get a token\n",
    "import getpass\n",
    "\n",
    "username = input(\"Type your ESO username: \")\n",
    "password=getpass.getpass(prompt=\"%s's password: \"%(username), stream=None)\n",
    "\n",
    "token = eso.getToken(username, password)\n",
    "if token != None:\n",
    "    print('token: ' + token)\n",
    "else:\n",
    "    sys.exit(-1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "funny-graph",
   "metadata": {},
   "source": [
    "<span id='authorised_archive_searches'></span>\n",
    "## 2 Authorised archive searches\n",
    "\n",
    "Remember what written in documentation page, at <em><a href=\"http://archive.eso.org/cms/eso-data/programmatic-access/authentication-and-authorisation.html#which_users\">§1.2.1 Which users should (not) perform authorised data searches?</a></em> before performing authorised archive searches! Authorised queries are slower than anonymous queries and only few users will really need that functionality.\n",
    " - *authorised* archive searches are useful only to users with special permissions \n",
    " - a PI of a regular observing programme normally does *not* possess special permissions\n",
    " - authorised queries are slower than anonymous queries, so use them only if you really need them!\n",
    "\n",
    "### 2.1 Setup a python requests session with an Authorization header\n",
    "Create a python requests session and add your token to its header. You will pass this session to an ESO service when you want to ensure that your own permissions are taken into consideration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "loose-hotel",
   "metadata": {},
   "outputs": [],
   "source": [
    "session = requests.Session()\n",
    "session.headers['Authorization'] = \"Bearer \" + token\n",
    "\n",
    "# Initialise a tap service for authorised queries\n",
    "# passing the created \"tokenised\" session\n",
    "# Remember: passing a non tokenised-session, or no session at all, \n",
    "# will result in tap performing anonymous queries:\n",
    "# none of your permissions will be used, hence the queryies will run faster,\n",
    "# and you will not be able to find any file with protected metadata.\n",
    "\n",
    "tap = pyvo.dal.TAPService(TAP_URL, session=session)\n",
    "\n",
    "# for comparison, use: \n",
    "# tap = pyvo.dal.TAPService(TAP_URL) \n",
    "# to execute your queries anonymously"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "seven-intranet",
   "metadata": {},
   "source": [
    "### 2.2 Execute authorised queries \n",
    "Any query you send to the tap service so initialised will be \"authorised\", in the sense that your permissions will be taken into consideration. \n",
    "\n",
    "To achieve this, your query gets modified on-the-fly by the TAP software; the resulting SQL query ensures that you retrieve all the records you have granted access to, including the public ones, and only those. Such modified query (which you do not see) is more complex than the one you actually typed, and cannot be as fast.\n",
    "\n",
    "For this reason we suggest to run authorised queries asynchronously, so to give it more execution time and not waiting for its results, hence avoiding http or application timeouts and possible intervening transient failures.\n",
    "\n",
    "How? Using a TAP job."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "arbitrary-magnet",
   "metadata": {},
   "outputs": [],
   "source": [
    "# define the query you want to run, e.g.:\n",
    "query = \"select top 2 * from dbo.raw where dp_cat='SCIENCE' and prog_id = 'your-protected-observing-run' \"\n",
    "\n",
    "# well, in this example we use a non-protected run, \n",
    "# but please pretend it is actually a protected one given the purpose of this notebook!\n",
    "\n",
    "# let's consider only 2 of its science frames:\n",
    "query = \"select top 2 * from dbo.raw where dp_cat='SCIENCE' and prog_id = '098.C-0739(C)' \"\n",
    "\n",
    "\n",
    "results = None\n",
    "\n",
    "# define a job that will run the query asynchronously \n",
    "job = tap.submit_job(query)\n",
    "\n",
    "# extending the maximum duration of the job to 300s (default 60 seconds)\n",
    "job.execution_duration = 300 # max allowed: 3600s\n",
    "\n",
    "# job initially is in phase PENDING; you need to run it and wait for completion: \n",
    "job.run()\n",
    "\n",
    "try:\n",
    "    job.wait(phases=[\"COMPLETED\", \"ERROR\", \"ABORTED\"], timeout=600.)\n",
    "except pyvo.DALServiceError:\n",
    "    print('Exception on JOB {id}: {status}'.format(id=job.job_id, status=job.phase))\n",
    "\n",
    "print(\"Job: %s %s\" %(job.job_id, job.phase))\n",
    "\n",
    "if job.phase == 'COMPLETED':\n",
    "    # When the job has completed, the results can be fetched:\n",
    "    results = job.fetch_result()\n",
    "\n",
    "# the job can be deleted (always a good practice to release the disk space on the ESO servers)\n",
    "job.delete()\n",
    "\n",
    "# Let's print the results to examine the content:\n",
    "# check out the access_url and the datalink_url\n",
    "if results:\n",
    "    print(\"query results:\")\n",
    "    eso.printTableTransposedByTheRecord(results.to_table()) \n",
    "else:\n",
    "    print(\"!\" * 42)\n",
    "    print(\"!                                        !\")\n",
    "    print(\"!       No results could be found.       !\")\n",
    "    print(\"!       ? Perhaps no permissions ?       !\")\n",
    "    print(\"!       Aborting here.                   !\")\n",
    "    print(\"!                                        !\")\n",
    "    print(\"!\" * 42)\n",
    "    quit()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "auburn-laser",
   "metadata": {},
   "source": [
    "<span id='downloadURL'></span>\n",
    "## 3 Downloading the selected science files using their access_url"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "loaded-hammer",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The access_url field of the dbo.raw table\n",
    "# provides the link that can be used to download the file\n",
    "\n",
    "# Here we pass that link together with your session\n",
    "# to the downloadURL method of the eso_programmatic.py module\n",
    "# (similarly to the authorised queries, if no session is passed, \n",
    "#  downloadURL will attempt to download the file anonymously)\n",
    "\n",
    "print(\"Start downloading...\")\n",
    "for raw in results:\n",
    "    access_url = raw['access_url'] # the access_url is the link to the raw file\n",
    "    status, filepath = eso.downloadURL(access_url, session=session, dirname=\"/tmp\")\n",
    "    if status==200:\n",
    "        print(\"      RAW: %s downloaded  \"  % (filepath))\n",
    "    else:\n",
    "        print(\"ERROR RAW: %s NOT DOWNLOADED (http status:%d)\"  % (filepath, status))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "capable-malta",
   "metadata": {},
   "source": [
    "## 4 Finding and downloading the associated calibration reference files\n",
    "\n",
    "The datalink service (implementing the VO <a href=\"https://www.ivoa.net/documents/DataLink/20150617\">DataLink</a> protocol) helps you find out files related to an input science file (whether raw or product, in this case a raw). Let's call THIS the science file at hand. Datalink in particular can give you back two lists of calibration reference files that can be used to process THIS:\n",
    " - the list of raw calibration reference files (mode: raw2raw)\n",
    " - the list of processed calibration reference files (mode: raw2master)\n",
    " \n",
    "As a side note, Datalink can also offer access to other related files, e.g.:\n",
    " - products generated out of THIS, \n",
    " - provenance files, i.e., the science files that were used to generate THIS\n",
    " - preview file, a quick look of THIS (for products only)\n",
    " - ancillary files of THIS (e.g. a weightmap of an imaging product) (for products only)\n",
    " - data documentation describing the science aim and the processing applied to THIS (for products only)\n",
    " - night log (for raw files only)\n",
    " - processed quicklook (for raw files only)\n",
    "\n",
    "### 4.1 Find the link to the associated calibration reference files (using DataLink)\n",
    "The <code>datalink_url</code> field of the dbo.raw table\n",
    "provides you the link that can be used to find files associated\n",
    "to the selected science frame.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "described-ancient",
   "metadata": {},
   "outputs": [],
   "source": [
    "# A python datalink object is created running\n",
    "# the pyvo DataLinkResults.from_result_url() method onto the datalink_url.\n",
    "\n",
    "# When dealing with files whose metadata are protected, we need to be authorised:\n",
    "# for that we need to pass to the from_result_url() also the above-created python requests session.\n",
    "\n",
    "# For the sake of this example, let's just consider the first science raw frame:\n",
    "first_record = results[0]\n",
    "datalink_url = first_record['datalink_url']\n",
    "\n",
    "datalink = pyvo.dal.adhoc.DatalinkResults.from_result_url(datalink_url, session=session)\n",
    "\n",
    "# The resulting datalink object contains the table of files associated\n",
    "# to SPHER.2016-09-26T03:04:09.308\n",
    "# Note: Were this input file a metadata protected file (it is not, but suppose...),\n",
    "# and had you not passed your session, or had you no permission to see this file,\n",
    "# DataLink would have given you back only a laconic table with the message \n",
    "# that that you do not have access permissions or that the file does not exist.\n",
    "\n",
    "# let's print the resulting datalink table:\n",
    "eso.printTableTransposedByTheRecord(datalink.to_table())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "outdoor-spelling",
   "metadata": {},
   "source": [
    "As shown above, the Datalink result is a table; each of its records provides a pointer (access_url) to an associated file, or to a service that returns associated files (like the calibration reference files); to distinguish among the records, the <code>semantics</code> column can be used. \n",
    "\n",
    "In this case there are 4 records:\n",
    " - semantics = <code>#this</code> :<br>\n",
    "    -  the first record in any datalink response always describes the input file (THIS) <p><br>\n",
    "    \n",
    " - semantics = <code>http://archive.eso.org/rdf/datalink/eso#calSelector_raw2raw</code> :<br>\n",
    "    -  provides a link (access_url) to the the associated raw calibration files <p><br>\n",
    "    \n",
    " - semantics = <code>http://archive.eso.org/rdf/datalink/eso#calSelector_raw2master</code> :<br>\n",
    "    -  provides a link (access_url) to the associated processed calibration files <p><br>\n",
    "    \n",
    " - semantics = <code>http://archive.eso.org/rdf/datalink/eso#night_log</code> :<br>\n",
    "    -  provide a link (access_url) to the associated Night Log report <p><br>\n",
    "    \n",
    "<table>\n",
    "<tr><td style=\"background-color: lightgrey; text-align: left;\"><strong>To know more:</strong><br>\n",
    "  For the two different flavours of calibration files (raw and processed), please refer to the  <a href=\"http://archive.eso.org/cms/application_support/calselectorInfo.html\">documentation page of the calSelector service</a>.\n",
    "</td></tr>\n",
    "<tr><td style=\"background-color: lightgrey; text-align: left;\">\n",
    "   For the description of all possible semantics values, please refer to:\n",
    "   <ul>\n",
    "   <li> <a href=\"http://archive.eso.org/programmatic/rdf/datalink/eso/\">the ESO semantics</a>\n",
    "   <li> <a href=\"http://www.ivoa.net/rdf/datalink/core\">the DataLink VO standard semantics</a>\n",
    "   </ul>\n",
    "</td></tr>\n",
    "</table>\n",
    "\n",
    "\n",
    "Here we want to get the processed calibration files, hence:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "congressional-simulation",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's get the link to the processed calibration files (raw2master)\n",
    "\n",
    "semantics = 'http://archive.eso.org/rdf/datalink/eso#calSelector_raw2master'\n",
    "\n",
    "raw2master_url = next(datalink.bysemantics( semantics )).access_url\n",
    "\n",
    "# which returns the calSelector (see next box) link:\n",
    "# https://archive.eso.org/calselector/v1/associations?dp_id=\\\n",
    "#SPHER.2016-09-26T03:04:09.308&mode=Raw2Master&responseformat=votable"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "recent-treasure",
   "metadata": {},
   "source": [
    "### 4.2 Getting the list of processed calibration reference files (using calSelector and DataLink)\n",
    "\n",
    "The automatic selection of calibration files (raw or processed) is performed by the above-mentioned calSelector service, exposed also programmatically.\n",
    "\n",
    "One of the calSelector interfaces (the _responseformat=votable_ param must be present), is fully compatible with the datalink VO protocol. This means that the same pyvo DatalinkResults.from_result_url() method can be used, e.g., to get the list of associated raw2master files.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "detected-shipping",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Don't forget to pass your session in case the science file has protected metadata!\n",
    "\n",
    "associated_calib_files = pyvo.dal.adhoc.DatalinkResults.from_result_url(raw2master_url, session=session)\n",
    "\n",
    "eso.printTableTransposedByTheRecord(associated_calib_files.to_table())\n",
    "\n",
    "# create and use a mask to get only the #calibration entries,\n",
    "# given that other entries, like #this or ...#sibiling_raw, could be present:\n",
    "calibrator_mask = associated_calib_files['semantics'] == '#calibration'\n",
    "calib_urls = associated_calib_files.to_table()[calibrator_mask]['access_url','eso_category']\n",
    "\n",
    "#eso.printTableTransposedByTheRecord(calib_urls)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cloudy-thing",
   "metadata": {},
   "source": [
    "#### 4.2.1 Check calibration cascade qualities\n",
    "\n",
    "Check if calibration cascade is complete, if it is certified, and if it is actually for processed calib files\n",
    "\n",
    "<table>\n",
    "<tr><td style=\"background-color: lightgrey; text-align: left;\"><b>Beware:</b>\n",
    "When executing a request for processed calibrations, you might get back the raw calibrations instead!   \n",
    "This is happening when no processed calibrations exists for the given raw frame, in which case the service, not to leave you empty-handed, gives back the raw calibrations instead.\n",
    "It is possible to check this, by reading the calibration cascade description, as shown here below.\n",
    "</td></tr>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "animated-program",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Given the above list of \"associated_calib_files\"\n",
    "# and knowing that we requested...\n",
    "mode_requested = \"raw2master\"\n",
    "\n",
    "# ... let's print out some important info and warnings on the received calibration cascade: \n",
    "# - is the cascade complete? \n",
    "# - is the cascade certified?\n",
    "# - has the cascade being generated for the mode you requested (processed calibrations) or not?\n",
    "\n",
    "# That info is embedded in the description field of the #this record.\n",
    "# We use the printCalselectorInfo of the eso_programmatic.py to parse/make sense of it.\n",
    "\n",
    "this_description=next(associated_calib_files.bysemantics('#this')).description\n",
    "\n",
    "alert, mode_warning, certified_warning = eso.printCalselectorInfo(this_description, mode_requested)\n",
    "\n",
    "if alert!=\"\":\n",
    "    print(\"%s\" % (alert))\n",
    "if mode_warning!=\"\":\n",
    "    print(\"%s\" % (mode_warning))\n",
    "if certified_warning!=\"\":\n",
    "    print(\"%s\" % (certified_warning))\n",
    "    \n",
    "question = None\n",
    "answer = None\n",
    "if len(calib_urls):\n",
    "    print()\n",
    "    if alert or mode_warning or certified_warning:    \n",
    "        question = \"Given the above warning(s), do you still want to download these %d calib files [y/n]? \" %(len(calib_urls))\n",
    "    else:\n",
    "        question = \"No warnings reported, do you want to download these %d calib files [y/n]? \" %(len(calib_urls))\n",
    "\n",
    "while answer != 'y' and answer != 'n':\n",
    "    answer = input(question)\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "jewish-symphony",
   "metadata": {},
   "source": [
    "### 4.3 Downloading the calibration reference files\n",
    "\n",
    "To download the calibration files we use again the <code>downloadURL</code> method of the <code>eso_programmatic.py</code> module.\n",
    "\n",
    "All ESO calibration files are open to the public, hence there is no need to pass your token/session."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "strong-helen",
   "metadata": {},
   "outputs": [],
   "source": [
    "if answer == 'y':\n",
    "    print(\"Downloading the %d calibration reference files...\" % (len(calib_urls)) )\n",
    "\n",
    "    i_calib=0\n",
    "    for url,category in calib_urls:\n",
    "        i_calib+=1\n",
    "        status, filename = eso.downloadURL(url)\n",
    "        if status==200:\n",
    "            print(\"    CALIB: %4d/%d dp_id: %s (%s) downloaded\"  % (i_calib, len(calib_urls), filename, category))\n",
    "        else:\n",
    "            print(\"    CALIB: %4d/%d dp_id: %s (%s) NOT DOWNLOADED (http status:%d)\"  % (i_calib, len(calib_urls), filename, category, status))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "living-sherman",
   "metadata": {},
   "source": [
    "### 4.4 Getting the Association Tree describing the relations among the science frame and calibration files\n",
    "\n",
    "You might have spotted above, that the <code>associated_calib_files</code>, generated invoking the raw2master_url, provides not only the <code>#calibrator</code> entries, but also an entry for the association tree.\n",
    "\n",
    "<code>Association Tree :== file describing the relations among the input raw frame(s)\n",
    "                           and the calibration files (in custom XML format)</code>\n",
    "\n",
    "You can use its semantics to find its access_url, as shown here below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "british-integer",
   "metadata": {},
   "outputs": [],
   "source": [
    "association_tree_semantics = 'http://archive.eso.org/rdf/datalink/eso#calSelector_raw2master'\n",
    "\n",
    "# Notice that the datalink service and the calselector service use the same semantics\n",
    "# to indicate two different things:\n",
    "# - in datalink: it points to the distinct list of calibration reference files (responseformat=votable);\n",
    "#                its eso_category is not defined\n",
    "# - in calselector: it points to the calibration cascade description (format still XML but not votable);\n",
    "#                its eso_category is set to \"ASSOCIATION_TREE\"\n",
    "\n",
    "association_tree_mask = associated_calib_files['semantics'] == association_tree_semantics\n",
    "association_tree = associated_calib_files.to_table()[association_tree_mask]['access_url','eso_category']\n",
    "\n",
    "for url, category in association_tree:\n",
    "    # the url points to the calselector service, which, for metadata protected files, needs a tokenised-session\n",
    "    status, filename = eso.downloadURL(url, session=session)\n",
    "    print(url)\n",
    "    if status == 200:\n",
    "        print(\"  Association tree: %s (%s) downloaded\"  % (filename, category))\n",
    "    else:\n",
    "        print(\"  Association tree: %s (%s) NOT DOWNLOADED (http status:%d)\"  % (filename, category, status))\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}