{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to search for ancillary files\n",
    "## Ancillary files :== files like weightmaps, previews, spectra, etc., that are associated to the science data products stored in the ESO Science Archive.\n",
    "\n",
    "<p><em>This section of the <a href=\"https://archive.eso.org/programmatic/HOWTO/\">\"ESO Science Archive Programmatic: HOWTOs\"</a> shows how to programmatically use the <code>phase3v2.product_files</code> table via TAP (tap_obs) to programmatically search and eventually download ancillary files. After illustrating the columns of that table, a few query examples are provided.</p>\n",
    "\n",
    "<p><strong>Usage</strong>: You can access this file either as a static HTML page <a href=\"https://archive.eso.org/programmatic/HOWTO/jupyter/query-ancillary-files/query-ancillary-files.html\">(download it here)</a>, or as an interactive jupyter notebook <a href=\"https://archive.eso.org/programmatic/HOWTO/jupyter/query-ancillary-files/query-ancillary-files.ipynb\">(download it here)</a> which you can download and run on your machine <a href=\"https://jupyter.org/install\">(instructions)</a>. To interact with the jupyter notebook (if you have download it): move up and down the various cells using the arrow keys, execute the code by pressing CTRL+ENTER; you can also modify the code and execute it at will.</em></p>\n",
    "\n",
    "\n",
    "### Introduction\n",
    "\n",
    "Users can browse and download so-called science-ready data products (products from now on) from the ESO Science Archive.\n",
    "\n",
    "Products have instrument and atmospheric signatures removed, possibly are calibrated in physical units, and have noise properties (like limiting magnitude or signal-to-noise ratio) quantified and documented.\n",
    "\n",
    "A product is composed of a science file (mandatorily), and optionally of a number (0-N) of ancillary files, typically created at the same time as the product itself.\n",
    "\n",
    "Example of ancillary files are: weight-maps accompanying images, white-light images (2D image obtained by averaging a data cube along the wavelength axis), tar balls, previews, telluric spectra, masks, etc. For the full list, please see the <a href=\"https://www.eso.org/sci/observing/phase3/p3sdpstd.pdf\">ESO Science Data Product standard</a>.\n",
    "\n",
    "It is useful at times to be able to search through the existing ancillary files. Examples of this are provided in what follows. \n",
    "\n",
    "\n",
    "Table of contents:\n",
    "   * Initialisation step (using pyvo)\n",
    "   * The phase3v2.product_files columns\n",
    "   * Example 1: How to find all the files belonging to a given product, and their metadata\n",
    "   * Example 2: The ESPRESSO ancillary files\n",
    "   * Example 3: Getting all the ANCILLARY.PREVIEW of the MUSE collection\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "pyvo             version 1.4\n"
     ]
    }
   ],
   "source": [
    "import os \n",
    "import sys\n",
    "\n",
    "import pyvo as vo\n",
    "from pyvo.dal import tap\n",
    "\n",
    "# Verify the version of pyvo \n",
    "from pkg_resources import parse_version\n",
    "if parse_version(vo.__version__) < parse_version('1.4'):\n",
    "    raise ImportError('pyvo version must be 1.4 or higher')\n",
    "    \n",
    "print('pyvo             version {version}'.format(version=vo.__version__))\n",
    "\n",
    "# Defining the ESO tap service to query for phase 3 products:\n",
    "tap = vo.dal.TAPService(\"https://archive.eso.org/tap_obs\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "## The  `phase3v2.product_files` columns\n",
    "\n",
    "A data product contains exactly one science file, and contains 0 or more ancillary files. The `phase3v2.product_files` table lists the file components of all Phase 3 data products. For each file, its category, estimated size, file extension, original file name, and its the access point are provided.\n",
    "\n",
    "\n",
    "| Column            | Description                                                                             |\n",
    "|-------------------|-----------------------------------------------------------------------------------------|\n",
    "| product_id        | ESO identifier of a published phase 3 product. A product is composed of a science file, and optionally of a number of ancillary files. Same as the dp_id field in ivoa.ObCore. |\n",
    "| archive_id        | ESO identifier of a file belonging to the product: either a science file (eso_category: \"SCIENCE.*\") or an ancillary file (eso_category: \"ANCILLARY.*\"). If a science file, its id is the same as the id of the product (product_id=archive_id). Ancillary files are not listed in the ivoa.ObsCore table. |\n",
    "| eso_category      | ESO file category; It starts with \"SCIENCE.\" or \"ANCILLARY.\" followed by one or more dot-separated and uppercased tokens describing the file at hand. For the full list of categories, please refer to the ESO Science Data Product standard available at: https://www.eso.org/sci/observing/phase3.html |\n",
    "| extension         | File name extension (upper case) of the product file, e.g.: FITS, PNG, TAR, FZ, etc. |\n",
    "| original_filename | Original name of the product file before ingestion in the ESO archive. It may provide useful hints on the file content (usually described in the release description of the product, available at: https://archive.eso.org/wdb/wdb/adp/phase3_main/query?dp_id=here_the_product_id). It is not an identifier, as in general, there could be multiple files with the same original_filename but different archive_ids. |\n",
    "| access_url        | The download link of the individual archive_id file. |\n",
    "| access_estsize    | Estimated size of the downloaded file in KBytes. It is only \"estimated\" as in general, FITS headers can be patched at download time, making the file size varying with time. |\n",
    "| internal_file_id  | Internal file identification number, useful in combination with the provenance table. |\n",
    "\n",
    "This table can be joined with the ivoa.ObsCore table by matching the product_id with the obscore.dp_id. It can be used, for example, to find all the ancillary files of a certain category for a given instrument.\n",
    "\n",
    "_Note: This table does not cover products that are obsolete or deprecated; the ancillary files of those can only be found using DataLink, as in this example of the obsolete VMC science image whose product_id is ADP.2013-06-20T17:15:42.690, and whose DataLink is: \n",
    "http://archive.eso.org/datalink/links?ID=ivo://eso.org/ID?ADP.2013-06-20T17:15:42.690_ Therein, the records related to ancillary files have semantics set to '#auxiliary'.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 1: How to find all the files belonging to a given product"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Table length=6>\n",
      "         archive_id         extension        eso_category       \n",
      "           object             object            object          \n",
      "--------------------------- --------- --------------------------\n",
      "ADP.2024-03-19T05:58:04.407      FITS           SCIENCE.CUBE.IFS\n",
      "ADP.2024-03-19T05:58:04.408      FITS ANCILLARY.IMAGE.WHITELIGHT\n",
      "ADP.2024-03-19T05:58:04.409       LOG           ANCILLARY.README\n",
      "ADP.2024-03-19T05:58:04.410       PNG          ANCILLARY.PREVIEW\n",
      "ADP.2024-03-19T05:58:04.411       PNG          ANCILLARY.PREVIEW\n",
      "ADP.2024-03-19T05:58:04.412       PNG          ANCILLARY.PREVIEW\n"
     ]
    }
   ],
   "source": [
    "## Given a certain product, how do I find all the files belonging it?\n",
    "# Suppose that the ESO identifier of a product is `ADP.2024-03-19T05:58:04.407`\n",
    "# To find all the files that belong to it, issue the following query:\n",
    "\n",
    "query=\"\"\"SELECT archive_id, extension, eso_category\n",
    "FROM phase3v2.product_files \n",
    "where product_id='ADP.2024-03-19T05:58:04.407'\n",
    "\"\"\"\n",
    "res = tap.search(query)\n",
    "print(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Table length=6>\n",
      "         archive_id                                       access_url                               access_estsize\n",
      "                                                                                                       kbyte     \n",
      "           object                                           object                                     int64     \n",
      "--------------------------- ---------------------------------------------------------------------- --------------\n",
      "ADP.2024-03-19T05:58:04.407 https://dataportal.eso.org/dataPortal/file/ADP.2024-03-19T05:58:04.407        2968672\n",
      "ADP.2024-03-19T05:58:04.408 https://dataportal.eso.org/dataPortal/file/ADP.2024-03-19T05:58:04.408            529\n",
      "ADP.2024-03-19T05:58:04.409 https://dataportal.eso.org/dataPortal/file/ADP.2024-03-19T05:58:04.409            241\n",
      "ADP.2024-03-19T05:58:04.410 https://dataportal.eso.org/dataPortal/file/ADP.2024-03-19T05:58:04.410            340\n",
      "ADP.2024-03-19T05:58:04.411 https://dataportal.eso.org/dataPortal/file/ADP.2024-03-19T05:58:04.411            341\n",
      "ADP.2024-03-19T05:58:04.412 https://dataportal.eso.org/dataPortal/file/ADP.2024-03-19T05:58:04.412            335\n"
     ]
    }
   ],
   "source": [
    "# Use the `access_url` and the `access_estsize` columns\n",
    "# to get the estimated size in kbytes and the download link of each file.\n",
    "\n",
    "query=\"\"\"SELECT archive_id, access_url, access_estsize\n",
    "FROM phase3v2.product_files \n",
    "where product_id='ADP.2024-03-19T05:58:04.407'\n",
    "\"\"\"\n",
    "res = tap.search(query)\n",
    "print(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Table length=6>\n",
      "         archive_id                              original_filename                     \n",
      "           object                                      object                          \n",
      "--------------------------- -----------------------------------------------------------\n",
      "ADP.2024-03-19T05:58:04.407 MU_SCBY_3442284_2024-03-07T08:09:46.977_WFM-NOAO-N_SKY.fits\n",
      "ADP.2024-03-19T05:58:04.408 MU_SIMY_3442284_2024-03-07T08:09:46.977_WFM-NOAO-N_SKY.fits\n",
      "ADP.2024-03-19T05:58:04.409                      r.MUSE.2024-03-07T08:09:46.977_tpl.log\n",
      "ADP.2024-03-19T05:58:04.410                      r.MUSE.2024-03-07T08:09:46.977_tpl.png\n",
      "ADP.2024-03-19T05:58:04.411                      r.MUSE.2024-03-07T08:09:46.977_pst.png\n",
      "ADP.2024-03-19T05:58:04.412                      r.MUSE.2024-03-07T08:36:46.807_pst.png\n"
     ]
    }
   ],
   "source": [
    "# Use the `original_filename` column\n",
    "# to get the original name the file had before ingestion in the ESO Science Archive\n",
    "\n",
    "query=\"\"\"SELECT archive_id, original_filename\n",
    "FROM phase3v2.product_files \n",
    "where product_id='ADP.2024-03-19T05:58:04.407'\n",
    "\"\"\"\n",
    "res = tap.search(query)\n",
    "print(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Note: you could get all those columns at once in a single query,\n",
    "# while here above the query was split in 3 parts, as large tables \n",
    "# do not display well in a jupyter notebook. \n",
    "\n",
    "query = \"\"\"SELECT * \n",
    "FROM phase3v2.product_files \n",
    "where product_id='ADP.2024-03-19T05:58:04.407'\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "## Example 2: The ESPRESSO ancillary files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Table length=18>\n",
      "         archive_id                           original_filename                          eso_category     \n",
      "           object                                   object                                  object        \n",
      "--------------------------- ------------------------------------------------------ -----------------------\n",
      "ADP.2021-04-12T12:27:33.887 ES_SOBF_2174806_2018-09-03T09:11:43.369_HR_2x1_U3.fits        SCIENCE.SPECTRUM\n",
      "ADP.2021-04-12T12:27:33.888 ES_SFLA_2174806_2018-09-03T09:11:43.369_HR_2x1_U3.fits      ANCILLARY.SPECTRUM\n",
      "ADP.2021-04-12T12:27:33.889 ES_SFLA_2174806_2018-09-03T09:22:49.937_HR_2x1_U3.fits      ANCILLARY.SPECTRUM\n",
      "ADP.2021-04-12T12:27:33.890 ES_SFLA_2174806_2018-09-03T09:33:56.370_HR_2x1_U3.fits      ANCILLARY.SPECTRUM\n",
      "ADP.2021-04-12T12:27:33.891 ES_SFLA_2174806_2018-09-03T09:45:02.953_HR_2x1_U3.fits      ANCILLARY.SPECTRUM\n",
      "ADP.2021-04-12T12:27:33.892 ES_SFLA_2174806_2018-09-03T09:56:07.370_HR_2x1_U3.fits      ANCILLARY.SPECTRUM\n",
      "ADP.2021-04-12T12:27:33.893 ES_SFLB_2174806_2018-09-03T09:11:43.369_HR_2x1_U3.fits      ANCILLARY.SPECTRUM\n",
      "ADP.2021-04-12T12:27:33.894 ES_SFLB_2174806_2018-09-03T09:22:49.937_HR_2x1_U3.fits      ANCILLARY.SPECTRUM\n",
      "ADP.2021-04-12T12:27:33.895 ES_SFLB_2174806_2018-09-03T09:33:56.370_HR_2x1_U3.fits      ANCILLARY.SPECTRUM\n",
      "ADP.2021-04-12T12:27:33.896 ES_SFLB_2174806_2018-09-03T09:45:02.953_HR_2x1_U3.fits      ANCILLARY.SPECTRUM\n",
      "ADP.2021-04-12T12:27:33.897 ES_SFLB_2174806_2018-09-03T09:56:07.370_HR_2x1_U3.fits      ANCILLARY.SPECTRUM\n",
      "ADP.2021-04-12T12:27:33.898 ES_SCCA_2174806_2018-09-03T09:11:43.369_HR_2x1_U3.fits           ANCILLARY.CCF\n",
      "ADP.2021-04-12T12:27:33.899 ES_SCCA_2174806_2018-09-03T09:22:49.937_HR_2x1_U3.fits           ANCILLARY.CCF\n",
      "ADP.2021-04-12T12:27:33.900 ES_SCCA_2174806_2018-09-03T09:33:56.370_HR_2x1_U3.fits           ANCILLARY.CCF\n",
      "ADP.2021-04-12T12:27:33.901 ES_SCCA_2174806_2018-09-03T09:45:02.953_HR_2x1_U3.fits           ANCILLARY.CCF\n",
      "ADP.2021-04-12T12:27:33.902 ES_SCCA_2174806_2018-09-03T09:56:07.370_HR_2x1_U3.fits           ANCILLARY.CCF\n",
      "ADP.2021-04-12T12:27:33.903  ES_SOBF_2174806_2018-09-03T09:11:43.369_HR_2x1_U3.tar ANCILLARY.2DECHELLE.TAR\n",
      "ADP.2021-04-12T12:27:33.904           r.ESPRE.2018-09-03T09:11:43.369_com_0000.png       ANCILLARY.PREVIEW\n"
     ]
    }
   ],
   "source": [
    "# The `original_filename` column can be useful to isolate specific ancillary files in a given collection.\n",
    "# Let's take as an example an ESPRESSO stacked 1d spectrum: `ADP.2021-04-12T12:27:33.887`\n",
    "\n",
    "query=\"\"\"SELECT archive_id, original_filename, eso_category\n",
    "from phase3v2.product_files\n",
    "where product_id='ADP.2021-04-12T12:27:33.887' \n",
    "\"\"\"\n",
    "res = tap.search(query)\n",
    "print(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Table length=5>\n",
      "         archive_id                           original_filename                   \n",
      "           object                                   object                        \n",
      "--------------------------- ------------------------------------------------------\n",
      "ADP.2021-04-12T12:27:33.888 ES_SFLA_2174806_2018-09-03T09:11:43.369_HR_2x1_U3.fits\n",
      "ADP.2021-04-12T12:27:33.889 ES_SFLA_2174806_2018-09-03T09:22:49.937_HR_2x1_U3.fits\n",
      "ADP.2021-04-12T12:27:33.890 ES_SFLA_2174806_2018-09-03T09:33:56.370_HR_2x1_U3.fits\n",
      "ADP.2021-04-12T12:27:33.891 ES_SFLA_2174806_2018-09-03T09:45:02.953_HR_2x1_U3.fits\n",
      "ADP.2021-04-12T12:27:33.892 ES_SFLA_2174806_2018-09-03T09:56:07.370_HR_2x1_U3.fits\n"
     ]
    }
   ],
   "source": [
    "# As of 2024, the ESPRESSO pipeline stores measured radial velocities in the header of the individual\n",
    "# non-stacked 1D spectra. It does not report any estimator of the radial velocity in the stacked spectra.\n",
    "# A stacked product stores, along the science file, many ancillary files; among them, the object and the \n",
    "# sky (or Fabry-Perot [FP]) spectra that contributed to the product are stored in the ANCILLARY.SPECTRUM files. \n",
    "# Therefore, to get to the radial velocities measurements of a stacked spectral product, one has to get to\n",
    "# the headers of the ANCILLARY.SPECTRUM taken on source (fiber A), excluding the ones on sky/FP (fiber B).\n",
    "# The fiber A spectra can be recognised by looking at their original_filenames which start with the string:\n",
    "# ES_SFLA (as opposed to ES_SFLAB).\n",
    "# Hence, the query to perform is the following:\n",
    "\n",
    "query=\"\"\"SELECT archive_id, original_filename\n",
    "from phase3v2.product_files\n",
    "where product_id='ADP.2021-04-12T12:27:33.887'\n",
    "and eso_category='ANCILLARY.SPECTRUM'\n",
    "and original_filename like 'ES_SFLA%'\n",
    "\"\"\"\n",
    "res = tap.search(query)\n",
    "print(res)\n",
    "\n",
    "# How would you know all the above?\n",
    "# The above details, the nomenclature and description of those original filenames, \n",
    "# can be found in the data documentation of the ESPRESSO collection. Link to it\n",
    "# can be found, as described above (see the original_filename description), at the address:\n",
    "# https://archive.eso.org/wdb/wdb/adp/phase3_main/query?dp_id=ADP.2021-04-12T12:27:33.887\n",
    "# where the link to the release description is provided,\n",
    "# i.e., https://www.eso.org/rm/api/v1/public/releaseDescriptions/176"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 3: Getting all the ANCILLARY.PREVIEW of the MUSE collection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Table length=3>\n",
      "         product_id                  archive_id                                       access_url                              \n",
      "           object                      object                                           object                                \n",
      "--------------------------- --------------------------- ----------------------------------------------------------------------\n",
      "ADP.2016-06-01T14:07:30.134 ADP.2016-06-01T14:07:30.137 https://dataportal.eso.org/dataPortal/file/ADP.2016-06-01T14:07:30.137\n",
      "ADP.2016-06-01T14:07:30.134 ADP.2016-06-01T14:07:30.138 https://dataportal.eso.org/dataPortal/file/ADP.2016-06-01T14:07:30.138\n",
      "ADP.2016-06-01T14:07:30.134 ADP.2016-06-01T14:07:30.139 https://dataportal.eso.org/dataPortal/file/ADP.2016-06-01T14:07:30.139\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/amicol/miniconda3/lib/python3.8/site-packages/pyvo/dal/query.py:324: DALOverflowWarning: Partial result set. Potential causes MAXREC, async storage space, etc.\n",
      "  warn(\"Partial result set. Potential causes MAXREC, async storage space, etc.\",\n"
     ]
    }
   ],
   "source": [
    "# A user is interested in getting the access_url of all preview files of the MUSE collection.\n",
    "# Here it is shown the query that can be used to get that info, followed by a detailed explanation.\n",
    "\n",
    "query = \"\"\"SELECT product_id,archive_id, product_files.access_url\n",
    "           from phase3v2.product_files product_files\n",
    "           inner join ivoa.ObsCore obscore on obscore.dp_id = product_files.product_id\n",
    "           where obscore.obs_collection = 'MUSE'\n",
    "           and eso_category='ANCILLARY.PREVIEW'\"\"\"\n",
    "\n",
    "# Showing here only the first 3 previews as an example of what the query returns:\n",
    "res = tap.search(query, maxrec=3)\n",
    "print(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Table length=1>\n",
      "           dp_id           \n",
      "           object          \n",
      "---------------------------\n",
      "ADP.2024-11-19T12:54:21.960\n"
     ]
    }
   ],
   "source": [
    "# Explanation of the above MUSE query\n",
    "\n",
    "# The ivoa.ObsCore table stores the metadata of all the Phase 3 data products (+ ALMA products).\n",
    "# As an example, the following query returns the id of one product from the MUSE collection:\n",
    "query = \"\"\"SELECT top 1 dp_id from ivoa.ObsCore where obs_collection='MUSE' order by dp_id desc\"\"\"\n",
    "res = tap.search(query)\n",
    "print(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Table length=3>\n",
      "         product_id                  archive_id                                       access_url                              \n",
      "           object                      object                                           object                                \n",
      "--------------------------- --------------------------- ----------------------------------------------------------------------\n",
      "ADP.2024-03-19T05:58:04.407 ADP.2024-03-19T05:58:04.410 https://dataportal.eso.org/dataPortal/file/ADP.2024-03-19T05:58:04.410\n",
      "ADP.2024-03-19T05:58:04.407 ADP.2024-03-19T05:58:04.411 https://dataportal.eso.org/dataPortal/file/ADP.2024-03-19T05:58:04.411\n",
      "ADP.2024-03-19T05:58:04.407 ADP.2024-03-19T05:58:04.412 https://dataportal.eso.org/dataPortal/file/ADP.2024-03-19T05:58:04.412\n"
     ]
    }
   ],
   "source": [
    "# You already know that you need to query the `phase3v2.product_files` table \n",
    "# to find all the files that belong to that product, and that you can restrict\n",
    "# to just the ANCILLARY.PREVIEW files:\n",
    "\n",
    "query = \"\"\" SELECT product_id,archive_id, product_files.access_url \n",
    "            from phase3v2.product_files product_files\n",
    "            where product_id='ADP.2024-03-19T05:58:04.407' \n",
    "            and eso_category='ANCILLARY.PREVIEW'\n",
    "            \"\"\"\n",
    "res = tap.search(query)\n",
    "print(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Table length=1>\n",
      "      from_table       from_column target_table target_column\n",
      "        object            object      object        object   \n",
      "---------------------- ----------- ------------ -------------\n",
      "phase3v2.product_files  product_id ivoa.ObsCore         dp_id\n"
     ]
    }
   ],
   "source": [
    "# You could combine the above two queries into a single one,\n",
    "# joining the two tables ivoa.ObsCore and phase3v2.product_files.\n",
    "\n",
    "# To find out the correct joining condition, we query the standard TAP_SCHEMA keys and key_columns tables:\n",
    "query = \"\"\"SELECT from_table, from_column, target_table, target_column\n",
    "        from keys, key_columns\n",
    "        where keys.key_id=key_columns.key_id\n",
    "        and target_table='ivoa.ObsCore' and from_table='phase3v2.product_files'\"\"\"\n",
    "res = tap.search(query)\n",
    "print(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The join is therefore to be performed via the:\n",
    "#     ivoa.ObsCore dp_id primary key,\n",
    "# and the \n",
    "#     phase3v2.product_files archive_id foreign key,\n",
    "# using the syntax:\n",
    "#\n",
    "#     inner join ivoa.ObsCore obscore on obscore.dp_id = product_files.product_id\n",
    "#\n",
    "# and so obtaining the query:\n",
    "\n",
    "query = \"\"\"SELECT product_id,archive_id, original_filename \n",
    "           from phase3v2.product_files product_files\n",
    "           inner join ivoa.ObsCore obscore on obscore.dp_id = product_files.product_id\n",
    "           where obscore.obs_collection = 'MUSE'\n",
    "           and eso_category='ANCILLARY.PREVIEW'\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Disk space required to download all the MUSE previews"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Table length=1>\n",
      "num_files  size_gb \n",
      "  int32    float64 \n",
      "--------- ---------\n",
      "    64782 16.777803\n"
     ]
    }
   ],
   "source": [
    "# If we want to download all the MUSE previews, we need some disk space available...\n",
    "# How much disk space is needed? Let's make a query to discover that:\n",
    "\n",
    "query = \"\"\"SELECT count(*) num_files, sum(product_files.access_estsize)/1000./1000. size_GB\n",
    "           from phase3v2.product_files product_files\n",
    "           inner join ivoa.ObsCore obscore on obscore.dp_id = product_files.product_id\n",
    "           where obscore.obs_collection = 'MUSE'\n",
    "           and eso_category='ANCILLARY.PREVIEW'\n",
    "           \"\"\"\n",
    "res = tap.search(query)\n",
    "print(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "#\n",
    "# Be sure to have available at least 16 GB of disk space before starting the download.\n",
    "#"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['https://dataportal.eso.org/dataPortal/file/ADP.2016-06-01T14:07:30.137'\n",
      " 'https://dataportal.eso.org/dataPortal/file/ADP.2016-06-01T14:07:30.138'\n",
      " 'https://dataportal.eso.org/dataPortal/file/ADP.2016-06-01T14:07:30.139'\n",
      " 'https://dataportal.eso.org/dataPortal/file/ADP.2016-06-01T14:07:30.143'\n",
      " 'https://dataportal.eso.org/dataPortal/file/ADP.2016-06-01T14:07:30.144'\n",
      " 'https://dataportal.eso.org/dataPortal/file/ADP.2016-06-01T14:07:30.145']\n"
     ]
    }
   ],
   "source": [
    "# Here it is shown how to get the links to the ancillary files.\n",
    "# It is left to the user to actually perform the download using those links.\n",
    "\n",
    "# As an example here the query shows only the first 6 links:\n",
    "query = \"\"\"SELECT top 6 product_files.access_url as url\n",
    "           from phase3v2.product_files product_files\n",
    "           inner join ivoa.ObsCore obscore on obscore.dp_id = product_files.product_id\n",
    "           where obscore.obs_collection = 'MUSE'\n",
    "           and eso_category='ANCILLARY.PREVIEW'\"\"\"\n",
    "\n",
    "res = tap.search(query)\n",
    "\n",
    "print(res['url'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The download part is not shown here, please refer to other jupyter notebooks to get that done."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}