astroquery: ESO retrieve data - Files keep failing

HI,

I am attempting to download some ESPRESSO data from ESO’s archive via the comment line using astroquery. To do so, I did the folowing (as per ESO’s Support suggestion):

  1. Install astroquery
  2. find its location in the python path (look for the directory: site-packages/astroquery/eso)
  3. create a parallel directory astroquery/esocas
  4. copy the content of astroquery/eso/ into astroquery/esocas
  5. modify astroquery/esocas by editing both the core.py and the init.py and replace any wdb/wdb/eso with wdb/wdb/cas

To query the ESO server, I use:

from astroquery.esocas import Eso

eso = Eso()

eso.ROW_LIMIT = -1

eso.USERNAME = username

eso.login(username, store_password=True)

query_results = eso.query_instrument('ESPRESSO', column_filters={'night':NIGHT, 'dp_cat':'SCIENCE'}, cache= False)

data_files = eso.retrieve_data(query_results['DP.ID'], with_calib='raw', destination=DATA_ROOT request_all_objects = True)

The problem I am facing is that sometimes the download fails on specific files when doing the above, although I am able to download them when I use the web interface.

For example when selecting NIGHT = '2018-09-01', the download consistently failed at filename “ESPRE.2018-09-05T18:26:46.039.fits.Z”, multiple times during the same night/over several hours.

here is the error:

Traceback (most recent call last):

  File "dl_from_eso_archive.py", line 176, in <module>

    main()

  File "dl_from_eso_archive.py", line 121, in main

    data_files = eso.retrieve_data(query_results['DP.ID'], with_calib='raw', destination=DATA_ROOT request_all_objects = True)

  File "/opt/anaconda3/envs/esoPy/lib/python3.7/site-packages/astroquery/esocas/core.py", line 720, in retrieve_data

    state = root.select('span[id=requestState]')[0].text

IndexError: list index out of range

I also had cases where the downloaded file was not a fits.Z file, but a bank login webpage, so I guess the connection somehow timed out?

Any idea what is happening?

PS: not sure if it help/causes issues, but I normally run the code inside “screen”.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 25 (11 by maintainers)

Most upvoted comments

HI @bsipocz I am sorry, though I had answered this already. Upgrading with pip solved my problem, thanks!

Cheers Jorge

@keflavich

Yes, I agree. I only have a couple of suggestions that I believe might help: a) setting up a way to verify that the connection does not timeout from time to time (more difficult to implement), e.g. every 15/30 min; b) adding an option to the ‘retrieve_data’ function so it will only download the list of files and request number instead of the files. This would allow to download the datasets at a later time without the need to make a new dataset request (useful in the case of timeouts)

Thank you for all the help! Cheers, jorge