astroquery: (403) Forbidden Returned from astroquery.mast.Observations

Using the AWS tutorial on https://mast-labs.stsci.io/ – but for the entire WFC3/IR Image database – I ran into hundreds of “(403) Forbidden” errors when accessing

s3_urls = Observations.get_hst_s3_uris(filtered), which is just a loop over s3_urls = Observations.get_hst_s3_uri(filtered)

When I looked through the code, I found that the error was generated in

$HOME/anaconda3/lib/python3.6/site-packages/astroquery-0.3.9.dev4981-py3.6.egg/astroquery/mast/core.py

then inside get_hst_s3_uri(..), on line 1372 the code reads s3_client.head_object(Bucket=self._hst_bucket, Key=path, RequestPayer='requester')

This is where the error “(403) Forbidden” is being generated.

It only generated the error 2213 out of 60945 image requests; but I am not sure this is a reasonable error or a typo somewhere along the chain.

The code that I used is below (it takes about 2 hours to process):


from astroquery.mast import Observations
from glob import glob
from pyql.file_system.make_fits_file_dict import make_fits_file_dict
from pyql.ingest.make_jpeg import make_jpeg
from tqdm import tqdm

import boto3
import numpy as np
import os

WFC3IR_Filters = ['F105W', 'F110W', 'F125W', 'F140W', 'F160W', 'F098M', 'F127M', 'F139M', 'F153M', 
                    'F126N', 'F128N', 'F130N', 'F132N', 'F164N', 'F167N']

# Enable 'S3 mode' for module which will return S3-like URLs for FITs files
Observations.enable_s3_hst_dataset()

s3 = boto3.resource('s3')

# Create an authenticated S3 session. Note, download within US-East is free
# e.g. to a node on EC2.
s3_client = boto3.client('s3',
                         aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
                         aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'])

bucket = s3.Bucket('stpubdata')

obsTable_dict = {ir_filt:Observations.query_criteria(obs_collection='HST', 
                                                        instrument_name='WFC3/IR', 
                                                        filters=ir_filt) 
                                                        for ir_filt in tqdm(WFC3IR_Filters)}

# Need to chunk the input data because the `Observations.get_product_list` command crashes 
#   with an error related to either json or DataFrames -- neither are diagnosable.
chunk_size = 1000
product_dict_full = {}
for ir_filt in tqdm(WFC3IR_Filters):
    n_obs = len(obsTable_dict[ir_filt])
    n_chunks = n_obs // chunk_size +1
    product_dict_full[ir_filt] = []
    for k in tqdm(range(n_chunks)):
        try:
            product_dict_full[ir_filt].append(Observations.get_product_list(obsTable_dict[ir_filt][k*chunk_size:(k+1)*chunk_size]))
        except Exception as e:
            print(str(e))

# Select only FLT files
# mrp = minimum recommended products
filtered_dict = {}
for ir_filt in tqdm(WFC3IR_Filters):
    filtered_dict[ir_filt] = []
    for product_tbl in product_dict_full[ir_filt]:
        filtered_dict[ir_filt].append(Observations.filter_products(product_tbl, 
                                        mrp_only=False, productSubGroupDescription='FLT', 
                                        dataproduct_type='image'))

# Grab the S3 URLs for each of the observations
s3_urls_dict = {}
for ir_filt in tqdm(WFC3IR_Filters):
    s3_urls_dict[ir_filt] = []
    for kf, filtered_tbl in tqdm(enumerate(filtered_dict[ir_filt]), total=len(filtered_dict[ir_filt])):
        try:
            s3_urls_dict[ir_filt].append(Observations.get_hst_s3_uris(filtered_tbl))
        except Exception as e:
            s3_urls_dict[ir_filt].append([kf, str(e)])

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

This is correct. From my knowledge, the answer to the original question is that astroquery is accessing proprietary root names that do not exist on the AWS HST Public Data bucket (thanks to @ivastar). So, to me, the issue is solved.

I am not looking at any bucket permission issues related this this ticket. As far as I can tell from this ticket @exowanderer discovered that the failed requests are for proprietary data. If that is not the case then please list some products that are failing that you believe should not be. Thanks.

Cc-ing @cam72cam as this looks like it has to do with the AWS connection and he is more familiar with that part.