pysradb: Error using python API for batch SRAweb search

pysradb version: 0.10.4
Python version: 3.8.3
Operating System: mac OS Catalina 10.15.5. But using anaconda environment and pip installation of pysradb

Description

Came across pysradb to extract the metadata for a batch of SRA runs (~9K). I tried two different approaches, however, both gave different error. Likely because of a missing value on SRAweb, but i am not sure how an error can either be ignored and moved forward.

1st Method

I tried to convert 9K SRA run accessions to SRA study IDs using srr_to_srp and then search approx. 500 accession ids against SRAweb

from pysradb.sraweb import SRAweb

db = SRAweb()
# file.txt has SRA run accession ids. With each ID in new line.
lineList = [line.rstrip('\n') for line in open("file.txt")]
srp = db.srr_to_srp(lineList)
unique_srp = srp.study_accession.unique()
studies_list = unique_srp.tolist()
Metadata = db.sra_metadata(studies_list, detailed=True,)
Metadata.to_csv('Metadata.tsv', sep='\t', index=False)

Error

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-32-d1fb481fd5e3> in <module>
----> 1 Metadata=db.sra_metadata(studies_list, detailed= True)

~/opt/anaconda3/envs/pysradb/lib/python3.8/site-packages/pysradb/sraweb.py in sra_metadata(self, srp, sample_attribute, detailed, expand_sample_attributes, output_read_lengths, **kwargs)
    457                     # detailed_record[key] = value
    458 
--> 459                 pool_record = record["Pool"]["Member"]
    460                 detailed_record["run_accession"] = run_set["@accession"]
    461                 detailed_record["run_alias"] = run_set["@alias"]

KeyError: 'Pool'

2nd Method

In this case I tried to run all 9K SRA run accessions directly against SRAweb

from pysradb.sraweb import SRAweb

db = SRAweb()
# file.txt has SRA run accession ids. With each ID in new line.
lineList = [line.rstrip('\n') for line in open("file.txt")]
Metadata = db.sra_metadata(lineList, detailed=True,)
Metadata.to_csv('Metadata.tsv', sep='\t', index=False)

Error

Traceback (most recent call last):
  File "/Users/Zohaib/PycharmProjects/SRA-Metadata/fetchSRAmetadata.py", line 10, in <module>
    Metadata = db.sra_metadata(lineList, detailed=True)
  File "~/opt/anaconda3/envs/pysradb/lib/python3.8/site-packages/pysradb/sraweb.py", line 425, in sra_metadata
    efetch_result = self.get_efetch_response("sra", srp)
  File "~/opt/anaconda3/envs/pysradb/lib/python3.8/site-packages/pysradb/sraweb.py", line 250, in get_efetch_response
    esearch_response = request.json()
  File "~/opt/anaconda3/envs/pysradb/lib/python3.8/site-packages/requests/models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)
  File "~/opt/anaconda3/envs/pysradb/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "~/opt/anaconda3/envs/pysradb/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "~/opt/anaconda3/envs/pysradb/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Thanks in advance, looking forward to hear from you. Zohaib

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 19 (11 by maintainers)

Commits related to this issue

Remove pool member since it was unused. See #46 — committed to saketkc/pysradb by saketkc 4 years ago
Remove pool member since it was unused. See #46 — committed to saketkc/pysradb by saketkc 4 years ago
Remove pool member since it was unused. See #46 — committed to bscrow/pysradb by saketkc 4 years ago
Handle missing organism names. Fixes #46 — committed to bscrow/pysradb by saketkc 4 years ago

Most upvoted comments

Also, SRP040281 has 120k+ records, so it takes approximately 7 minutes on Colab to fetch it which I think is reasonable.

saketkc on Jul 12, 2020

Sorry about the delay in responding. I am able to obtain results for the first two of these ids:

SRP040281
SRP046387 https://colab.research.google.com/drive/1UQpJG32BbjHOf0cV6rxmljf8vhqw22R-?usp=sharing

The problem with the third id is a missing organism tag ERP000171 (which ideally should have been Yersinia. I will have a fix for this soon, but this is not really a bug at the pysradb end.

saketkc on Jul 12, 2020

Thanks for reporting @anwarMZ, I will be taking a look at it later tomorrow.

Thanks! Saket

saketkc on Jul 8, 2020

The last fix works. Here is an example with your SRP list: https://colab.research.google.com/drive/1pNeuZJjjHliYFk582kGNRpGJ1Fa2h9cn?usp=sharing

Let me know if you still face any errors. I prefer giving it a few seconds of sleep time to make sure it doesn’t hit NCBI’s API limits.

saketkc on Jun 30, 2020