pysradb: Error using python API for batch SRAweb search
- pysradb version:
0.10.4 - Python version:
3.8.3 - Operating System: mac OS Catalina
10.15.5. But using anaconda environment and pip installation ofpysradb
Description
Came across pysradb to extract the metadata for a batch of SRA runs (~9K). I tried two different approaches, however, both gave different error. Likely because of a missing value on SRAweb, but i am not sure how an error can either be ignored and moved forward.
1st Method
I tried to convert 9K SRA run accessions to SRA study IDs using srr_to_srp and then search approx. 500 accession ids against SRAweb
from pysradb.sraweb import SRAweb
db = SRAweb()
# file.txt has SRA run accession ids. With each ID in new line.
lineList = [line.rstrip('\n') for line in open("file.txt")]
srp = db.srr_to_srp(lineList)
unique_srp = srp.study_accession.unique()
studies_list = unique_srp.tolist()
Metadata = db.sra_metadata(studies_list, detailed=True,)
Metadata.to_csv('Metadata.tsv', sep='\t', index=False)
Error
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-32-d1fb481fd5e3> in <module>
----> 1 Metadata=db.sra_metadata(studies_list, detailed= True)
~/opt/anaconda3/envs/pysradb/lib/python3.8/site-packages/pysradb/sraweb.py in sra_metadata(self, srp, sample_attribute, detailed, expand_sample_attributes, output_read_lengths, **kwargs)
457 # detailed_record[key] = value
458
--> 459 pool_record = record["Pool"]["Member"]
460 detailed_record["run_accession"] = run_set["@accession"]
461 detailed_record["run_alias"] = run_set["@alias"]
KeyError: 'Pool'
2nd Method
In this case I tried to run all 9K SRA run accessions directly against SRAweb
from pysradb.sraweb import SRAweb
db = SRAweb()
# file.txt has SRA run accession ids. With each ID in new line.
lineList = [line.rstrip('\n') for line in open("file.txt")]
Metadata = db.sra_metadata(lineList, detailed=True,)
Metadata.to_csv('Metadata.tsv', sep='\t', index=False)
Error
Traceback (most recent call last):
File "/Users/Zohaib/PycharmProjects/SRA-Metadata/fetchSRAmetadata.py", line 10, in <module>
Metadata = db.sra_metadata(lineList, detailed=True)
File "~/opt/anaconda3/envs/pysradb/lib/python3.8/site-packages/pysradb/sraweb.py", line 425, in sra_metadata
efetch_result = self.get_efetch_response("sra", srp)
File "~/opt/anaconda3/envs/pysradb/lib/python3.8/site-packages/pysradb/sraweb.py", line 250, in get_efetch_response
esearch_response = request.json()
File "~/opt/anaconda3/envs/pysradb/lib/python3.8/site-packages/requests/models.py", line 898, in json
return complexjson.loads(self.text, **kwargs)
File "~/opt/anaconda3/envs/pysradb/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "~/opt/anaconda3/envs/pysradb/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "~/opt/anaconda3/envs/pysradb/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Thanks in advance, looking forward to hear from you. Zohaib
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 19 (11 by maintainers)
Commits related to this issue
- Remove pool member since it was unused. See #46 — committed to saketkc/pysradb by saketkc 4 years ago
- Remove pool member since it was unused. See #46 — committed to saketkc/pysradb by saketkc 4 years ago
- Remove pool member since it was unused. See #46 — committed to bscrow/pysradb by saketkc 4 years ago
- Handle missing organism names. Fixes #46 — committed to bscrow/pysradb by saketkc 4 years ago
Also, SRP040281 has 120k+ records, so it takes approximately 7 minutes on Colab to fetch it which I think is reasonable.
Sorry about the delay in responding. I am able to obtain results for the first two of these ids:
The problem with the third id is a missing organism tag
ERP000171(which ideally should have been Yersinia. I will have a fix for this soon, but this is not really a bug at the pysradb end.Thanks for reporting @anwarMZ, I will be taking a look at it later tomorrow.
Thanks! Saket
The last fix works. Here is an example with your SRP list: https://colab.research.google.com/drive/1pNeuZJjjHliYFk582kGNRpGJ1Fa2h9cn?usp=sharing
Let me know if you still face any errors. I prefer giving it a few seconds of sleep time to make sure it doesn’t hit NCBI’s API limits.