kraken2: kraken2-build --download-library error for bacteria: ftp_path na

Hi,

When downloading the “bacteria” library through ftp I got the following error rsync_from_ncbi.pl: unexpected FTP path (new server?) for na.

It appears to be due to the presence of missing values na in the assembly_summary.txt file for the ftp_path variable used in the rsync_from_ncbi.pl script.

I imagine those recently appeared in the NCBI files as I did not find any issue about this. Using Kraken version 2.0.8-beta installed with conda.

Maybe fail safes for such cases will be needed in the script?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 20

Most upvoted comments

To resolve the error “rsync_from_ncbi.pl: unexpected FTP path (new server?)”, just replace in the file "libexec/rsync_from_ncbi.pl " “^ftp://” by "^https:// " in line 46.

Hello Right now I am having the same error with archaea (but not with bacteria). @alxsimon: what do you mean by “and to comment the rm -f assembly_summary.txt line in the download_genomic_library.sh script.” Should add # in this line? (sorry if this is a silly question)

You can modify the rsync_from_ncbi.pl(miniconda3/envs/kraken2/libexec/rsync_from_ncbi.pl) file to solve this problem, add in the location of the picture.

if ( $full_path =~/^na/){ next }

a

Hi @SolayMane @jenniferlu717 @SepOrion I have tried the suggestions but still getting the same error here is how the rsync file looks like:

image

A temporary workaround for now is to remove manually the nas from the assembly_summary.txt

awk -v FS='\t' '$20 != "na" {print $0}' assembly_summary.txt > new_assembly_summary.txt
cp new_assembly_summary.txt assembly_summary.txt

and to comment the rm -f assembly_summary.txt line in the download_genomic_library.sh script.

To resolve the error “rsync_from_ncbi.pl: unexpected FTP path (new server?)”, just replace in the file "libexec/rsync_from_ncbi.pl " “^ftp://” by "^https:// " in line 46.

You saved my day, that’s a quick fix. Thanks a lot.

look for the file “rsync_from_ncbi.pl” (find /home/ -type f -name “rsync_from_ncbi.pl” ) then open it and modify it as explained at the line 46.

I noticed this just recently as well. I’ll try to add in a workaround in the script to fix the error.

Hello Right now I am having the same error with archaea (but not with bacteria). @alxsimon: what do you mean by “and to comment the rm -f assembly_summary.txt line in the download_genomic_library.sh script.” Should add # in this line? (sorry if this is a silly question)

Yes you could add a # before the line to avoid removing the file you just modified to remove the NAs.

But @SepOrion’s solution is better.

The location is here ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt which is downloaded in $DB/library/bacteria/

You can see the nas with cut -f 20 assembly_summary.txt | grep "^na$" for example.