ncov: [BUG] Unable to reproduce results in running.md

Current Behavior
I downloaded the sequencing data for SARS-CoV-2 from GISAID and I’m trying to use the metadata from this repo to reproduce the results (I was able to reproduce the results of the Zika tutorial). I made a directory called data with the fasta file and the metadata file and tried running

snakemake -p -s Snakefile --cores 2 auspice/ncov.json

but I’m getting the error

Error: Snakefile "Snakefile" not found.

Expected behavior
Reproduce the results described here https://github.com/nextstrain/ncov/blob/master/docs/running.md

How to reproduce
Steps to reproduce the current behavior:

  1. Download fasta sequences from GISAID
  2. Run command from above

Possible solution
(optional)

Your environment: if browsing Nextstrain online

  • Operating system: MacOS
  • Browser: Chrome

Your environment: if running Nextstrain locally

  • Operating system:
  • Browser:
  • Version (e.g. auspice 2.7.0):
➜  nextstrain_local_ncov nextstrain check-setup                
nextstrain-cli is up to date!

Testing your setup…

# docker is supported
✔ yes: docker is installed
✔ yes: docker run works
⚑ warning: containers have access to >2 GiB of memory

  Containers appear to be limited to 2.0 GiB of memory. This
  may not be enough for some Nextstrain builds.  On Windows or
  a Mac, you can increase the memory available to containers
  in the Docker preferences.                        
✔ yes: image is new enough for this CLI version

# native is not supported
✔ yes: snakemake is installed
✔ yes: augur is installed
✘ no: auspice is installed

# aws-batch is not supported
✘ no: job description "nextstrain-job" exists
✘ no: job queue "nextstrain-job-queue" exists
✘ no: S3 bucket "nextstrain-jobs" exists

Supported Nextstrain environments: docker

Additional context
Add any other context about the problem here.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (10 by maintainers)

Most upvoted comments

Is mafft installed/accessible by the user running snakemake? I had a few problems with old versions, but it is working for me now with:

% mafft --version
v7.453 (2019/Nov/8)

I don’t recall that truncated looking error message though. You may also need to confirm the version of nextstrain/augur you are running, just in case.

% augur --version
augur 7.0.2

It doesn’t seem like error messages are making it to your screen so you may also have to dig through the logs generated in ./.snakemake/log/ to see the root causes of some of these.

Hi @cornhundred, it’s not clear from your last post what errors occurred as none seem to appear in the log. But for your prior post with the indication of a duplicate key “hCoV-19/Hong”, it looks like you are using the GISAID sequences download file and have not normalized the sequence names from the format provided by GISAID to the format expected by the nextstrain/ncov pipeline. Specifically what you are hitting there are embedded spaces in the strain names from GISAID that are stripped out in the metadata.tsv strain names.

If you run scripts/normalize_gisaid_fasta.sh path-to-GISAID-download-file data/sequences.fasta then the normalize_gisaid_fasta.sh script should make all the necessary adjustments to strain names in the GISAID fasta file and place the results in data/sequences.fasta.

You may need to run snakemake clean to clean up your working directory from these errored-out attempts.

Regarding the ‘download’ Snakefile rule, that rule is only run if no sequences.fasta file exists, so once you have produced the sequences.fasta file either by hand or from normalize_gisaid_fasta.sh script, the download rule will not be run.