seq2science: BUG: Workflow not running

Hi,

I have been trying to run the chip-seq workflow of seq2science. It starts but stops when 7% of the jobs are done.

seq2science --version
seq2science: v0.5.1

To Reproduce Please include your config.yaml, your samples.tsv, and the complete/relevant output.

Both config.yaml and samples.tsv were generated from seq2science init chip-seq

  • config.yaml:
# tab-separated file of the samples
samples: samples.tsv

# pipeline file locations
result_dir: ./results  # where to store results
genome_dir: ./genomes  # where to look for or download the genomes
# fastq_dir: ./results/fastq  # where to look for or download the fastqs


# contact info for multiqc report and trackhub
email: yourmail@here.com

# produce a UCSC trackhub?
create_trackhub: true

# how to handle replicates
biological_replicates: fisher  # change to "keep" to not combine them
technical_replicates: merge    # change to "keep" to not combine them

# which trimmer to use
trimmer: fastp

# which aligner to use
aligner: bwa-mem2

# filtering after alignment
remove_blacklist: true
min_mapping_quality: 30
only_primary_align: true

# peak caller
peak_caller:
  macs2:
      --keep-dup 1 --buffer-size 10000

## differential gene expression analysis
#contrasts:
#  - 'descriptive_name_all_HEL'
  • samples.tsv :
# for help with filling out the samples.tsv:
# https://vanheeringen-lab.github.io/seq2science/content/workflows/chip_seq.html#filling-out-the-samples-tsv
# also make sure that you use tab as a delimiter
sample  assembly        descriptive_name
GSM4404624      hg38    HEL

I get several error messages, I include the complete log file: seq2science.2021-04-13T103059.065792.log

The log file in seq2science/results/log/bwa-mem2_index/hg38.log:

Looking to launch executable "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/7fa92a1c/bin/bwa-mem2.avx", simd = .avx
Launching executable "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/7fa92a1c/bin/bwa-mem2.avx"
[bwa_index] Pack FASTA... 18.78 sec
* Entering FMI_search
init ticks = 204386466299
ref seq len = 6199501436
binary seq ticks = 136647971146

Those are the files I got in the genome folder:

(seq2science) jchouaref@res-hpc-exe028:/exports/humgen/jihed/seq2science/genomes/hg38$ tree
.
├── hg38.annotation.bed.gz
├── hg38.annotation.gtf.gz
├── hg38.fa
├── hg38.fa.fai
├── hg38.fa.sizes
├── hg38.gaps.bed
├── index
├── README.txt
└── tmpevip0jtt

Do you think the problem comes from there?

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 20 (10 by maintainers)

Most upvoted comments

Thank you so much for these files! I have added them to my genomes/mm10 folder.

Unfortunately it still does not work. Here are the log and the slurmoutput:

seq2science.2021-04-21T100027.917233.log slurm-2426180.txt

The run goes so fast I am doubting that it’s doing anything. Here is the content of the bwa-index:

jchouaref@res-hpc-lo01:/exports/humgen/jihed/seq2science/genomes/mm10/index/bwa-mem$ ls -ltr
total 5616152
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 2730871864 Apr 20 12:41 mm10.bwt
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf  682717945 Apr 20 12:41 mm10.pac
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf       2857 Apr 20 12:41 mm10.ann
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf      11032 Apr 20 12:41 mm10.amb
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 1365435936 Apr 20 12:56 mm10.sa

So it created it but the results folder is almost empty:

jchouaref@res-hpc-lo01:/exports/humgen/jihed/seq2science/results$ ls -l
total 76
drwxr-sr-x  9 jchouaref 5-A-SHARK_hg_bioinf 203 Apr 20 13:03 benchmark
drwxr-sr-x  3 jchouaref 5-A-SHARK_hg_bioinf  59 Apr 20 13:01 fastq
drwxr-sr-x  2 jchouaref 5-A-SHARK_hg_bioinf   0 Apr 20 17:10 fastq_trimmed
drwxr-sr-x 28 jchouaref 5-A-SHARK_hg_bioinf 921 Apr 20 17:13 log
drwxr-sr-x  3 jchouaref 5-A-SHARK_hg_bioinf 168 Apr 20 13:03 qc
drwxr-sr-x  3 jchouaref 5-A-SHARK_hg_bioinf  28 Apr 19 15:00 sra

Do you think it because I am using a swatch command to distribute the job on the cluster? Then snakemake doesn’t actually know which job are done or not?

I have honestly no clue what is going on here… Sorry, I don’t think I can help you 😭

It should be in /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/scripts/genome_support.py

add

import time
time.sleep(60)

at the bottom

The cluster was a bit busy today, I hope it will run during the night