deepvariant: ValueError: Data loss: Expected mtid >= 0 as mate is supposedly mapped

I’ve successfully run deepvariant with test data. But I keep getting the following error when extracting pileup images from my own provided BAM file. What could be the problem, please?

I1003 20:27:32.183320 140083390310144 make_examples.py:825] Found 0 candidates in chr1:1-1000 [1000 bp] [1.62s elapsed]
I1003 20:27:32.185085 140083390310144 make_examples.py:825] Found 0 candidates in chr1:1001-2000 [1000 bp] [0.00s elapsed]
I1003 20:27:32.186733 140083390310144 make_examples.py:825] Found 0 candidates in chr1:2001-3000 [1000 bp] [0.00s elapsed]
I1003 20:27:32.188343 140083390310144 make_examples.py:825] Found 0 candidates in chr1:3001-4000 [1000 bp] [0.00s elapsed]
I1003 20:27:32.189908 140083390310144 make_examples.py:825] Found 0 candidates in chr1:4001-5000 [1000 bp] [0.00s elapsed]
I1003 20:27:32.191494 140083390310144 make_examples.py:825] Found 0 candidates in chr1:5001-6000 [1000 bp] [0.00s elapsed]
I1003 20:27:32.193065 140083390310144 make_examples.py:825] Found 0 candidates in chr1:6001-7000 [1000 bp] [0.00s elapsed]
I1003 20:27:32.194626 140083390310144 make_examples.py:825] Found 0 candidates in chr1:7001-8000 [1000 bp] [0.00s elapsed]
I1003 20:27:32.196187 140083390310144 make_examples.py:825] Found 0 candidates in chr1:8001-9000 [1000 bp] [0.00s elapsed]
I1003 20:27:32.197738 140083390310144 make_examples.py:825] Found 0 candidates in chr1:9001-10000 [1000 bp] [0.00s elapsed]
Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_8StCi1/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 1188, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/tmp/Bazel.runfiles_8StCi1/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 1178, in main
    make_examples_runner(options)
  File "/tmp/Bazel.runfiles_8StCi1/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 1090, in make_examples_runner
    candidates, examples, gvcfs = region_processor.process(region)
  File "/tmp/Bazel.runfiles_8StCi1/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 808, in process
    self.in_memory_sam_reader.replace_reads(self.region_reads(region))
  File "/tmp/Bazel.runfiles_8StCi1/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 844, in region_reads
    reads, self.options.max_reads_per_partition, self.random)
  File "/tmp/Bazel.runfiles_8StCi1/runfiles/com_google_deepvariant/third_party/nucleus/util/utils.py", line 92, in reservoir_sample
    for i, item in enumerate(iterable):
  File "/tmp/Bazel.runfiles_8StCi1/runfiles/six_archive/six.py", line 558, in next
    return type(self).__next__(self)
  File "/tmp/Bazel.runfiles_8StCi1/runfiles/com_google_deepvariant/third_party/nucleus/io/clif_postproc.py", line 67, in __next__
    not_done, record = self._cc_iterable.Next()
ValueError: Data loss: Expected mtid >= 0 as mate is supposedly mapped: fragment_name:ValueError: Data loss: Expected mtid >= 0 as mate is supposedly mapped: fragment_name: "XXX00-XX000_000:0:0000:0000:000000/0" read_number: 1 number_reads: 2 alignment { position { reference_name: "chr1" position: 10540 reverse_strand: true } mapping_quality: 60 cigar { operation: ALIGNMENT_MATCH operation_length: 50 } } aligned_sequence: "ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGAT" aligned_quality: 35 aligned_quality: 35 aligned_quality: 35 aligned_quality: 35 aligned_quality: 35 aligned_quality: 35 aligned_quality: 35 aligned_quality: 35 aligned_quality: 35 aligned_quality: 35 aligned_quality: 35 aligned_quality: 35 aligned_quality: 35 aligned_quality: 35 aligned_quality: 36 aligned_quality: 36 aligned_quality: 36 aligned_quality: 36 aligned_quality: 35 aligned_quality: 35 aligned_quality: 37 aligned_quality: 37 aligned_quality: 37 aligned_quality: 39 aligned_quality: 39 aligned_quality: 39 aligned_quality: 41 aligned_quality: 41 aligned_quality: 41 aligned_quality: 41 aligned_quality: 41 aligned_quality: 41 aligned_quality: 41 aligned_quality: 41 aligned_quality: 41 aligned_quality: 41 aligned_quality: 41 aligned_quality: 39 aligned_quality: 39 aligned_quality: 39 aligned_quality: 39 aligned_quality: 39 aligned_quality: 37 aligned_quality: 37 aligned_quality: 37 aligned_quality: 37 aligned_quality: 37 aligned_quality: 34 aligned_quality: 34 aligned_quality: 34 next_mate_position { }

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 38

Commits related to this issue

Most upvoted comments

Super-awesome to hear!!! It was a fun team-effort 😃

@zyxue Regarding how it knows about each region, I’m not sure how much do you want to know, since we can go into great detail here. So the big picture of the control flow of the program for make_examples.py is this:

 main() -> 
   make_examples_runner() -> 
      processing_regions_from_options() -> (build_calling_regions, regions_to_process)

The build_calling_regions() first parses out relevant regions to include/exclude by calling a set of Nucleus helper functions to generate the ranges in this file:

https://github.com/google/deepvariant/blob/r0.7/third_party/nucleus/util/ranges.py

Then build_calling_regions() calls regions_to_process() where there is a key line that does a modulo to the number of shards with task_id:

return (r for i, r in enumerate(partitioned) if i % num_shards == task_id)

As you know modulo allows the remainder to be bounded n-1 to the divisor, and thus the distribution of tasks is ideally uniform. Ask more questions, since now you’re getting into Computer Science concepts I know I can easily lose people in the details.

~[p]

Hi @zyxue , my teammate @cmclean pointed out that I might have confused you in my earlier comment ( https://github.com/google/deepvariant/issues/99#issuecomment-428622073 ) because I typed “GPU” parallel instead of “GNU” parallel. Sorry about that. 😕 I corrected it.