picard: RevertSam, when REMOVE_ALIGNMENT_INFORMATION=true, should not stall on sam validation errors
E.g. Read CIGAR M operator maps off end of reference
. The point of RevertSam afterall is to remove subpar alignment information to eventually obtain a fresh alignment.
Bug Report
Affected tool(s)
RevertSam, when REMOVE_ALIGNMENT_INFORMATION=true
Affected version(s)
- Latest public release version [2.9.4]
Description
- Original user post: http://gatkforums.broadinstitute.org/gatk/discussion/comment/39949#Comment_39949
- Recapitulate user error with test data: https://github.com/broadinstitute/dsde-docs/issues/2231
Steps to reproduce
- Test data:
/humgen/gsa-scr1/pub/incoming/jfiksel_revertsam_bug.zip
- Test command:
java -jar $PICARD RevertSam \
I=PGDX8157T_Ex_snippet.bam \
O=sandbox/PGDX8157T_Ex_u.bam
Expected behavior
Tool reverts reads to unaligned BAM
Actual behavior
Error message:
WMCF9-CB5:jfiksel_revertsam_error shlee$ java -jar $PICARD RevertSam I=PGDX8157T_Ex_snippet.bam O=sandbox/PGDX8157T_Ex_u.bam
[Fri Jun 30 11:17:58 EDT 2017] picard.sam.RevertSam INPUT=PGDX8157T_Ex_snippet.bam OUTPUT=sandbox/PGDX8157T_Ex_u.bam OUTPUT_BY_READGROUP=false OUTPUT_BY_READGROUP_FILE_FORMAT=dynamic SORT_ORDER=queryname RESTORE_ORIGINAL_QUALITIES=true REMOVE_DUPLICATE_INFORMATION=true REMOVE_ALIGNMENT_INFORMATION=true ATTRIBUTE_TO_CLEAR=[NM, UQ, PG, MD, MQ, SA, MC, AS] SANITIZE=false MAX_DISCARD_FRACTION=0.01 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Fri Jun 30 11:17:58 EDT 2017] Executing as shlee@WMCF9-CB5 on Mac OS X 10.11.6 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_111-b14; Picard version: 2.9.4-SNAPSHOT
INFO 2017-06-30 11:18:04 RevertSam Reverted 1,000,000 records. Elapsed time: 00:00:05s. Time for last 1,000,000: 5s. Last read position: chr17:61,684,897
INFO 2017-06-30 11:18:10 RevertSam Reverted 2,000,000 records. Elapsed time: 00:00:12s. Time for last 1,000,000: 6s. Last read position: chr17:72,477,335
INFO 2017-06-30 11:18:16 RevertSam Reverted 3,000,000 records. Elapsed time: 00:00:18s. Time for last 1,000,000: 5s. Last read position: chr17:76,120,913
INFO 2017-06-30 11:18:22 RevertSam Reverted 4,000,000 records. Elapsed time: 00:00:24s. Time for last 1,000,000: 6s. Last read position: chr17:80,046,790
[Fri Jun 30 11:18:24 EDT 2017] picard.sam.RevertSam done. Elapsed time: 0.44 minutes.
Runtime.totalMemory()=1670381568
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Read name HWI-D00743_115_5_2112_18001_6554_0:0:0:0:0, Read CIGAR M operator maps off end of reference
at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:451)
at htsjdk.samtools.BAMRecord.getCigar(BAMRecord.java:253)
at htsjdk.samtools.SAMRecord.getAlignmentEnd(SAMRecord.java:603)
at htsjdk.samtools.SAMRecord.computeIndexingBin(SAMRecord.java:1547)
at htsjdk.samtools.SAMRecord.isValid(SAMRecord.java:2054)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:811)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:797)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:765)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:576)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:548)
at picard.sam.RevertSam.doWork(RevertSam.java:246)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 22 (18 by maintainers)
@jfiksel The fix has been merged to master.