rust-htslib: Passing `bam::record::Record` between threads causes a segfault

Thank you for the very nice library!

I’m working on a tool that passes reads between threads. Unfortunately, the tool is producing non-deterministic segmentation faults and other memory errors. I’ve traced some of these issues back to rust-htslib, which will crash somewhat randomly when bam::record::Records are passes between threads. I am not too familiar with this library, so I am wondering if this is the expected behavior?

Here is a simplified example that can reproduce the crash:

use std::error::Error;
use std::str;
use std::sync::mpsc::{self, Receiver};
use std::thread;

use rust_htslib::bam::{Read, Reader};
use rust_htslib::bam::record::Record;

fn sum_mapqs(rx: Receiver<Option<Record>>) -> Result<(), Box<dyn Error>> {
    let mut total_mapq = 0u64;
    loop {
        match rx.recv() {
            Ok(x) => {
                match x {
                    Some(read) => {
                        let mapq = read.mapq();
                        total_mapq = total_mapq.saturating_add(mapq as u64);
                    },
                    None => {  // No more data
                        println!("Total MapQ: {}", total_mapq);
                        return Ok(());
                    },
                }
            },
            Err(e) => {
                eprintln!("Error reciving data: {}", e);
            }
        }
    }
}

fn main() -> Result<(), Box<dyn Error>> {
    let mut bam = match Reader::from_stdin() {
        Ok(bam) => { bam },
        Err(e) => { return Err(Box::new(e)); },
    };

    // Initialize the writer thread
    let (tx, rx) = mpsc::channel();
    let writer = thread::spawn(move || {
        if let Err(e) = sum_mapqs(rx) {
            eprintln!("Error writing output - {}", e);
        }
    });

    for read in bam.records() {
        match read {
            Ok(read) => {
                eprintln!("Parsed read: {}", str::from_utf8(read.qname()).unwrap());
                if let Err(e) = tx.send(Some(read)) {
                    eprintln!("Error sending data to writer thread");
                    return Err(Box::new(e));
                }
            },
            Err(e) => {
                return Err(Box::new(e));
            },
        }
    }

    // Close the spawned thread
    let _ = tx.send(None);
    let _ = writer.join().unwrap();
    Ok(())
}

Compiling with RUSTFLAGS="-g" cargo build and then running with a SAM passed through stdin produces the following:

$ cat test.sam | target/debug/rust-htslib-crash
...
Parsed read: H203:185:D2990ACXX:4:1101:11561:5493
23559 Broken pipe             cat test.sam
23560 Segmentation fault      (core dumped) | target/debug/rust-htslib-crash

Re-running the same command will produce the crash in different parts of the program. I’ve attached a backtrace from one of the crashes: crash backtrace.txt

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 28 (22 by maintainers)

Commits related to this issue

Most upvoted comments

Reviving this issue again.

Personally, my vote is for 3. I would also think that 1 is a no-go as being unable to pass bam::Records between threads would limit the usefulness of the library. I’m not sure what the reasoning is for implementing bio_types::genome::AbstractInterval for bam::Records, so it’s hard for me to weigh the merits of 2 vs. 3.

@veldsla I agree. As it stands the Send/Sync are not valid. My primary vote is for 3 as well, then 1. There may be performance benefits from having a smaller bam::Record struct. HeaderView may be needed to get reference sequence name from tid so we will need to document how to get that, (c.f. https://github.com/rust-bio/rust-htslib/issues/288 ) 2 is a no go.

I think it’s a bit painful to see a segfaulting issue go stale, so let me try to revive this. It seems clear that the current situation is invalid. I see three possible solutions also mentioned before:

  1. Remove the Send/Sync implementations and make bam::Record not shareable.
  2. Replace the Rc<HeaderView> with Arc<HeaderView> and always pay the performance penalty.
  3. Remove Rc<HeaderView> from Record. Breaking change, but introduced fairly recent.

Anyone see another way out of this? Do we take a vote? I’m on 1 or 3, but mostly 3.

What’s the consensus here? It seems that Arc (option 3) requires the least code-change and @veldsla has shown that the performance hit is tolerable. Is there some concurrency setup where this is not an issue? @sitag recommends using rayon parallel iterator. Is there an exampl here?

I made a draft PR (#345) trying to work around this particular soundness issue. Please let me know what you think.

Having seen these results, I still don’t change my preference. I think the header has no useful place in the Record and the sacrificing of the AbstractInterval trait would be a good thing. Also when doing my naive s/Rc/Arc/ I saw: 17 occurrences of unsafe impl (Send|Sync) for structs that often contain Rc. So a decent unsafe audit is in order.