tantivy: Case of corrupted segment

Probably related to #897

Describe the bug In continuation of https://gitter.im/tantivy-search/tantivy?at=5ff589c503529b296bd23728 Firstly, I found memory bloating of Tantivy application. After debugging I found that merging thread have been failing every time on merging broken store in a particular segment. Example of the whole segment I’ve sent to your personally in Gitter.

Which version of tantivy are you using? https://github.com/tantivy-search/tantivy/commit/a4f33d3823f1bad3ff7a59877f1608615acabe6e

What happened I used poor man debugging and launched Tantivy with patched function (added print only):

    fn write_storable_fields(&self, store_writer: &mut StoreWriter) -> crate::Result<()> {
        for reader in &self.readers {
            let store_reader = reader.get_store_reader()?;
            if reader.num_deleted_docs() > 0 {
                for doc_id in reader.doc_ids_alive() {
                    let doc = store_reader.get(doc_id);
                    if let Err(ref err) = doc {
                        println!("Error: {:?}\nSegment ID: {:?}\nDocID: {}", err, reader.segment_id(), doc_id);
                    }
                    store_writer.store(&doc?)?;
                }
            } else {
                store_writer.stack(&store_reader)?;
            }
        }
        Ok(())
    }

Stdout after failed merging:

Error: IOError(Custom { kind: InvalidData, error: "Reach end of buffer while reading VInt" })
Segment ID: Seg("e6ece22e")
DocID: 53

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 33 (16 by maintainers)

Commits related to this issue

Most upvoted comments

@ppodolsky Thank you very for the patience and all of your help reporting the bug!

Can’t wait to try it!