badger: Data corruption when not closing on Windows

https://github.com/dgraph-io/badger/pull/470 describes and fixes a lock file issue on Windows. The author also describes a data corruption problem on Windows. I can reproduce this problem as well. Quite easily.

On my 64-bit Windows 10 machine, to simulate a crash (opening Badger and not closing it), I ran the following program three times:

import (
	"fmt"
	"github.com/dgraph-io/badger"
)

func main() {
	opts := badger.DefaultOptions
	opts.Dir = "crashtest"
	opts.ValueDir = "crashtest"
	_, err := badger.Open(opts)
	if err != nil {
		fmt.Println(err)
	}
}

The first time I ran it, it terminated gracefully. The second run, Badger said there’s a lock file, so I removed it and ran the program again. The third time, Badger told me it’s corrupted:

Unable to replay value log: "crashtest\\000000.vlog": Data corruption detected.
Value log truncate required to run DB. This would result in data loss.

This shouldn’t happen. I’m using the default options, and badger’s writes should be crash-safe.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 15 (12 by maintainers)

Commits related to this issue

Most upvoted comments

Closing the issue. Feel free to reopen if there’s an actual data loss.

We’re already doing all that for Windows.

https://github.com/dgraph-io/badger/blob/fa3538847194ebe5ab662141c3faf05359a29112/y/mmap_windows.go#L28

I remember that in Windows, we need to expand the size of a file to the max size upfront. https://github.com/dgraph-io/badger/blob/fa3538847194ebe5ab662141c3faf05359a29112/y/mmap_windows.go#L42

So, I think what’s happening here is that this file which has been expanded beyond it’s written data, gets left behind when Badger is crashed in windows. And when replaying the value log, Badger determines that it needs to truncate the file to bring it back to it’s valid written data. Now, that truncation was changed recently to not auto-truncate, because of this issue: https://github.com/dgraph-io/badger/issues/434#issuecomment-379152221

This is what is confusing users. In linux, truncation error means there might be a data loss. But, in windows, that’s just how it works. You need to trucate because we have to overallocate upfront due to the nature of how file mmap works.

So, it’s not really corruption. In windows, you must pass the Truncate option. In fact, it could be tested by writing a few key-value pairs, then crashing the instance, and seeing if any of those ever get lost. I bet they wouldn’t.

I’m inclined to close this issue – unless someone can prove an actual data loss.

Summary: Set Truncate option to true on Windows. It is needed to make Windows work.