arrow: [Java] The offset buffer of empty BaseVariableWidthVector should not be empty when exposed through C Data Interface

Describe the bug, including details regarding any error messages, version, and platform.

We encountered an error when exchanging string array from Java to Rust through Arrow C data interface. At Rust side, it complains that the buffer at position 1 (offset buffer) is null. After tracing down and some debugging, it looks like the issue is Java Arrow BaseVariableWidthVector class assigns an empty offset buffer if the array is empty (value count 0).

According to Arrow spec for variable size binary layout:

The offsets buffer contains length + 1 signed integers …

So for an empty string array, its offset buffer should be a buffer with one element (generally it is 0).

Component(s)

Java

About this issue

  • Original URL
  • State: closed
  • Created 5 months ago
  • Reactions: 1
  • Comments: 18 (15 by maintainers)

Commits related to this issue

Most upvoted comments

Per more discussions in the PR, we probably need to fix C data interface of Java Arrow to properly export empty offset buffer for var-size arrays.

@vibhatha Yea, I’m working on a fix locally. But it causes a few tests failed now. Still looking into fixing the tests.