arrow: [Java] The offset buffer of empty BaseVariableWidthVector should not be empty when exposed through C Data Interface
Describe the bug, including details regarding any error messages, version, and platform.
We encountered an error when exchanging string array from Java to Rust through Arrow C data interface. At Rust side, it complains that the buffer at position 1 (offset buffer) is null. After tracing down and some debugging, it looks like the issue is Java Arrow BaseVariableWidthVector
class assigns an empty offset buffer if the array is empty (value count 0).
According to Arrow spec for variable size binary layout:
The offsets buffer contains length + 1 signed integers …
So for an empty string array, its offset buffer should be a buffer with one element (generally it is 0
).
Component(s)
Java
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Reactions: 1
- Comments: 18 (15 by maintainers)
Commits related to this issue
- GH-40038: [Java] The offset buffer of empty array with variable-size layout should not be empty — committed to viirya/arrow by viirya 5 months ago
- GH-40038: [Java] The offset buffer of empty vector with variable-size layout should not be empty — committed to viirya/arrow by viirya 5 months ago
- Revert "GH-40038: [Java] The offset buffer of empty vector with variable-size layout should not be empty" This reverts commit 5eb34e17defa5608d974c0ac3909d74a1071231a. — committed to viirya/arrow by viirya 5 months ago
- GH-40038: [Java] Export non empty offset buffer for variable-size layout through C Data Interface (#40043) ### Rationale for this change We encountered an error when exchanging string array from J... — committed to apache/arrow by viirya 3 months ago
- GH-40038: [Java] Export non empty offset buffer for variable-size layout through C Data Interface (#40043) ### Rationale for this change We encountered an error when exchanging string array from J... — committed to tmct/arrow by viirya 3 months ago
- GH-40038: [Java] Export non empty offset buffer for variable-size layout through C Data Interface (#40043) ### Rationale for this change We encountered an error when exchanging string array from J... — committed to tmct/arrow by viirya 3 months ago
Per more discussions in the PR, we probably need to fix C data interface of Java Arrow to properly export empty offset buffer for var-size arrays.
@vibhatha Yea, I’m working on a fix locally. But it causes a few tests failed now. Still looking into fixing the tests.