opendal: Dotnet binding fails on windows

Dotnet binding fails on windows, and passed on linux with net8

→ dotnet test  .\DotOpenDAL.Tests\
  Determining projects to restore...
  All projects are up-to-date for restore.
  DotOpenDAL -> \opendal\incubator-opendal\bindings\dotnet\DotOpenDAL\bin\Debug\net8.0\DotOpenDAL.dll
  DotOpenDAL.Tests -> \opendal\incubator-opendal\bindings\dotnet\DotOpenDAL.Tests\bin\Debug\net8.0\DotOpenD
  AL.Tests.dll
Test run for \opendal\incubator-opendal\bindings\dotnet\DotOpenDAL.Tests\bin\Debug\net8.0\DotOpenDAL.Tests.dll (.NETCoreApp,Version=v8.0)
Microsoft (R) Test Execution Command Line Tool Version 17.8.0 (x64)
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
[xUnit.net 00:00:00.32]     DotOpenDAL.Tests.BlockingOperatorTest.TestReadWrite [FAIL]
  Failed DotOpenDAL.Tests.BlockingOperatorTest.TestReadWrite [72 ms]
  Error Message:
   Assert.NotEqual() Failure
Expected: Not 0
Actual:   0
  Stack Trace:
     at DotOpenDAL.Tests.BlockingOperatorTest.TestReadWrite() in \opendal\incubator-opendal\bindings\dotnet\DotOpenDAL.Tests\BlockingOperatorTest.cs:line 29
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)

Failed!  - Failed:     1, Passed:     0, Skipped:     0, Total:     1, Duration: < 1 ms - DotOpenDAL.Tests.dll (net8.0)

When passing a C# string to c_char, it’s UTF16 internal on windows, and then fail to parse in rust part

About this issue

  • Original URL
  • State: open
  • Created 6 months ago
  • Comments: 17 (13 by maintainers)

Most upvoted comments

scheme in blocking_operator_construct: must be ascii like s3, memory. No other possible values. path in blocking_operator_write: must be valid UTF-8 string.

If we want to keep the same behavior for both windows and linux, we need to use ‘Marshal.StringToCoTaskMemUTF8’ manually. It’s the recommended way from official.

Maybe we should use byte[] in our .NET API instead of string? I don’t think it’s correct to use UTF8 convert he

I think the UTF8 here is enough for .NET binding. Here’s the reason:

The core problem here is that we don’t have a dot net binding SDK here. So we need to process the DLL call convension. We can take the JNI binding sdk as a example

impl<'local, 'other_local: 'obj_ref, 'obj_ref> JavaStr<'local, 'other_local, 'obj_ref> {
    /// Get a pointer to the character array beneath a [JString]
    ///
    /// The string will be `NULL` terminated and encoded as
    /// [Modified UTF-8](https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8) /
    /// [CESU-8](https://en.wikipedia.org/wiki/CESU-8).
    ///
    /// The implementation may either create a copy of the character array for
    /// the given `String` or it may pin it to avoid it being collected by the
    /// garbage collector.
    ///
    /// Returns a tuple with the pointer and the status of whether the implementation
    /// created a copy of the underlying character array.
    ///
    /// # Warning
    ///
    /// The caller must release the array when they are done with it via
    /// [Self::release_string_utf_chars]
    ///
    /// # Safety
    ///
    /// The caller must guarantee that the Object passed in is an instance of `java.lang.String`,
    /// passing in anything else will lead to undefined behaviour (The JNI implementation
    /// is likely to crash or abort the process).
    unsafe fn -get_string_utf_chars(
        env: &JNIEnv<'_>,
        obj: &JString<'_>,
    ) -> Result<(*const c_char, bool)> {
        non_null!(obj, "get_string_utf_chars obj argument");
        let mut is_copy: jboolean = 0;
        let ptr: *const c_char = jni_non_null_call!(
            env.get_raw(),
            GetStringUTFChars,
            obj.as_raw(),
            &mut is_copy as *mut _
        );

        let is_copy = is_copy == JNI_TRUE;
        Ok((ptr, is_copy))
    }
}

So for me ,the UTF8 LGTM

But it works for ansi strings only, and for non-ansi (e.g. Simplified Chinese), the original version also fails with linux

Oh, that’s unexpected. Our public API should accept any bytes [u8] instead of just UTF-8 or ASCII.

Apologize, the Chinese character does work as I passed it to a wrong place.