opendal: Dotnet binding fails on windows

Dotnet binding fails on windows, and passed on linux with net8

→ dotnet test  .\DotOpenDAL.Tests\
  Determining projects to restore...
  All projects are up-to-date for restore.
  DotOpenDAL -> \opendal\incubator-opendal\bindings\dotnet\DotOpenDAL\bin\Debug\net8.0\DotOpenDAL.dll
  DotOpenDAL.Tests -> \opendal\incubator-opendal\bindings\dotnet\DotOpenDAL.Tests\bin\Debug\net8.0\DotOpenD
  AL.Tests.dll
Test run for \opendal\incubator-opendal\bindings\dotnet\DotOpenDAL.Tests\bin\Debug\net8.0\DotOpenDAL.Tests.dll (.NETCoreApp,Version=v8.0)
Microsoft (R) Test Execution Command Line Tool Version 17.8.0 (x64)
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
[xUnit.net 00:00:00.32]     DotOpenDAL.Tests.BlockingOperatorTest.TestReadWrite [FAIL]
  Failed DotOpenDAL.Tests.BlockingOperatorTest.TestReadWrite [72 ms]
  Error Message:
   Assert.NotEqual() Failure
Expected: Not 0
Actual:   0
  Stack Trace:
     at DotOpenDAL.Tests.BlockingOperatorTest.TestReadWrite() in \opendal\incubator-opendal\bindings\dotnet\DotOpenDAL.Tests\BlockingOperatorTest.cs:line 29
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)

Failed!  - Failed:     1, Passed:     0, Skipped:     0, Total:     1, Duration: < 1 ms - DotOpenDAL.Tests.dll (net8.0)

When passing a C# string to c_char, it’s UTF16 internal on windows, and then fail to parse in rust part

About this issue

Original URL
State: open
Created 6 months ago
Comments: 17 (13 by maintainers)

Most upvoted comments

scheme in blocking_operator_construct: must be ascii like s3, memory. No other possible values. path in blocking_operator_write: must be valid UTF-8 string.

If we want to keep the same behavior for both windows and linux, we need to use ‘Marshal.StringToCoTaskMemUTF8’ manually. It’s the recommended way from official.

Zheaoli on Jan 11, 2024

Maybe we should use byte[] in our .NET API instead of string? I don’t think it’s correct to use UTF8 convert he

I think the UTF8 here is enough for .NET binding. Here’s the reason:

The core problem here is that we don’t have a dot net binding SDK here. So we need to process the DLL call convension. We can take the JNI binding sdk as a example

impl<'local, 'other_local: 'obj_ref, 'obj_ref> JavaStr<'local, 'other_local, 'obj_ref> {
    /// Get a pointer to the character array beneath a [JString]
    ///
    /// The string will be `NULL` terminated and encoded as
    /// [Modified UTF-8](https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8) /
    /// [CESU-8](https://en.wikipedia.org/wiki/CESU-8).
    ///
    /// The implementation may either create a copy of the character array for
    /// the given `String` or it may pin it to avoid it being collected by the
    /// garbage collector.
    ///
    /// Returns a tuple with the pointer and the status of whether the implementation
    /// created a copy of the underlying character array.
    ///
    /// # Warning
    ///
    /// The caller must release the array when they are done with it via
    /// [Self::release_string_utf_chars]
    ///
    /// # Safety
    ///
    /// The caller must guarantee that the Object passed in is an instance of `java.lang.String`,
    /// passing in anything else will lead to undefined behaviour (The JNI implementation
    /// is likely to crash or abort the process).
    unsafe fn -get_string_utf_chars(
        env: &JNIEnv<'_>,
        obj: &JString<'_>,
    ) -> Result<(*const c_char, bool)> {
        non_null!(obj, "get_string_utf_chars obj argument");
        let mut is_copy: jboolean = 0;
        let ptr: *const c_char = jni_non_null_call!(
            env.get_raw(),
            GetStringUTFChars,
            obj.as_raw(),
            &mut is_copy as *mut _
        );

        let is_copy = is_copy == JNI_TRUE;
        Ok((ptr, is_copy))
    }
}

So for me ,the UTF8 LGTM

Zheaoli on Jan 10, 2024

But it works for ansi strings only, and for non-ansi (e.g. Simplified Chinese), the original version also fails with linux

Oh, that’s unexpected. Our public API should accept any bytes [u8] instead of just UTF-8 or ASCII.

Apologize, the Chinese character does work as I passed it to a wrong place.

fallenwood on Jan 10, 2024