runtime: API Proposal: Span from null-terminated char*

Background and Motivation

PInvoke scenarios (mostly those on Windows) interact a lot with null-terminated wide strings.

String has always had a ctor(char*), which in .Net Core now uses a highly optimized wcslen, internal to the framework.

I am proposing the equivalent functionality, minus the allocation+copy, using Span<>. Given a null-terminated wide string as input (in the shape of unsafe char*), return a Span<char> whose length is the count of characters before the null.

I am proposing Span<> and not ReadOnlySpan<>, to let the caller decide what to do with the result, according to their specific scenario and the nature of their char-pointer. As Jan mentionned, the big majority of usecases are const, so the API should be ReadOnlySpan.

I am not proposing to also add the equivalent narrow ctor(sbyte*) version for span, as this one cannot be implemented without allocating a new buffer. UTF8 (byte*) returns ReadOnlySpan<byte>.

Implementing the same functionality with just public API is possible, but very awkward and undocumented internal behavior (this is more or less what wcslen does, with error-checking omitted):

int len = MemoryExtensions.IndexOf(new ReadOnlySpan<char>(value, int.MaxValue), '\0');
return new Span<char>(value, len);

Proposed API

namespace System.Runtime.InteropServices
{
    public static class MemoryMarshal
    {
+       public static unsafe ReadOnlySpan<char> CreateFromNullTerminated(char* value);
+       public static unsafe ReadOnlySpan<byte> CreateFromNullTerminated(byte* value);
    }
}

The implementation of this proposal is also trivial with the existing tools in the framework (null is non-throwing, the behavior of string ctor(char*)):

public static unsafe ReadOnlySpan<char> CreateFromNullTerminated(char* value)
{
    if (value == null)
        return default;
 
    int count = string.wcslen(value);
    if (count == 0)
        return default;

    return new ReadOnlySpan<char>(value, count);
}

Usage Examples

['ptr' is a char* that points to null-terminated string "abc\0"]

var span = MemoryMarshal.CreateFromNullTerminated(ptr);

['span' is now a Span<char> of length 3.]

Alternative Designs

Risks

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 7
  • Comments: 47 (42 by maintainers)

Most upvoted comments

A missing null terminator in a null-terminated string would be a breach of a very fundamental contract that is present in 99% of c-style APIs, including big parts of the Win32 API. Null-terminated strings of unknown length are everywhere.

A null-terminated string without a null is a bug in the code that generated it, and I don’t think the framework should patch around that in what is unsafe code in the first place (because of the use of pointers).

When no one loves it, you know it’s a good compromise 😄

namespace System.Runtime.InteropServices
{
    public static class MemoryMarshal
    {
        public static unsafe ReadOnlySpan<char> CreateReadOnlySpanFromNullTerminated(char* value);
        public static unsafe ReadOnlySpan<byte> CreateReadOnlySpanFromNullTerminated(byte* value);
        ...
    }
}

Going… going…

Video

  • Let’s make it clear in the name what this method does.
  • Let’s have an overload that deals with UTF8
  • Let’s hold back UTF32 until we need it.
  • We should also provide strlen-like APIs
namespace System.Runtime.InteropServices
{
    public partial class MemoryMarshal
    {
        public static Span<char> CreateFromNullTerminated(char* value);
        public static Span<byte> CreateFromNullTerminated(byte* value);
    }
}
namespace System
{
    public partial class Buffer
    {
        public unsafe static nuint GetStringLength(char* source);
        public unsafe static nuint GetStringLength(char* source, nuint maxLength);
        public unsafe static nuint GetStringLength(byte* source);
        public unsafe static nuint GetStringLength(byte* source, nuint maxLength);
    }
}

Are there real world code examples that take advantage of this today?

Not that I’m aware of and I don’t see anything obvious popping out in the list of APIs (in Windows, Vulkan, PulseAudio, Xlib, or of the several other libraries I’ve created bindings for) where it would be more than a micro-optimization.

I’m sure the code exists, but it might be as you said and exceptionally rare. So if there is a concern around having it return Span to also cover the 1% case due to users accidentally mutating where they didn’t intend, then just exposing ROSpan first and waiting for users to request the other mutable seems reasonable.

The API should return Span<T>, we can add a ReadOnly version later

This feels backwards. 99+% case for zero-terminated strings is getting read-only string and parsing it.

We should throw InvalidOperationException when the size would exceed int.MaxValue.

All existing APIs that take zero-terminated strings throw ArgumentException(SR.Arg_MustBeNullTerminatedString) for this case. Is the inconsistency intentional?

(I am sorry that I was not able to participate in the review discussion.)

Video

  • The API should return Span<T>, we can add a ReadOnly version later
  • We should throw InvalidOperationException when the size would exceed int.MaxValue.
namespace System.Runtime.InteropServices
{
    public static class MemoryMarshal
    {
        public static unsafe Span<char> CreateSpanFromNullTerminated(char* value);
        public static unsafe Span<byte> CreateSpanFromNullTerminated(byte* value);
    }
}

Your sample copies characters into an already zero-filled array, meaning the null characters it puts at the end make no difference, it was already null characters 😃

In any case, even if it did not, it could still randomly work according to what happens to be in the memory around it.