runtime: API Proposal: Span from null-terminated char*
Background and Motivation
PInvoke scenarios (mostly those on Windows) interact a lot with null-terminated wide strings.
String has always had a ctor(char*), which in .Net Core now uses a highly optimized wcslen, internal to the framework.
I am proposing the equivalent functionality, minus the allocation+copy, using Span<>. Given a null-terminated wide string as input (in the shape of unsafe char*), return a Span<char> whose length is the count of characters before the null.
I am proposing Span<> and not ReadOnlySpan<>, to let the caller decide what to do with the result, according to their specific scenario and the nature of their char-pointer.
As Jan mentionned, the big majority of usecases are const, so the API should be ReadOnlySpan.
I am not proposing to also add the equivalent narrow ctor(sbyte*) version for span, as this one cannot be implemented without allocating a new buffer.
UTF8 (byte*) returns ReadOnlySpan<byte>
.
Implementing the same functionality with just public API is possible, but very awkward and undocumented internal behavior (this is more or less what wcslen does, with error-checking omitted):
int len = MemoryExtensions.IndexOf(new ReadOnlySpan<char>(value, int.MaxValue), '\0');
return new Span<char>(value, len);
Proposed API
namespace System.Runtime.InteropServices
{
public static class MemoryMarshal
{
+ public static unsafe ReadOnlySpan<char> CreateFromNullTerminated(char* value);
+ public static unsafe ReadOnlySpan<byte> CreateFromNullTerminated(byte* value);
}
}
The implementation of this proposal is also trivial with the existing tools in the framework (null is non-throwing, the behavior of string ctor(char*)):
public static unsafe ReadOnlySpan<char> CreateFromNullTerminated(char* value)
{
if (value == null)
return default;
int count = string.wcslen(value);
if (count == 0)
return default;
return new ReadOnlySpan<char>(value, count);
}
Usage Examples
['ptr' is a char* that points to null-terminated string "abc\0"]
var span = MemoryMarshal.CreateFromNullTerminated(ptr);
['span' is now a Span<char> of length 3.]
Alternative Designs
Risks
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 7
- Comments: 47 (42 by maintainers)
A missing null terminator in a null-terminated string would be a breach of a very fundamental contract that is present in 99% of c-style APIs, including big parts of the Win32 API. Null-terminated strings of unknown length are everywhere.
A null-terminated string without a null is a bug in the code that generated it, and I don’t think the framework should patch around that in what is unsafe code in the first place (because of the use of pointers).
When no one loves it, you know it’s a good compromise 😄
Going… going…
Video
strlen
-like APIsNot that I’m aware of and I don’t see anything obvious popping out in the list of APIs (in Windows, Vulkan, PulseAudio, Xlib, or of the several other libraries I’ve created bindings for) where it would be more than a micro-optimization.
I’m sure the code exists, but it might be as you said and exceptionally rare. So if there is a concern around having it return
Span
to also cover the 1% case due to users accidentally mutating where they didn’t intend, then just exposing ROSpan first and waiting for users to request the other mutable seems reasonable.This feels backwards. 99+% case for zero-terminated strings is getting read-only string and parsing it.
All existing APIs that take zero-terminated strings throw
ArgumentException(SR.Arg_MustBeNullTerminatedString)
for this case. Is the inconsistency intentional?(I am sorry that I was not able to participate in the review discussion.)
Video
Span<T>
, we can add aReadOnly
version laterInvalidOperationException
when the size would exceedint.MaxValue
.Your sample copies characters into an already zero-filled array, meaning the null characters it puts at the end make no difference, it was already null characters 😃
In any case, even if it did not, it could still randomly work according to what happens to be in the memory around it.