roc: Incorrect C calling convention in Wasm for Zig builtins

Wasm calling conventions

In the development of Roc’s development backend for Wasm, some tricky issues have come up around calling conventions for non-native value types. Zig does not correctly implement the C ABI and is not consistent between versions, and we need to figure out what to do about it. The issues arise around numbers with more than 64 bits, and structs. I thought I had solved this but then discovered that there is more to it, and wanted to open up the discussion.

64-bit structs

Let’s compare the generated Wasm from C and Zig for some examples. I compiled our Zig tests for RocStr and also created a Point2 struct that contains two i32’s. These are the same size in Wasm, since it’s a 32-bit target.

// Zig structs
pub const RocStr = extern struct {
    str_bytes: ?[*]u8,
    str_len: usize,
    // ... associated methods and constants
};
const Point2 = struct {
    x: i32,
    y: i32,
};
// equivalent C structs
typedef struct {
    unsigned char *str_bytes;
    unsigned int str_len;
} RocStr;
typedef struct {
    int x;
    int y;
} Point2;

To analyse the calling conventions, we can use the fact that Wasm functions have type signatures. A signature can only include the native types i32, i64, f32, and f64.

All the i32s in the table below are pointers. strTrimRight takes one argument and add_point2 takes two.

lang/compiler callconv strTrimRight: Str -> Str add_point2: (Point2,Point2) -> Point
C (clang-13) - (param i32 i32) (param i32 i32 i32)
Zig 0.8 default (param i32 i32) (param i32 i32 i32)
Zig 0.9 default (param i32 i32) (param i32 i32 i32)
Zig 0.8 callconv(.C) (param i32 i64) (param i32 i64 i64)
Zig 0.9 callconv(.C) (param i64) (result i64) (param i64 i64) (result i64)

In the first few rows, the struct is allocated in stack memory and passed to functions as a pointer. The Wasm function signatures have no result. Instead the function takes a pointer as its first argument, and writes its result to that address.

Note: In the source code, the structs are all passed by value, but the compiled code uses pointers.

Confusingly, Zig uses the C calling convention by default, and using extern and callconv(.C) actually turns off C compatibility and changes to a different convention - the opposite of what you’d expect. It is packing the two i32s into a single i64 and passing by value (like the source code) rather than by pointer as in the compiled C code.

Since we are using Zig for all our builtins, we need to handle this calling convention.

Having discussed with Luuk who does a lot of work with Wasm in Zig, he says that Zig has some bugs in this area, particularly in the “stage 1” compiler that we are using. He thinks it is using LLVM’s “fast calling convention” instead of the C calling convention. He confirms that the output from Clang is as it should be.

LLVM’s fast calling convention favours passing things in CPU registers rather than stack memory. (Of course, Wasm is abstracted from machine code, but the translation is straightforward. And for this Wasm code, the most obvious translation is to use registers.)

I searched the LLVM codebase and found the enum for calling conventions, but their Wasm backend doesn’t seem to have any logic that depends the Fast variant. However I don’t know LLVM well, so I could be missing any number of things!

Larger structs

If we’re going to support this convention then we will need to know how it generalises to any struct. I created a Point3 struct with three i32’s.

lang/compiler callconv add_point2 add_point3
C (clang-13) - (param i32 i32 i32) (param i32 i32 i32)
Zig 0.8 default (param i32 i32 i32) (param i32 i32 i32)
Zig 0.9 default (param i32 i32 i32) (param i32 i32 i32)
Zig 0.8 callconv(.C) (param i32 i64 i64) (param i32 i64 i64 i64 i64)
Zig 0.9 callconv(.C) (param i64 i64) (result i64) (param i32 i64 i32 i64 i32)

Here, Zig is again passing by value. But the struct doesn’t fit into the largest available integer, i64. Instead, Zig adds extra arguments to the function to pass in the extra bits! Zig 0.8 passes each Point3 as two i64’s, but Zig 0.9 is using an i64 and an i32.

The difference between Zig versions is unfortunate but not a disaster. If I implement calls to Zig using the current (0.8) calling convention, it will break as soon as we upgrade to Zig 0.9. But at least it’s a once-off change and it’s not huge.

I also experimented with Point4 and Point5. Point4 has a signature of (param i32 i64 i64 i64 i64) in both Zig 0.8 and 0.9. Point5 goes back to the C calling convention, using (param i32 i32 i32) where each i32 is a pointer.

Wide numbers

Wasm does not have 128-bit numbers. We are storing them as 16 bytes of memory in the stack frame. So again let’s look again at the calling conventions for passing them to functions.

// Zig code
pub const RocDec = extern struct {
    num: i128,
    // ... associated methods and constants
}
fn add_i128(x: i128, y: i128) i128 {
    return x + y;
}
fn add_f128(x: f128, y: f128) f128 {
    return x + y;
}
// C code
typedef struct {
    __int128 num;
} RocDec;
__int128 add_i128(__int128 x, __int128 y) {
    return x + y;
}
// (I couldn't get __float128 to work in C!)
lang function Wasm signature
Zig RocDec.add (param i32 i32 i32)
C RocDec.add (param i32 i64 i64 i64 i64)
Zig add_i128 (param i32 i64 i64 i64 i64)
C add_i128 (param i32 i64 i64 i64 i64)
Zig add_f128 (param i32 i64 i64 i64 i64)
C add_f128 ??

For RocDec again we have a difference between C and Zig. Zig versions 0.8 and 0.9 are consistent with each other.

For i128, everyone agrees that a single i128 argument is transformed to two i64 arguments. There is no mismatch between Zig versions.

Zig also passes f128 as two i64s, since the binary representation for f128 is not the same as for two f64s. I couldn’t get a C example to work, but I’m not too concerned. It’s hard to see how it would do anything other than what Zig is doing.

Summary

  • For the Wasm target, Zig’s controls that should enable the C calling convention are inverted. They actually disable the convention and use something else.
  • The resulting calling convention is not consistent between Zig versions.
  • Selfishly, I wish I could remove extern and callconv(.C) for all the builtin structs. But I assume we need to keep these directives in the Zig code for other targets.

Current status

  • In gen_wasm, the general code path uses the C calling convention. There is a special-case path that converts to Zig’s convention when calling Zig builtins. When we switch to 0.9 this will have to be modified, but it’s not a huge change.
  • A Zig host will not work correctly on Wasm, since we only use the Zig convention for builtins
  • For i128 and f128 I have not implemented anything yet. That will require some changes to the general code path. This transformation is similar to what we’re doing for Zig structs, BUT
  • We’re going to need to track more information in the code generator to mark which values need which calling convention in which situation. There would be less logic and complexity if we had the same calling convention everywhere.

I would love to find a way to get Zig to generate the correct C calling convention. But I’m not sure how to do that without breaking other targets!

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21 (15 by maintainers)

Most upvoted comments

Thanks @Luukdegram for all your help on this! I learned a lot of useful things from your input, and was able to get things unblocked because of it. And thanks for investigating it on the Zig side. Yes I suspected it might be proving difficult. Waiting for self-hosted does seem like the way to go. Delighted to hear you’re planning to write tests for this!

For functions you can do this

const builtin = @import("builtin");

const cc = if (builtin.target.isWasm()) .Unspecified else .C;
pub fn bar () callconv(cc) void {}

For structs there’s no comptime trick at the moment AFAIK.