datafusion: DataFusion does not support wasm32-unknown-unknown target
Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-11615
The Arrow crate successfully compiles to WebAssembly (e.g. https://github.com/domoritz/arrow-wasm) but the DataFusion crate currently does not support thewasm32-unknown-unknown
target.
Try out the repository at https://github.com/domoritz/datafusion-wasm/tree/73105fd1b2e3ca6c32ec4652c271fb741bda419a.
{code}
error[E0433]: failed to resolve: could not find unix
in os
–> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:41:18
|
41 | use std::os::unix::ffi::OsStringExt;
| ^^^^ could not find unix
in os
error[E0432]: unresolved import unix
–> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:6:5
|
6 | use unix;
| ^^^^ no unix
in the root
error[E0433]: failed to resolve: use of undeclared crate or module sys
–> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:98:9
|
98 | sys::duplicate(self)
| ^^^ use of undeclared crate or module sys
error[E0433]: failed to resolve: use of undeclared crate or module sys
–> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:101:9
|
101 | sys::allocated_size(self)
| ^^^ use of undeclared crate or module sys
error[E0433]: failed to resolve: use of undeclared crate or module sys
–> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:104:9
|
104 | sys::allocate(self, len)
| ^^^ use of undeclared crate or module sys
error[E0433]: failed to resolve: use of undeclared crate or module sys
–> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:107:9
|
107 | sys::lock_shared(self)
| ^^^ use of undeclared crate or module sys
error[E0433]: failed to resolve: use of undeclared crate or module sys
–> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:110:9
|
110 | sys::lock_exclusive(self)
| ^^^ use of undeclared crate or module sys
error[E0433]: failed to resolve: use of undeclared crate or module sys
–> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:113:9
|
113 | sys::try_lock_shared(self)
| ^^^ use of undeclared crate or module sys
error[E0433]: failed to resolve: use of undeclared crate or module sys
–> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:116:9
|
116 | sys::try_lock_exclusive(self)
| ^^^ use of undeclared crate or module sys
error[E0433]: failed to resolve: use of undeclared crate or module sys
–> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:119:9
|
119 | sys::unlock(self)
| ^^^ use of undeclared crate or module sys
error[E0433]: failed to resolve: use of undeclared crate or module sys
–> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:126:5
|
126 | sys::lock_error()
| ^^^ use of undeclared crate or module sys
error[E0433]: failed to resolve: use of undeclared crate or module sys
–> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:169:5
|
169 | sys::statvfs(path.as_ref())
| ^^^ use of undeclared crate or module sys
Compiling num-rational v0.3.2 error: aborting due to 10 previous errors {code}
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 15 (8 by maintainers)
Good news, fellow WebAssembly enthusiasts! It looks like the stars are finally aligning, and with relatively minimal patching, I successfully compiled the code from the gist (create, insert and query a
MemTable
) towasm32-wasi
andwasm32-unknown-unknown
, and ran it inwasmedge
and the browser (viawasmpack
):I pushed the proof-of-concept to a public repository at
splitgraph/experimental-datafusion-webassembly
. There are two branches:wasm32-wasi
wasm32-unknown-unknown
wasm32-wasi
and the diff ofwasm32-wasi..wasm32-unknown-unknown
shows the changesIn the near future, I intend to cleanup these changes and submit a PR to DataFusion feature-flagging WebAssembly support.
In general, the summary of requirements for
wasm-wasi
:arrow
to use its upstream Git repository fixed some initial compilation issues, but otherwise no changes were required. It seems like the latest release onmaster
ofarrow-rs
(v26.0.0
) can compile to both webassembly targets, and so DataFusion just needs to upgrade to that. (There was also one minor change todatafusion/physical-expr
required to support the upgraded Arrow package).RUSTFLAGS="--cfg tokio_unstable"
is necessary, and will benefit from recently stabilized wasm support in Tokioobject_store
, as they are not compatible with thewasm32-unknown-unknown
target (they may be withwasm32-wasi
in some runtimes, but I disabled them).bzip2
reqwest
to add some wasm compatibility (note: I’m not sure how much of this was/is necessary, and/or if it forces tokio to resolve to a version that other packages incidentally benefit from)spawn_blocking
insort
withspawn
, making the compiler happy but possibly causing runtime/logic errorsfor
wasm32-unknown-unknown
, in addition to all those requirements, it was also necessary to:std::time
withInstant
, in bothdatafusion
andarrow
getrandom
is also passing it thejs
feature flag, which I did by just patchinggetrandom
and making that the defaultTo get it to run (without a runtime error related to
std::time
being unreachable), a few more changes were made:main
runtime, even withflavor = current-thread
. Instead, use wasm-bindgen-futures to await a future that performs the asynchronous task that calls datafusionThis is all very messy. I will clean it up and submit a PR to DataFusion once I have a better sense of the most minimal changes required and the proper way to feature flag them. Also, general disclaimer that I’m new to Rust and YMMV, especially on the
wasm-unknown-unknown
patch - after all, I barely got it to run. But it does compile and create and query a small in-memory table, which is pretty good!Hello, folks.
I’m trying to add WASM support to DataFusion’s dependencies. Started with bzip2-rs https://github.com/alexcrichton/bzip2-rs/pull/93
This sounds very cool @milesrichardson - DataFusion should be upgraded to arrow 26.0.0 shortly: https://github.com/apache/arrow-datafusion/pull/4039. I think @Jimexist is in the process of making bzip support optional https://github.com/apache/arrow-datafusion/pull/3993
In terms of being messy / submitting a PR – if it is possible I suggest trying to do it incrementally – like for example we can probably sort out the calls to
spawn_blocking
in a separate PRBut all in all this is pretty exciting
Thanks @alamb . I will do some experiments but seems like a good solution.
@REASY In my experiment (the one linked above), I put bzip behind a configuration flag and disabled it for the wasm targets. Datafusion still compiled. I don’t know enough about DF to say how important bzip is, or which parts of DF would be broken without it, however. It seemed limited in scope, since it should only affect files that are encoded with bzip.
@seddonm1 compile and run?
I have experimented with that yesterday. I tried wasm32-wasi first and a simple sample works in single threaded mode after disabling some parquet features. See this gist for the example: https://gist.github.com/roee88/91f2b67c3e180fa0dfb688ba8d923dae
For wasm32-unknown-unknown adding getrandom with js as a dependency of the sample makes it compile IIRC, but actually running it is a different story. I tried to get a sample working with wasm-pack and it stops execution on the datafusion context creation, I suspect that it uses some sync primitives that are unsupported in wasm32-unknown-unknown but I didn’t investigate further.
I didn’t try wasm32-unknown-emscripten yet since my local rust version is incompatible with my installed emcc version (both latest at the time of this writing).
Edit: re tokio, the sample above worked on wasm32-wasi with other executors in single threaded mode including futures 0.3, https://github.com/richardanaya/executor, and async-global-executor. As long as you don’t hit code paths that use things like tokio::spawn (used in hash aggregate) it might be fine to use another executor. I’m not sure what’s the best approach for library code that needs to spawn tasks. I have seen opinions for 1) a library should never spawn, 2) futures should be universally supported, 3) a library should accept an executor trait (as implemented in https://github.com/najamelan/async_executors). I didn’t check the state of futures and WebAssembly recently. I didn’t try wasmbindgen-futures because it’s officially no longer compatible with wasi and emscripten and as I said I couldn’t get anything running with wasm32-unknown-unknown.
Note that https://github.com/apache/arrow-rs/pull/656 from @PsiACE has removed the
pretty-table
dependency in arrow-rs upstream. This will be included in the 6.0 arrow release (in 2ish months); I am not sure if/how this affects your decisionI think lz4 is an optional dependency of parquet: https://github.com/apache/arrow-rs/blob/master/parquet/Cargo.toml#L40 thus perhaps we could just have a lz4 feature flag for datafusion?
Polars proof of concept (shows that arrow-rs and datafusion like API can work): https://github.com/ritchie46/polars/blob/master/js-polars/app.js