candle: Explicit panic on Falcon
Hi,
I have most models to work with; thanks again for your guidance, but I also had one other issue with Falcon. Hopefully, someone knows about this.
codespace@codespaces-f226cf:/workspaces/rust-candle-demos/candle$ RUST_BACKTRACE=1 && cargo run --example falcon --release -- --pr
ompt "which 100m sprinter won the 1984 olympics"?
warning: some crates are on edition 2021 which defaults to `resolver = "2"`, but virtual workspaces default to `resolver = "1"`
note: to keep the current resolver, specify `workspace.resolver = "1"` in the workspace root's manifest
note: to use the edition 2021 resolver, specify `workspace.resolver = "2"` in the workspace root's manifest
Finished release [optimized] target(s) in 0.35s
Running `target/release/examples/falcon --prompt 'which 100m sprinter won the 1984 olympics?'`
Running on CPU, to run on GPU, build this example with `--features cuda`
retrieved the files in 226.6µs
loaded the model in 8.0565019s
starting the inference loop
thread 'main' panicked at 'explicit panic', /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-gemm-0.15.6/src/gemm.rs:143:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 19
This was it! Thanks again for all of the help, Candle is pretty cool! I started with the p4, then switching the machine to a G5 which has a better cost profile, and better performance for developmant and is actually supported by Candle! I simply “switched machine type”, which I should have thought about, and could have created a subtle issue with the compilation.
Here is my workflow that verifies Falcon works a g5.16xlarge. Later I will open another ticket on a related by different topic of machine compatibility for development, probably later this week and put some of my notes in there.
You probably already tried it but using
--features cuda
might work well out of the box if cuda is installed at some default locations. E.g.Or you can try
--use-f32
if you have lots of memory 😃That’s because the model use
bf16
for which there is no cpu implementation. I’ll make the error message more explicit, if you have a gpu maybe you want to use--features cuda
?