mistral.rs: Quantized Mistral: Batching is slower than non batches
I added some code that prints the queue state: https://github.com/EricLBuehler/mistral.rs/pull/138
I ran it on a single generation:
2024-04-14T17:34:50.601969Z INFO mistralrs_core::engine: Prompt[] Completion[210] - 21ms
And on batches:
2024-04-14T17:34:49.138483Z INFO mistralrs_core::engine: Prompt[] Completion[269, 154] - 278ms
...
2024-04-14T17:34:34.634905Z INFO mistralrs_core::engine: Prompt[] Completion[269, 217, 102] - 354ms
I’m trying to figure out why…
About this issue
- Original URL
- State: closed
- Created 3 months ago
- Comments: 20 (20 by maintainers)
After https://github.com/EricLBuehler/mistral.rs/pull/198 this issue can be closed IMO