mistral.rs: Quantized Mistral: Batching is slower than non batches

I added some code that prints the queue state: https://github.com/EricLBuehler/mistral.rs/pull/138

I ran it on a single generation:

2024-04-14T17:34:50.601969Z  INFO mistralrs_core::engine: Prompt[] Completion[210] - 21ms

And on batches:

2024-04-14T17:34:49.138483Z  INFO mistralrs_core::engine: Prompt[] Completion[269, 154] - 278ms

...

2024-04-14T17:34:34.634905Z  INFO mistralrs_core::engine: Prompt[] Completion[269, 217, 102] - 354ms

I’m trying to figure out why…

About this issue

  • Original URL
  • State: closed
  • Created 3 months ago
  • Comments: 20 (20 by maintainers)

Most upvoted comments