dali_backend: Segfault when max_batch_size > 1
Hi everybody
I am facing issues when enabling dynamic scheduler with a max_batch_size bigger than 1, which gives me a segfault when submitting requests. In the main readme it says, that dali requires homogenous batch sizes. How would I achieve that when using the triton C API directly? In the tests introduced with the PR enabling dynamic batching, I can’t find anything enforcing homogenous batch sizes. Am I missing something?
We are using the C API of the triton r21.06 release with a dali pipeline which is created with a batch size of 64 and then set the max_batch_size in the triton config.pbtxt file to 32 for all elements of the ensemble model.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (6 by maintainers)
@MaxHuerlimann ,
we’ve narrowed down the issue and fixed it. Here’s the PR: https://github.com/NVIDIA/DALI/pull/4043
The change will be released in Triton 22.08.
@MaxHuerlimann ,
that’s actually one challenging debugging, but I’m working on it right now. Hopefully I’d have some conclusion in a day or two 😃
I have used the
perf_analyzertool and used this data repro_data.zip with a batch size of 1 of each request and testing different concurrency values, doesn’t really matter which one as it happens all the time.I can check out if I can reproduce the issue with your repro client, will get back to you.
Hello again!
I have come back to this issue now as we are experimenting with using the docker deployment of triton (22.05) and we are still facing this issue. I have managed to pinpoint it to the
cropoperator. If I try to feed it a batch of crop windows (as we are detecting objects in an image and want to crop them on a per-image basis), the triton process crashes withIs there a recommended way how to feed a batch of cropping windows to a crop a batch of images with?
A minimal example for reproduction should be:
and with configuration
I will close this for now, as I don’t have the capacity to reproduce this with extra code (as can be seen by the long inactivity) and the inference latency does not seem to drastically impacted. I will reopen this once I can tackle the issue again.