finetune: Very slow inference in 0.5.11
After training a default classifier, saving and loading.
model.predict("lorem ipsum")
and model.predict_prob
take in average 14 seconds even on a hefty server such as AWS p3.16Xlarge.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 17 (17 by maintainers)
Thanks, for my use case (serving a model as an api), a
contextmanager
doesn’t fit, since I need to call predict after an external event (e.g. an http request), so I’m just calling_cached_inference
directly. Anyhow, I think we can finally close this issue. Thanks a lot for your great work!Hi @dimidd,
Thanks for checking back in! Although I was hoping to end up with a solution where we could have our metaphorical cake and eat it too, we ran into some limitations with how tensorflow handles cleaning up memory that meant we had to opt for a more explicit interface for prediction if you want to avoid rebuilding the graph: https://finetune.indico.io/#prediction
Let me know if this solution works for you!
Hola Guillermo,
I’m getting a sub-second end-to-end times (as measured from the web interface) using flask. See here for details: https://github.com/IndicoDataSolutions/finetune/issues/153
I think we’ve found a way to have our cake and eat it too without complicating the user interface. Just padding out the final batches should allow us to get a batch speedup but not have to recompile the predict function. PR in progress at #193
This is imperfect but this is some WIP code might be helpful for you to use as a starting point.