finetune: Very slow inference in 0.5.11

After training a default classifier, saving and loading. model.predict("lorem ipsum") and model.predict_prob take in average 14 seconds even on a hefty server such as AWS p3.16Xlarge.

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 17 (17 by maintainers)

Most upvoted comments

Thanks, for my use case (serving a model as an api), a contextmanager doesn’t fit, since I need to call predict after an external event (e.g. an http request), so I’m just calling _cached_inference directly. Anyhow, I think we can finally close this issue. Thanks a lot for your great work!

dimidd on Nov 29, 2018

Hi @dimidd,

Thanks for checking back in! Although I was hoping to end up with a solution where we could have our metaphorical cake and eat it too, we ran into some limitations with how tensorflow handles cleaning up memory that meant we had to opt for a more explicit interface for prediction if you want to avoid rebuilding the graph: https://finetune.indico.io/#prediction

model = Classifier()
model.fit(train_data, train_labels)
with model.cached_predict():
    model.predict(test_data) # triggers prediction graph construction
    model.predict(test_data) # graph is already cached, so subsequence calls are faster

Let me know if this solution works for you!

madisonmay on Nov 22, 2018

Hola Guillermo,

I’m getting a sub-second end-to-end times (as measured from the web interface) using flask. See here for details: https://github.com/IndicoDataSolutions/finetune/issues/153

dimidd on Nov 8, 2018

I think we’ve found a way to have our cake and eat it too without complicating the user interface. Just padding out the final batches should allow us to get a batch speedup but not have to recompile the predict function. PR in progress at #193

madisonmay on Nov 7, 2018

This is imperfect but this is some WIP code might be helpful for you to use as a starting point.

def _data_generator(self):
    while not self._closed:
        yield self._data.pop(0)

def _inference(self, Xs, mode=None):
    self._data = Xs
    n = len(Xs)

    if self.__class__ == SequenceLabeler:
        self._data = [[x] for x in self._data]

    if not getattr(self, 'estimator', None):
        self.estimator = self.get_estimator()
        self._closed = False
        dataset = lambda: self.input_pipeline._dataset_without_targets(self._data_generator, train=None).batch(1)
        self.predictions = self.estimator.predict(input_fn=dataset, yield_single_examples=True)

    _predictions = []
    for _ in range(n):
        try:
            y = next(self.predictions)
        except:
            raise e
        y = y[mode] if mode else y
        _predictions.append(y)
    return _predictions

madisonmay on Nov 6, 2018