bergamot-translator: Loading time is really slow with large thread count once again

This is identified to be a bergamot translator issue in https://github.com/XapaJIaMnu/translateLocally/issues/76.

A nice solution may involve shared model memory across worker threads (avoiding intgemm/shortlist preprocessing placing stuff in graph [needs verification]). This memory will be owned by TranslationModel. Everything transient will remain in the workspace and workspace attached to the worker. This issue is closely related to #257.

A temporary workaround provided by @jelmervdl is:

diff --git a/src/translator/translation_model.cpp b/src/translator/translation_model.cpp
index 9d2eb0cdb73526584d53e5cc2e32facfffc9650e..753b500fea4629fde1452b67f76d5862185a1df8 100644
--- a/src/translator/translation_model.cpp
+++ b/src/translator/translation_model.cpp
@@ -45,8 +45,15 @@ TranslationModel::TranslationModel(const Config &options, MemoryBundle &&memory
     }
   }
 
+  std::vector<std::future<void>> loadCalls;
+  loadCalls.resize(replicas);
+
   for (size_t idx = 0; idx < replicas; idx++) {
-    loadBackend(idx);
+    loadCalls[idx] = std::async(&TranslationModel::loadBackend, this, idx);
+  }
+
+  for (auto &&loadCall : loadCalls) {
+    loadCall.wait();
   }
 }

Unsure about putting std::thread or std::async within TranslationModel, the threading and delegations should ideally be within Service. As part of resolving, we should ideally check-in something on var through BRT which checks model-loading speeds remain unaffected hereafter.

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 17 (12 by maintainers)

Commits related to this issue

Proposed quick fix for #293 parallel model loading — committed to browsermt/bergamot-translator by jelmervdl 2 years ago

Most upvoted comments

We should put energy into solving the underlying problem by loading the model once and sharing the memory across threads, rather than kludges on top.

On January 3, 2022 2:06:30 PM UTC, Nikolay Bogoychev @.***> wrote:

The main problem as it stands currently is that if you try to load translateLocally (especially on Windows) with a gaming machine with 24 cores you would have a very annoying and inexplicable 6 second lag until you can do the first translation.

I don’t quite see the merit of lazy initialisation, I think most of the time the user would want to translate more than n sentences where n is the number of threads locally available and that initialisation in parallel of n workers would take as much time as initialising one worker (discarding factors such as turbo single thread boost)

– Reply to this email directly or view it on GitHub: https://github.com/browsermt/bergamot-translator/issues/293#issuecomment-1004115013 You are receiving this because you are subscribed to this thread.

Message ID: @.***>

kpu on Jan 3, 2022

don’t quite see the merit of lazy initialisation

It’s a bit of a misnomer in this case maybe. The problem right now is that all workers are initialised sequentially on the main thread. Delaying that initialisation till its necessary is just an easy way to move the initialisation into the worker threads. The worker threads can then all individually do their own initialisation, so the initialisation can be done in parallel.

jelmervdl on Jan 3, 2022