deeplearning4j: DL4J is super slow on GoogleNews-vectors file

I tried to execute the following example on DL4J (loading pre-trained vectors file):

File gModel = new File("./GoogleNews-vectors-negative300.bin.gz");

Word2Vec vec = WordVectorSerializer.loadGoogleModel(gModel, true);

InputStreamReader r = new InputStreamReader(System.in);

BufferedReader br = new BufferedReader(r);

for (; ; ) {
    System.out.print("Word: ");
    String word = br.readLine();

    if ("EXIT".equals(word)) break;

    Collection<String> lst = vec.wordsNearest(word, 20);

    System.out.println(word + " -> " + lst);
}

But it is super slow (taking ~10 minutes to calculate the nearest words, though they are correct).

There is enough memory (-Xms20g -Xmx20g).

When I run the same Word2Vec example from https://code.google.com/p/word2vec/

it gives the nearest words very quickly.

DL4J uses ND4J which claims to be twice as fast as Numpy: http://nd4j.org/benchmarking

Is anything wrong with my code?

It is based on https://github.com/deeplearning4j/dl4j-0.4-examples.git (I didn’t touch any dependencies, just tried to read the Google pre-trained vectors file). Word2VecRawTextExample works just fine (but the data size is relatively small), I only replaced the main method with the code above.

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Comments: 26 (13 by maintainers)

Most upvoted comments

Ok. That’s not putScalar. Issue will be investigated.

Thanks for flagging this once again.