nagisa: Heroku deployment of NLP model Nagisa Tokenizer showing error

Hi, I deployed my Flask App ( NLP model ) on Heroku. I was basically a price prediction model where some columns were in Japanese where I applied NLP + Nagisa Library for tokenization and some columns were numerical data. I pickled vectorizers and the model and Finally added them to my Flask API. But after deployment when I added the values in the frontend and clicked on Predict button, the result is not getting displayed. This is the exact error I am facing. image The exact code of Tokenizer_jp is : def tokenize_jp(doc): doc = nagisa.tagging(doc) return doc.words

I am not able to figure out how to fix this? does Nagisa work in Heroku deployment? PS: I am not really sure if the problem is with Heroku or Nagisa, please help me with this.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 22 (11 by maintainers)

Most upvoted comments

Hi @Pranjal-bisht. OK. I will try to come up with a solution using this site as a reference.

Please write tokenize_jp to the file(e.g., utils_tokenizer.py).

from janome.tokenizer import Tokenizer                                         
                                                     
def tokenize_jp(doc):                                                          
    words = list(Tokenizer().tokenize(doc, wakati=True))                                 
    return words

Load it in the python script where pickle is saved and in the API scirpt, respectively.

from utils_tokenizer import tokenize_jp

...

Hi @Pranjal-bisht. I understand your situation. Once again, I think that if you adjust the memory management in Heroku properly, the program will work without any problems. I hope it works well. If you encounter any other issues, please let me know. I think I can help you. Thanks!

Hi @Pranjal-bisht. Thank you for the Python libraries’ information. I used the following code to check memory usage. As a result, the Python libraries in your configuration use at least 345.8 MiB of memory. In addition, loading xgb models and pickle files will use additional memory.

from memory_profiler import profile


@profile
def main():
    import os
    import pickle

    import nagisa
    import pandas as pd
    import xgboost as xgb

    from random import shuffle
    from flask import Flask, render_template, request
    from scipy.sparse import csr_matrix, hstack

    text = "これはサンプルの文です。"
    words = nagisa.tagging(text)
    print(words)


if __name__ == '__main__':
    main()
Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     4     39.1 MiB     39.1 MiB           1   @profile
     5                                         def main():
     6     39.1 MiB      0.0 MiB           1       import os
     7     39.1 MiB      0.0 MiB           1       import pickle
     8                                         
     9    270.2 MiB    231.1 MiB           1       import nagisa
    10    293.5 MiB     23.2 MiB           1       import pandas as pd
    11    339.4 MiB     45.9 MiB           1       import xgboost as xgb
    12                                         
    13    339.4 MiB      0.0 MiB           1       from random import shuffle
    14    345.3 MiB      5.9 MiB           1       from flask import Flask, render_template, request
    15    345.3 MiB      0.0 MiB           1       from scipy.sparse import csr_matrix, hstack
    16                                         
    17    345.3 MiB      0.0 MiB           1       text = "これはサンプルの文です。"
    18    345.3 MiB      0.0 MiB           1       words = nagisa.tagging(text)
    19    345.8 MiB      0.5 MiB           1       print(words)

I think the 512MB of memory in the Free plan of heroku will not be enough. Your local environment has enough memory, so it works fine.

This is not a problem with the nagisa library itself. It is a problem of how memory is used in Heroku. As a solution, the free plan Heroku allows you to use two processes. So how about separating the API for tokenizing and the API for xgb models to separate the memory usage?

Hi, @Pranjal-bisht. Thank you for using nagisa! First of all, I have confirmed that nagisa works on Heroku.

As far as the error is concerned, you are getting a Heroku memory overflow error. Nagisa uses about 270 MiB of memory. If you are using the free Heroku, then only 500MiB of memory is available. So, to avoid errors, it is necessary to conserve memory usage with other libraries.

Do you use libraries other than nagisa on Heroku? Let me first check your situation. Thanks