lark: grammar files get opened an unnecessary amount of times, causing an enormous loading time when creating a parser

it seems that, when using a FromPackageLoader object, a grammar file is opened and read from each time another grammar uses a rule that is imported from that former grammar. this means opening the same file over and over again, for each occurence of a rule contained in that file.

while this may not be noticeable for parsers that only use grammar files contained in the same directory (meaning no custom FromPackageLoader is necessary), it becomes highly problematic when using many FromPackageLoaders, as the time required to construct a parser goes up by an absurd amount.

by placing a print(resource_name) in the get_data() function of the python lib pkgutil.py, i was able to count how many times my grammar files were loaded each. for example, the common.lark grammar provided by lark gets opened 61 (!) times, one of my own grammars 25 times, another 16, etc.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 72 (72 by maintainers)

Most upvoted comments

@ornariece I will create a PR, probably tomorrow. Now I gotta sleep. 😃