antlr4: Go: exponentially bad/absymal performance as of ...

  • target information: Go go get -u github.com/antlr/antlr4/runtime/Go/antlr/v4@d5dd253 (same with plain /v4)
  • smallest possible grammar and code that reproduces the behavior

Expected

Parsing 19 files should take around 0.2s

Actual

Parsing 19 files takes more than 11s, and exponentially more the more files there are

Happens as of https://github.com/antlr/antlr4/commit/f19dc0ea3ede67483b6aec6128d5dd44053778f9 (~5.5s) Got worse with 56df72ca6742 (~10s)

See example in attached zip: (includes sample files and grammar) issue.zip

To test old and new behaviour, just toggle the commented lines in go.mod and run go mod tidy && go test -timeout 30s -run ^TestParsing$ issue -count=1

EDIT: after removing all leading conditional values (summary comments) and putting them into a hidden channel, performance was still a bit lower (~20%) but much, MUCH better than whatever that issue is

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 62 (43 by maintainers)

Commits related to this issue

Most upvoted comments

@kaby76, I’m not sure why you’re so astonished about PHP’s performance. We’ve been using it in production for two years now, and based on our benchmarks, it’s one of the fastest ANTLR runtimes when properly used. PHP has a just-in-time compiler and can run with opcodes, skipping all the interpreting phases.

We have pretty complex grammar, and the ANTLR-backed PHP parser can handle 5k files in < 15 seconds with zero optimization using a regular MacBook.

I plan to invest some time in identifying bottlenecks over the next few months to make the PHP runtime as fast as the Java runtime. PHP 8.1 introduced native support for murmur hash, which allows new performance improvements.

@jimidle

i gained about 4 seconds on average for single core parsing of 1893 files, this is me reusing parsers and lexers, pre-allocating and reusing buffers, from 18s to about 13s.

Mutlicore parsing using 8 goroutines went from 5.9s to about 3.6s, also a very nice upgrade. And because i was stupid, i was parsing the files two times, in the end im down to 1.9s for 1893 files, including validation of their syntax.

Also due to newly generated parser, i no longer have to constantly cast from parser.ISomeThing to *parser.SomeThing, because the parser.ISomeThing already contains the functions i need. Very useful 😄

But it is backwards compatible.

I just found out that the Go runtime API changed, only because I override all the reporting methods, https://github.com/antlr/antlr4/blob/ce241de812fbf3dd118987ed851f187e30fa3d3d/runtime/Go/antlr/v4/error_listener.go#L35 (config ATNConfigSet => config *ATNConfigSet). The only one that really matters though is SyntaxError(), and that didn’t change. I tried dev branch on the grammar and inputs for this Issue, and it takes 1/2 s for 18 files in group parsing, and about twice that using individual parsing. Excellent work, Jim!!

Yes, the “dev” branch contains important changes for the Go target. The “configs” sets are now in agreement between Java, CSharp, and Go. So, that difference has been resolved.

I thought it did. HOwever, I will still do some checking. I think I have missed one change somehow, maybe in a merge. There is also sone work to be done in error recovery. Java changed a little bit on where it inserts tokens - ironically the bits that use “Jim Idle’s magic recovery…” (copied from Ter’s comments 😉 - but Go wasn’t updated. Then I can turn on the tests that are currently disabled for Go. I also know that I need to rework config set “inheritance” because it is causing way too many allocations in one area because comparisons fail, even though I fixed the comparison logic.

I recall that preliminary versions of antlr4 were extremely slow precisely because the hash function wasn’t distributing hash keys well enough. Since the # of calls to closure() is correct and a lot of time is spent in equals() I’d certainly recommend monitoring the size of the hash table buckets. If they grow, it would show a defect in the hash function Envoyé de mon iPhoneLe 20 oct. 2022 à 20:13, Ken Domino @.***> a écrit : Ah, it’s a 2-level hash table. Not as simple as I thought.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>