Skip to content

WIP Remove many antlr issues #1588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

GillesDuvert
Copy link
Contributor

@GillesDuvert GillesDuvert commented May 3, 2023

This is a WIP.
All the open issues directly related to ANTLR should be closed by this PR.
For the sake of maintainance, the version of 27/09/2013 (just before some complicated optimization introduced by Mark Schellens) has been used as template (upon which all the subsequent small changes have been reported and the open issues are solved). This to avoid a dependency between the ANTLR code and the antlr-generated code (code based on internal values of '_tokenSet_XX' that may be changing with the number and order of tokens generated by antlr). Indeed, nobody can IMHO debug such complicated dependencies. The loss in performance induced by this pseudo-regression has to be estimated (it seems negligible) and weighted against the advantages of having more chances to maintain correctly the code, even going to ANTLR3 or 4 more easily.
⚠️ I was obliged to use the last version of gdlc.g because contary to what is stated above, the last optimized code by Marc is twice as fast as the one referred to above.
⚠️ The last remaining problem with the ANTLR code is that it will become very very slow during the Parser phase when there is even a small number of (matched) parenthesis. For example a=(3) is instantaneous, a=((((3)))) is very slow.
⚠️ At the moment the format.g issues nondeterminism warnings when compiling by antlr, this is OK as the order of branching makes the formats (issue #830) behave OK, so no further maintainance is needed.

reverted to last (optimized) version of gdlc.g made by Marc as it is 2 times faster than plain antlr code. Only worry is the dependency to TokenSet numbers in the C++ code inserted in primary_expr_deref, that may become wrong with future changes...
Added tests!
@GillesDuvert
Copy link
Contributor Author

Hmmm... The tests show incredible low speed on a few tests, so there is something fishy with the backport of Marc's optimizations (was to be expected)
---> One should separate the compilation time from the execution time in the tests by redoing every test twice: sometimes a long test is just due to a long Parser time, iself due to the long-standing problem of () vs. [] in our ANTLR.

@GillesDuvert
Copy link
Contributor Author

speed of GH continuous integration tests are very bad in all cases (typically 5 to 10 times slower than on my old HP notebook) because the code is unoptimized, the number of cores is 1 and the machine virtual. So the speed tests are not particularly useful. But in the present case, indeed there IS quite a huge penalty in compilation and execution speed if the ANTLR code is not well-written (last patch 8998987).
This suggest further speedup can be attained by modifying the ANTLR code, possibly after converting it to ANTLR4 (see #369) . It would be recommended in this case to use an ANTLR code (especially for gdlc.g) without Marc's optimizations, at least without the dedicated code using tokenSet_xx .

@GillesDuvert GillesDuvert merged commit b0bdebb into gnudatalanguage:master May 24, 2023
@GillesDuvert GillesDuvert mentioned this pull request May 25, 2023
@GillesDuvert GillesDuvert deleted the remove_many_antlr_issues branch September 8, 2023 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant