Skip to content

Speedup lexer #2027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Apr 26, 2025
Merged

Conversation

GillesDuvert
Copy link
Contributor

Here I propose changes in the ANTLR code that aims at speed up (quite a lot in some cases) the compilation of procedures.
The Parser always start parsing a procedure in STRICTARR mode, the normal, modern, way to write IDL code, using '[' and ']' instead of '(' and ')' to mark array indices. The parser is much faster at parsing the procedure if this rule is respected.
If the parser encounters an problem, at locations where an array index is acceptable, it reparses the whole procedure using a 'relaxed' mode where z=toto(a,b,c) can be in fact z=toto[a,b,c] and not the function toto called with a,b,c as arguments.
parsing speed gain is 3 to 10 (still waaay below IDL's homegrown parser 😄 )

Additionnaly this removes the need of the --fussy and --sloppy commandline swiches, that are removed.
I've added the --trace-old-syntax switch that can permit to find ALL (??) occurences of the pre-v5 (i.e;, non-strictarr) way of writing IDL code. An example:

$ gdl --trace-old-syntax -e ".compile /home/gildas/IDLAstro/coyote/cgpickcolorname.pro"
old syntax at line 523, column 26
% Compiled module: CGPICKCOLORNAME_SELECT_COLOR.

closes #2012
thanks to @brandy125 for mentioning the speed problem.

Note that all this cannot solve the 'nested parenthesis' nigthmare. See the (added) antlr4 diectory to an example of code that talkes forever with ANTLR2 and is immediate with ANTLR4.

Giloo added 12 commits April 16, 2025 17:44
…tlr-parser/ removing the realtime test { LA(4) == '='}? in the lexer must improve greatly the performance. This implies to set k=4 for the lexer but we experience a factor 3 in speed.
…stic. Was not helpful anyway and makes ANTLR code more unreadable!
…to gdl by a copy of the latest 2.7.7 version. With possibly a speed impact.
…ack, for the concerned FUNCTION or PRO, to the 'sloppy' old mode.
…mode first and retry with non-strictarr if there is a syntax problem (but only if COMPILE_OPT STRICTARR has not been defined in the procedure).

Besides, one can trace all occurences of non IDL2 syntax in routines.
interactive is 'sloppy' by default but honors COMPILE_OPT STRICTARR
…-trace-old-syntax, which list the lines and columns where a non-idl2 (non-strictarr) syntax is used in the procedure.

Useful for optimizing old code.
2) displaced the location where auto-assignment operators ( all the XXX_OP_EQ ) are treated to the (IMHO) 'right' place.
3) Yet unsolvable: the parenthesis malediction (which is not present in ANTLR4) :    t=systime(/sec) &  z=execute("a=!DPI&b=2 & x=b+(a+(b+(a+(b+(a+(a+(a+b)))))))") & print,systime(/sec)-t
But this duplicates the error message. not very clever....
@alaingdl
Copy link
Contributor

I noticed that this change may help us https://cliutils.gitlab.io/modern-cmake/chapters/packages/OpenMP.html
I am OK to move to Cmake 3.12 !

Questions :

  • How can I help you ? Since it seems to be OK on recent OSX 15.3.2
  • How can I be sure that OpenMP is ON ? (if OPENMP_EXISTS() flag reliable ?)
  • Do we have tests illustrating the change in speed with OpenMP ON ?

Copy link

codecov bot commented Apr 26, 2025

Codecov Report

Attention: Patch coverage is 57.01107% with 233 lines in your changes missing coverage. Please review.

Project coverage is 42.27%. Comparing base (b648a63) to head (d34148b).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
src/antlr/TokenStreamRewriteEngine.cpp 0.00% 66 Missing ⚠️
src/antlr/Parser.hpp 24.48% 37 Missing ⚠️
src/GDLLexer.cpp 90.14% 20 Missing ⚠️
src/antlr/CommonAST.hpp 55.26% 17 Missing ⚠️
src/antlr/TokenStreamRewriteEngine.hpp 0.00% 12 Missing ⚠️
src/antlr/TreeParser.hpp 45.45% 12 Missing ⚠️
src/antlr/BaseAST.hpp 50.00% 10 Missing ⚠️
src/antlr/LLkParser.cpp 0.00% 10 Missing ⚠️
src/antlr/TokenStreamRecognitionException.hpp 0.00% 8 Missing ⚠️
src/antlr/CommonToken.cpp 0.00% 7 Missing ⚠️
... and 11 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2027      +/-   ##
==========================================
- Coverage   42.67%   42.27%   -0.41%     
==========================================
  Files         445      451       +6     
  Lines      110976   111259     +283     
  Branches    22183    22247      +64     
==========================================
- Hits        47363    47032     -331     
- Misses      63613    64227     +614     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@GillesDuvert GillesDuvert merged commit 14ef296 into gnudatalanguage:master Apr 26, 2025
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

compilation time taks 10 times longer with version 1.1.2_testing2
2 participants