Some grammars require semantic predicates to add context-sensitive parsing to what would generally be a context-free grammar.
For example:
-
In Fortran90, lines that begin with a 'C' in column 1 are comments, which should be placed on a token stream other than the default. But, if the 'C' does not begin in column 1, then the input is invalid and should be flagged as so.
c Hello World. c This is a syntax error because 'c' does not start in column 1 program hello print *, 'Hello World!' end
-
In CSharp, two greater-than signs
'>>'
can either mean a right shift expression or part of a type declaration with templates. Since lexers in Antlr are not parser aware, the lexer must tokenize the two greater-than signs as two separate tokens. A semantic predicate should be added to disallow a space between the two greater-than signs in the context of an expression, but allowed in the context of a type declaration.class Foo { void Func() { int x = 1000 > > 2; // syntax error if a space exists in the double greater-than sign } Dictionary<int, List<int> > mapping; // nested template declaration, valid }
Antlr does not have a general-purpose language for predicates. These must be written in the target language of the generated parser. The problem is that a grammar would need to be forked for each target desired, which adds to the burden of maintenance.
However, it is possible to write the grammar such that forking is not required, using target-agnostic format.
- You will need to split your grammar
into separate lexer and parser grammars. Then, add
options { tokenVocab=...; }
to the parser grammar. - Create target-specific source code files that contain methods in a base class for
the parser or lexer grammar. In these source code files, write the code for the semantic
predicate. For example, the files for the Cpp target would be
Python3LexerBase.{cpp,h}
,Python3ParserBase.{cpp,h}
. - In the grammar(s), add
options { superClass=... }
. This will superclass the recognizer. For example,options { superclass=Python3ParserBase; }
. - In the grammar(s), write code to make a single
call to the base-class method. The call should have a
this.
string before the name of the method, e.g.,OPEN_PAREN : '(' {this.openBrace();};
The action code must not reference Antlr attributes, variables, types, or have semi-colons as statement separators or control-flow statements of any kind. - For some targets like Cpp and PHP, you may need to add code to include source
code files so that the generated code compiles.
For these, add a comment
such as
// Insert here @header for lexer include.
or// Insert here @header for parser include.
to the grammar, before the first rule. - Add a Python script called "transformGrammar.py" that rewrites the grammar(s)
with some target-specific code syntax.
a) For Cpp: replace
this.
strings withthis->
. b) For PHP: replacethis.
strings with$this->
. c) For Python: replacethis.
strings withself.
,l.
, orp.
depending on where the action or predicate is in the grammar. d) For Cpp: replace// Insert here @header for lexer include.
(or parser) with@header::lexer {#include ...}
. e) For PHP: replace// Insert here @header for lexer include.
(or parser) with@header::lexer {require ...}
. e) Runpython transformGrammar.py *.g4
before generating the parser and lexer.