Skip to content

Latest commit

 

History

History

1-regexes

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Lab 1: Lexers and regular expressions

This lab is about lexers and regular expressions. It is intended to give you enough working knowledge and experience to be able to design and implement the C lexer for your compiler.

Changelog

  • 23-Jan-2025: Added clarification that comments inside attributes should not count towards the number of comments removed.

Specification

Write a tool using Flex that reads a stream of ASCII characters, and processes it, character by character, by applying the rules below. In what follows, a line is defined as any maximal sub-sequence of the stream whose last character is a newline and which does not contain any other newline characters. You may assume that the final character in the whole stream is a newline.

  • The // sequence indicates the beginning of a comment. If // is encountered, remove it and the rest of the line.

  • The \ character indicates the beginning of an escaped identifier. If \ is encountered, ignore it and move to the next space or newline in the stream.

  • The (* sequence indicates the beginning of an attribute. If (* is encountered, remove it, and then remove all characters up to and including the end of the attribute. The end of the attribute is the next occurrence of *) that is not part of an escaped identifier and is not part of a comment. The end of the attribute may either be on the same line or on a subsequent line. If there is no end of the attribute, then this rule does not apply.

  • If any other character is encountered, ignore it and move to the next character in the stream.

Finally, the tool should add a line to the end of its output that says Number of comments and attributes removed: n. where n is the number of comments and attributes that have been removed. [Edit 23-Jan-2025: If a comment is nested inside an attribute, then it is automatically removed when the attribute is removed, and does not need removing explicitly; therefore, it doesn't count towards the number of comments removed. The result of test 9 clarifies this behaviour.]

As an example, if the input stream looks like this:

module foo ()
  wire \hello;
  wire \//go away;
  //go away
  wire (* hello \*) world *) there;
endmodule

then the output should look like this:

module foo ()
  wire \hello;
  wire \//go away;
    wire  there;
endmodule
Number of comments and attributes removed: 2.

The program should be built by running the command make nocomment.

There is already a skeleton program setup, including:

The skeleton setup contains a number of suggestions where things need to be changed and edited, but these are not exhaustive.

There is also a test-bench included, which is a partial set of test vectors for the program together with a script for running them. Passing these tests is equivalent to achieving 50% in the final assessment, with unseen tests covering the remaining 50%.

The components of the test are:

  • test/in: A set of input test files of increasing complexity.

  • test/ref: The "golden" output for the given input files, which your program should match. There is one output for each input.

  • test_lexer.sh: A script that runs the tests. It will build your program, then apply it to each input in turn to produce a file in test/out. It will then use diff to work out whether the output matches the reference output. You can run this script via the command: ./test_lexer.sh.

You may find the Flex manual helpful, particularly the section about the syntax of patterns.