Skip to content

Some issue with error reporting #332

@tailhook

Description

@tailhook

I've stripped down my example to the following grammar, expressed in English: source text can contain multiple items separated by newline or comment (double slash //), each item is identifier followed by whitespace-separated numbers.

Here are tree versions of a grammar:

use combine::parser::char::{digit, space, letter};
use combine::parser::repeat::{repeat_until};
use combine::{Stream, Parser, EasyParser};
use combine::{eof, token, many1, sep_by, value};
use combine::{many, skip_many1, attempt};


fn id<I: Stream<Token=char>>() -> impl Parser<I, Output=String> {
    many(letter())
}

fn ws<I: Stream<Token=char>>() -> impl Parser<I, Output=()> {
    skip_many1(space())
}

fn num<I: Stream<Token=char>>() -> impl Parser<I, Output=String> {
    many1(digit())
}

fn comment<I: Stream<Token=char>>() -> impl Parser<I, Output=()> {
    attempt((token('/'), token('/')).silent()).with(value(()))
}

fn newline<I: Stream<Token=char>>() -> impl Parser<I, Output=()> {
    token('\n').with(value(())).expected("newline")
}

fn main() {

    let mut parser1 = many::<Vec<_>, _, _>(
        id()
        .and(many::<Vec<_>, _, _>(ws().with(num())))
        .and(comment().or(newline())),
    );

    let mut parser2 = many::<Vec<_>, _, _>(
        id()
        .and(repeat_until::<Vec<_>, _, _, _>(
            ws().with(num()),
            comment().or(newline()),
        ))
        .and(comment().or(newline())),
    );

    let mut parser3 = many::<Vec<_>, _, _>(
        id()
        .skip(ws())
        .and(sep_by::<Vec<_>, _, _, _>(num(), ws()))
        .and(comment().or(newline()))
    );

    let s = r#"a 123/2"#;
    let err1 = parser1.easy_parse(s)
         .map_err(|e| e.map_position(|p| p.translate_position(s)))
         .unwrap_err();
    let err2 = parser2.easy_parse(s)
         .map_err(|e| e.map_position(|p| p.translate_position(s)))
         .unwrap_err();
    let err3 = parser3.easy_parse(s)
         .map_err(|e| e.map_position(|p| p.translate_position(s)))
         .unwrap_err();
    println!("{}\n{}\n{}", err1, err2, err3);
}

The output is:

Parse error at 6
Unexpected `2`
Unexpected `/`
Expected `whitespace`, `digit` or `newline`

Parse error at 5
Unexpected ` `
Expected `letter`

Parse error at 6
Unexpected `2`
Unexpected `/`
Expected `whitespace` or `newline`

Note in variant 1:

  1. Two unexpected's, / is at wrong position, 2 is not the erroneous character. Looks like a bug?
  2. Position is the position of the character after the erroneous one
  3. Expected digit is wrong. There needs to be whitespace between (or newline, or comment which is silenced)

Note in variant 2:

  1. Unexpected space is at a different position.
  2. Erroneous position is (surprisingly) right
  3. letter can't be here, note that even if I remove the outermost many (i.e. only support single item, so there are no letters possible after initial whitespace), this parser also reports letter.

Note in variant 3:

  1. Same issues as with "variant 1" for position and "unexpected"s
  2. "expected" set is fine

Are there any bugs, or am I misunderstanding parsers somehow? Also why there is such a difference between sep_by, repeat_until and many?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions