Skip to content

Replace cabal project parsing with Parsec #8889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 45 commits into
base: master
Choose a base branch
from

Conversation

jgotoh
Copy link
Member

@jgotoh jgotoh commented Apr 1, 2023

Implements #6101, #7748. Finally ready for review!

Please include the following checklist in your PR:

  • Patches conform to the coding conventions.
  • Any changes that could be relevant to users have been recorded in the changelog.
  • The documentation has been updated, if necessary. (I don't think it's necessary, correct me if I'm wrong)
  • Manual QA notes have been included (not needed as this change is invisible to users)
  • Tests have been added. (Ask for help if you don’t know how to write them! Ask for an exemption if tests are too complex for too little coverage!)

In this PR I implemented #6101 to replace the legacy cabal.project parser (module Distribution.Client.ProjectConfig.Legacy) with an implementation based on Parsec.
My implementation is based heavily on the existing Parsec parser of .cabal files, see module Distribution.PackageDescription.Parsec.
My goal was to recreate the exact grammar the Legacy Parser parses using the modern Parsec framework with FieldGrammars etc.
This means the Legacy Parser and Parsec Parser should always return the same ProjectConfig value for a file (at least in this PR).

About CI:
All of the validation checks of Ubuntu are running, unfortunately windows-latest ghc-9.10.1 fails, but windows ghc-9.8.2 succeeds.

The PR consists of several main parts:

Main Entrypoint Distribution.Client.ProjectConfig.readProjectFileSkeleton
The main entrypoint into the Parsec parser is in function readProjectFileSkeleton in Distribution.Client.ProjectConfig and parses a cabal.project file.
Note that the legacy parser can be executed still via readProjectFileSkeletonLegacy in the same file, I want to remove it but in another ticket because this PR is huge already.
My current implementation of readProjectFileSkeleton also calls readProjectFileSkeletonLegacy to compare the Parsec value to the legacy value, throwing errors if they are not equal.
Currently I use it to verify that the parsers' grammars match.
I will delete this functionality when the PR is ready to merge, but for now I will let the functionality stay in the PR so reviewers can try to build other projects to verify the parsers' values to not diverge.
Note that because .project files are parsed by both the Legacy parser and the Parsec parser, all warnings are currently emitted twice - once by each parser.

The function readProjectFileSkeleton currently uses an old variant of readAndParseFile from Cabal/src/Distribution/Simple/PackageDescription.hs (new variant is here) that is not based on SymbolicPaths yet but uses FilePath instead.
If we want to migrate it to use SymbolicPaths I need some help here :) Maybe this should be part of a future ticket too.

readProjectFileSkeleton calls Distribution.Client.ProjectConfig.Parsec.parseProject which leads me to the next main part:

Module Distribution.Client.ProjectConfig.Parsec

This module contains the Parsec parser of ProjectConfigs.
Function parseProject is a copy of function parseProject of the Legacy parser (Distribution.Client.ProjectConfig.Legacy.parseProject) but it now calls the Parsec version of parseProjectSkeleton.

parseProjectSkeleton
parseProjectSkeleton is a port of function parseProjectSkeleton of module Distribution.Client.ProjectConfig.Legacy.
It does not use the deprecated type Field from Distribution.Deprecated.ParseUtils anymore but Distribution.Fields.Field instead.
I tried to change as little as possible here, so we still parse the fields/sections into ProjectConfigs, import and parse other cabal.project files and process the conditional structure.

An interesting bit includes the new processing of imports (Fields with name equaling "import"):
We need to use liftPR to be able to compose ParseResults that involve IO actions (for example downloading imported cabal.project files via HTTP).
Composing two actions involves executing a ParseResult resulting in a PRState and executing another ParseResult passing in the previous PRState.
Unfortunately the PRState that is generated when executing a ParseResult does not contain the file source where the warnings/errors came from.
So we need to print any warnings/errors that came up when parsing an imported file before we return the ParseResult, otherwise we lose the source file of the warnings/errors.
I added the implementation of liftPR for Parsec ParseResults to module Distribution.Fields.ParseResult because I needed its constructor.
Please pay special attention to reviewing this implementation, as it was quite complex to develop.
It works in the tests :)

Another interesting bit is function fieldsToConfig in parseProjectSkeleton. Here we produce ProjectConfig values by parsing the current fields with a FieldGrammar. Afterwards we parse the sections (such as source-repository-package, ...) with function goSections.

FieldGrammar
The ProjectConfig FieldGrammar is defined in module Distribution.Client.ProjectConfig.FieldGrammar.
It took me some while to reverse engineer all the possible field names and find out their named field equivalents in the ProjectConfig record, but it should be complete now.
Parsing of sections such as source-repository-package is not done in here, see the next point below.
There are some fields such as projectConfigDryRun, projectConfigOnlyDeps that afaik can not be specified in a cabal.project file and need to be passed in via command line flags.
They are marked by comments in the following form: -- cli flag: projectConfigDryRun.
I also needed to add some Parsec instances that I added to the modules defining the respective type.
For example the Parsec instance of OptimisationLevel is in module Distribution.Simple.Compiler directly below the type definition.

Parsing of Sections
Parsing of sections happens in a StateT monad (SectionParser) modifying the ProjectConfig we got out of the FieldGrammar.
For the SectionParser I took a lot of inspiration again from the parser of .cabal files in module Distribution.PackageDescription.Parsec, see type SectionParser.
Currently I need to implement parsing of repository sections here as I've missed it until now but I hope it is the last type of section that is missing.

Integration Tests in cabal-testsuite/PackageTests/ProjectConfig/Parsec/cabal.test.hs
A suite of integration tests.
Parsing values and comparing them to expectations.
Note that I also run the legacy parser here to make sure my expected values do not differ from the non-Parsec implementation, this will be removed in the future.

Furthermore, I've tested the implementation by successfully building the following Haskell repositories: http2, cabal, tls, hashable, hspec, ghc-lib-parser, texmath, text, lens, megaparsec.

Future PRs
There are also other aspects that I want to address, but I want to do this in future PRs because the current one is too big already:

  • Add WarningTests/ErrorTests similar to Cabal-tests/tests/ParserTests.warningTests to test the correct output of warnings/errors of the Parsec parser
  • Use treeDiff in the integration tests to compare the parsed values with expectations for easier identification of differences
  • Remove unused code from Distribution/Client/ProjectConfig/Legacy.hs

@jgotoh
Copy link
Member Author

jgotoh commented Apr 1, 2023

@ulysses4ever as announced, here is an early PR of the changes. Still lots of work, but I have a rough outline of the ticket. I would be glad if you would take a look!

@ulysses4ever
Copy link
Collaborator

Thanks @jgotoh ! Will take a look eventually.

@gbaz you know this code better than anyone currently active, i think. Would you be able to provide guidance?

@jgotoh jgotoh force-pushed the parse-cabal-project-parsec branch 5 times, most recently from 402f607 to 9016a3a Compare April 3, 2023 06:55
@ulysses4ever
Copy link
Collaborator

@grayjay I know you’re more of a solver person, but if you happen to have some time for advising and feel comfortable in this part of the code, your contribution would be priceless.

@ulysses4ever
Copy link
Collaborator

@jgotoh there's a bi-weekly cabal devs meeting, where, I am sure, people would be delighted to hear your experience so far. The closest one is this Thursday (Apr 20th), 1 PM Eastern Time (US). The link to a Jitsi video call is posted before the meeting on #hackage at libera.chat (can be browsed using Element/Matrix or an IRC client). Are you interested?

@jgotoh
Copy link
Member Author

jgotoh commented Apr 19, 2023

@ulysses4ever Thank you very much for you invitation! I've already wondered about the cabal devs main communication channel. I am definitely interested and will be glad to attend :)

@ulysses4ever
Copy link
Collaborator

@jgotoh cool! let me know if you want me to mail you the jitsi link beforehand.

@grayjay
Copy link
Collaborator

grayjay commented Apr 20, 2023

@grayjay I know you’re more of a solver person, but if you happen to have some time for advising and feel comfortable in this part of the code, your contribution would be priceless.

I'm not familiar with this part of the code, but I tried to answer a couple questions based on my understanding of the parser behavior.

@jgotoh jgotoh force-pushed the parse-cabal-project-parsec branch from c5f7509 to 5fd9791 Compare May 20, 2023 15:31
@andreabedini
Copy link
Collaborator

@jgotoh Is there anyway I can help you with this?

@jgotoh
Copy link
Member Author

jgotoh commented May 30, 2023

@andreabedini many thanks for your offer! I will push some changes this week adding ProjectConfig.projectConfigBuildOnly. This means around 50% of parsing of ProjectConfigs is done then.
What is missing afterwards is parsing Conditionals into the CondTree structure. I've concentrated more on ProjectConfigs for now, but if you have some spare time you could support me here! Also, I will talk about the current state of the PR in the next dev meeting this thursday, you can join if you want :) see this comment: #8889 (comment)

Also ongoing small reviews of the code I push are helpful if you spot anything.

@andreabedini
Copy link
Collaborator

Also, I will talk about the current state of the PR in the next dev meeting this thursday, you can join if you want :)

Thank you for the invite, but unfortunately that would be 3:00 am here 😄 😞 I might just go through the PR in my own time and leave some comment (if there's anything I can comment on).

@jgotoh jgotoh force-pushed the parse-cabal-project-parsec branch from 96ef42c to e671cb1 Compare June 3, 2023 18:44
parsec = parsecNumJobs

parsecNumJobs :: CabalParsing m => m NumJobs
parsecNumJobs = ncpus <|> numJobs
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Distribution.Setup.Simple.Common.numJobsParser for original, non-parsec parser

@jgotoh jgotoh force-pushed the parse-cabal-project-parsec branch from 9e0127d to c75f6e2 Compare August 5, 2023 17:03
@mpickering
Copy link
Collaborator

I did a brief look and it is looking very good, too hard to review exactly line-by-line I think but you have done some very good testing.

My instinct is to turn this on by default and add a command line option which enables the legacy parser for 3.16 (so users can workaround any bugs), then in 3.18 the legacy parser can be removed.

@mpickering mpickering force-pushed the parse-cabal-project-parsec branch from 8b10027 to 62592df Compare June 12, 2025 11:19
@ulysses4ever
Copy link
Collaborator

My instinct is to turn this on by default and add a command line option which enables the legacy parser for 3.16 (so users can workaround any bugs), then in 3.18 the legacy parser can be removed.

I agree.

From what @jgotoh told us at some past Cabal meetings, I think he'd not mind getting a hand in pushing this over the line. With the understanding that we can always augment it in future PRs. I personally think that in the interest of getting it in 3.16, we should be most decisive.

@mpickering
Copy link
Collaborator

Thanks Arten, I rebased but would still appreciate a summary of what is TODO and what is completed before diving in from @jgotoh .

@jgotoh
Copy link
Member Author

jgotoh commented Jun 16, 2025

Hi @mpickering,
many thanks for continuing this, awesome news! :)
Here is a short summary of the open todos:

  1. There are still some open comments in the PR here (mostly style suggestions, so they should be easy to implement)
  2. See my comment Replace cabal project parsing with Parsec #8889 (comment). I've tested the parser on lots of files, but two files were still failing:
    First failure did fail with the parsec parser, but also with the default/legacy cabal parser, so it does not need to be fixed (imo).
    Another failure was because of a mix of comma/no comma in a list of values:
packages: gi-cairo-render/gi-cairo-render.cabal,          <- here a comma is used to separate the elements in the list
          gi-cairo-connector/gi-cairo-connector.cabal     <- no comma separator
          examples/sdl/gi-cairo-render-sdl.cabal
          ^ fails here, unexpected e
          examples/clock/gi-cairo-render-clock.cabal

The legacy parser accepts this. The parsec parser throws an error because you either have to always use comma separator or never use one (can be discussed whether this behavior should be changed). See the comment for links to the files. Also here is a quick and dirty python implementation I used to test the files: https://github.com/jgotoh/cabal-test-parser

  1. Make a trailing colon for stanzas a parse failure (PR Make a trailing colon for stanzas a parse failure #10525)
    The PR was implemented for the legacy parser. There is already a new ticket to do this for the Parsec parser, .cabal files: Make a trailing colon for stanzas a parse failure #10660.
    As my current PR uses the Parsec parser, .cabal files: Make a trailing colon for stanzas a parse failure #10660 would need to be implemented for the Parsec parser for .cabal files before this PR can be merged. Or it can be part of this PR.
    Otherwise the new parser can not fulfil the existing test (cabal-testsuite/PackageTests/ProjectConfig/FieldStanzaConfusion/cabal.test.hs).
  2. Add source file to project parse errors and warnings (Add source file to project parse errors and warnings #10644)
    This was a really nice change on how errors and warnings are printed. Again, only for the legacy parser. So the printing mechanisms needs to be implemented for the Parsec parser, too.

As it's some months since I've worked on the PR, I don't know if there are any other new changes to the legacy parser that need to be migrated.
But it may be a good idea to compare my reimplementation of parseProjectSkeleton (file cabal-install/src/Distribution/Client/ProjectConfig/Parsec.hs) with its original (cabal-install/src/Distribution/Client/ProjectConfig/Legacy.hs, function parseProjectSkeleton), as often changes to the legacy parser touched it.

I hope this clears some things up a bit, just ping me here if you have any questions.

Thanks again! I really appreciate that you want to continue here :)

edit:
Another addition:
The integration tests in this PR (cabal-install/parser-tests/Tests/ParserTests.hs) just use assertEqual. Unfortunately this results in huge strings when something is not equal. It would be a good idea to use tree-diff instead for better error messages. The necessary instances for ProjectConfig and its related types are already there, see https://github.com/haskell/cabal/blob/master/cabal-install/tests/UnitTests/Distribution/Client/TreeDiffInstances.hs. So instead of assertEqual some variation of ediff can be used.

I’ve tested the parser on around 3,000 .cabal files, and the only one it really failed on was the example I mentioned above with mixed comma usage (which might not even need to be fixed). Overall, the implementation is in pretty good shape and most of the remaining stuff is about style tweaks or catching up with some newer legacy parser features like error messages and handling trailing colons for stanzas.

@ulysses4ever
Copy link
Collaborator

@mpickering Julian posted an update that you requested. Do you have any plans for this? I'm wondering if we could still squeeze it into 3.16 still...

@mpickering
Copy link
Collaborator

@mpickering Julian posted an update that you requested. Do you have any plans for this? I'm wondering if we could still squeeze it into 3.16 still...

I don't think it is likely to get into 3.16.. we shouldn't rush this.

@ulysses4ever
Copy link
Collaborator

Sounds reasonable, thank you. Let's hope for 3.18 then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants