Skip to content

Convert README verification script to a Rust test #1563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

jackcooper20
Copy link

This PR addresses #636 by replacing the bash script (scripts/update_example_READMEs.sh) with a Rust-based test that verifies the examples/README.md contents are in sync with the actual examples in the codebase.

The test:

  • Walks through all example categories (general, supplementary, restriction, experimental, and testing)
  • Extracts example names and descriptions from each Cargo.toml file
  • Generates a temporary README.md with the expected content
  • Verifies that the actual README.md contains all necessary example links and descriptions

This change makes the README verification process part of the standard test suite, runnable via cargo test. By keeping this verification in Rust rather than bash, we ensure better cross-platform compatibility and integration with the rest of the testing infrastructure.

@jackcooper20 jackcooper20 requested a review from smoelius as a code owner March 25, 2025 23:36
@CLAassistant
Copy link

CLAassistant commented Mar 25, 2025

CLA assistant check
All committers have signed the CLA.

@jackcooper20
Copy link
Author

hey @smoelius, here is the PR. Please check and let me know if any thing is needed. Thank You!

@smoelius
Copy link
Collaborator

Hey, thanks @jackcooper20! I will try to look at this no later than this weekend.

@smoelius
Copy link
Collaborator

@jackcooper20 Do you think you could try to get CI to pass?

@jackcooper20
Copy link
Author

@smoelius completed the CI

Copy link
Collaborator

@smoelius smoelius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this, @jackcooper20.

My two main concerns right now are that the test encodes the entire README, and that the test does not simply compare the expected README with the actual README. The former is elaborated on below; the latter is more of a question, so if you could please explain. 🙏

@@ -0,0 +1,4 @@
// This file ensures that the tests directory is recognized as a test directory
// and that the test_readme.rs file is included in the test suite.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, I don't understand this comment.

But, either way, could we please move this test into https://github.com/trailofbits/dylint/blob/master/cargo-dylint/tests/ci.rs?


#[test]
fn test_readme_contents() {
let examples_dir = find_examples_dir();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have a reason to prefer this approach over simply hardcoding the path to examples?

Copy link
Author

@jackcooper20 jackcooper20 Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding would be simpler and more explicit. The main reason I chose the current approach was to ensure the test works regardless of where it's run

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you move this test to https://github.com/trailofbits/dylint/blob/master/cargo-dylint/tests/ci.rs, it will always execute from within cargo-dylint's subdirectory. So there will be no ambiguity.



// Instead we use a more flexible approach that allows for formatting differences
// while ensuring all examples are properly documented
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like you have a reason to not just compare expected_content and actual_content. What is that reason?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically to ensures the README is functionally correct while being resilient to formatting changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fine to compare them outright. That should also make the code simpler.


if category == "restriction" {
// Handle restriction directory differently since it seems to have directories with slashes in names
for entry in read_dir(&category_dir).unwrap() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you use read_dir for both restriction and non-restriction cases?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, read_dir could be used for both restriction and non-restriction cases. I'll try that

}
} else {
return None;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use fs::read_to_string here? https://doc.rust-lang.org/std/fs/fn.read_to_string.html

Copy link
Author

@jackcooper20 jackcooper20 Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. After looking at it again, I realized the current approach is a bit verbose than necessary. Using fs::read_to_string would make the code more concise and readable.

let mut content = String::new();

// Add the header
content.push_str("# Example Dylint libraries\n\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I like the idea of storing the entire README contents in the test. Because then the test will have to change whenever the README changes.

I think I would prefer that the test find where the lint descriptions should go and generate just that part, taking everything else from the existing README.

I may not be obvious, but the existing shell script worked this way:

if [[ "$X" =~ ^\| ]]; then

But I would be open to making things even easier for the test. For example, the README could contain markers like <!-- lint descriptions start -->/<!-- lint descriptions end --> to help the test know where the lint descriptions should go.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right that hardcoding the entire README is brittle. I'll refactor the test to generate and verify only the table sections.

@jackcooper20
Copy link
Author

jackcooper20 commented Apr 1, 2025

@smoelius thank you for the feedback. I’ve answered your queries, and it looks like a few changes are needed. Do you think anything else needs to be changed other than the mentioned ones? Also, let me know if you have any further questions.

Copy link
Collaborator

@smoelius smoelius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think anything else needs to be changed other than the mentioned ones?

I honestly won't know until I see the code.

I really appreciate you working on this.


#[test]
fn test_readme_contents() {
let examples_dir = find_examples_dir();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you move this test to https://github.com/trailofbits/dylint/blob/master/cargo-dylint/tests/ci.rs, it will always execute from within cargo-dylint's subdirectory. So there will be no ambiguity.



// Instead we use a more flexible approach that allows for formatting differences
// while ensuring all examples are properly documented
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fine to compare them outright. That should also make the code simpler.

@jackcooper20 jackcooper20 force-pushed the convert-to-rust-test branch from b4f041b to 2cf32c5 Compare April 5, 2025 12:45
@jackcooper20
Copy link
Author

jackcooper20 commented Apr 6, 2025

Hello @smoelius,

I've tried to fix the issues you mentioned in the review. However, after pushing the changes, one of the CI tests failed, which is related to formatting. I attempted to resolve the issue, but I encountered a potential bug in the update_example_READMEs.sh script.

In particular, the problem seems to be with this line

sed 's/| A lint to check for \([^|]*\) |$/| \u\1 |/'

The intent of this line is to capitalize text (e.g., changing abc to Abc). However, the \u syntax isn't supported in GNU sed, which results in an extra u being introduced before the text (e.g., uLint description instead of Lint description). This issue is likely causing a mismatch between the expected and actual content in README.md.

Could you please check if this is the root cause of the issue, or if I might be overlooking something? If it’s not the cause, I'd appreciate your guidance in resolving the CI failure.

@jackcooper20
Copy link
Author

@smoelius ,please take a look at this at your convenience

@smoelius
Copy link
Collaborator

smoelius commented Apr 7, 2025

Hey, sorry for not replying, @jackcooper20. I will try to look at this this week.

Copy link
Collaborator

@smoelius smoelius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackcooper20 This is looking pretty good. Thanks a lot for working on this!

@@ -133,6 +133,165 @@ fn cargo_dylint_and_dylint_readmes_are_equal() {
compare_lines(&cargo_dylint_readme, &dylint_readme);
}

#[test]
fn examples_readme_contents() {
let examples_dir = find_examples_dir();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let examples_dir = find_examples_dir();
let examples_dir = Path::new(env!("CARGO_MANIFEST_DIR")).join("../examples");

I think it is fine to hardcode the path to the examples directory. The likelihood of it moving relative the cargo-dylint package is small.

Please also remove the find_examples_dir function.

Comment on lines 11 to 12
<!-- lint descriptions start -->
## General
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<!-- lint descriptions start -->
## General
<!-- lint descriptions start -->
## General

The lack of an intervening blank line is why I think the current CI is complaining.

Comment on lines 68 to 69
| [`straggler`](./testing/straggler) | A lint that uses an old toolchain for testing purposes |
<!-- lint descriptions end -->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| [`straggler`](./testing/straggler) | A lint that uses an old toolchain for testing purposes |
<!-- lint descriptions end -->
| [`straggler`](./testing/straggler) | A lint that uses an old toolchain for testing purposes |
<!-- lint descriptions end -->

Same comment as above.

Comment on lines 12 to 13
regex = { workspace = true }
tempfile = { workspace = true }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
regex = { workspace = true }
tempfile = { workspace = true }

Since no code in the examples directory is modified, these additions should not be necessary.

// Extract the description using regex
let re = Regex::new(r#"description\s*=\s*"([^"]*)""#).unwrap();
let description = if let Some(caps) = re.captures(&content) {
if let Some(desc) = caps.get(1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let Some(desc) = caps.get(1) {
let desc = caps.get(1).unwrap();

If the captures call succeeds, then this call should succeed as well.

A failure of this get would likely indicate a bug in the definition of re, which I think is worth a test panic.

Note that removing this if will require other changes.

content
}

fn extract_between_markers(content: &str, start_marker: &str, end_marker: &str) -> Option<String> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn extract_between_markers(content: &str, start_marker: &str, end_marker: &str) -> Option<String> {
fn extract_between_markers(content: &str) -> Option<String> {
const START_MARKER: &str = "<!-- lint descriptions start -->";
const END_MARKER: &str = "<!-- lint descriptions end -->";

Since extract_between_markers is called only once, could we make start_marker and end_marker constants defined within this function?

let readme_content = read_to_string(&readme_path).unwrap();

// Generate just the lint description tables
let tables = generate_lint_tables(&examples_dir, &categories);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let tables = generate_lint_tables(&examples_dir, &categories);
let expected_tables = generate_lint_tables(&examples_dir, &categories);

Nit.

Comment on lines 165 to 166
expected_content,
actual_content,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expected_content,
actual_content,
expected_tables,
actual_tables,

I would prefer this more stringent check. Please also remove the extract_meaningful_content function.

I understand why you wrote the extract_meaningful_content function: to avoid having to worry about spacing.

But generating a table with correct spacing will make it easier to replace an existing, incorrect table.

let tables = generate_lint_tables(&examples_dir, &categories);

// Extract the current tables section from README using markers
let current_tables = extract_between_markers(&readme_content,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let current_tables = extract_between_markers(&readme_content,
let actual_tables = extract_between_markers(&readme_content,

Nit to follow the convention that "expected" is often paired with "actual".

result
}

fn generate_lint_tables(examples_dir: &Path, categories: &[&str]) -> String {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the comment above regarding assert_eq!, this function will need to be adjusted to get the spacing right.

I recommend computing all of the formatted "Example" and "Description/check" entries, and then determining the columns' widths from that.

Something similar to this:

dylint/dylint/src/lib.rs

Lines 182 to 193 in 67eddaf

let name_width = name_toolchain_map
.keys()
.map(String::len)
.max()
.unwrap_or_default();
let toolchain_width = name_toolchain_map
.values()
.flat_map(LazyToolchainMap::keys)
.map(String::len)
.max()
.unwrap_or_default();

@jackcooper20
Copy link
Author

@smoelius I've fixed the things you mentioned in comments, but looks like the CI is failing again for the reason I mentioned previously. I am writing it here again. Please have a look

There might be a potential bug in the update_example_READMEs.sh script.

In particular, the problem seems to be with this line

sed 's/| A lint to check for \([^|]*\) |$/| \u\1 |/'

The intent of this line is to capitalize text (e.g., changing abc to Abc). However, the \u syntax isn't supported in GNU sed, which results in an extra u being introduced before the text (e.g., uLint description instead of Lint description).

This issue is likely causing a mismatch between the expected and actual content in README.md.

@jackcooper20
Copy link
Author

Hey @smoelius, I've managed to pass the CI. Apparently there were some new lines that was causing the problem. Please check and let me know if you have any questions.

Thank You!

Copy link
Collaborator

@smoelius smoelius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please forgive me for saying this, but there are changes all over the place. It is difficult to review a PR when seemingly random changes are made.

For example. there is now code that calls preserves_cleanliness, but it's conditioned on "bless_mode". This doesn't really make sense because preserves_cleanliness is for checking that things don't change, while "blessing" typically means intentionally updating something.

When I commented on 3589ae5, I thought we were close to a solution. Is there any chance you could roll back to that commit and make smaller, more focused changes?

Comment on lines 83 to 85
- name: Format example READMEs
run: ./scripts/update_example_READMEs.sh && git diff --exit-code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that this will eventually have to go away. But, for now, could you please put it back in? I have found its success/failure useful to compare to.

@jackcooper20
Copy link
Author

Hello @smoelius, thanks for the feedback.

Apologies for making a mess of it. To be honest, after all the CI failures, I started getting a bit nervous and ended up trying all kinds of approaches just to get things working. That led to a few missteps, and I agree the recent changes added more confusion than value.

I'll revert back to 3589ae5 and work from there with a more simplified approach.

Thanks again for your patience — I'll follow up soon with a cleaner version of the PR.

@jackcooper20 jackcooper20 force-pushed the convert-to-rust-test branch from 6d658ed to 3589ae5 Compare April 19, 2025 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants