Skip to content

Normalize labels before addition or removal #2128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 1, 2025

Conversation

kailan
Copy link
Contributor

@kailan kailan commented Jul 29, 2025

The crates.io repository has some cute emoji in the label names, but this interferes with usage of label-related triagebot commands. Example:

Screenshot 2025-07-29 at 12 19 18

It seems unreasonable to expect users to exactly match the label names in these cases, and it would be a shame to remove the emoji from the label names, so I feel that it makes sense to make triagebot a bit more lax when it comes to finding labels.

This PR adds a normalize_and_match_labels function, which takes a set of "requested" label names and attempts to match them to the labels available in the repository. This will always return an exact match, if it exists, but otherwise will strip emoji, ignore casing, and ignore leading and trailing whitespace in an attempt to find the relevant label.

This function is used in the lower level add_labels and remove_label functions so that it can apply across any triagebot commands that handle labels. This means that the existing configuration for autolabelling in crates.io will start working:

[autolabel."A-backend"]
trigger_files = [
    "src/",
]

This PR is somewhat related to #1372.

@Urgau
Copy link
Member

Urgau commented Jul 29, 2025

It seems unreasonable to expect users to exactly match the label names in these cases

I feel like it's unreasonable for GitHub to allow emoji in label names but here we are.

This PR adds a normalize_and_match_labels function

Unless I'm mistaken, your function would allow the same prefix to be match to multiple emoji. So "label 🤷", "label 👍" and "label 💯" would all be matched by "label", right?

I don't think we should allow any of those to match "label", I feel like we should require perfect match in those cases.


I didn't look too closely at the code but I saw that there wasn't any caching done, we'll need to add some, at minimum per webhook request, maybe in Issue to start and in the long run per repository.

@kailan
Copy link
Contributor Author

kailan commented Jul 29, 2025

I feel like it's unreasonable for GitHub to allow emoji in label names but here we are.

Can't argue with that, lol.

Unless I'm mistaken, your function would allow the same prefix to be match to multiple emoji. So "label 🤷", "label 👍" and "label 💯" would all be matched by "label", right?

I don't think we should allow any of those to match "label", I feel like we should require perfect match in those cases.

This is a great point, so I have added an error for this case:

AmbiguousLabelMatch – "Unsure which label to use for label - could be one of: label 🤷, label 👍, label 💯"

I didn't look too closely at the code but I saw that there wasn't any caching done

I didn't implement any caching, however this implementation will send far fewer requests to GitHub – previously a GET /labels/label_name would have been made for every requested label, this is now replaced by a single GET /labels which provides the data needed to check all labels.

@kailan kailan marked this pull request as draft July 31, 2025 15:09
@kailan kailan marked this pull request as ready for review August 1, 2025 16:09
@kailan kailan requested a review from Urgau August 1, 2025 16:09
@kailan kailan requested a review from Urgau August 1, 2025 17:50
@Urgau
Copy link
Member

Urgau commented Aug 1, 2025

Thanks for the PR. Let's try it.

@Urgau Urgau added this pull request to the merge queue Aug 1, 2025
Merged via the queue into rust-lang:master with commit 72dbff7 Aug 1, 2025
3 checks passed
@Urgau
Copy link
Member

Urgau commented Aug 1, 2025

Let's do some quick sanity checks.

@rustbot label A-test

@rustbot rustbot added the A-test label Aug 1, 2025
@Urgau
Copy link
Member

Urgau commented Aug 1, 2025

@rustbot label A-test-emoji

@rustbot
Copy link
Collaborator

rustbot commented Aug 1, 2025

Unknown labels: A-test-emoji

@Urgau
Copy link
Member

Urgau commented Aug 1, 2025

@rustbot label "A-test-emoji 😄"

@rustbot
Copy link
Collaborator

rustbot commented Aug 1, 2025

Unknown labels: A-test-emoji 😄

@Urgau
Copy link
Member

Urgau commented Aug 1, 2025

@rustbot label "A-test-emoji 😄"

@rustbot rustbot added the A-test-emoji 😄 For testing label Aug 1, 2025
@Urgau
Copy link
Member

Urgau commented Aug 1, 2025

@rustbot label "-A-test-emoji 😄" "-A-test"

@Urgau
Copy link
Member

Urgau commented Aug 1, 2025

@rustbot label A-test-emoji-2

@rustbot rustbot added the A-test-emoji-2 😄 For testing label Aug 1, 2025
@Urgau
Copy link
Member

Urgau commented Aug 1, 2025

Hum, this doesn't work 100% as I would have expected. GitHub allows both Unicode emoji characters and the Markdown named ones (:smile:, ...), and returns the raw/unexpected version in it's API.

It works fine for the Unicode emoji, but not for the second one. Fortunately the crates.io repository uses the Unicode ones, so it's fine but still a bit unexpected.

@tgross35
Copy link
Contributor

tgross35 commented Aug 1, 2025

I think this may have broken standard ascii labels in some way too; libc's autolabel has started claiming everything is unknown rust-lang/libc#4607 (comment) rust-lang/libc#4608 (comment)

@kailan
Copy link
Contributor Author

kailan commented Aug 1, 2025

Hum, this doesn't work 100% as I would have expected. GitHub allows both Unicode emoji characters and the Markdown named ones (:smile:, ...), and returns the raw/unexpected version in it's API.

Oh, that's definitely an oversight that should be addressed.

I think this may have broken standard ascii labels in some way too; libc's autolabel has started claiming everything is unknown

even worse! Okay, I will revisit this with some new test cases based on these issues.

Probably worth rolling this back if you haven't already. Apologies for the inconvenience.

Urgau added a commit to Urgau/triagebot that referenced this pull request Aug 1, 2025
This reverts commit 72dbff7, reversing
changes made to 4c371f1.

It doesn't account for the labels endpoint page limit.
@kailan
Copy link
Contributor Author

kailan commented Aug 1, 2025

That's probably why the original implementation made the GET request for each label...

@Urgau
Copy link
Member

Urgau commented Aug 1, 2025

Revert on going at #2133, apologies for the inconvenience.

github-merge-queue bot pushed a commit that referenced this pull request Aug 1, 2025
Revert "Merge pull request #2128 from kailan/normalize-labels"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants