Skip to content

Lemmas with '%' #1125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 21, 2024
Merged

Lemmas with '%' #1125

merged 1 commit into from
Oct 21, 2024

Conversation

jmccrae
Copy link
Member

@jmccrae jmccrae commented Oct 21, 2024

Lemmas with '%' are potentially ambiguous as discussed in #1123 as this leads to two percentage (%) occurring in the sense key.

This PR fixes our tools to work with them as follows.

The lemma and the lex_sense are split by the last percentage sign to occur. In this way ambiguity is avoided.

This even works with the Princeton WordNet tools:

-> % wordnet "100% correct" -over

Overview of adj 100%_correct

The adj 100% correct has 1 sense (no senses from tagged texts)
                                   
1. accurate, 100% correct -- (conforming exactly or almost exactly to fact or to a standard or performing with total accuracy; "an accurate reproduction"; "the accounting was accurate"; "accurate measurements"; "an accurate scale")

@jmccrae jmccrae merged commit a513594 into main Oct 21, 2024
2 checks passed
@jmccrae jmccrae deleted the percent_test branch October 21, 2024 11:05
@1313ou
Copy link
Contributor

1313ou commented Oct 22, 2024

This ignores the possible presence of a head in a sense key. If lemmas can have unescaped %, so do heads, leading to possible unparsability of sense key. This does not happen with the current data set and is unlikely. But unlikely things happen sometimes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants