Lemmas with '%' #1125

jmccrae · 2024-10-21T09:54:29Z

Lemmas with '%' are potentially ambiguous as discussed in #1123 as this leads to two percentage (%) occurring in the sense key.

This PR fixes our tools to work with them as follows.

The lemma and the lex_sense are split by the last percentage sign to occur. In this way ambiguity is avoided.

This even works with the Princeton WordNet tools:

-> % wordnet "100% correct" -over

Overview of adj 100%_correct

The adj 100% correct has 1 sense (no senses from tagged texts)
                                   
1. accurate, 100% correct -- (conforming exactly or almost exactly to fact or to a standard or performing with total accuracy; "an accurate reproduction"; "the accounting was accurate"; "accurate measurements"; "an accurate scale")

1313ou · 2024-10-22T09:14:43Z

This ignores the possible presence of a head in a sense key. If lemmas can have unescaped %, so do heads, leading to possible unparsability of sense key. This does not happen with the current data set and is unlikely. But unlikely things happen sometimes.

Fix scripts to work with '%'

e122bba

jmccrae mentioned this pull request Oct 21, 2024

An entry in the index.sense file has 6 fields instead of 5 #1123

Closed

jmccrae merged commit a513594 into main Oct 21, 2024
2 checks passed

jmccrae deleted the percent_test branch October 21, 2024 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lemmas with '%' #1125

Lemmas with '%' #1125

Uh oh!

jmccrae commented Oct 21, 2024

Uh oh!

Uh oh!

1313ou commented Oct 22, 2024

Uh oh!

Uh oh!

Lemmas with '%' #1125

Lemmas with '%' #1125

Uh oh!

Conversation

jmccrae commented Oct 21, 2024

Uh oh!

Uh oh!

1313ou commented Oct 22, 2024

Uh oh!

Uh oh!