Skip to content

Fix bad-string-format-character when a placeholder occurs within a format spec part #6773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

zanieb
Copy link
Member

@zanieb zanieb commented Aug 22, 2023

Closes #6767

The presented false positive "{0:.{prec}g}".format(1.23, prec=15) occurs because we treated the entire format specification of .{prec}g as the format type (which is typically at the end of a format specification). The nested replacement was not detected because a leading { was not seen. Instead, . remained the leading character — it's the leading character for a precision clause but was not consumed since the replacement was not a valid number

One way to resolve this is to update parsing of every clause in a format specification to be replacement aware e.g. precision parsing would accept a replacement as well as a number. I determined that this is too complicated and provides little value for our current rules around format specifications.

Instead, after we've parsed most of the format specification parts, we will check for any remaining replacements (i.e. the presence of { anywhere in the remaining string) and consume them and any intermediate characters. This matches our existing intention of basically ignoring any replacements while retaining as much of the normal format specification linting as we can.

@zanieb zanieb added the bug Something isn't working label Aug 22, 2023
/// ```
/// "x{foo}y{bar}z"
/// ```
fn consume_all_placeholders<'a>(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following implementation is a little clunky. I'm guessing there's a more idiomatic way to do this?

@github-actions
Copy link
Contributor

github-actions bot commented Aug 22, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.01      3.4±0.01ms    11.9 MB/sec    1.00      3.4±0.02ms    12.0 MB/sec
formatter/numpy/ctypeslib.py               1.00    709.0±2.82µs    23.5 MB/sec    1.00    712.1±2.61µs    23.4 MB/sec
formatter/numpy/globals.py                 1.00     74.0±0.34µs    39.9 MB/sec    1.00     74.2±0.37µs    39.8 MB/sec
formatter/pydantic/types.py                1.00  1384.4±15.60µs    18.4 MB/sec    1.02  1411.7±21.69µs    18.1 MB/sec
linter/all-rules/large/dataset.py          1.00     10.2±0.06ms     4.0 MB/sec    1.00     10.2±0.06ms     4.0 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.01      2.8±0.05ms     6.0 MB/sec    1.00      2.8±0.04ms     6.1 MB/sec
linter/all-rules/numpy/globals.py          1.01    391.1±0.65µs     7.5 MB/sec    1.00    387.4±0.87µs     7.6 MB/sec
linter/all-rules/pydantic/types.py         1.00      5.3±0.04ms     4.8 MB/sec    1.00      5.3±0.01ms     4.8 MB/sec
linter/default-rules/large/dataset.py      1.00      5.4±0.01ms     7.5 MB/sec    1.01      5.4±0.01ms     7.5 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00   1211.5±7.77µs    13.7 MB/sec    1.00   1208.9±1.38µs    13.8 MB/sec
linter/default-rules/numpy/globals.py      1.01    139.3±3.56µs    21.2 MB/sec    1.00    138.5±0.57µs    21.3 MB/sec
linter/default-rules/pydantic/types.py     1.00      2.5±0.02ms    10.3 MB/sec    1.00      2.5±0.01ms    10.3 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      3.7±0.05ms    11.1 MB/sec    1.03      3.8±0.19ms    10.8 MB/sec
formatter/numpy/ctypeslib.py               1.00   734.6±19.75µs    22.7 MB/sec    1.00   734.6±11.14µs    22.7 MB/sec
formatter/numpy/globals.py                 1.00     75.7±1.11µs    39.0 MB/sec    1.01     76.4±3.34µs    38.6 MB/sec
formatter/pydantic/types.py                1.00  1523.7±39.36µs    16.7 MB/sec    1.00  1521.9±23.39µs    16.8 MB/sec
linter/all-rules/large/dataset.py          1.00     13.1±0.64ms     3.1 MB/sec    1.02     13.4±0.37ms     3.0 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.7±0.25ms     4.5 MB/sec    1.00      3.7±0.13ms     4.5 MB/sec
linter/all-rules/numpy/globals.py          1.02   398.5±36.73µs     7.4 MB/sec    1.00   389.5±11.80µs     7.6 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.9±0.27ms     3.7 MB/sec    1.03      7.1±0.29ms     3.6 MB/sec
linter/default-rules/large/dataset.py      1.02      7.4±0.18ms     5.5 MB/sec    1.00      7.2±0.19ms     5.6 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.02  1530.8±45.17µs    10.9 MB/sec    1.00  1508.1±34.97µs    11.0 MB/sec
linter/default-rules/numpy/globals.py      1.00    152.2±2.96µs    19.4 MB/sec    1.04    157.9±4.91µs    18.7 MB/sec
linter/default-rules/pydantic/types.py     1.01      3.3±0.15ms     7.6 MB/sec    1.00      3.3±0.11ms     7.7 MB/sec

Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm playing around with this a bit, and I'm starting to wonder if attempting to parse these format strings (when nested replacements are present) is sound.

Consider FormatSpec::parse("{a}30"). In this case, 30 could either be the width or the precision depending on whether a is > or .. So assigning it to the width (as we do now) is incorrect.

Similarly, for FormatSpec::parse("{a}s"), this could either resolve to !s or :s depending on the value of a, and so treating s as either the format type or the conversion would be wrong.

In general, I just don't know that we can soundly analyze these when the contents are dynamic, which leads me to think we should use a different representation for them and avoid raising errors for them... What do you think?

@MichaReiser
Copy link
Member

@charliermarsh are you saying that the format string gets evaluated at runtime? Like, a can contain a partial format spec and the rest is a static string?

@charliermarsh
Copy link
Member

@MichaReiser - Yes that's correct.

@zanieb
Copy link
Member Author

zanieb commented Aug 23, 2023

I just don't know that we can soundly analyze these when the contents are dynamic, which leads me to think we should use a different representation for them and avoid raising errors for them

I do think that we have a possibility of false diagnostics with the current approach, but it seems better than giving up entirely. I don't think replacements are used to replace large parts of the format specification in practice and are typically used for values rather than clause identifiers.

I could be convinced to just go to my initial approach which was an entirely different data structure. If we're not going to bother writing any diagnostics though we might as well just add a FormatSpecError type and just give up when we see a replacement.

@tdulcet
Copy link

tdulcet commented Aug 24, 2023

(I am the original reporter of this bug in: #6442 (comment).)

Similarly, for FormatSpec::parse("{a}s"), this could either resolve to !s or :s depending on the value of a, and so treating s as either the format type or the conversion would be wrong.

Maybe I am missing something here, but it seems the colon or exclamation mark must be included statically in the format string, as otherwise it produces a ValueError:

>>> "{{a}s}".format("test", a=":")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Single '}' encountered in format string
>>> "{{a}s}".format("test", a="!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Single '}' encountered in format string

Adding the implicit zero does not make it work either:

>>> "{0{a}s}".format("test", a=":")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: unexpected '{' in field name
>>> "{0{a}s}".format("test", a="!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: unexpected '{' in field name

@charliermarsh
Copy link
Member

@tdulcet - Sorry, you're right, it seems that it doesn't behave that way in that case. The first example does hold though (of width vs. precision), in my retesting.

@zanieb zanieb closed this Aug 24, 2023
zanieb added a commit that referenced this pull request Aug 25, 2023
…re present (#6858)

Closes #6767
Replaces #6773 (this cherry-picks
some parts from there)
Alternative to the approach introduced in #6616 which added support for
placeholders in format specifications while retaining parsing of other
format specification parts.

The idea is that if there are placeholders in a format specification we
will not attempt to glean semantic meaning from the other parts of the
format specification we'll just extract all of the placeholders ignoring
other characters. The dynamic content of placeholders can drastically
change the meaning of the format specification in ways unknowable by
static analysis. This change prevents false analysis and will ensure
safety if we build other rules on top of this at the cost of missing
detection of some bad specifications.

Minor note: I've use "replacements" and "placeholders" interchangeably
but am trying to go with "placeholder" as I think it's a better term for
the static analysis concept here
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spurious PLE1300 error with 0.0.285
4 participants