Fix `bad-string-format-character` when a placeholder occurs within a format spec part #6773

zanieb · 2023-08-22T15:15:20Z

The presented false positive "{0:.{prec}g}".format(1.23, prec=15) occurs because we treated the entire format specification of .{prec}g as the format type (which is typically at the end of a format specification). The nested replacement was not detected because a leading { was not seen. Instead, . remained the leading character — it's the leading character for a precision clause but was not consumed since the replacement was not a valid number

One way to resolve this is to update parsing of every clause in a format specification to be replacement aware e.g. precision parsing would accept a replacement as well as a number. I determined that this is too complicated and provides little value for our current rules around format specifications.

Instead, after we've parsed most of the format specification parts, we will check for any remaining replacements (i.e. the presence of { anywhere in the remaining string) and consume them and any intermediate characters. This matches our existing intention of basically ignoring any replacements while retaining as much of the normal format specification linting as we can.

zanieb · 2023-08-22T15:24:42Z

crates/ruff_python_literal/src/format.rs

+/// ```
+/// "x{foo}y{bar}z"
+/// ```
+fn consume_all_placeholders<'a>(


The following implementation is a little clunky. I'm guessing there's a more idiomatic way to do this?

github-actions · 2023-08-22T15:37:23Z

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.01      3.4±0.01ms    11.9 MB/sec    1.00      3.4±0.02ms    12.0 MB/sec
formatter/numpy/ctypeslib.py               1.00    709.0±2.82µs    23.5 MB/sec    1.00    712.1±2.61µs    23.4 MB/sec
formatter/numpy/globals.py                 1.00     74.0±0.34µs    39.9 MB/sec    1.00     74.2±0.37µs    39.8 MB/sec
formatter/pydantic/types.py                1.00  1384.4±15.60µs    18.4 MB/sec    1.02  1411.7±21.69µs    18.1 MB/sec
linter/all-rules/large/dataset.py          1.00     10.2±0.06ms     4.0 MB/sec    1.00     10.2±0.06ms     4.0 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.01      2.8±0.05ms     6.0 MB/sec    1.00      2.8±0.04ms     6.1 MB/sec
linter/all-rules/numpy/globals.py          1.01    391.1±0.65µs     7.5 MB/sec    1.00    387.4±0.87µs     7.6 MB/sec
linter/all-rules/pydantic/types.py         1.00      5.3±0.04ms     4.8 MB/sec    1.00      5.3±0.01ms     4.8 MB/sec
linter/default-rules/large/dataset.py      1.00      5.4±0.01ms     7.5 MB/sec    1.01      5.4±0.01ms     7.5 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00   1211.5±7.77µs    13.7 MB/sec    1.00   1208.9±1.38µs    13.8 MB/sec
linter/default-rules/numpy/globals.py      1.01    139.3±3.56µs    21.2 MB/sec    1.00    138.5±0.57µs    21.3 MB/sec
linter/default-rules/pydantic/types.py     1.00      2.5±0.02ms    10.3 MB/sec    1.00      2.5±0.01ms    10.3 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      3.7±0.05ms    11.1 MB/sec    1.03      3.8±0.19ms    10.8 MB/sec
formatter/numpy/ctypeslib.py               1.00   734.6±19.75µs    22.7 MB/sec    1.00   734.6±11.14µs    22.7 MB/sec
formatter/numpy/globals.py                 1.00     75.7±1.11µs    39.0 MB/sec    1.01     76.4±3.34µs    38.6 MB/sec
formatter/pydantic/types.py                1.00  1523.7±39.36µs    16.7 MB/sec    1.00  1521.9±23.39µs    16.8 MB/sec
linter/all-rules/large/dataset.py          1.00     13.1±0.64ms     3.1 MB/sec    1.02     13.4±0.37ms     3.0 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.7±0.25ms     4.5 MB/sec    1.00      3.7±0.13ms     4.5 MB/sec
linter/all-rules/numpy/globals.py          1.02   398.5±36.73µs     7.4 MB/sec    1.00   389.5±11.80µs     7.6 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.9±0.27ms     3.7 MB/sec    1.03      7.1±0.29ms     3.6 MB/sec
linter/default-rules/large/dataset.py      1.02      7.4±0.18ms     5.5 MB/sec    1.00      7.2±0.19ms     5.6 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.02  1530.8±45.17µs    10.9 MB/sec    1.00  1508.1±34.97µs    11.0 MB/sec
linter/default-rules/numpy/globals.py      1.00    152.2±2.96µs    19.4 MB/sec    1.04    157.9±4.91µs    18.7 MB/sec
linter/default-rules/pydantic/types.py     1.01      3.3±0.15ms     7.6 MB/sec    1.00      3.3±0.11ms     7.7 MB/sec

charliermarsh

So I'm playing around with this a bit, and I'm starting to wonder if attempting to parse these format strings (when nested replacements are present) is sound.

Consider FormatSpec::parse("{a}30"). In this case, 30 could either be the width or the precision depending on whether a is > or .. So assigning it to the width (as we do now) is incorrect.

Similarly, for FormatSpec::parse("{a}s"), this could either resolve to !s or :s depending on the value of a, and so treating s as either the format type or the conversion would be wrong.

In general, I just don't know that we can soundly analyze these when the contents are dynamic, which leads me to think we should use a different representation for them and avoid raising errors for them... What do you think?

MichaReiser · 2023-08-23T06:34:17Z

@charliermarsh are you saying that the format string gets evaluated at runtime? Like, a can contain a partial format spec and the rest is a static string?

charliermarsh · 2023-08-23T13:27:52Z

@MichaReiser - Yes that's correct.

zanieb · 2023-08-23T13:45:25Z

I just don't know that we can soundly analyze these when the contents are dynamic, which leads me to think we should use a different representation for them and avoid raising errors for them

I do think that we have a possibility of false diagnostics with the current approach, but it seems better than giving up entirely. I don't think replacements are used to replace large parts of the format specification in practice and are typically used for values rather than clause identifiers.

I could be convinced to just go to my initial approach which was an entirely different data structure. If we're not going to bother writing any diagnostics though we might as well just add a FormatSpecError type and just give up when we see a replacement.

tdulcet · 2023-08-24T09:07:16Z

(I am the original reporter of this bug in: #6442 (comment).)

Similarly, for FormatSpec::parse("{a}s"), this could either resolve to !s or :s depending on the value of a, and so treating s as either the format type or the conversion would be wrong.

Maybe I am missing something here, but it seems the colon or exclamation mark must be included statically in the format string, as otherwise it produces a ValueError:

>>> "{{a}s}".format("test", a=":")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Single '}' encountered in format string
>>> "{{a}s}".format("test", a="!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Single '}' encountered in format string

Adding the implicit zero does not make it work either:

>>> "{0{a}s}".format("test", a=":")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: unexpected '{' in field name
>>> "{0{a}s}".format("test", a="!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: unexpected '{' in field name

charliermarsh · 2023-08-24T13:29:29Z

@tdulcet - Sorry, you're right, it seems that it doesn't behave that way in that case. The first example does hold though (of width vs. precision), in my retesting.

…re present (#6858) Closes #6767 Replaces #6773 (this cherry-picks some parts from there) Alternative to the approach introduced in #6616 which added support for placeholders in format specifications while retaining parsing of other format specification parts. The idea is that if there are placeholders in a format specification we will not attempt to glean semantic meaning from the other parts of the format specification we'll just extract all of the placeholders ignoring other characters. The dynamic content of placeholders can drastically change the meaning of the format specification in ways unknowable by static analysis. This change prevents false analysis and will ensure safety if we build other rules on top of this at the cost of missing detection of some bad specifications. Minor note: I've use "replacements" and "placeholders" interchangeably but am trying to go with "placeholder" as I think it's a better term for the static analysis concept here

zanieb added 4 commits August 22, 2023 09:36

Clean up implementation

d272874

Add failing test case for bad-string-format-character

0e32842

Consume all remaining placeholders at end of format spec parsing

5df9ba7

Add more test cases; check format types that occur after placeholders

268be0c

zanieb added the bug Something isn't working label Aug 22, 2023

zanieb added 2 commits August 22, 2023 10:22

Restore unrelated snapshot

6d88ac4

Simplify comment

d2b0b1c

zanieb commented Aug 22, 2023

View reviewed changes

charliermarsh reviewed Aug 23, 2023

View reviewed changes

zanieb mentioned this pull request Aug 24, 2023

Avoid parsing other parts of a format specification if replacements are present #6858

Merged

zanieb closed this Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `bad-string-format-character` when a placeholder occurs within a format spec part #6773

Fix `bad-string-format-character` when a placeholder occurs within a format spec part #6773

Uh oh!

zanieb commented Aug 22, 2023 •

edited

Loading

Uh oh!

zanieb Aug 22, 2023

Uh oh!

github-actions bot commented Aug 22, 2023 •

edited

Loading

Uh oh!

charliermarsh left a comment

Uh oh!

MichaReiser commented Aug 23, 2023

Uh oh!

charliermarsh commented Aug 23, 2023

Uh oh!

zanieb commented Aug 23, 2023

Uh oh!

tdulcet commented Aug 24, 2023

Uh oh!

charliermarsh commented Aug 24, 2023

Uh oh!

Uh oh!

Fix bad-string-format-character when a placeholder occurs within a format spec part #6773

Fix bad-string-format-character when a placeholder occurs within a format spec part #6773

Uh oh!

Conversation

zanieb commented Aug 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zanieb Aug 22, 2023

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Check Results

Ecosystem

Benchmark

Linux

Windows

Uh oh!

charliermarsh left a comment

Choose a reason for hiding this comment

Uh oh!

MichaReiser commented Aug 23, 2023

Uh oh!

charliermarsh commented Aug 23, 2023

Uh oh!

zanieb commented Aug 23, 2023

Uh oh!

tdulcet commented Aug 24, 2023

Uh oh!

charliermarsh commented Aug 24, 2023

Uh oh!

Uh oh!

Fix `bad-string-format-character` when a placeholder occurs within a format spec part #6773

Fix `bad-string-format-character` when a placeholder occurs within a format spec part #6773

zanieb commented Aug 22, 2023 •

edited

Loading

github-actions bot commented Aug 22, 2023 •

edited

Loading