Description
Dear Steering Council,
to introduce myself, I have been involved in the Psycopg project (the de facto standard PostgreSQL driver for Python) since 2005, I have been the main maintainer of psycopg2 since 2010, and, in 2020, I designed and implemented Psycopg 3, in order to use new Python features (typing, async), making new choices based on the experience gathered in the previous 15 years.
As you can imagine, one of the main preoccupations of our project is safety: how to enable end users to craft any statement they need to execute on a database while guarding them from unsafe input. Therefore, we have always stressed using the best safety practices when dealing with untrusted user input.
I have recently come across the efforts of PEP 750 to provide a template string object in Python, and I found it extremely fitting for the project. A few days ago I have implemented a first version of template strings execution, and I must say that it is the most important and refreshing change I have seen in the language, positively affecting our project.
However, during a discussion about whether to include the implementation of a Template.join()
method (which would be very desirable in my opinion, but this is a different matter), it dawned on me that explicit string concatenation is a very dangerous feature, and it pretty much defeats entirely the safety that the PEP declares as being one of the design goals.
Because templates and strings can be concatenated without any safety check, it is very easy to include insecure input in a template. Taking the same example from the PEP's Motivation section:
evil = "<script>alert('evil')</script>"
template = t"<p>"+ evil + t"</p>"
assert html(template) == "<p><script>alert('evil')</script></p>" # Will fail
The hypothetical html()
function will receive a Template
object on which it cannot put any trust. The design of the Template object makes any insecure input string instantly secure.
Please note that this design goes pretty much in the opposite direction of the LiteralString
defined in PEP 675: concatenating a safe LiteralString
and an unsafe str
produces an unsafe str
. This is bizarre:
- with a
LiteralString
, anything safe, when it comes in contact with something unsafe, becomes unsafe. - with template strings, anything unsafe, when it comes in contact with something safe, becomes safe!
This, I am afraid, is not a well-thought-out design from the safety point of view.
LiteralString
, in our project, is so relevant that it is actually the only string accepted by the execute()
function: we define our Query
type as:
Query: TypeAlias = Union[LiteralString, bytes, sql.SQL, sql.Composed]
What we require is for a statement to be either a literal string or to have been produced by composition of the psycopg.sql
family of objects, which are designed to compose the different parts of a SQL statement employing the correct escaping method (and whose use would become largely marginal with a good template string solution). While there isn't widespread support for this feature yet in type checkers, this is our formal requirement for a query, and even if, every now and then, some user is confused by linter errors, it is no problem to explain our design.
I have looked for information about the rationale of the current design and I have been kindly provided some references. There were some discussions mixing explicit and implicit concatenation, with a final resolution stating:
Added full support for both explicit and implicit concatenation.
template+template
,template+str
, andstr+template
are all supported. Concatenation always results in aTemplate
. In the end, we decided the arguments in favor of allowing concatenation outweighed the potential disadvantages. We’ve updated the “rejected ideas” section of the PEP to describe this.
There are indeed explanations in the PEP stating:
In the end, we decided that the surprise to developers of a new string type not supporting concatenation was likely to be greater than the theoretical harm caused by supporting it. (Developers concatenate f-strings all the time, after all, and while we are sure there are cases where this introduces bugs, it’s not clear that those bugs outweigh the benefits of supporting concatenation.)
This statement is misguided. People can concatenate f-strings and normal strings without a problem because 1) they are the same type and 2) there is no safety semantics behind str
. The type of bug that can be caused by disallowing str + Template
is an immediate TypeError
; the type of bug that can be caused by allowing it is a safety bug.
Using the current design, accepting a template string in a query cannot be considered safe. We are back to the point of people being able to compose queries such as:
name = input()
cur.execute(t"INSERT INTO names VALUES ('" + name + "')")
and no runtime or static checker should have any problem with it.
This is less safe than LiteralString
or sql.SQL
objects, which require an active action from the user to allow a str
to be part of a statement, signifying that the author has taken their measures to prevent problems:
snip: str
cur.execute(sql.SQL("SELECT * FROM table WHERE ") + sql.SQL(snip)) # I know what I am doing
cur.execute("SELECT * FROM table WHERE " + cast(LiteralString, snip)) # I know what I am doing
cur.execute(t"SELECT * FROM table WHERE " + Template(snip)) # I know what I am doing
cur.execute(t"SELECT * FROM table WHERE " + snip) # This might be an error
This goes in the opposite direction of where we want to go, in terms of safety. Therefore, we cannot, in our conscience, allow the use of template strings as query input, and, despite the initial enthusiasm, we will prefer to not merge the feature.
I understand that the template string branch was merged to the Python 3.14 branch only yesterday; version 3.14a7 didn't include the feature, and 3.14b1 is due to be released in a few days, after which no change would be accepted. I believe we are still in time to fix this design.
Thank you very much.
-- Daniele