-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Escape LIKE metacharacters #3013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
$replacePairs[$metaCharacter] = $escapeChar.$metaChar; | ||
} | ||
|
||
return strtr($untrusted, $replacePairs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$untrustedString
?
} | ||
|
||
abstract protected function getLikeEscapeChar(); | ||
abstract protected function getLikeMetaCharacters(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: string
and : iterable
?
dd1b35a
to
8ed7349
Compare
Please tell me if you're fine with the feature, the naming, and I should add implementations, docs and tests. |
@greg0ire I have a feeling that in most supported DB platforms, escaping wildcard characters works almost the same. Instead of introducing new |
*/ | ||
public function escapeStringForLike(string $untrustedString): string | ||
{ | ||
$replacePairs = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The escape character itself should also be always escaped. Does this approach account for getLikeMetaCharacters()
always returning getLikeEscapeChar()
or escapeStringForLike()
should handle that itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best is to let escapeStringForLike
handle it, and I could rename "meta" to "wildcard", what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on both.
* Escapes metacharacters in a string intended to be used with a LIKE | ||
* operator. | ||
*/ | ||
public function escapeStringForLike(string $untrustedString): string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not about trust. The input is an arbitrary string with no meaning while the output is a LIKE expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could indeed come from a trusted source and still need to be escaped because you genuinely want to search for that 👍
"Funny" thing: In MS SQL, the escape character has to be picked by the user and has no default: https://docs.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql |
In my personal experience, it defaults to
Why is then the |
Probably because you don't always need to escape characters |
For PostgreSQL and MySQL, you can specify an escape char too, but it defaults to backslash |
Also IIRC, in IBM DB2 there's also a different default escape character. Probably, we need to gather escaping examples for all DBs and then see if the |
MySQL sounds like a lot of fun:
|
Well the only one which seems problematic to me is MS SQL, because it seems you have to append |
8ed7349
to
ad8435c
Compare
Ok so let's start the gathering
|
ad8435c
to
e3f6433
Compare
That's right, I think an optional |
It should be either mandatory or not configurable at all and accessible in a read-only mode (e.g. If mandatory, the caller will always manually pass the character of choice to the escaping function and add it to Otherwise, the caller doesn't know if it's needed for a given string and platform and what to pass to |
I think the mandatory parameter might be best: it offers the most control, and it acts as a better reminder since people not knowing what to specify will read the phpdoc, where I can document that they need to also add |
539896f
to
c9cf880
Compare
final public function escapeStringForLike(string $inputString, string $escapeChar) : string | ||
{ | ||
/* Some RDBMS' can deal with multibyte characters, but let us assume | ||
* that kind of will not actually be needed */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tell me if I should allow them instead, or even remove that check and let the RDBMS deal with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say the DB should decide what's valid and what's not. Even if it's known that it should be one character for all supported platforms, there may exist other platform-specific limitations which we'll not be able to cover. Let's not even try.
d6a3807
to
47b0a67
Compare
I'd also add a functional test to make sure valid SQL expressions are built by means of the new API. |
public function testItEscapesStringsForLikeProperly() : void | ||
{ | ||
self::assertSame( | ||
'25\% off \\\\o/', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the underscore here as well?
@@ -1468,4 +1468,12 @@ public function getGeneratesFloatDeclarationSQL() | |||
array(array('precision' => 8, 'scale' => 2), 'DOUBLE PRECISION'), | |||
); | |||
} | |||
|
|||
public function testItEscapesStringsForLikeProperly() : void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "properly" part seems redundant. It's implied that the test will only pass if the API is implemented properly.
* operator. | ||
* | ||
* @param string $inputString a literal, unquoted string | ||
* @param string $escapeChar should be exactly one character long and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not enforced by the API, I'd omit the "should be exactly one character long" part. It's still a DB abstraction layer where all DB limitations are implied but don't necessarily have to be called out. The rest is valid.
47b0a67
to
85815f6
Compare
Not sure if it's outside of the scope but maybe the $pattern = $platform->escapeStringForLike($input, '!') . '%';
$builder->like($pattern, '!'); The builder, in turn, may or may not append the Looks a bit complicated but I can't see a simpler approach. |
f8bba7f
to
a9e5e5a
Compare
I don't really care which way it is, but
And there will be lots of dragons. 🐲 That's why I proposed using simple string instead of array. |
Agree, let’s revert to the string. It makes more sense with the regexp approach. |
Agreement! Great, let's do this! |
public function testNotLikeWithEscape() | ||
{ | ||
self::assertEquals( | ||
"p.description NOT LIKE '20💩%' ESCAPE '💩'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Classy 💩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IKR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Not needed: |
493310f
to
ddda207
Compare
Oh well, I see it's done anyway. Waiting for CI, then 🚢 |
ddda207
to
fddd873
Compare
A string is made of characters, an array might contain anything.
fddd873
to
f397fe0
Compare
Ready to 🚢 |
👍 |
…mySelectSQL() For testing purposes and not only, it may be needed to evaluate an arbitrary SQL expression (see doctrine#3013, doctrine#3108). Instead of doing `str_replace()` on an expected string, one could specify the expression explicitly.
…mySelectSQL() For testing purposes and not only, it may be needed to evaluate an arbitrary SQL expression (see doctrine#3013, doctrine#3108). Instead of doing `str_replace()` on an expected string, one could specify the expression explicitly.
…mySelectSQL() For testing purposes and not only, it may be needed to evaluate an arbitrary SQL expression (see doctrine#3013, doctrine#3108). Instead of doing `str_replace()` on an expected string, one could specify the expression explicitly.
There seems to be a lack of tools to fight against wildcard attacks, or to simply provide a way to let your users search for strings that include metacharacters
todo
like
andnotLike
methods.getWildcardCharacters
for RDBMSs that have more than the default wildcard characters.