Skip to content

Escape LIKE metacharacters #3013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Feb 17, 2018
Merged

Conversation

greg0ire
Copy link
Member

@greg0ire greg0ire commented Feb 5, 2018

There seems to be a lack of tools to fight against wildcard attacks, or to simply provide a way to let your users search for strings that include metacharacters

todo

  • add a functional test to make sure valid SQL expressions are built by means of the new API.
  • allow specifying an escape character in the expression builder like and notLike methods.
  • implement getWildcardCharacters for RDBMSs that have more than the default wildcard characters.

@greg0ire greg0ire changed the title Escape LIKE metacharacters WIP: Escape LIKE metacharacters Feb 5, 2018
$replacePairs[$metaCharacter] = $escapeChar.$metaChar;
}

return strtr($untrusted, $replacePairs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$untrustedString ?

}

abstract protected function getLikeEscapeChar();
abstract protected function getLikeMetaCharacters();
Copy link
Contributor

@Taluu Taluu Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: string and : iterable ?

@greg0ire greg0ire force-pushed the escape_like_literals branch from dd1b35a to 8ed7349 Compare February 5, 2018 21:00
@greg0ire
Copy link
Member Author

greg0ire commented Feb 5, 2018

Please tell me if you're fine with the feature, the naming, and I should add implementations, docs and tests.

@morozov
Copy link
Member

morozov commented Feb 5, 2018

@greg0ire I have a feeling that in most supported DB platforms, escaping wildcard characters works almost the same.

Instead of introducing new abstract methods which is a breaking change, can you introduce default implementations for them?

*/
public function escapeStringForLike(string $untrustedString): string
{
$replacePairs = [];
Copy link
Member

@morozov morozov Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The escape character itself should also be always escaped. Does this approach account for getLikeMetaCharacters() always returning getLikeEscapeChar() or escapeStringForLike() should handle that itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best is to let escapeStringForLike handle it, and I could rename "meta" to "wildcard", what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on both.

* Escapes metacharacters in a string intended to be used with a LIKE
* operator.
*/
public function escapeStringForLike(string $untrustedString): string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not about trust. The input is an arbitrary string with no meaning while the output is a LIKE expression.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could indeed come from a trusted source and still need to be escaped because you genuinely want to search for that 👍

@greg0ire
Copy link
Member Author

greg0ire commented Feb 5, 2018

"Funny" thing: In MS SQL, the escape character has to be picked by the user and has no default: https://docs.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql
I think this means any char would do.

@morozov
Copy link
Member

morozov commented Feb 5, 2018

Funny thing: In MS SQL, the escape character has to be picked by the user and has no default: https://docs.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql

In my personal experience, it defaults to ! same as in other databases.

-- Syntax for SQL Server and Azure SQL Database

match_expression [ NOT ] LIKE pattern [ ESCAPE escape_character ]

Why is then the ESCAPE part optional?

@greg0ire
Copy link
Member Author

greg0ire commented Feb 5, 2018

Why is then the ESCAPE part optional?

Probably because you don't always need to escape characters

@greg0ire
Copy link
Member Author

greg0ire commented Feb 5, 2018

For PostgreSQL and MySQL, you can specify an escape char too, but it defaults to backslash

@morozov
Copy link
Member

morozov commented Feb 5, 2018

Also IIRC, in IBM DB2 there's also a different default escape character. Probably, we need to gather escaping examples for all DBs and then see if the ESCAPE part can be avoided by using the default escape character for the current platform. Otherwise, it may turn out that just escaping the string may be insufficient and the API will need to look somehow differently.

@greg0ire
Copy link
Member Author

greg0ire commented Feb 5, 2018

MySQL sounds like a lot of fun:

 Because MySQL uses C escape syntax in strings (for example, \n to represent a newline character), you must double any \ that you use in LIKE strings. For example, to search for \n, specify it as \\n. To search for \, specify it as \\\\; this is because the backslashes are stripped once by the parser and again when the pattern match is made, leaving a single backslash to be matched against.

Exception: At the end of the pattern string, backslash can be specified as \\. At the end of the string, backslash stands for itself because there is nothing following to escape. Suppose that a table contains the following values: 

@greg0ire
Copy link
Member Author

greg0ire commented Feb 5, 2018

Also IIRC, in IBM DB2 there's also a different default escape character. Probably, we need to gather escaping examples for all DBs and then see if the ESCAPE part can be avoided by using the default escape character for the current platform. Otherwise, it may turn out that just escaping the string may be insufficient and the API will need to look somehow differently.

Well the only one which seems problematic to me is MS SQL, because it seems you have to append ESCAPE somechar and escape with that char to get it to work. A solution could be to pick some random char and always use ESCAPE no matter what but that would imply all systems have that escape char specification capability.

@greg0ire greg0ire force-pushed the escape_like_literals branch from 8ed7349 to ad8435c Compare February 5, 2018 22:12
@greg0ire
Copy link
Member Author

greg0ire commented Feb 5, 2018

Ok so let's start the gathering

Platform Escape sequence
PostgreSQL \
MySQL \
SQLite \
MS SQL none
DB2 none
Oracle none
SQL Anywhere none

@greg0ire greg0ire force-pushed the escape_like_literals branch from ad8435c to e3f6433 Compare February 5, 2018 23:12
@greg0ire
Copy link
Member Author

greg0ire commented Feb 5, 2018

the API will need to look somehow differently.

That's right, I think an optional $escapeChar argument could be added to escapeStringForLike, and let the user be responsible for specifying it again when provided whatever they want to the like() method of the expression builder.

@morozov
Copy link
Member

morozov commented Feb 6, 2018

That's right, I think an optional $escapeChar argument could be added to escapeStringForLike

It should be either mandatory or not configurable at all and accessible in a read-only mode (e.g. const ESCAPE_CHAR = '!').

If mandatory, the caller will always manually pass the character of choice to the escaping function and add it to ESCAPE. If read-only, the caller will always take its value and also add it to ESCAPE.

Otherwise, the caller doesn't know if it's needed for a given string and platform and what to pass to ESCAPE and if it's needed at all.

@greg0ire
Copy link
Member Author

greg0ire commented Feb 6, 2018

I think the mandatory parameter might be best: it offers the most control, and it acts as a better reminder since people not knowing what to specify will read the phpdoc, where I can document that they need to also add ESCAPE

@greg0ire greg0ire force-pushed the escape_like_literals branch from 539896f to c9cf880 Compare February 6, 2018 19:20
final public function escapeStringForLike(string $inputString, string $escapeChar) : string
{
/* Some RDBMS' can deal with multibyte characters, but let us assume
* that kind of will not actually be needed */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tell me if I should allow them instead, or even remove that check and let the RDBMS deal with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say the DB should decide what's valid and what's not. Even if it's known that it should be one character for all supported platforms, there may exist other platform-specific limitations which we'll not be able to cover. Let's not even try.

@greg0ire greg0ire force-pushed the escape_like_literals branch 3 times, most recently from d6a3807 to 47b0a67 Compare February 6, 2018 19:52
@morozov
Copy link
Member

morozov commented Feb 6, 2018

I'd also add a functional test to make sure valid SQL expressions are built by means of the new API.

public function testItEscapesStringsForLikeProperly() : void
{
self::assertSame(
'25\% off \\\\o/',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the underscore here as well?

@@ -1468,4 +1468,12 @@ public function getGeneratesFloatDeclarationSQL()
array(array('precision' => 8, 'scale' => 2), 'DOUBLE PRECISION'),
);
}

public function testItEscapesStringsForLikeProperly() : void
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "properly" part seems redundant. It's implied that the test will only pass if the API is implemented properly.

* operator.
*
* @param string $inputString a literal, unquoted string
* @param string $escapeChar should be exactly one character long and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not enforced by the API, I'd omit the "should be exactly one character long" part. It's still a DB abstraction layer where all DB limitations are implied but don't necessarily have to be called out. The rest is valid.

@greg0ire greg0ire force-pushed the escape_like_literals branch from 47b0a67 to 85815f6 Compare February 6, 2018 21:27
@morozov
Copy link
Member

morozov commented Feb 6, 2018

Not sure if it's outside of the scope but maybe the QueryBuilderExpressionBuilder should be able to build the [NOT] LIKE ? ESCAPE ? expression to encapsulate the SQL syntax. In that case, the client-side code might look like:

$pattern = $platform->escapeStringForLike($input, '!') . '%';
$builder->like($pattern, '!');

The builder, in turn, may or may not append the ESCAPE part the expression depending on whether the escape character or wildcard characters are found in the pattern (for sake of compactness).

Looks a bit complicated but I can't see a simpler approach.

@greg0ire greg0ire force-pushed the escape_like_literals branch from f8bba7f to a9e5e5a Compare February 15, 2018 23:19
@greg0ire
Copy link
Member Author

@Majkl578 @morozov I switched to preg + implode, please review again, and agree on something :P

@Majkl578
Copy link
Contributor

Majkl578 commented Feb 17, 2018

I don't really care which way it is, but array doesn't give any guarantees about chars or even string[] rather than arbitrary mixed[]. I can technically do:

protected function getLikeWildcardCharacters() : array
{
    return [$this];
}

And there will be lots of dragons. 🐲

That's why I proposed using simple string instead of array.

@morozov
Copy link
Member

morozov commented Feb 17, 2018

Agree, let’s revert to the string. It makes more sense with the regexp approach.

@greg0ire
Copy link
Member Author

Agreement! Great, let's do this!

public function testNotLikeWithEscape()
{
self::assertEquals(
"p.description NOT LIKE '20💩%' ESCAPE '💩'",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classy 💩

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IKR?

Copy link
Member

@Ocramius Ocramius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@Ocramius
Copy link
Member

Agree, let’s revert to the string. It makes more sense with the regexp approach.

Not needed: string[] is a sufficient documentation for subclasses/implementors. In addition to that, there is no guarantee that the wildcard is a single char (although I do really hope that's the case everywhere :-P)

@greg0ire greg0ire force-pushed the escape_like_literals branch from 493310f to ddda207 Compare February 17, 2018 21:13
@Ocramius
Copy link
Member

Oh well, I see it's done anyway. Waiting for CI, then 🚢

@greg0ire greg0ire force-pushed the escape_like_literals branch from ddda207 to fddd873 Compare February 17, 2018 21:23
A string is made of characters, an array might contain anything.
@greg0ire greg0ire force-pushed the escape_like_literals branch from fddd873 to f397fe0 Compare February 17, 2018 21:39
@greg0ire
Copy link
Member Author

Ready to 🚢

@Ocramius Ocramius merged commit ea7657f into doctrine:master Feb 17, 2018
@Ocramius
Copy link
Member

👍

@greg0ire greg0ire deleted the escape_like_literals branch February 17, 2018 23:43
morozov added a commit to morozov/dbal that referenced this pull request Apr 14, 2018
…mySelectSQL()

For testing purposes and not only, it may be needed to evaluate an arbitrary SQL expression (see doctrine#3013, doctrine#3108). Instead of doing `str_replace()` on an expected string, one could specify the expression explicitly.
morozov added a commit to morozov/dbal that referenced this pull request Apr 29, 2018
…mySelectSQL()

For testing purposes and not only, it may be needed to evaluate an arbitrary SQL expression (see doctrine#3013, doctrine#3108). Instead of doing `str_replace()` on an expected string, one could specify the expression explicitly.
morozov added a commit to Majkl578/doctrine-dbal that referenced this pull request May 6, 2018
…mySelectSQL()

For testing purposes and not only, it may be needed to evaluate an arbitrary SQL expression (see doctrine#3013, doctrine#3108). Instead of doing `str_replace()` on an expected string, one could specify the expression explicitly.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants