SQLite + upper() or lower() does not work as expected #151

socketpair · 2015-10-19T18:22:44Z

Since SQLite does not support Unicode operations, upper() and lower() SQL functions does not work for national symbols.

i.e. select(upper(r.field) for r in table) will FAIL if SQLite driver is used and field contains non-ASCII symbols (only ASCII symbols will be converted to upper case)

https://www.sqlite.org/lang_corefunc.html#upper :
upper(X) The upper(X) function returns a copy of input string X in which all lower-case ASCII characters are converted to their upper-case equivalent.

http://www.sqlite.org/faq.html#q18 case (18) about collation and sorting (!)

So, my decision is:

Disallow Unicode operations in queries (on SQLite driver) by default. i.e. raise NotImplementedError exception.
Add SLOW work-around as described in point 18 of faq, and enable it only by special PONY flag. There are two implementation options: use native python unicode functions, or libicu functions directly (https://pypi.python.org/pypi/PyICU ?)

Next. Suppose we want to select somefun(unicode_string) order by 1. So, SQLite engine should call collation function many times to implement sorting. Each call mean full UTF-8 decode (since SQLite store strings in UTF-8 format). Yes, we can (should?) implement partial decoding (this is slightly complex).

Also, we should discover if SQL index by string field will be (1) correct and (2) effective if custom collation will be used.

The text was updated successfully, but these errors were encountered:

kozlovsky · 2016-01-15T11:34:40Z

Now Pony registers two additional unicode-aware functions in SQLite: py_upper(s) and py_lower(s), and uses these functions instead of the standard upper and lower funcitons:

>>> select(p.id for p in Person if p.name.upper() == 'John')[:]

SQLite query:

SELECT "p"."id"
FROM "Person" "p"
WHERE py_upper("p"."name") = 'John'

For other databases Pony still uses standard upper and lower functions.

If you want to sort query by a column in a case-insensitive way, you can manually call .upper():

>>> select(p.id for p in Person).order_by(lambda: p.name.upper())[:]

SQLite query:

SELECT "p"."id"
FROM "Person" "p"
ORDER BY py_upper("p"."name")

socketpair · 2016-01-15T11:46:24Z

Есть для SQLite ещё способ - Cython + реализация на Си, а также регистрация функции из Си.

Вообще, думаю, стоит рассмотреть включение ускорялок на Си для поней. Опциональных. как в aiohttp или tornado.

… also convert values to unicode if necessary

kozlovsky · 2016-01-15T13:42:21Z

I definitely think that we need to add C speedup module, but probably not right now, because it may slow down the development - we have many important tasks regarding migration support, integration with JavaScript frameworks, and implementing GraphQL backend. I think that some time later we can hire a dedicated developer who'll write such a C speedup module.

socketpair · 2016-01-15T14:42:14Z

Добавляйте issue - может я и законтрибучу что-нибудь

kozlovsky closed this as completed in 3025f6e Jan 15, 2016

kozlovsky added a commit that referenced this issue Jan 15, 2016

Fixes #151: py_upper() & py_lower() should process None correctly and…

6b9409d

… also convert values to unicode if necessary

kozlovsky added the bug label Feb 4, 2016

kozlovsky added this to the 0.6.3 milestone Feb 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SQLite + upper() or lower() does not work as expected #151

SQLite + upper() or lower() does not work as expected #151

socketpair commented Oct 19, 2015

kozlovsky commented Jan 15, 2016

Uh oh!

socketpair commented Jan 15, 2016

Uh oh!

kozlovsky commented Jan 15, 2016

Uh oh!

socketpair commented Jan 15, 2016

Uh oh!

SQLite + upper() or lower() does not work as expected #151

SQLite + upper() or lower() does not work as expected #151

Comments

socketpair commented Oct 19, 2015

kozlovsky commented Jan 15, 2016

Uh oh!

socketpair commented Jan 15, 2016

Uh oh!

kozlovsky commented Jan 15, 2016

Uh oh!

socketpair commented Jan 15, 2016

Uh oh!