Skip to content

SQLite + upper() or lower() does not work as expected #151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
socketpair opened this issue Oct 19, 2015 · 4 comments
Closed

SQLite + upper() or lower() does not work as expected #151

socketpair opened this issue Oct 19, 2015 · 4 comments
Labels
Milestone

Comments

@socketpair
Copy link

Since SQLite does not support Unicode operations, upper() and lower() SQL functions does not work for national symbols.

i.e. select(upper(r.field) for r in table) will FAIL if SQLite driver is used and field contains non-ASCII symbols (only ASCII symbols will be converted to upper case)

https://www.sqlite.org/lang_corefunc.html#upper :
upper(X) The upper(X) function returns a copy of input string X in which all lower-case ASCII characters are converted to their upper-case equivalent.

http://www.sqlite.org/faq.html#q18 case (18) about collation and sorting (!)

So, my decision is:

  1. Disallow Unicode operations in queries (on SQLite driver) by default. i.e. raise NotImplementedError exception.
  2. Add SLOW work-around as described in point 18 of faq, and enable it only by special PONY flag. There are two implementation options: use native python unicode functions, or libicu functions directly (https://pypi.python.org/pypi/PyICU ?)

Next. Suppose we want to select somefun(unicode_string) order by 1. So, SQLite engine should call collation function many times to implement sorting. Each call mean full UTF-8 decode (since SQLite store strings in UTF-8 format). Yes, we can (should?) implement partial decoding (this is slightly complex).

Also, we should discover if SQL index by string field will be (1) correct and (2) effective if custom collation will be used.

@kozlovsky
Copy link
Member

Now Pony registers two additional unicode-aware functions in SQLite: py_upper(s) and py_lower(s), and uses these functions instead of the standard upper and lower funcitons:

>>> select(p.id for p in Person if p.name.upper() == 'John')[:]

SQLite query:

SELECT "p"."id"
FROM "Person" "p"
WHERE py_upper("p"."name") = 'John'

For other databases Pony still uses standard upper and lower functions.

If you want to sort query by a column in a case-insensitive way, you can manually call .upper():

>>> select(p.id for p in Person).order_by(lambda: p.name.upper())[:]

SQLite query:

SELECT "p"."id"
FROM "Person" "p"
ORDER BY py_upper("p"."name")

@socketpair
Copy link
Author

Есть для SQLite ещё способ - Cython + реализация на Си, а также регистрация функции из Си.

Вообще, думаю, стоит рассмотреть включение ускорялок на Си для поней. Опциональных. как в aiohttp или tornado.

kozlovsky added a commit that referenced this issue Jan 15, 2016
… also convert values to unicode if necessary
@kozlovsky
Copy link
Member

I definitely think that we need to add C speedup module, but probably not right now, because it may slow down the development - we have many important tasks regarding migration support, integration with JavaScript frameworks, and implementing GraphQL backend. I think that some time later we can hire a dedicated developer who'll write such a C speedup module.

@socketpair
Copy link
Author

Добавляйте issue - может я и законтрибучу что-нибудь

@kozlovsky kozlovsky added the bug label Feb 4, 2016
@kozlovsky kozlovsky added this to the 0.6.3 milestone Feb 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants