-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Unicode path for projects work again #3229
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing I really don't like here is the change to the thumbnail requests.Response
handling, I left a code comment explaining.
The rest, well, all those ensure_ascii=False
arguments to json.dumps()
make me concerned about the possible consequences, but maybe it is the right way to go. I'd have to see the effects of the change — and get an idea of the issues that the current code is causing — first. (Which I realize I can get at #3227, I just haven't gotten around to reading it yet.)
src/windows/models/files_model.py
Outdated
thumb_path = r.text | ||
thumb_path = r.content.decode("utf-8", "ignore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I like this change, I think it'll bite us.
requests.Response.text
is documented as "Content of the response, in unicode". Which is exactly what we want.
If there's an issue with the encoding of the Unicode, we have two choices:
- We can have the server end explicitly specify UTF-8 in a
content-type
header (which the client will receive and respect) - Or, on the client end, we can (as the Response documentation suggests) "set
r.encoding
appropriately before accessing this property."
Either option is much safer than accessing the raw bytes of the request and blindly force-decoding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want to modify API again...
I think that the p1. is here, (something like charset=utf-8
can be specified):
openshot-qt/src/classes/thumbnail.py
Line 156 in e247a9b
self.send_header('Content-type', 'text/html') |
As for the p2. - I see no much difference in this two:
response.encoding = 'utf-8'
response.text
and the decode()
code - it surely receives utf-8
:
openshot-qt/src/classes/thumbnail.py
Line 184 in e247a9b
self.wfile.write(bytes(thumb_path, "utf-8")) |
no need to see the header's charset.
And it can't be safer. It cannot use other technique. I just take example from the https://docs.python.org/3/howto/unicode.html and know the result, but I don't know how the response.text
works when ...the input string can’t be converted according to the encoding’s rules... . Edit: Of course, I cannot imaging how it can happen if it already was send in utf-8
... Broken message?
Fixes an issue when application fails to load files from the saved project with non-English characters in the path to the clips. Also removing json.loads strict=False key as obsolete in 3.x Python.
Fixes an issue that list view miniatures of the imported clips not rendered if path contains non-English letters. By defult the ".text" field in response uses ISO-8859-1 charset (RFC 2616 specs) while server sends utf-8 encoded path string just few strings below in the code.
Updated. API fix was implemented (instead of programming solution). |
It seems that fix from the main developer is available now: #3247 |
Any fix that works! 👍 Now the only question is, what about all of the mis-encoded project files our 2.5.0 users are sitting with? Will they be readable with OpenShot after this change? If (assuming) they aren't, is it worth putting together a sort of "project file fix-up" tool to correct their formatting? (...Hopefully it's feasible to do that programmatically?) I'd be willing to slap together a standalone Python command-line tool to correct the contents of mis-encoded files, something we could distribute to users (as well as include in the 2.5.1 release that's presumably imminent) — it's not as convenient as having OpenShot directly support reading the corrupted files, but it's simpler and probably safer to make it a separate, explicit thing. This week has me under uncharacteristically high real-world time pressures, unfortunately, so my availability is limited, but assuming the logic is relatively straightforward and well-understood, the few hours I have free tonight should be enough time to get that together... or at least get it into a good enough state where I can submit a branch and hand it off for completion. (Because if it's not done in the next 5-6 hours, then it'll be another 12-18 before I can return to it.) |
@ferdnyc Based on what I know, this only affects project files with certain unicode characters, and only version 2.5.0. I'm not sure how many people are really affected at this point, with broken projects. It could be dozens, or hundreds, but I don't imagine it's a huge amount. Now, with that being said, I would be happy to create a page on openshot.org, which could take an *.osp upload file, and pops out a "fixed" encoding version. If anyone wants to create that "Python", I can adopt it into the official website, and make a tool for people. 👍 |
Or... if want to integrate some "charset checking" code into openshot-qt, which could detetct a JSON parse failure, and then try and fix the JSON encoding at runtime. That would also be a good solution... especially if a large # of people are affected. |
Oh, no, it's gonna affect anyone with any non-ASCII characters in their pathnames — I just created this file hierarchy:
Both of the MP4 files were imported into the project. Here's what came out: $ egrep '\/u[[:xdigit:]]{4}' /var/tmp/Jöerç/Charsetᛀᛊᛇᛅ.osp
"path": "../J/u00f6er/u00e7/ForagerN/u00e5k/u0259d.mp4",
"image": "../J/u00f6er/u00e7/Charset/u16c0/u16ca/u16c7/u16c5_assets/thumbnail/0U3BZFTQMS.png"
"path": "../J/u00f6er/u00e7/ForagerN/u00f8tN/u00e6kd.mp4",
"image": "../J/u00f6er/u00e7/Charset/u16c0/u16ca/u16c7/u16c5_assets/thumbnail/K6QJISME3F.png"
"path": "../J/u00f6er/u00e7/ForagerN/u00e5k/u0259d.mp4",
"image": "../J/u00f6er/u00e7/Charset/u16c0/u16ca/u16c7/u16c5_assets/thumbnail/0U3BZFTQMS.png"
"path": "../J/u00f6er/u00e7/ForagerN/u00f8tN/u00e6kd.mp4",
"image": "../J/u00f6er/u00e7/Charset/u16c0/u16ca/u16c7/u16c5_assets/thumbnail/K6QJISME3F.png"
"path": "../J/u00f6er/u00e7/ForagerN/u00e5k/u0259d.mp4",
"path": "../J/u00f6er/u00e7/ForagerN/u00f8tN/u00e6kd.mp4",
"path": "../J/u00f6er/u00e7/ForagerN/u00e5k/u0259d.mp4",
"path": "../J/u00f6er/u00e7/ForagerN/u00e5k/u0259d.mp4",
"path": "../J/u00f6er/u00e7/ForagerN/u00e5k/u0259d.mp4",
"path": "../J/u00f6er/u00e7/ForagerN/u00f8tN/u00e6kd.mp4",
"path": "../J/u00f6er/u00e7/ForagerN/u00e5k/u0259d.mp4",
"path": "../J/u00f6er/u00e7/ForagerN/u00f8tN/u00e6kd.mp4", ...However, as my |
(Hrm. And I just noticed, from the above, that we appear to be storing paths relative to the parent directory of the project file location, for some reason.)
Edit: Correction: Relative to the project file location, but rooted in its parent dir. |
I can only add that some users has usernames that can cause problems too. Because the settings file of OpenShot uses same routines to write/read the The other way the issue may appear is the default names templates for localization (mentioned here: #3252 ) |
Fixes: #3227
export
andupdates
also usesjson.dumps()
, but it is slightly different thing. It needs more time to investigate if changes are required. In my tests it doesn't affect this issue itself.Let's see if this fix is woks for you too.