Skip to content

Prevent Assets Folder Name Collision #4159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 27, 2021
Merged

Conversation

JacksonRG
Copy link
Collaborator

@JacksonRG JacksonRG commented May 24, 2021

Issue #4158 pointed out that if project names were longer than 30 characters, they were truncated to 30 characters, and _assets was appended.

Solution

Aside from MacOS, 255 characters the typical limit for file or folder names. So to avoid, neighboring projects running into each other as much as possible, we now truncate only if the addition of '_assets' would push the folder name over 255.

Reverse Backwards Compatability.

As Frank pointed out, moving the folder to the new naming convention would cause trouble for anyone with long project files moving back to the stable build.

To solve this, we search for and copy the truncated folder if if a full length one doesn't exist. I believe this is the most seamless user experience, as renaming the folder, would mean the first project opened would take the folder that had been shared.

Testing

  1. Created two long named projects on develop
  2. Created titles in both with different titles
  3. Switched branches
  4. Opened each project, finding both had their own assets

@JacksonRG JacksonRG requested a review from jonoomph May 24, 2021 22:55
@jonoomph
Copy link
Member

Based on a quick google:

Folder name length limits per OS:

  • Windows 8/10: 255 to 260
  • Mac OS: 1024
  • Linux: 255

So, perhaps the 30 char truncate is not the solution, but I do think we need to truncate it at some reasonable length... such that Project_Name + "_assets" length < 255. But as long as we are truncating the folder name, it seems possible that we can still have a conflict. But perhaps less often. 👍

@jonoomph
Copy link
Member

Oh, and if we change this truncation (if that is a word), will this prevent different versions of OpenShot from finding the _assets folder?

asset_path = os.path.join(os.path.dirname(file_path), asset_folder_name)

# Previous Assets File Name Convention.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also tackle this with the same code:

for asset_folder_name in [asset_filename[:30], asset_filename[:250], ""...]:
  ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks very clean!

Copy link
Contributor

@ferdnyc ferdnyc May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on a quick google:

Folder name length limits per OS:

  • Windows 8/10: 255 to 260
  • Mac OS: 1024
  • Linux: 255

One sticking point: That Windows size was for the total length of the entire path, all the way from C:\ to the extension.

That limitation was finally lifted in (an update to!) Windows 10, and I believe that change has been automatically pushed to all systems if they've been running Windows Update. But for a long time, Windows path strings were very restricted when you were using legacy APIs (meaning non-UNC, the //host/share/path/ etc. names.) IIRC that was the reason for the short-ish folder naming. If the user tried to name their project file something over the 260-character limit, the system would stop them, but the worry was that their path might be, say, 258 characters, so adding _assets would be the thing that pushed it over.

That being said, if someone is naming their project file something insane and long, and it's accepted as a legal path, these days our extra 7 characters won't often push it over the limit. (Sorry, no... actually I just remembered that it's only three extra characters: the cost of adding _assets is offset by the fact that we've dropped the .osp extension.)

Except, of course, when _assets is the straw that breaks 260. There is still a slight chance of causing some headaches for any remaining Win7 users, by allowing significantly longer name lengths. (Which isn't to say that I'm opposed to it, just a factor to consider.)

But as long as we are truncating the folder name, it seems possible that we can still have a conflict. But perhaps less often.

That's kind of the thing, I mean... if people are going to really push their luck, eventually they're going to break something.

The biggest danger is people creating long-ass project names, then duplicating the files and letting Windows do its My_long_ass_project_name (1).osp, My_long_ass_project_name (Copy).osp, ... type disambiguation. Because those names only differ at the very far end of the string, any truncation potentially results in a collision.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I forgot the .osp extension. I was going to go with
asset_folder_name = asset_filename[:248] + "_assets" 255 - 7.

@JacksonRG
Copy link
Collaborator Author

Based on a quick google:

Folder name length limits per OS:

  • Windows 8/10: 255 to 260
  • Mac OS: 1024
  • Linux: 255

So, perhaps the 30 char truncate is not the solution, but I do think we need to truncate it at some reasonable length... such that Project_Name + "_assets" length < 255. But as long as we are truncating the folder name, it seems possible that we can still have a conflict. But perhaps less often.

Right. I feel 255 - 7 is enough of a name (nearly a paragraph) for all reasonable cases.

@JacksonRG
Copy link
Collaborator Author

Oh, and if we change this truncation (if that is a word), will this prevent different versions of OpenShot from finding the _assets folder?

That was my thinking, in checking for a truncated folder first, and moving it to the new naming system

@ferdnyc
Copy link
Contributor

ferdnyc commented May 25, 2021

Oh, and if we change this truncation (if that is a word), will this prevent different versions of OpenShot from finding the _assets folder?

That was my thinking, in checking for a truncated folder first, and moving it to the new naming system

The new code checking for legacy names is half of it, but that still doesn't solve the problem of backwards migration. Let's say OpenShot 2.5.2 saves project assets in a folder named f"{project_filename[:250]}_assets". That project is no longer fully compatible with OpenShot 2.5.1 or lower, since the assets won't be found.

That's not so much a concern for released code as it is for development builds. Sometimes a user will open their project in a Daily Build version just to test something out. (We basically tell them to, in fact, before reporting bugs.) But it's not uncommon to return to the release version afterwards, for normal use. If opening a project in the latest Daily Build instantly upgraded away its compatibility with the release version, that'd be kind of a bummer.

@JacksonRG
Copy link
Collaborator Author

That's not so much a concern for released code as it is for development builds. Sometimes a user will open their project in a Daily Build version just to test something out. (We basically tell them to, in fact, before reporting bugs.) But it's not uncommon to return to the release version afterwards, for normal use. If opening a project in the latest Daily Build instantly upgraded away its compatibility with the release version, that'd be kind of a bummer.

So should we hold off merging this until the next full release?

@JacksonRG JacksonRG requested a review from ferdnyc May 27, 2021 22:02
@jonoomph
Copy link
Member

LG Otherwise... 👍

@JacksonRG JacksonRG merged commit 11e045d into develop May 27, 2021
@ferdnyc
Copy link
Contributor

ferdnyc commented Jun 10, 2021

@jonoomph, @JacksonRG :

Er, crap. Turns out, the 30-character prefix length is also hardcoded into libopenshot's support for opening .osp files.

@Timeline.cpp:133-139:

	// Determine asset path
	QString asset_name = filePath.baseName().left(30) + "_assets";
	QDir asset_folder(filePath.dir().filePath(asset_name));
	if (!asset_folder.exists()) {
		// Create directory if needed
		asset_folder.mkpath(".");
	}

@ferdnyc ferdnyc deleted the assets-folder-name-collision branch June 10, 2021 13:41
@JacksonRG
Copy link
Collaborator Author

@ferdnyc Yikes! Thanks for pointing that out. We should probably make project wide variable or function for the filePath, but for now, I'll fix the timeline and do a search for any other path code.

@ferdnyc
Copy link
Contributor

ferdnyc commented Jun 10, 2021

@JacksonRG That should be it, the .osp support in libopenshot is a relatively recent addition.

I was actually toying with the idea of storing the asset folder name -- NOT the path, just the name of the folder -- in the project file itself.

Pros

  1. We know unambiguously where the assets for this file were stored.
  2. We keep track of them even if the user renames the project file, but not the folder (which seems more likely than not).
  3. We won't have this problem again even if we change the formula for generating the name, because locating it won't be dependent on the formula.
  4. If there ARE name collisions, and we come up with some way of detecting that, we could use an alternate name for the folder.

Cons

  1. We lose the folder if the user DOES rename the project file and the folder. (Though, we could still fall back to checking the standard generated names, to mitigate that. And also, honestly, this will never ever happen.)
  2. We'd have to always reference all asset folder contents with a standard placeholder -- ok, well, we do that already.
  3. We won't have the folder name until we've parsed the project data.

Con 3 is the big one. As I recall, right now we do string replacement on the JSON form of the project data first, to correct relative paths and placeholder paths. Then, we parse the corrected JSON into project data. But if we were going to get the folder name from the project data, we'd have to parse the JSON first, with the relative and placeholder paths intact. Then, we could correct the paths when parsing the other data.

There's no reason that couldn't work. Both (for sure) Python's os.path() and (I believe) Qt's QFileInfo() should accept paths like @assets/thumbnails/blah.png without complaint -- they shouldn't validate the path against disk until asked to. Until then, it's just a relative path with a funny directory name. (Although that funny directory name may also be an invalid directory name on Windows, and we'd have to check whether that causes issues for QFileInfo().)

Handling the paths this way would also be much safer than regexp-editing the JSON string before parsing it (something that's caused us problems in the past).

But, because it would change the form of path strings as they come in via SetJson() in libopenshot, we'd probably have to change how those parts of the data are read.

Like, any JSON that might involve a placeholder path, we'd have to queue processing until after we've read all the keys in the root object, just in case it contains an "assetFolderName". If it does, the relevant SetJsonValue() methods would all have to be taught how to correct placeholder-relative paths, when provided with the path of the asset folder (already checked for validity and made absolute by the Timeline SetJsonValue()).

As far as OpenShot goes, because the JSON is read into the get_app().project._data Python dict en masse, before anything else is done with that data, we could actually just recursively examine its contents and update the path strings before they're parsed.

Right now the path placeholders are subbed in during save right after json.dumps()-ing the data to string, by classes.json. Which makes sense, as we don't want to modify the in-memory data before we dump it.

And so they're swapped out again before json.loads(), in the same class. But there's no reason that process has to be symmetric.

The data dict created by json.loads() could still contain "@assets/" style paths initially, just as long as it's immediately walked to correct them. Unlike in libopenshot, nothing is done with those path strings until after the entire data object has been loaded. And if we defer the path replacements until after we've parsed the JSON, then there's no reason we can't take the folder name from there.

@ferdnyc
Copy link
Contributor

ferdnyc commented Jun 11, 2021

@JacksonRG @jonoomph

I may also be overcomplicating this, on the libopenshot front, saying we have to "defer" parsing. Like, we might be able to start off Timeline::SetJsonValue() like this:

QDir assetDir;
QString assetPath;
QString baseName;

if (!root["path"].isNull())
    path = root["path"].asString();

// member var 'path' can also be set in the constructor
if (path) {
    QFileInfo filePath(QString::fromStdString(path));
    baseName = filePath.baseName();
    assetDir = QDir(filePath.dir()).makeAbsolute();
}

if (!root["assetFolderName"].isNull()) {
    QString assetFolder(root["assetFolderName"].asCString());
    if (assetDir.cd(assetFolder))
        assetPath = assetDir.canonicalPath();
}

// If we didn't find the asset folder (or none was specified)
// but we have a project file, try some generated paths
if (baseName && !assetPath) {
    if (   !assetDir.cd(baseName.left(250) + "_assets")
        && !assetDir.cd(baseName.left(30) + "_assets"))
    {
        // As a last resort, create a directory
        assetDir.setPath(baseName.left(250) + "_assets");
        assetDir.mkPath(".");
    }
    assetPath = assetDir.canonicalPath();
}

...And then just call each Clip::SetJsonValue and Effect::SetJsonValue with an optional argument assetPath (default ".") that they in turn pass to the Reader SetJsonValue()s, so they can do:

if (!root["path"].isNull()) {
    QString json_path(root["path"].asCString());
    if (json_path.startsWith("@assets/"))
         json_path = assetPath + json_path.mid(7);
    path = QFileInfo(json_path).canonicalFilePath().toStdString();
}

I guess we'd need a second optional arg for the @transitions/ mapping, too.

It adds a bunch of code, but also replaces a lot of much more expensive code:

	// Determine asset path
	QString asset_name = filePath.baseName().left(30) + "_assets";
	QDir asset_folder(filePath.dir().filePath(asset_name));
	if (!asset_folder.exists()) {
		// Create directory if needed
		asset_folder.mkpath(".");
	}


	// Load UTF-8 project file into QString
	QFile projectFile(QString::fromStdString(path));
	projectFile.open(QFile::ReadOnly);
	QString projectContents = QString::fromUtf8(projectFile.readAll());


	// Convert all relative paths into absolute paths (if requested)
	if (convert_absolute_paths) {


		// Find all "image" or "path" references in JSON (using regex). Must loop through match results
		// due to our path matching needs, which are not possible with the QString::replace() function.
		QRegularExpression allPathsRegex(QStringLiteral("\"(image|path)\":.*?\"(.*?)\""));
		std::vector<QRegularExpressionMatch> matchedPositions;
		QRegularExpressionMatchIterator i = allPathsRegex.globalMatch(projectContents);
		while (i.hasNext()) {
			QRegularExpressionMatch match = i.next();
			if (match.hasMatch()) {
				// Push all match objects into a vector (so we can reverse them later)
				matchedPositions.push_back(match);
			}
		}


		// Reverse the matches (bottom of file to top, so our replacements don't break our match positions)
		std::vector<QRegularExpressionMatch>::reverse_iterator itr;
		for (itr = matchedPositions.rbegin(); itr != matchedPositions.rend(); itr++) {
			QRegularExpressionMatch match = *itr;
			QString relativeKey = match.captured(1); // image or path
			QString relativePath = match.captured(2); // relative file path
			QString absolutePath = "";


			// Find absolute path of all path, image (including special replacements of @assets and @transitions)
			if (relativePath.startsWith("@assets")) {
				absolutePath = QFileInfo(asset_folder.absoluteFilePath(relativePath.replace("@assets", "."))).canonicalFilePath();
			} else if (relativePath.startsWith("@transitions")) {
				absolutePath = QFileInfo(openshotTransPath.absoluteFilePath(relativePath.replace("@transitions", "."))).canonicalFilePath();
			} else {
				absolutePath = QFileInfo(filePath.absoluteDir().absoluteFilePath(relativePath)).canonicalFilePath();
			}


			// Replace path in JSON content, if an absolute path was successfully found
			if (!absolutePath.isEmpty()) {
				projectContents.replace(match.capturedStart(0), match.capturedLength(0), "\"" + relativeKey + "\": \"" + absolutePath + "\"");
			}
		}
		// Clear matches
		matchedPositions.clear();
	}

@JacksonRG
Copy link
Collaborator Author

JacksonRG commented Jun 11, 2021

@ferdnyc Still reading through this, but I just thought: If we store the asset folder name in the project data, would/should we give the option for the user to set it? Then there's an optional path to having a shared asset folder between projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants