Skip to content

Linux file system with square brackets in the directory/filename are failing to be indexed #3503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
streetpaws opened this issue May 17, 2025 · 2 comments
Labels
bug Something isn't working help-wanted Extra attention is needed

Comments

@streetpaws
Copy link

streetpaws commented May 17, 2025

https://github.com/Shopify/ruby-lsp/blame/6acf78b6f5e94c8fd13be5ad7a4c82d083233853/lib/ruby_indexer/lib/ruby_indexer/uri.rb#L21

The unsafe_regexp includes square brackets and I'm finding this is causing the ruby-lsp indexing to fail when it comes across a file path like the following "/home/streetp/index[0]/example.rb".

When I modify the ruby_indexer code and change the unsafe_regexp, removing the square brackets, to be unsafe_regex = %r{[^\-_.!~*'()a-zA-Z\d;/?@&=+$,]}

Then it works. I haven't checked tested this for Windows/Mac - and I'm not aware of whether this may create issues but I know that I have these work filepaths on my system (and these generated filepaths that break ruby-lsp indexing are not things I can control and change).

For context, I am using VSCode on a Mac, and using the Microsoft Remote Extension to work on a Linux host and have the Ruby-Lsp extension installed remotely on the Linux host (so it's trying to index on that host and failing). This extension pulls in the ruby-lsp gem. In terms of ruby-lsp version it is as follows:

$ ruby-lsp --version
0.23.20
@vinistock vinistock added bug Something isn't working help-wanted Extra attention is needed labels May 20, 2025
@vinistock
Copy link
Member

Thanks for the report. To fix this, we also need to ensure that the resulting URI is escaped the same exact way that the editor would escape it, otherwise we would end up with duplicate entries.

During initial indexing, the URIs used to store entries come from our own logic and escaping. After you open a document in the editor and modify it, then we are receiving the URI from the editor. Any mismatches between the two will result in inserting the same declarations for two different URIs.

@pstreet
Copy link

pstreet commented May 22, 2025

This might be helpful as well to consider with folks on Linux https://www.cyberciti.biz/faq/linuxunix-rules-for-naming-file-and-directory-names/

Essentially a Linux file/directory name can contain any character other than / (which is the separator) and are limited to no more than 255 bytes (although the linked article recommends avoiding some characters due to how shell will interpret them - but these are not restricted and can be escaped to not be parsed by the shell).

Also noted this online comment from Hacker News https://news.ycombinator.com/item?id=19245120

File and folder names can't be longer than 255 UTF-8 code units in Linux, which means they can contain 255 US-ASCII characters, but only 127 Cyrillic characters, 85 Chinese characters, or 63 emoji. In Windows it's different, because file names can contain up to 255 UTF-16 code units. This is 255 characters in almost every language (but only 127 emoji.) So, if you create a file name with 100 Chinese characters in Windows, you can't transfer it to Linux (or upload it to a Linux web server, for example.)

Windows, on the other hand, has problems when the full path to the file (eg. C:\folder\file) is more than 259 UTF-16 code units, but it's getting better at this, and newer Windows apps normally handle longer paths just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help-wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants