Linux file system with square brackets in the directory/filename are failing to be indexed #3503

streetpaws · 2025-05-17T02:00:29Z

https://github.com/Shopify/ruby-lsp/blame/6acf78b6f5e94c8fd13be5ad7a4c82d083233853/lib/ruby_indexer/lib/ruby_indexer/uri.rb#L21

The unsafe_regexp includes square brackets and I'm finding this is causing the ruby-lsp indexing to fail when it comes across a file path like the following "/home/streetp/index[0]/example.rb".

When I modify the ruby_indexer code and change the unsafe_regexp, removing the square brackets, to be unsafe_regex = %r{[^\-_.!~*'()a-zA-Z\d;/?@&=+$,]}

Then it works. I haven't checked tested this for Windows/Mac - and I'm not aware of whether this may create issues but I know that I have these work filepaths on my system (and these generated filepaths that break ruby-lsp indexing are not things I can control and change).

For context, I am using VSCode on a Mac, and using the Microsoft Remote Extension to work on a Linux host and have the Ruby-Lsp extension installed remotely on the Linux host (so it's trying to index on that host and failing). This extension pulls in the ruby-lsp gem. In terms of ruby-lsp version it is as follows:

$ ruby-lsp --version
0.23.20

The text was updated successfully, but these errors were encountered:

vinistock · 2025-05-20T21:11:46Z

Thanks for the report. To fix this, we also need to ensure that the resulting URI is escaped the same exact way that the editor would escape it, otherwise we would end up with duplicate entries.

During initial indexing, the URIs used to store entries come from our own logic and escaping. After you open a document in the editor and modify it, then we are receiving the URI from the editor. Any mismatches between the two will result in inserting the same declarations for two different URIs.

pstreet · 2025-05-22T15:53:40Z

This might be helpful as well to consider with folks on Linux https://www.cyberciti.biz/faq/linuxunix-rules-for-naming-file-and-directory-names/

Essentially a Linux file/directory name can contain any character other than / (which is the separator) and are limited to no more than 255 bytes (although the linked article recommends avoiding some characters due to how shell will interpret them - but these are not restricted and can be escaped to not be parsed by the shell).

Also noted this online comment from Hacker News https://news.ycombinator.com/item?id=19245120

File and folder names can't be longer than 255 UTF-8 code units in Linux, which means they can contain 255 US-ASCII characters, but only 127 Cyrillic characters, 85 Chinese characters, or 63 emoji. In Windows it's different, because file names can contain up to 255 UTF-16 code units. This is 255 characters in almost every language (but only 127 emoji.) So, if you create a file name with 100 Chinese characters in Windows, you can't transfer it to Linux (or upload it to a Linux web server, for example.)

Windows, on the other hand, has problems when the full path to the file (eg. C:\folder\file) is more than 259 UTF-16 code units, but it's getting better at this, and newer Windows apps normally handle longer paths just fine.

vinistock added bug Something isn't working help-wanted Extra attention is needed labels May 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Linux file system with square brackets in the directory/filename are failing to be indexed #3503

Linux file system with square brackets in the directory/filename are failing to be indexed #3503

streetpaws commented May 17, 2025 •

edited

Loading

vinistock commented May 20, 2025

Uh oh!

pstreet commented May 22, 2025

Uh oh!

Linux file system with square brackets in the directory/filename are failing to be indexed #3503

Linux file system with square brackets in the directory/filename are failing to be indexed #3503

Comments

streetpaws commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

vinistock commented May 20, 2025

Uh oh!

pstreet commented May 22, 2025

Uh oh!

streetpaws commented May 17, 2025 •

edited

Loading