Skip to content

Commit 0051e03

Browse files
gjabellthe-mikedavis
authored andcommitted
Add glob file type support (helix-editor#8006)
* Replace FileType::Suffix with FileType::Glob Suffix is rather limited and cannot be used to match files which have semantic meaning based on location + file type (for example, Github Action workflow files). This patch adds support for a Glob FileType to replace Suffix, which encompasses the existing behavior & adds additional file matching functionality. Globs are standard Unix-style path globs, which are matched against the absolute path of the file. If the configured glob for a language is a relative glob (that is, it isn't an absolute path or already starts with a glob pattern), a glob pattern will be prepended to allow matching relative paths from any directory. The order of file type matching is also updated to first match on globs and then on extension. This is necessary as most cases where glob-matching is useful will have already been matched by an extension if glob matching is done last. * Convert file-types suffixes to globs * Use globs for filename matching Trying to match the file-type raw strings against both filename and extension leads to files with the same name as the extension having the incorrect syntax. * Match dockerfiles with suffixes It's common practice to add a suffix to dockerfiles based on their context, e.g. `Dockerfile.dev`, `Dockerfile.prod`, etc. * Make env filetype matching more generic Match on `.env` or any `.env.*` files. * Update docs * Use GlobSet to match all file type globs at once * Update todo.txt glob patterns * Consolidate language Configuration and Loader creation This is a refactor that improves the error handling for creating the `helix_core::syntax::Loader` from the default and user language configuration. * Fix integration tests * Add additional starlark file-type glob --------- Co-authored-by: Michael Davis <[email protected]>
1 parent 04d2919 commit 0051e03

File tree

12 files changed

+262
-187
lines changed

12 files changed

+262
-187
lines changed

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

book/src/languages.md

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -78,24 +78,26 @@ from the above section. `file-types` is a list of strings or tables, for
7878
example:
7979

8080
```toml
81-
file-types = ["Makefile", "toml", { suffix = ".git/config" }]
81+
file-types = ["toml", { glob = "Makefile" }, { glob = ".git/config" }, { glob = ".github/workflows/*.yaml" } ]
8282
```
8383

8484
When determining a language configuration to use, Helix searches the file-types
8585
with the following priorities:
8686

87-
1. Exact match: if the filename of a file is an exact match of a string in a
88-
`file-types` list, that language wins. In the example above, `"Makefile"`
89-
will match against `Makefile` files.
90-
2. Extension: if there are no exact matches, any `file-types` string that
91-
matches the file extension of a given file wins. In the example above, the
92-
`"toml"` matches files like `Cargo.toml` or `languages.toml`.
93-
3. Suffix: if there are still no matches, any values in `suffix` tables
94-
are checked against the full path of the given file. In the example above,
95-
the `{ suffix = ".git/config" }` would match against any `config` files
96-
in `.git` directories. Note: `/` is used as the directory separator but is
97-
replaced at runtime with the appropriate path separator for the operating
98-
system, so this rule would match against `.git\config` files on Windows.
87+
1. Glob: values in `glob` tables are checked against the full path of the given
88+
file. Globs are standard Unix-style path globs (e.g. the kind you use in Shell)
89+
and can be used to match paths for a specific prefix, suffix, directory, etc.
90+
In the above example, the `{ glob = "Makefile" }` config would match files
91+
with the name `Makefile`, the `{ glob = ".git/config" }` config would match
92+
`config` files in `.git` directories, and the `{ glob = ".github/workflows/*.yaml" }`
93+
config would match any `yaml` files in `.github/workflow` directories. Note
94+
that globs should always use the Unix path separator `/` even on Windows systems;
95+
the matcher will automatically take the machine-specific separators into account.
96+
If the glob isn't an absolute path or doesn't already start with a glob prefix,
97+
`*/` will automatically be added to ensure it matches for any subdirectory.
98+
2. Extension: if there are no glob matches, any `file-types` string that matches
99+
the file extension of a given file wins. In the example above, the `"toml"`
100+
config matches files like `Cargo.toml` or `languages.toml`.
99101

100102
## Language Server configuration
101103

helix-core/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ chrono = { version = "0.4", default-features = false, features = ["alloc", "std"
4949

5050
etcetera = "0.8"
5151
textwrap = "0.16.0"
52+
globset = "0.4.14"
5253

5354
nucleo.workspace = true
5455
parking_lot = "0.12"

helix-core/src/config.rs

Lines changed: 40 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,45 @@
1-
/// Syntax configuration loader based on built-in languages.toml.
2-
pub fn default_syntax_loader() -> crate::syntax::Configuration {
1+
use crate::syntax::{Configuration, Loader, LoaderError};
2+
3+
/// Language configuration based on built-in languages.toml.
4+
pub fn default_lang_config() -> Configuration {
35
helix_loader::config::default_lang_config()
46
.try_into()
5-
.expect("Could not serialize built-in languages.toml")
7+
.expect("Could not deserialize built-in languages.toml")
68
}
7-
/// Syntax configuration loader based on user configured languages.toml.
8-
pub fn user_syntax_loader() -> Result<crate::syntax::Configuration, toml::de::Error> {
9+
10+
/// Language configuration loader based on built-in languages.toml.
11+
pub fn default_lang_loader() -> Loader {
12+
Loader::new(default_lang_config()).expect("Could not compile loader for default config")
13+
}
14+
15+
#[derive(Debug)]
16+
pub enum LanguageLoaderError {
17+
DeserializeError(toml::de::Error),
18+
LoaderError(LoaderError),
19+
}
20+
21+
impl std::fmt::Display for LanguageLoaderError {
22+
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
23+
match self {
24+
Self::DeserializeError(err) => write!(f, "Failed to parse language config: {err}"),
25+
Self::LoaderError(err) => write!(f, "Failed to compile language config: {err}"),
26+
}
27+
}
28+
}
29+
30+
impl std::error::Error for LanguageLoaderError {}
31+
32+
/// Language configuration based on user configured languages.toml.
33+
pub fn user_lang_config() -> Result<Configuration, toml::de::Error> {
934
helix_loader::config::user_lang_config()?.try_into()
1035
}
36+
37+
/// Language configuration loader based on user configured languages.toml.
38+
pub fn user_lang_loader() -> Result<Loader, LanguageLoaderError> {
39+
let config: Configuration = helix_loader::config::user_lang_config()
40+
.map_err(LanguageLoaderError::DeserializeError)?
41+
.try_into()
42+
.map_err(LanguageLoaderError::DeserializeError)?;
43+
44+
Loader::new(config).map_err(LanguageLoaderError::LoaderError)
45+
}

helix-core/src/syntax.rs

Lines changed: 96 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -82,12 +82,6 @@ pub struct Configuration {
8282
pub language_server: HashMap<String, LanguageServerConfiguration>,
8383
}
8484

85-
impl Default for Configuration {
86-
fn default() -> Self {
87-
crate::config::default_syntax_loader()
88-
}
89-
}
90-
9185
// largely based on tree-sitter/cli/src/loader.rs
9286
#[derive(Debug, Serialize, Deserialize)]
9387
#[serde(rename_all = "kebab-case", deny_unknown_fields)]
@@ -164,9 +158,11 @@ pub enum FileType {
164158
/// The extension of the file, either the `Path::extension` or the full
165159
/// filename if the file does not have an extension.
166160
Extension(String),
167-
/// The suffix of a file. This is compared to a given file's absolute
168-
/// path, so it can be used to detect files based on their directories.
169-
Suffix(String),
161+
/// A Unix-style path glob. This is compared to the file's absolute path, so
162+
/// it can be used to detect files based on their directories. If the glob
163+
/// is not an absolute path and does not already start with a glob pattern,
164+
/// a glob pattern will be prepended to it.
165+
Glob(globset::Glob),
170166
}
171167

172168
impl Serialize for FileType {
@@ -178,9 +174,9 @@ impl Serialize for FileType {
178174

179175
match self {
180176
FileType::Extension(extension) => serializer.serialize_str(extension),
181-
FileType::Suffix(suffix) => {
177+
FileType::Glob(glob) => {
182178
let mut map = serializer.serialize_map(Some(1))?;
183-
map.serialize_entry("suffix", &suffix.replace(std::path::MAIN_SEPARATOR, "/"))?;
179+
map.serialize_entry("glob", glob.glob())?;
184180
map.end()
185181
}
186182
}
@@ -213,9 +209,20 @@ impl<'de> Deserialize<'de> for FileType {
213209
M: serde::de::MapAccess<'de>,
214210
{
215211
match map.next_entry::<String, String>()? {
216-
Some((key, suffix)) if key == "suffix" => Ok(FileType::Suffix({
217-
suffix.replace('/', std::path::MAIN_SEPARATOR_STR)
218-
})),
212+
Some((key, mut glob)) if key == "glob" => {
213+
// If the glob isn't an absolute path or already starts
214+
// with a glob pattern, add a leading glob so we
215+
// properly match relative paths.
216+
if !glob.starts_with('/') && !glob.starts_with("*/") {
217+
glob.insert_str(0, "*/");
218+
}
219+
220+
globset::Glob::new(glob.as_str())
221+
.map(FileType::Glob)
222+
.map_err(|err| {
223+
serde::de::Error::custom(format!("invalid `glob` pattern: {}", err))
224+
})
225+
}
219226
Some((key, _value)) => Err(serde::de::Error::custom(format!(
220227
"unknown key in `file-types` list: {}",
221228
key
@@ -752,81 +759,113 @@ pub struct SoftWrap {
752759
pub wrap_at_text_width: Option<bool>,
753760
}
754761

762+
#[derive(Debug)]
763+
struct FileTypeGlob {
764+
glob: globset::Glob,
765+
language_id: usize,
766+
}
767+
768+
impl FileTypeGlob {
769+
fn new(glob: globset::Glob, language_id: usize) -> Self {
770+
Self { glob, language_id }
771+
}
772+
}
773+
774+
#[derive(Debug)]
775+
struct FileTypeGlobMatcher {
776+
matcher: globset::GlobSet,
777+
file_types: Vec<FileTypeGlob>,
778+
}
779+
780+
impl FileTypeGlobMatcher {
781+
fn new(file_types: Vec<FileTypeGlob>) -> Result<Self, globset::Error> {
782+
let mut builder = globset::GlobSetBuilder::new();
783+
for file_type in &file_types {
784+
builder.add(file_type.glob.clone());
785+
}
786+
787+
Ok(Self {
788+
matcher: builder.build()?,
789+
file_types,
790+
})
791+
}
792+
793+
fn language_id_for_path(&self, path: &Path) -> Option<&usize> {
794+
self.matcher
795+
.matches(path)
796+
.iter()
797+
.filter_map(|idx| self.file_types.get(*idx))
798+
.max_by_key(|file_type| file_type.glob.glob().len())
799+
.map(|file_type| &file_type.language_id)
800+
}
801+
}
802+
755803
// Expose loader as Lazy<> global since it's always static?
756804

757805
#[derive(Debug)]
758806
pub struct Loader {
759807
// highlight_names ?
760808
language_configs: Vec<Arc<LanguageConfiguration>>,
761809
language_config_ids_by_extension: HashMap<String, usize>, // Vec<usize>
762-
language_config_ids_by_suffix: HashMap<String, usize>,
810+
language_config_ids_glob_matcher: FileTypeGlobMatcher,
763811
language_config_ids_by_shebang: HashMap<String, usize>,
764812

765813
language_server_configs: HashMap<String, LanguageServerConfiguration>,
766814

767815
scopes: ArcSwap<Vec<String>>,
768816
}
769817

818+
pub type LoaderError = globset::Error;
819+
770820
impl Loader {
771-
pub fn new(config: Configuration) -> Self {
772-
let mut loader = Self {
773-
language_configs: Vec::new(),
774-
language_server_configs: config.language_server,
775-
language_config_ids_by_extension: HashMap::new(),
776-
language_config_ids_by_suffix: HashMap::new(),
777-
language_config_ids_by_shebang: HashMap::new(),
778-
scopes: ArcSwap::from_pointee(Vec::new()),
779-
};
821+
pub fn new(config: Configuration) -> Result<Self, LoaderError> {
822+
let mut language_configs = Vec::new();
823+
let mut language_config_ids_by_extension = HashMap::new();
824+
let mut language_config_ids_by_shebang = HashMap::new();
825+
let mut file_type_globs = Vec::new();
780826

781827
for config in config.language {
782828
// get the next id
783-
let language_id = loader.language_configs.len();
829+
let language_id = language_configs.len();
784830

785831
for file_type in &config.file_types {
786832
// entry().or_insert(Vec::new).push(language_id);
787833
match file_type {
788-
FileType::Extension(extension) => loader
789-
.language_config_ids_by_extension
790-
.insert(extension.clone(), language_id),
791-
FileType::Suffix(suffix) => loader
792-
.language_config_ids_by_suffix
793-
.insert(suffix.clone(), language_id),
834+
FileType::Extension(extension) => {
835+
language_config_ids_by_extension.insert(extension.clone(), language_id);
836+
}
837+
FileType::Glob(glob) => {
838+
file_type_globs.push(FileTypeGlob::new(glob.to_owned(), language_id));
839+
}
794840
};
795841
}
796842
for shebang in &config.shebangs {
797-
loader
798-
.language_config_ids_by_shebang
799-
.insert(shebang.clone(), language_id);
843+
language_config_ids_by_shebang.insert(shebang.clone(), language_id);
800844
}
801845

802-
loader.language_configs.push(Arc::new(config));
846+
language_configs.push(Arc::new(config));
803847
}
804848

805-
loader
849+
Ok(Self {
850+
language_configs,
851+
language_config_ids_by_extension,
852+
language_config_ids_glob_matcher: FileTypeGlobMatcher::new(file_type_globs)?,
853+
language_config_ids_by_shebang,
854+
language_server_configs: config.language_server,
855+
scopes: ArcSwap::from_pointee(Vec::new()),
856+
})
806857
}
807858

808859
pub fn language_config_for_file_name(&self, path: &Path) -> Option<Arc<LanguageConfiguration>> {
809860
// Find all the language configurations that match this file name
810861
// or a suffix of the file name.
811-
let configuration_id = path
812-
.file_name()
813-
.and_then(|n| n.to_str())
814-
.and_then(|file_name| self.language_config_ids_by_extension.get(file_name))
862+
let configuration_id = self
863+
.language_config_ids_glob_matcher
864+
.language_id_for_path(path)
815865
.or_else(|| {
816866
path.extension()
817867
.and_then(|extension| extension.to_str())
818868
.and_then(|extension| self.language_config_ids_by_extension.get(extension))
819-
})
820-
.or_else(|| {
821-
self.language_config_ids_by_suffix
822-
.iter()
823-
.find_map(|(file_type, id)| {
824-
if path.to_str()?.ends_with(file_type) {
825-
Some(id)
826-
} else {
827-
None
828-
}
829-
})
830869
});
831870

832871
configuration_id.and_then(|&id| self.language_configs.get(id).cloned())
@@ -2592,7 +2631,8 @@ mod test {
25922631
let loader = Loader::new(Configuration {
25932632
language: vec![],
25942633
language_server: HashMap::new(),
2595-
});
2634+
})
2635+
.unwrap();
25962636
let language = get_language("rust").unwrap();
25972637

25982638
let query = Query::new(language, query_str).unwrap();
@@ -2654,7 +2694,8 @@ mod test {
26542694
let loader = Loader::new(Configuration {
26552695
language: vec![],
26562696
language_server: HashMap::new(),
2657-
});
2697+
})
2698+
.unwrap();
26582699

26592700
let language = get_language("rust").unwrap();
26602701
let config = HighlightConfiguration::new(
@@ -2760,7 +2801,8 @@ mod test {
27602801
let loader = Loader::new(Configuration {
27612802
language: vec![],
27622803
language_server: HashMap::new(),
2763-
});
2804+
})
2805+
.unwrap();
27642806
let language = get_language(language_name).unwrap();
27652807

27662808
let config = HighlightConfiguration::new(language, "", "", "").unwrap();

helix-core/tests/indent.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ fn test_treesitter_indent(
186186
lang_scope: &str,
187187
ignored_lines: Vec<std::ops::Range<usize>>,
188188
) {
189-
let loader = Loader::new(indent_tests_config());
189+
let loader = Loader::new(indent_tests_config()).unwrap();
190190

191191
// set runtime path so we can find the queries
192192
let mut runtime = std::path::PathBuf::from(env!("CARGO_MANIFEST_DIR"));

helix-term/src/application.rs

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -96,11 +96,7 @@ fn setup_integration_logging() {
9696
}
9797

9898
impl Application {
99-
pub fn new(
100-
args: Args,
101-
config: Config,
102-
syn_loader_conf: syntax::Configuration,
103-
) -> Result<Self, Error> {
99+
pub fn new(args: Args, config: Config, lang_loader: syntax::Loader) -> Result<Self, Error> {
104100
#[cfg(feature = "integration")]
105101
setup_integration_logging();
106102

@@ -126,7 +122,7 @@ impl Application {
126122
})
127123
.unwrap_or_else(|| theme_loader.default_theme(true_color));
128124

129-
let syn_loader = std::sync::Arc::new(syntax::Loader::new(syn_loader_conf));
125+
let syn_loader = std::sync::Arc::new(lang_loader);
130126

131127
#[cfg(not(feature = "integration"))]
132128
let backend = CrosstermBackend::new(stdout(), &config.editor);
@@ -394,10 +390,8 @@ impl Application {
394390

395391
/// refresh language config after config change
396392
fn refresh_language_config(&mut self) -> Result<(), Error> {
397-
let syntax_config = helix_core::config::user_syntax_loader()
398-
.map_err(|err| anyhow::anyhow!("Failed to load language config: {}", err))?;
399-
400-
self.syn_loader = std::sync::Arc::new(syntax::Loader::new(syntax_config));
393+
let lang_loader = helix_core::config::user_lang_loader()?;
394+
self.syn_loader = std::sync::Arc::new(lang_loader);
401395
self.editor.syn_loader = self.syn_loader.clone();
402396
for document in self.editor.documents.values_mut() {
403397
document.detect_language(self.syn_loader.clone());

0 commit comments

Comments
 (0)