Description
What are stack-graphs
Stack-graphs were developed by Github to provide code navigation features to any of the ~500 programming languages hosted in Github repositories (although they have only defined it for python currently). Examples of such features are "go to definition" and "go to references". It builds on the information obtained from tree-sitter grammars to create incrementally-updated look-up graphs that allow sub-100ms navigation of named symbols.
Why would we want them in Helix?
Helix at this time relies exclusively on tree-sitter to provide basic highlighting and other more advanced features (such as selection by scope, which is still being worked into the default keymap). The conceit is that sourcing or writing a tree-sitter grammar for a language should be the preferred way of implementing support for a language for Helix. If we are to keep with this preference, we should endeavour to reap the maximum benefits of having a tree-sitter parsed syntax tree available for editor functions. This seems to be part of what sets helix apart from other editors.
Helix supports LSP for those features, why not rely on that?
There are a number of reasons:
Resource constraints
Many LSP servers are notoriously memory and cpu intensive. Depending on the kind of development machine you are running on, that may or may not be an issue even for one running LSP server. However, in heterogenous code environments (consider a website that uses typescript on the frontend, PHP on the backend and ruby for it's build system or to provide certain endpoints), or in environments where you are working on a number of projects throughout the day, whether or not all in the same language, it can quickly become limiting to rely on LSP in every case. It will likely be a concious decision to limit to a single LSP for the most used language simply to free up resources.
Not all languages have functional LSPs
Fairly recently, there was a lot of effort to utilize Julia's LSP server in the default configuration of Helix, but it was found that not only was the server excruciatingly slow to start up, but it wasn't entirely conformant or complete. There are certainly other examples of LSPs that only provide partial support for the standard. In those cases, it would be simpler to create a ruleset for stack-graphs to at least provide code navigation, rather than write or improve an entire LSP implementation.
Not all languages should have an LSP
Especially in the web sphere, there are many small domain-specific languages that are less a complete language and more of a configuration format, but which nonetheless support definitions and references of discrete portions of the configuration, and/or drop-in files to organize configuration across a number of files. Apache, nginx and varnish configuration files in particular come to mind. These will likely never get LSP servers written, as the benefit is too small, but it is certainly possible to write a tree-sitter grammer and stack-graph ruleset to make editing those files in Helix more ergonomic.
What else could be derived?
- Besides go to definition (
gd
) and go to references (gr
), it seems conceivable that a complete stack-graph can be used to produce a full list of symbols in file, and even workspace symbols. - Go to type definition (
gy
) and go to implementation (gi
) are notably not talked about in the stack-graphs intro talk, but seem like they could be added down the road if not already available. - Having all of the information on where symbols are defined and used means we could theoretical implement an LSP- and language-agnostic version of rename-symbol.
- "show docs under cursor" (
<space>k
) could theoretically utilize definition information and provide that as the popup information, similar to what github does with stack-graphs (try it on a python repo like flask to see what i mean).
What's involved in bringing it to Helix
First of all, the code behind the github feature is already open sourced as a Rust crate! Perusing the docs.rs for it, it seems only to provide the core functionality of incrementally creating the graph and querying it. This means we will need to take care of crawling the project (github uses the git trees to discover these) for all files of that type and generating the intial partial graphs. This should be cached somewhere (perhaps in a .helix
in the project root directory, or in something like $XDG_RUNTIME_DIR/heix/path
). As files are worked on in the editor, the partial graphs would be updated for that file in real time, then as functionality was invoked, we could query the full graph set to get the answer (this is the part that is always within 100ms in Github's testing).
Once functionality is present, we would need only find or implement support for each language with a ruleset. I'm not sure where Github stores the ruleset for python or if they intend to share what they develop for each language. I would assume they will, as they specifically called out taking advantage of the tree-sitter parses ecosystem. This would live in the runtime directory alongside the tree-sitter queries.