Description
This is a description of how incremental injections can be implemented, and I plan to work on the PR.
I created this issue in case anyone has suggestions or can find potential flaws with this implementation.
See the PR.
Theory
The central idea is that most patterns in a query have a "root" node, i.e. such a node, for which the content and structure (types of nodes) outside its range do not affect whether the pattern matches on it or not. I.e. if we know that, after an edit, no change intersects the node range, then its matches/doesn't match
status remains the same.
We can store the ranges of those "root" nodes for each match, and skip running query cursor on those ranges if they haven't changed. The same logic applies to all other nodes (that don't match). If their contents and types haven't changed, they will also continue to not match.
We can use ts_tree_get_changed_ranges()
to get all regions where the syntax tree was changed, and we can record content changes from on_bytes()
. Both are already used in the LanguageTree
.
The last part is how to get the root nodes for each match. For now, we can do something like #set! nvim.injection-root @_root
. And if all patterns in a query have that predicate, the optimization applies.
How it would work
After we have all the changes, we can remove all matches that intersect them, run the matches iterator on these ranges, add the new matches, and create new regions for the affected trees. (The actual implementation would need a smarter invalidating strategy).
Since most of the edits don't affect the injections or the syntax tree in a significant way, for them, the query cursor will recheck only a small range, and 0 injection matches will be added/removed.
Implementation
We would need to store all the matches that were found by the iterator, as well as ranges that aren't yet parsed. For each tree-sitter tree.
On a tree edit
For each tree, do the following:
- Adjust root range of each match (the same way tree-sitter would've adjusted the node). If the change affects the content inside the match, remove the match.
- Adjust "not parsed" ranges so they have up-to-date positions.
- Add the new range to "not parsed" ranges (if needed).
- Edit the tree.
When a parse is complete
After a parse on all trees is complete, each tree has a list of ranges in which the syntax tree was changed.
For each tree:
- Remove all matches that intersect these ranges.
- Add these ranges to "not parsed" ranges.
- Go through all "not parsed" ranges, run the iterator on them, and add all new matches to the list.
Notes:
- Theoretically, the "root" node can depend on the
type
of its ancestors, since if they change, their range is invalidated, and the match will be as well. This means#has-ancestor?
should work with this optimization.
This is different from #26827 in that with this implementation, all injections, even the ones currently not visible, are preserved, and it also works with combined injection. Having both would be ideal, since if there are a lot of changes, all of them have to be reparsed. But usually, we only need a small range.