Span merge review #3

pascalkuthe · 2022-09-25T11:39:50Z

I did a pretty comprehensive review of the SpanIter from helix-editor#3695.
I found a couple of bugs, and some opportunities for performance improvement.
The changes are complex and quite a lot of the so I did not want to suggest them trough the GitHub web UI.
See my comments in helix-editor#3695 for more details

This bug occurs if: * A span needs to be subliced at the end of an in-progress span * Another shorter span exists at the same poisitiotion * That span ends before the in-progress span does An exammple would look roughly like this: D |--| C |-------| B |----------| A |---------| Because spans at the same position are sorted in the descending order, the largest span (B) is processed first and subslicing is started. The current code then moves the start of all subslices at the same position to the end of the of the in-progress span (A). This is correctt in many cases (like C) where the spans at the same poistion extend past the end of the in-progress span (A). However if any spans (C) are shorted than the in-progress span (A), the start of these slices will be moved past their end. To fix this, the code now first splits all spans that end before the end of the in-progress span (A) (in this case B and C) into subslices. This is identical to the previous behaviour except that the subslicing stops as soon as the first span whose end starts before the end of the in-progress span (A) is encountered. Afterwards the span list is resorted (unchanged) to ensure the new subslices are at the correct posistion. Finally the spans that are shorter than the in-progress span (A) are processed normally (no subslicing).

By sorting regends in descending order, the ranges that end before the next span can simply be popped off the end. This allows turning two inner loops (final loop and retain) into simpel if conditions that return early. This presents a nice peformance win in two ways: * Do no iterate the entire `range_ends` vector during retain * No need to move all elements in range_ends just to remove front of elements * Many events can be returned directly instead of pushing to the VecDeque

the-mikedavis · 2022-09-25T14:27:29Z

Sweet, thanks for taking a look at this in such depth! Having another pair of eyes on this should hopefully help iron out those last remaining bugs. It might take me a while to look into the changes properly but a few notes off the top of my head:

property testing: I thought this might be useful after writing a few unit tests manually; thanks for setting it up 😀
sorting range ends descending: I had a commit with this change for a while but I dropped it because it exposed more edge cases than sorting ascending. It looks like your changes may have covered the edge cases I found. I will mostly be looking at this while reviewing.

pascalkuthe · 2022-09-25T15:19:37Z

Sorry about the failing tests somehow I made a small mistake while factoring out a function and forgot to save before running cargo test (still used to autosaving from nvim) :D
It should be fixed now

sorting range ends descending: I had a commit with this change for a while but I dropped it because it exposed more edge cases than sorting ascending. It looks like your changes may have covered the edge cases I found. I will mostly be looking at this while reviewing.

Ah that makes sense so it was intentional. I wasn't quite sure because the documentation of the span_iter function still claimed that spans should be ordered in descending order while elsewhere (and one testcase) used ascending order.

I guess you probably found similar edge cases to me. The edge cases I fixed here actually also occur for ascending order but instead of completely nonsensical spans (end past the start) this leads to subtlety broken highlighting which none of the previous tests would have picked up.
I updated my comment to reflect that (also with some more details). I had that mixed up because I mostly looked at descending order most of the time.
My fix wouldn't actually work for ascending order (and an alternative fix for ascending order would be more complex I think).

Note that I have removed the empty_span_at_sublice_start testcase. This was the minimal reproducible testcase the proptest gave me but its actually a poor example and relied on empty spans to introduce the bug (and therefore doesn't actually reach the edgecase anymore, because empty spans are now ignored)

pascalkuthe · 2022-09-25T16:43:42Z

It seems that dropping empty spans can cause a mismatched number of HiglightStart/HighlightEnd events. I am not too sure why that is. I don't have time too dig into that right now so I just dropped that one commit since it was pretty trivial

the-mikedavis

I was only able to look at this superficially so far but I'm liking what I see 😀

helix-core/src/syntax/span.rs

the-mikedavis · 2022-09-26T00:39:41Z

helix-core/src/syntax/span.rs

+        while let Some(span) = self.spans.get(i) {
+            if span <= &intersect_pos {
+                after = Some(i);
+                i += 1;
+            } else {
+                break;
+            }
+        }


I opted for the loop { match { Some(range) if .. { .. }, _ => break, } } pattern since I thought the while let version was less clear that we're only looking for exactly one pattern. What do you think, is the while let version more idiomatic?

I personally like the while loop better, because it looks less cluttered.
I personally always prefer if to match if there is just one case that I actually care about.
Its also closer to my ideal solution which would be a let_chain whenever that becomes available on a rust version we can use: while let Some(span) = self.spans.get(i) && span <= &intersect_pos

The while loop is just a standard vec iterator so when scanning the code its immediately) obvious what it does and I can focus on the content of the loop (where anything interesting is happening).

However thinking about this more this can actually be solved with an iterator which makes it much clearer.
This also inspired me to rename some variables to make the behaviour of partition_spans_at much clearer.
These changes are in 30563c4. Let me know what you think

helix-core/src/syntax/span.rs

the-mikedavis · 2022-09-26T00:57:13Z

helix-core/src/syntax/span.rs

+        // the runtime is even better since we only scan from `self.index` to
+        // the first element of the Vec with a `range.start` after this range.
+        let mut after = None;
+        let intersect_pos = Span {


I think this would read clearer as intersect_span

fixed in 2030231.

This made me notice that the term range is kind of used randomly instead of span. I think it might be nice to consistently use the term span instead. I did that in 7de6f2a and also expanded on the documentation of partition_spans_at. Do you think the renaming is better or should I revert that commit (except for the documentation)?

helix-core/src/syntax/span.rs

the-mikedavis · 2022-09-26T01:03:10Z

From the upstream PR review:

I haven't looked at the merge implementation yet, I think that will also be a good fit for proptesting

I'm much more confident about that implementation: the procedure is much simpler - basically the same problem as merging two lists which are already sorted - I haven't found any bugs in it yet (compared to span_iter which has been quite a headache), and it's essentially the same as the original implementation of Merge.

That being said, I agree it's a good place for more proptesting 👍

pascalkuthe · 2022-09-26T21:35:27Z

It seems that dropping empty spans can cause a mismatched number of HiglightStart/HighlightEnd events. I am not too sure why that is. I don't have time too dig into that right now so I just dropped that one commit since it was pretty trivial

I have fixed this now in 8f995d6. I simply dropped the empty spans too late where they already had some influence on the state of the iterator and ignoring them left the iterator in an invalid state.

I now drop the right at the start of the iterator before any state has been modified.

Protesting is really invaluable for this specific task. Whenever I make an oversight it is always caught (even tough none of the existing testcases found it). While I am not 100% sure about this I am reasonable sure that the proptest can reach 100% coverage on all permutations of the algorithm. I will look into improving the prop testing even further (and the applying it to merge)

pascalkuthe · 2022-09-27T21:31:19Z

Another update from me (sorry for the spam).

I have significantly improved the proptesting infrastructure.
I have created a datastructure that allows me to represent the highlight of each character as a bitset.
This datastructure can be easily constructed from both highlight events and spans directly.
This allows me to compare the highlighting produced by the span_iter to that of the original evenets.

This has turned up two minor bugs (which are just extensions of what I already worked on):

empty spans must always be removed because they can actually mess up the highlight of other spans, in particular span partioning can create new empty spans by moving the start of a span to its end
previously the spans vector was resorted using rotate_left this was a very nice optimization but sadly only works if the spans were only sorted by their start. I already noticed during my initial review that this is not correct because span ends must also stay in descending order. However my suggested fix was wrong: I simply also rotate the spans that had a larger end to the front. However the span ends are not actually sorted (or rather there are two distincs sorted lists) and therefore if that special case occurs, a (very small) part of the spans vec needs a full call to sort_unstable

I have pushed the fixes. Right now I am letting my PC run a billion proptestcases to ensure there are no edgecases left but I am already at a couple hundred million so I doubt any more will turn up. The proptests now cover the entire output of the algorithm so we have realistically 100% coverage once that tests concludes.

I will start looking at the merge implementation now.

pascalkuthe · 2022-10-01T00:12:36Z

(closed in favor of helix-editor#4042)

pascalkuthe added 5 commits September 25, 2022 11:49

add property testing

924e92c

fix: use descending order for span ends

447270a

fix: mainting descending span end oder during subslice

276e58b

pascalkuthe mentioned this pull request Sep 25, 2022

Rewrite Merge to take two HighlightEvent Iterators helix-editor/helix#3695

Closed

refactor: move parts of the SpanIter into seperate functions

8f92c62

pascalkuthe force-pushed the span-merge-review branch from 36d44f5 to 352894e Compare September 25, 2022 14:46

remove useless unit test

cead56c

pascalkuthe force-pushed the span-merge-review branch from 57ffd84 to cead56c Compare September 25, 2022 16:40

the-mikedavis reviewed Sep 26, 2022

View reviewed changes

pascalkuthe added 2 commits September 26, 2022 21:39

minor consmetic changes

10c4323

update documentation and replace all mentions of range with span

7de6f2a

pascalkuthe force-pushed the span-merge-review branch from 54291fd to 7de6f2a Compare September 26, 2022 19:40

make partition_spans_at function more readable

30563c4

pascalkuthe force-pushed the span-merge-review branch from b173991 to 30563c4 Compare September 26, 2022 21:03

skip empty spans

8f995d6

pascalkuthe added 2 commits September 27, 2022 21:42

test all events emitted by the span-iter during proptests

8c4fba9

always skip empty spans to avoid incorrect highlights

07cf301

pascalkuthe force-pushed the span-merge-review branch 2 times, most recently from dbc5c28 to 31c8931 Compare September 27, 2022 21:23

retain descending span end order during partioning

b123aa1

pascalkuthe force-pushed the span-merge-review branch from 31c8931 to b123aa1 Compare September 27, 2022 21:24

pascalkuthe closed this Oct 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Span merge review #3

Span merge review #3

Uh oh!

pascalkuthe commented Sep 25, 2022

Uh oh!

the-mikedavis commented Sep 25, 2022

Uh oh!

pascalkuthe commented Sep 25, 2022 •

edited

Loading

Uh oh!

pascalkuthe commented Sep 25, 2022

Uh oh!

the-mikedavis left a comment

Uh oh!

Uh oh!

Uh oh!

the-mikedavis Sep 26, 2022

Uh oh!

pascalkuthe Sep 26, 2022

Uh oh!

Uh oh!

Uh oh!

the-mikedavis Sep 26, 2022

Uh oh!

pascalkuthe Sep 26, 2022

Uh oh!

Uh oh!

the-mikedavis commented Sep 26, 2022

Uh oh!

pascalkuthe commented Sep 26, 2022 •

edited

Loading

Uh oh!

pascalkuthe commented Sep 27, 2022 •

edited

Loading

Uh oh!

pascalkuthe commented Oct 1, 2022

Uh oh!

Uh oh!

Span merge review #3

Span merge review #3

Uh oh!

Conversation

pascalkuthe commented Sep 25, 2022

Uh oh!

the-mikedavis commented Sep 25, 2022

Uh oh!

pascalkuthe commented Sep 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pascalkuthe commented Sep 25, 2022

Uh oh!

the-mikedavis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

the-mikedavis Sep 26, 2022

Choose a reason for hiding this comment

Uh oh!

pascalkuthe Sep 26, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

the-mikedavis Sep 26, 2022

Choose a reason for hiding this comment

Uh oh!

pascalkuthe Sep 26, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

the-mikedavis commented Sep 26, 2022

Uh oh!

pascalkuthe commented Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pascalkuthe commented Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pascalkuthe commented Oct 1, 2022

Uh oh!

Uh oh!

pascalkuthe commented Sep 25, 2022 •

edited

Loading

pascalkuthe commented Sep 26, 2022 •

edited

Loading

pascalkuthe commented Sep 27, 2022 •

edited

Loading