-
Notifications
You must be signed in to change notification settings - Fork 3
Span merge review #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Span merge review #3
Conversation
This bug occurs if: * A span needs to be subliced at the end of an in-progress span * Another shorter span exists at the same poisitiotion * That span ends before the in-progress span does An exammple would look roughly like this: D |--| C |-------| B |----------| A |---------| Because spans at the same position are sorted in the descending order, the largest span (B) is processed first and subslicing is started. The current code then moves the start of all subslices at the same position to the end of the of the in-progress span (A). This is correctt in many cases (like C) where the spans at the same poistion extend past the end of the in-progress span (A). However if any spans (C) are shorted than the in-progress span (A), the start of these slices will be moved past their end. To fix this, the code now first splits all spans that end before the end of the in-progress span (A) (in this case B and C) into subslices. This is identical to the previous behaviour except that the subslicing stops as soon as the first span whose end starts before the end of the in-progress span (A) is encountered. Afterwards the span list is resorted (unchanged) to ensure the new subslices are at the correct posistion. Finally the spans that are shorter than the in-progress span (A) are processed normally (no subslicing).
By sorting regends in descending order, the ranges that end before the next span can simply be popped off the end. This allows turning two inner loops (final loop and retain) into simpel if conditions that return early. This presents a nice peformance win in two ways: * Do no iterate the entire `range_ends` vector during retain * No need to move all elements in range_ends just to remove front of elements * Many events can be returned directly instead of pushing to the VecDeque
Sweet, thanks for taking a look at this in such depth! Having another pair of eyes on this should hopefully help iron out those last remaining bugs. It might take me a while to look into the changes properly but a few notes off the top of my head:
|
36d44f5
to
352894e
Compare
Sorry about the failing tests somehow I made a small mistake while factoring out a function and forgot to save before running cargo test (still used to autosaving from nvim) :D
Ah that makes sense so it was intentional. I wasn't quite sure because the documentation of the I guess you probably found similar edge cases to me. The edge cases I fixed here actually also occur for ascending order but instead of completely nonsensical spans (end past the start) this leads to subtlety broken highlighting which none of the previous tests would have picked up. Note that I have removed the |
57ffd84
to
cead56c
Compare
It seems that dropping empty spans can cause a mismatched number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was only able to look at this superficially so far but I'm liking what I see 😀
helix-core/src/syntax/span.rs
Outdated
while let Some(span) = self.spans.get(i) { | ||
if span <= &intersect_pos { | ||
after = Some(i); | ||
i += 1; | ||
} else { | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted for the loop { match { Some(range) if .. { .. }, _ => break, } }
pattern since I thought the while let
version was less clear that we're only looking for exactly one pattern. What do you think, is the while let
version more idiomatic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally like the while loop better, because it looks less cluttered.
I personally always prefer if
to match
if there is just one case that I actually care about.
Its also closer to my ideal solution which would be a let_chain whenever that becomes available on a rust version we can use: while let Some(span) = self.spans.get(i) && span <= &intersect_pos
The while
loop is just a standard vec iterator so when scanning the code its immediately) obvious what it does and I can focus on the content of the loop (where anything interesting is happening).
However thinking about this more this can actually be solved with an iterator which makes it much clearer.
This also inspired me to rename some variables to make the behaviour of partition_spans_at
much clearer.
These changes are in 30563c4. Let me know what you think
helix-core/src/syntax/span.rs
Outdated
// the runtime is even better since we only scan from `self.index` to | ||
// the first element of the Vec with a `range.start` after this range. | ||
let mut after = None; | ||
let intersect_pos = Span { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would read clearer as intersect_span
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 2030231.
This made me notice that the term range
is kind of used randomly instead of span
. I think it might be nice to consistently use the term span
instead. I did that in 7de6f2a and also expanded on the documentation of partition_spans_at
. Do you think the renaming is better or should I revert that commit (except for the documentation)?
From the upstream PR review:
I'm much more confident about that implementation: the procedure is much simpler - basically the same problem as merging two lists which are already sorted - I haven't found any bugs in it yet (compared to That being said, I agree it's a good place for more proptesting 👍 |
54291fd
to
7de6f2a
Compare
b173991
to
30563c4
Compare
I have fixed this now in 8f995d6. I simply dropped the empty spans too late where they already had some influence on the state of the iterator and ignoring them left the iterator in an invalid state. I now drop the right at the start of the iterator before any state has been modified. Protesting is really invaluable for this specific task. Whenever I make an oversight it is always caught (even tough none of the existing testcases found it). While I am not 100% sure about this I am reasonable sure that the proptest can reach 100% coverage on all permutations of the algorithm. I will look into improving the prop testing even further (and the applying it to merge) |
dbc5c28
to
31c8931
Compare
31c8931
to
b123aa1
Compare
Another update from me (sorry for the spam). I have significantly improved the proptesting infrastructure. This has turned up two minor bugs (which are just extensions of what I already worked on):
I have pushed the fixes. Right now I am letting my PC run a billion proptestcases to ensure there are no edgecases left but I am already at a couple hundred million so I doubt any more will turn up. The proptests now cover the entire output of the algorithm so we have realistically 100% coverage once that tests concludes. I will start looking at the merge implementation now. |
(closed in favor of helix-editor#4042) |
I did a pretty comprehensive review of the SpanIter from helix-editor#3695.
I found a couple of bugs, and some opportunities for performance improvement.
The changes are complex and quite a lot of the so I did not want to suggest them trough the GitHub web UI.
See my comments in helix-editor#3695 for more details