-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize TextColor::GetColor #10779
Conversation
316a285
to
69adabb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱
69adabb
to
4f96a19
Compare
FYI, one of things I've been working on is a refactoring of the color handling, which would include removing that looping color match altogether. Essentially there would be a separate color table entry for default bold which would get set automatically whenever the color table is updated. This is necessary to support a user-chosen bold color (issue #5682), as well as some of the DEC VT52x color sequences that can set the bold color. Technically we might still want to do a looping color match if the bold color is set to "auto", but that would happen only when the color table is changed, rather than on every color lookup, so performance is less of an issue. |
This PR doesn't noticeably improve performance. I just did it because I was trying to get back into vectorized programming. The value I see in this is that it sets a precedent of being less afraid of low-level operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this.
I do like that you're dipping our toes into being able to do these sorts of things. We leveraged stuff like this with libpopcnt, but it's cool to see us do it ourselves too.
If you didn't "just write it for a sunny day"... I probably wouldn't advise writing it. But it exists. So let's check it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I'm, not really sure if I'm qualified to sign off on this, but the comments are excellent so I trust that it works.
const auto needle = _mm256_set1_epi32(__builtin_bit_cast(int, defaultColor)); // 2. | ||
const auto result = _mm256_cmpeq_epi32(haystack, needle); // 3. | ||
const auto mask = _mm256_movemask_ps(_mm256_castsi256_ps(result)); // 4. | ||
unsigned long index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to get initialized to 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_BitScanForward
compiles to the assembly instruction bsf
, whose resulting value (the one stored in index
) is undefined if the given value has no set bits. Since I only use index
when _BitScanForward
return true (meaning at least 1 bit was set) I'm never reading any uninitialized values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okeydokey good enough for me!
Hello @lhecker! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
@lhecker You might want to also add run-time dispatch to one of the version, because you won't ship AVX2 built terminal, do you? So this code is basically dead coz no one will build/use it. |
@kasper93 The difference between the SSE2 and AVX2 variant are only 1.5ns/call vs. 1.6ns/call. The only reason the AVX2 variant exists is because it made explaining the code easier (and because I build my own terminal with But I'm adding more important & impactful vectorized code in the future, which I've already planned to use dynamic dispatch. Currently I've got my own dispatch logic but I might use Google's highway or the SIMD-everywhere lib when I finalize the code. |
I was watching a video about vectorized instructions and I wanted to
try out some new things, as I had never written AVX code before.
This commit is the result of this tiny Thursday morning detour into
AVX land. It improves performance of
TextColor::GetColor
by about 3x.Validation Steps Performed