Vectorize TextColor::GetColor #10779

lhecker · 2021-07-24T00:09:58Z

I was watching a video about vectorized instructions and I wanted to
try out some new things, as I had never written AVX code before.
This commit is the result of this tiny Thursday morning detour into
AVX land. It improves performance of TextColor::GetColor by about 3x.

Validation Steps Performed

Default colors are still properly shifted +8 ✔️

zadjii

😱

j4james · 2021-07-24T10:11:38Z

FYI, one of things I've been working on is a refactoring of the color handling, which would include removing that looping color match altogether. Essentially there would be a separate color table entry for default bold which would get set automatically whenever the color table is updated. This is necessary to support a user-chosen bold color (issue #5682), as well as some of the DEC VT52x color sequences that can set the bold color.

Technically we might still want to do a looping color match if the bold color is set to "auto", but that would happen only when the color table is changed, rather than on every color lookup, so performance is less of an issue.

lhecker · 2021-07-24T13:20:15Z

This PR doesn't noticeably improve performance. I just did it because I was trying to get back into vectorized programming. The value I see in this is that it sets a precedent of being less afraid of low-level operations.

miniksa

I'm fine with this.

I do like that you're dipping our toes into being able to do these sorts of things. We leveraged stuff like this with libpopcnt, but it's cool to see us do it ourselves too.

If you didn't "just write it for a sunny day"... I probably wouldn't advise writing it. But it exists. So let's check it in.

zadjii-msft

Honestly I'm, not really sure if I'm qualified to sign off on this, but the comments are excellent so I trust that it works.

zadjii-msft · 2021-08-02T13:22:35Z

src/buffer/out/TextColor.cpp

+            const auto needle = _mm256_set1_epi32(__builtin_bit_cast(int, defaultColor)); // 2.
+            const auto result = _mm256_cmpeq_epi32(haystack, needle); // 3.
+            const auto mask = _mm256_movemask_ps(_mm256_castsi256_ps(result)); // 4.
+            unsigned long index;


Does this need to get initialized to 0?

_BitScanForward compiles to the assembly instruction bsf, whose resulting value (the one stored in index) is undefined if the given value has no set bits. Since I only use index when _BitScanForward return true (meaning at least 1 bit was set) I'm never reading any uninitialized values.

okeydokey good enough for me!

ghost · 2021-08-02T19:01:43Z

Hello @lhecker!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.

kasper93 · 2021-08-07T17:16:47Z

@lhecker You might want to also add run-time dispatch to one of the version, because you won't ship AVX2 built terminal, do you? So this code is basically dead coz no one will build/use it.

lhecker · 2021-08-07T19:38:35Z

@kasper93 The difference between the SSE2 and AVX2 variant are only 1.5ns/call vs. 1.6ns/call. The only reason the AVX2 variant exists is because it made explaining the code easier (and because I build my own terminal with /arch:AVX2). This code in particular isn't all that important to be honest.

But I'm adding more important & impactful vectorized code in the future, which I've already planned to use dynamic dispatch. Currently I've got my own dispatch logic but I might use Google's highway or the SIMD-everywhere lib when I finalize the code.

lhecker force-pushed the dev/lhecker/vectorize-text-color branch from 316a285 to 69adabb Compare July 24, 2021 01:27

zadjii reviewed Jul 24, 2021

View reviewed changes

Vectorize TextColor::GetColor

4f96a19

lhecker force-pushed the dev/lhecker/vectorize-text-color branch from 69adabb to 4f96a19 Compare July 24, 2021 02:51

miniksa reviewed Jul 27, 2021

View reviewed changes

miniksa approved these changes Jul 27, 2021

View reviewed changes

lhecker marked this pull request as ready for review July 27, 2021 21:08

zadjii-msft approved these changes Aug 2, 2021

View reviewed changes

lhecker added Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) Area-Performance Performance-related issue AutoMerge Marked for automatic merge by the bot when requirements are met labels Aug 2, 2021

ghost merged commit fc64ff3 into main Aug 2, 2021

ghost deleted the dev/lhecker/vectorize-text-color branch August 2, 2021 19:03

j4james mentioned this pull request Aug 4, 2021

Default bold color is no longer working as expected #10866

Closed

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize TextColor::GetColor #10779

Vectorize TextColor::GetColor #10779

lhecker commented Jul 24, 2021 •

edited

Loading

zadjii left a comment

j4james commented Jul 24, 2021

lhecker commented Jul 24, 2021

miniksa left a comment

zadjii-msft left a comment

zadjii-msft Aug 2, 2021

lhecker Aug 2, 2021

zadjii-msft Aug 2, 2021

ghost commented Aug 2, 2021

kasper93 commented Aug 7, 2021

lhecker commented Aug 7, 2021

Vectorize TextColor::GetColor #10779

Vectorize TextColor::GetColor #10779

Conversation

lhecker commented Jul 24, 2021 • edited Loading

Validation Steps Performed

zadjii left a comment

Choose a reason for hiding this comment

j4james commented Jul 24, 2021

lhecker commented Jul 24, 2021

miniksa left a comment

Choose a reason for hiding this comment

zadjii-msft left a comment

Choose a reason for hiding this comment

zadjii-msft Aug 2, 2021

Choose a reason for hiding this comment

lhecker Aug 2, 2021

Choose a reason for hiding this comment

zadjii-msft Aug 2, 2021

Choose a reason for hiding this comment

ghost commented Aug 2, 2021

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

kasper93 commented Aug 7, 2021

lhecker commented Aug 7, 2021

lhecker commented Jul 24, 2021 •

edited

Loading

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.