Skip to content

Add SpanStringTokenizer and avoid many string allocations in FontCollectionBase #17745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Washi1337
Copy link
Contributor

@Washi1337 Washi1337 commented Dec 10, 2024

What does the pull request do?

This PR creates a ref struct variant of StringTokenizer called SpanStringTokenizer, which holds a ReadOnlySpan<char> instead of a System.String. Additionally, this also adds EnumHelpers::TryParse shims accepting spans as input parameter. This drastically reduces the number of allocations of unnecessary System.String objects in FontCollectionBase and friends.

What is the current behavior?

When creating a TextRunProperties (e.g., a GenericTextRunProperties), somewhere down the line Avalonia.Skia accesses Typeface.GlyphTypeface. This in turn eventually bubbles down all the way to FontCollectionBase::GetImplicitTypeface, which tries to normalize the typeface by carving out the font style, weight and stretch out of the full family name. It does so with the help of StringTokenizer.

For example, below is the logic for carving out the font weight out of the full family name:

internal static bool TryGetWeight(ref string familyName, out FontWeight weight)
{
weight = FontWeight.Normal;
var tokenizer = new StringTokenizer(familyName, ' ');
tokenizer.ReadSpan();
while (tokenizer.TryReadString(out var weightString))
{
if (new StringTokenizer(weightString).TryReadInt32(out _))
{
continue;
}
if (!Enum.TryParse(weightString, true, out weight))
{
continue;
}
familyName = familyName.Replace(" " + weightString, "").TrimEnd();
return true;
}
return false;
}

StringTokenizer takes in a System.String object, meaning that all parsing needs to have a fully concretized string. This requires every single token returned by the tokenizer to be fully concretized to a System.String, or else we cannot determine whether it was an integer, nor can we pass it into Enum.TryParse to match it with font modifiers. Additionally, even more new string allocations are made by concatenating strings and calling Replace and TrimEnd. Finally, since there are three types of modifiers, there are three very similar parsing functions that all do extensive string object allocations. This all to simply parse a font family name.

image

While the glyph typeface is cached in the internal CachedGlyphTypeface property, only slightly adjusting the visual appearance of text requires a new TextRunProperties, and thus results in the creation of a new cached glyph typeface. This happens in projects such as AvaloniaEdit and AvaloniaHex, which need to create many TextRunProperties instances to implement syntax highlighting. Additionally, every TextBlock control creates a new instance as well. As far as I know, this property is not exposed nor can a TextRunProperties be cloned efficiently with this cached glyph typeface either. Even though, out of scope for this PR, this potentially could also be an interesting avenue for further optimizations.

What is the updated/expected behavior with this PR?

Drastically reduced observed System.String allocations when creating many TextRunProperties:

image2

How was the solution implemented (if it's not obvious)?

  • A new internal ref struct SpanStringTokenizer which is an exact copy of StringTokenizer but stores a ReadOnlySpan<char> as opposed to a string.
  • References to StringTokenizer were updated to SpanStringTokenizer throughout the entire Avalonia solution.
  • StringTokenizer is now marked obsolete.
  • EnumHelpers now include TryParse shims.
  • Carving out the modifiers is now done via a lazy initialized StringBuilder.

Checklist

  • Added unit tests (if possible)?

Added a copy of StringTokenizerTests but specifically for SpanStringTokenizer.

Breaking changes

The only technically publicly visible breaking change is turning GridLength::ParseLenghts into eagerly materializing a List<GridLength>, as ref structs are not allowed in conjunction with yield return. However I don't think this should be much of an issue:

         public static IEnumerable<GridLength> ParseLengths(string s)
         {
-            using (var tokenizer = new StringTokenizer(s, CultureInfo.InvariantCulture))
+            var result = new List<GridLength>();
+
+            using (var tokenizer = new SpanStringTokenizer(s, CultureInfo.InvariantCulture))
             {
                 while (tokenizer.TryReadString(out var item))
                 {
-                    yield return Parse(item);
+                    result.Add(Parse(item));
                 }
             }
+
+            return result;

Issues

@Washi1337
Copy link
Contributor Author

Tests seem only to be failing because of the removal of record and IDisposable of the internal struct.

@Washi1337 Washi1337 force-pushed the feature/ref-struct-string-tokenizer branch from 495fcf5 to b1f773d Compare December 11, 2024 12:51
@Washi1337 Washi1337 changed the title Mark StringTokenizer as a ref struct and avoid many string allocations in FontCollectionBase Add SpanStringTokenizer avoid many string allocations in FontCollectionBase Dec 11, 2024
@Washi1337 Washi1337 changed the title Add SpanStringTokenizer avoid many string allocations in FontCollectionBase Add SpanStringTokenizer and avoid many string allocations in FontCollectionBase Dec 11, 2024
@avaloniaui-bot
Copy link

You can test this PR using the following package version. 11.3.999-cibuild0053832-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

@Washi1337
Copy link
Contributor Author

I reverted StringTokenizer and added SpanStringTokenizer. Not sure what happened to the macOS runner, but all other tests seem to pass now.

@Washi1337 Washi1337 force-pushed the feature/ref-struct-string-tokenizer branch from b1f773d to 4ee848e Compare December 13, 2024 22:45
@avaloniaui-bot
Copy link

You can test this PR using the following package version. 11.3.999-cibuild0053883-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

Copy link
Member

@MrJul MrJul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A remark aside, LGTM, thank you!
It's always good to have fewer allocations :)

@MrJul
Copy link
Member

MrJul commented Dec 15, 2024

I'm sorry to hijack this PR for a moment.

@Alexizx00789999 could you please stop approving all pending pull requests? It won't help them get team approval any faster. This just adds unnecessary noise.

Copy link
Member

@MrJul MrJul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@MrJul MrJul enabled auto-merge December 15, 2024 12:14
@avaloniaui-bot
Copy link

You can test this PR using the following package version. 11.3.999-cibuild0053901-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

@avaloniaui-bot
Copy link

You can test this PR using the following package version. 11.3.999-cibuild0053903-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

@MrJul MrJul added this pull request to the merge queue Dec 15, 2024
Merged via the queue into AvaloniaUI:master with commit e991d63 Dec 15, 2024
10 checks passed
@Washi1337 Washi1337 deleted the feature/ref-struct-string-tokenizer branch December 15, 2024 14:33
@AvaloniaUI AvaloniaUI deleted a comment Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants