Skip to content

Use interpreted dynamic delegates when they're invoked just once, instead of compiling #29814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
roji opened this issue Dec 9, 2022 · 1 comment · Fixed by #29815
Closed

Comments

@roji
Copy link
Member

roji commented Dec 9, 2022

In various places in the code, we compile an expression tree to a delegate, and then invoke it just once. For example, in ParameterExtractingExpressionVisitor, fragments of the expression tree which can be client-evaluated are evaluated by compiling those fragments into a delegate and then invoking that delegate.

The expression compilation API allows controlling whether the resulting delegate is compiled to IL (to be later JITted), or whether the delegate should just be interpreted. Doing the work of compilation (and jitting!) to just run the code once is quite inefficient - see benchmark below.

In addition, it's possible that continuously compiling delegates causes code bloat (not sure of the JITted code ever goes away). Interpretation should mitigate that too.

Method TreeDepth NodeType DelegateType Mean Error StdDev
Once 1 Add Compiled 0.5192 ns 0.0402 ns 0.0356 ns
Once 1 Add Interpreted 91.9521 ns 0.5005 ns 0.4437 ns
Once 1 Call Compiled 0.5168 ns 0.0041 ns 0.0032 ns
Once 1 Call Interpreted 94.2390 ns 0.4193 ns 0.3502 ns
EveryTime 1 Add Compiled 37,249.1127 ns 736.1745 ns 1,167.6508 ns
EveryTime 1 Add Interpreted 1,636.6028 ns 19.6829 ns 18.4114 ns
EveryTime 1 Call Compiled 45,410.4435 ns 899.1323 ns 841.0489 ns
EveryTime 1 Call Interpreted 1,680.2431 ns 12.5205 ns 11.0991 ns
Once 10 Add Compiled 0.5199 ns 0.0132 ns 0.0117 ns
Once 10 Add Interpreted 265.8891 ns 2.2637 ns 2.1175 ns
Once 10 Call Compiled 0.0618 ns 0.0026 ns 0.0020 ns
Once 10 Call Interpreted 259.1131 ns 0.6866 ns 0.6423 ns
EveryTime 10 Add Compiled 45,024.5658 ns 887.9222 ns 2,057.8948 ns
EveryTime 10 Add Interpreted 2,802.1430 ns 7.6220 ns 6.7567 ns
EveryTime 10 Call Compiled 75,013.6080 ns 1,449.9254 ns 2,170.1801 ns
EveryTime 10 Call Interpreted 3,402.9787 ns 3.6309 ns 2.8347 ns
Once 100 Add Compiled 0.5194 ns 0.0188 ns 0.0166 ns
Once 100 Add Interpreted 1,885.6130 ns 3.0590 ns 2.5544 ns
Once 100 Call Compiled 0.5262 ns 0.0039 ns 0.0033 ns
Once 100 Call Interpreted 1,927.5032 ns 3.5941 ns 3.0012 ns
EveryTime 100 Add Compiled 77,491.0032 ns 1,016.8876 ns 951.1973 ns
EveryTime 100 Add Interpreted 13,941.0736 ns 119.1034 ns 105.5821 ns
EveryTime 100 Call Compiled 352,433.7320 ns 3,664.6569 ns 3,248.6229 ns
EveryTime 100 Call Interpreted 20,881.0007 ns 175.9068 ns 155.9368 ns
Once 1000 Add Compiled 0.5217 ns 0.0046 ns 0.0036 ns
Once 1000 Add Interpreted 17,504.4514 ns 78.3508 ns 65.4264 ns
Once 1000 Call Compiled 539.6946 ns 4.2658 ns 3.7815 ns
Once 1000 Call Interpreted 17,988.3271 ns 32.0835 ns 26.7912 ns
EveryTime 1000 Add Compiled 486,345.0278 ns 1,983.4168 ns 1,758.2474 ns
EveryTime 1000 Add Interpreted 130,073.6554 ns 1,509.4981 ns 1,411.9855 ns
EveryTime 1000 Call Compiled 7,619,107.3270 ns 20,203.8816 ns 17,910.2148 ns
EveryTime 1000 Call Interpreted 175,068.3599 ns 2,508.8744 ns 2,346.8026 ns

Notes:

  • The Once scenarios are only included for reference; it's obviously better to compile when the same delegate is executed many times.
  • The expression tree benchmarked is composed of variable number of Add or Call nodes, to see the effect for various node types and various tree depths. Interpreting is always better than compiling for one-off scenarios.
Benchmark code
BenchmarkRunner.Run<Benchmark>();

public class Benchmark
{
    [Params(1, 10, 100, 1000)]
    public int TreeDepth { get; set; }

    [Params(NodeType.Call, NodeType.Add)]
    public NodeType NodeType { get; set; }

    [Params(DelegateType.Compiled, DelegateType.Interpreted)]
    public DelegateType DelegateType { get; set; }

    private Expression<Func<int, int>> _expressionTree;
    private Func<int, int> _compiledDelegate;

    [GlobalSetup]
    public void Setup()
    {
        _expressionTree = GenerateExpressionTree(TreeDepth);
        _compiledDelegate = _expressionTree.Compile(preferInterpretation: DelegateType == DelegateType.Interpreted);
    }

    private Expression<Func<int, int>> GenerateExpressionTree(int depth)
    {
        var p = Expression.Parameter(typeof(int));
        var node = CreateNode(p, Expression.Constant(1));

        for (var i = 1; i < depth; i++)
            node = CreateNode(node, Expression.Constant(1));

        return Expression.Lambda<Func<int, int>>(node, p);

        Expression CreateNode(Expression a, Expression b)
            => NodeType == NodeType.Add
                ? Expression.Add(a, b)
                : Expression.Call(MyAddMethod, a, b);
    }

    [Benchmark]
    public int Once()
        => _compiledDelegate(8);

    [Benchmark]
    public int EveryTime()
    {
        var compiledDelegate = _expressionTree.Compile(preferInterpretation: DelegateType == DelegateType.Interpreted);
        return compiledDelegate(8);
    }

    private static readonly MethodInfo MyAddMethod
        = typeof(Benchmark).GetMethod(nameof(MyAdd), new[] { typeof(int), typeof(int) })!;

    public static int MyAdd(int x, int y) => x + y;
}

public enum DelegateType
{
    Compiled,
    Interpreted
}

public enum NodeType
{
    Add,
    Call
}
@roji
Copy link
Member Author

roji commented Dec 10, 2022

Note: this may cause a size increase when trimming on CoreCLR (but not on NativeAOT), since the interpreter should get trimmed there unless explicitly opted into. However, I think this still makes a lot of sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants