Skip to content

Restructure AggregationBinder and ComputeBinder #1378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

gathogojr
Copy link
Contributor

@gathogojr gathogojr commented Dec 19, 2024

This pull request restructures the AggregationBinder and ComputeBinder used in OData's LINQ translation process to improve extensibility.

Restructuring AggregationBinder for Extensibility

The primary goal is to modularize the three transformation steps:

  1. Flattening referenced properties – Ensures Entity Framework (EF) generates efficient SQL by avoiding nested queries.
  2. Constructing GroupBy expressions – Converts $apply grouping into LINQ GroupBy.
  3. Constructing Select expressions – Maps aggregation results into structured output.

This restructuring introduces well-defined interfaces (IGroupByWrapper, IAggregationPropertyContainer, IFlattentingWrapper, etc.) to enforce a type-safe contract, ensuring extensibility while preventing unintended mix-and-match of unrelated implementations.

After an OData $apply expression is transformed into a parse tree represented by a TransformationNode, it undergoes a three-step process to translate it into a LINQ expression tree:

  1. Pre-flatten referenced properties in aggregate clauses to prevent the generation of nested queries by Entity Framework (EF). This restructuring results in more efficient SQL generation.
  2. Translate the grouping aspects of an OData $apply clause into the corresponding LINQ GroupBy expression.
  3. Translate the projection aspects of an OData $apply clause into the corresponding LINQ Select expression.

The following sections will explain each step in detail, using illustrations while also describing the APIs introduced to support extensibility..

Example OData Query

Consider the following OData query containing a $apply expression:

/Sales?$apply=groupby((Year,Quarter),aggregate(Amount with average as AverageAmount,Amount with sum as SumAmount))

Class diagram for the data model:
image

It's easier to cover the introduced interfaces by going over the second step - constructing GroupBy expression, before the first step - pre-flattening referenced properties.

Constructing the GroupBy Expression

To support the construction of the GroupBy expression, we expose the BindGroupBy method with the following signature:

Expression BindGroupBy(TransformationNode transformationNode, QueryBinderContext context)

The BindGroupBy method takes the parse tree represented by the TransformationNode and constructs a lambda expression representing the body of the GroupBy expression. For the OData $apply expression introduced earlier, the expression will look as follows:

$it => new GroupByWrapper()
{
    GroupByContainer = new AggregationPropertyContainer()
    {
        Name = "Quarter",
        Value = $it.Source.Quarter,
        Next = new LastInChain()
        {
            Name = "Year",
            Value = Convert($it.Source.Year, Object)
        }
    }
}

The process involves:

  1. Creating a wrapper that encapsulates grouping properties.
  2. Formulating a lambda expression to define the grouping key.
    The wrapper used in constructing the GroupBy expression must inherit from DynamicTypeWrapper and implement the IGroupByWrapper<TContainer, TWrapper> interface:
public interface IGroupByWrapper<TContainer, TWrapper>
    where TContainer : IAggregationPropertyContainer<TWrapper, TContainer>
    where TWrapper : IGroupByWrapper<TContainer, TWrapper>
{
    TContainer GroupByContainer { get; set; }
    TContainer Container { get; set; }
}

TContainer represents the aggregation property container, while TWrapper represents the groupby wrapper. This establishes a type-safe pairing.

Similarly, IAggregationPropertyContainer<TWrapper, TContainer> is defined as:

public interface IAggregationPropertyContainer<TWrapper, TContainer>
    where TWrapper : IGroupByWrapper<TContainer, TWrapper>
    where TContainer : IAggregationPropertyContainer<TWrapper, TContainer>
{
    string Name { get; set; }
    object Value { get; set; }
    TWrapper NestedValue { get; set; }
    IAggregationPropertyContainer<TWrapper, TContainer> Next { get; set; }
    void ToDictionaryCore(Dictionary<string, object> dictionary, IPropertyMapper propertyMapper, bool includeAutoSelected);
}

This bidirectional dependency ensures that each wrapper and container type can only interact with valid implementations.

Here's how implementations of the two interfaces would look:

internal class GroupByWrapper : IGroupByWrapper<AggregationPropertyContainer, GroupByWrapper>
{
    public AggregationPropertyContainer GroupByContainer { get; set; }
    public AggregationPropertyContainer Container { get; set; }
}

internal class AggregationPropertyContainer : IAggregationPropertyContainer<GroupByWrapper, AggregationPropertyContainer>
{
    public string Name { get; set; }
    public object Value { get; set; }
    public GroupByWrapper NestedValue { get; set; }
    public IAggregationPropertyContainer<GroupByWrapper, AggregationPropertyContainer> Next { get; set; }
    public void ToDictionaryCore(Dictionary<string, object> dictionary, IPropertyMapper propertyMapper, bool includeAutoSelected) => throw new NotImplementedException();
}

Pre-flattening Referenced Properties

To support flattening of referenced properties, we expose the FlattenReferencedProperties method:

AggregationFlatteningResult FlattenReferencedProperties(TransformationNode transformationNode, IQueryable query, QueryBinderContext context)

The AggregationFlatteningResult type encapsulates the outcome of the flattening process.

For the OData $apply expression above, the result of flattening the original query source would be:

$it => new FlatteningWrapper<Sale>()
{
    Source = $it,
    GroupByContainer = new AggregationPropertyContainer()
    {
        Name = "Property1",
        Value = Convert($it.Amount, Object),
        Next = new LastInChain()
        {
            Name = "Property0",
            Value = Convert($it.Amount, Object)
        }
    }
}

The wrapper used in flattening must inherit from DynamicTypeWrapper and implement the IGroupByWrapper and IFlatteningWrapper<T> interfaces:

public interface IFlatteningWrapper<T>
{
    T Source { get; set; }
}

internal class FlatteningWrapper<T> : GroupByWrapper, IGroupByWrapper<AggregationPropertyContainer, GroupByWrapper>, IFlatteningWrapper<T>
{
    public T Source { get; set; }
}

Constructing the Select Expression

To support the construction of the Select expression, we expose the BindSelect method:

Expression BindSelect(TransformationNode transformationNode, QueryBinderContext context)

The BindSelect method constructs a lambda expression representing the body of the Select expression. For the OData $apply expression introduced earlier, the expression will look as follows:

$it => new AggregationWrapper()
{
    GroupByContainer = $it.Key.GroupByContainer,
    Container = new AggregationPropertyContainer()
    {
        Name = "SumAmount",
        Value = Convert(
            Convert($it, IEnumerable<SomeType>)
                .Sum($it => Convert(
                    Convert($it.GroupByContainer.Next, AggregationPropertyContainer).Value, 
                    Decimal)
                ), 
            Object
        ),
        Next = new LastInChain()
        {
            Name = "AverageAmount",
            Value = Convert(
                Convert($it, IEnumerable<SomeType>)
                    .Average($it => Convert(
                        Convert($it.GroupByContainer, AggregationPropertyContainer).Value, 
                        Decimal)
                    ), 
                Object
            )
        }
    }
}

Bringing Everything Together

The ApplyBind method in BinderExtensions combines everything to construct the final modified query source. The FlattenReferenceProperties method is invoked conditionally, followed by BindGroupBy and BindSelect.

For the OData $apply expression introduced earlier, the final query source will look as follows:

[Microsoft.EntityFrameworkCore.Query.EntityQueryRootExpression]
    .Select($it => new FlatteningWrapper<T>()
    {
        Source = $it,
        GroupByContainer = new AggregationPropertyContainer()
        {
            Name = "Property1",
            Value = Convert($it.Amount, Object),
            Next = new LastInChain()
            {
                Name = "Property0",
                Value = Convert($it.Amount, Object)
            }
        }
    })
    .GroupBy($it => new GroupByWrapper()
    {
        GroupByContainer = new AggregationPropertyContainer()
        {
            Name = "Quarter",
            Value = $it.Source.Quarter,
            Next = new LastInChain()
            {
                Name = "Year",
                Value = Convert($it.Source.Year, Object)
            }
        }
    })
    .Select($it => new AggregationWrapper()
    {
        GroupByContainer = $it.Key.GroupByContainer,
        Container = new AggregationPropertyContainer()
        {
            Name = "SumAmount",
            Value = Convert(
                Convert($it, IEnumerable<T>)
                    .Sum($it => Convert(
                        Convert($it.GroupByContainer.Next, AggregationPropertyContainer).Value, 
                        Decimal)
                    ), 
                Object
            ),
            Next = new LastInChain()
            {
                Name = "AverageAmount",
                Value = Convert(
                    Convert($it, IEnumerable<T>)
                        .Average($it => Convert(
                            Convert($it.GroupByContainer, AggregationPropertyContainer).Value, 
                            Decimal)
                        ), 
                    Object
                )
            }
        }
    });

Why is Flattening Referenced Properties Necessary?

Flattening referenced properties is crucial when navigation properties are referenced in the aggregate transformation. For example, consider the following OData query referencing a navigation property in the aggregate clause:

Sales?$apply=groupby((Year,Quarter),aggregate(Product/TaxRate with average as AverageProductTaxRate))

Comparing the LINQ expressions and SQL queries generated with and without flattening demonstrates the efficiency gained by flattening.

Flattened - LINQ Expression:

[Microsoft.EntityFrameworkCore.Query.EntityQueryRootExpression]
    .Select($it => new FlatteningWrapper`1() 
    {
        Source = $it, 
        GroupByContainer = new LastInChain() 
        {
            Name = "Property0", 
            Value = Convert($it.Product.TaxRate, Object)
        }
    })
    .GroupBy($it => new GroupByWrapper() 
    {
        GroupByContainer = new AggregationPropertyContainer() 
        {
            Name = "Quarter", 
            Value = $it.Source.Quarter, 
            Next = new LastInChain() 
            {
                Name = "Year", 
                Value = Convert($it.Source.Year, Object)
            }
        }
    })
    .Select($it => new AggregationWrapper() 
    {
        GroupByContainer = $it.Key.GroupByContainer, 
        Container = new LastInChain() 
        {
            Name = "AverageProductTaxRate", 
            Value = Convert(
                Convert($it, IEnumerable`1)
                    .Average($it => Convert(
                        Convert($it.GroupByContainer, AggregationPropertyContainer).Value, Decimal
                    )), 
                Object
            )
        }
    });

Non-flattened - LINQ Expression:

[Microsoft.EntityFrameworkCore.Query.EntityQueryRootExpression]
    .GroupBy($it => new GroupByWrapper() 
    {
        GroupByContainer = new AggregationPropertyContainer() 
        {
            Name = "Quarter", 
            Value = $it.Quarter, 
            Next = new LastInChain() 
            {
                Name = "Year", 
                Value = Convert($it.Year, Object)
            }
        }
    })
    .Select($it => new AggregationWrapper() 
    {
        GroupByContainer = $it.Key.GroupByContainer, 
        Container = new LastInChain() 
        {
            Name = "AverageProductTaxRate", 
            Value = Convert(
                Convert($it, IEnumerable`1)
                    .Average($it => $it.Product.TaxRate), 
                Object
            )
        }
    });

Flattened - SQL Query:

SELECT 
    [GroupBy1].[K1] AS [Year], 
    [GroupBy1].[K3] AS [C1], 
    [GroupBy1].[K4] AS [C2], 
    [GroupBy1].[K2] AS [Quarter], 
    [GroupBy1].[K5] AS [C3], 
    [GroupBy1].[K6] AS [C4], 
    N'AverageProductTaxRate' AS [C5], 
    [GroupBy1].[A1] AS [A1]
    FROM ( SELECT 
        [Join1].[K1] AS [K1], 
        [Join1].[K2] AS [K2], 
        [Join1].[K3] AS [K3], 
        [Join1].[K4] AS [K4], 
        [Join1].[K5] AS [K5], 
        [Join1].[K6] AS [K6], 
        AVG([Join1].[A1_0]) AS [A1]
        FROM ( SELECT 
            [Extent1].[Year] AS [K1], 
            [Extent1].[Quarter] AS [K2], 
            1 AS [K3], 
            N'Quarter' AS [K4], 
            1 AS [K5], 
            N'Year' AS [K6], 
            [Extent2].[TaxRate] AS [A1_0]
            FROM  [dbo].[Sales] AS [Extent1]
            INNER JOIN [dbo].[Products] AS [Extent2] ON [Extent1].[ProductId] = [Extent2].[Id]
        )  AS [Join1]
        GROUP BY [K1], [K2], [K3], [K4], [K5], [K6]
    )  AS [GroupBy1]

Non-flattened - SQL Query:

SELECT 
    [Project2].[Year] AS [Year], 
    [Project2].[C1] AS [C1], 
    [Project2].[C2] AS [C2], 
    [Project2].[Quarter] AS [Quarter], 
    [Project2].[C3] AS [C3], 
    N'AverageProductTaxRate' AS [C4], 
    [Project2].[C4] AS [C5]
    FROM ( SELECT 
        [Distinct1].[Year] AS [Year], 
        [Distinct1].[Quarter] AS [Quarter], 
        [Distinct1].[C1] AS [C1], 
        [Distinct1].[C2] AS [C2], 
        [Distinct1].[C3] AS [C3], 
        (SELECT 
            AVG([Extent3].[TaxRate]) AS [A1]
            FROM  [dbo].[Sales] AS [Extent2]
            INNER JOIN [dbo].[Products] AS [Extent3] ON [Extent2].[ProductId] = [Extent3].[Id]
            WHERE ([Distinct1].[C2] = N'Quarter') AND (([Distinct1].[Quarter] = [Extent2].[Quarter]) OR (([Distinct1].[Quarter] IS NULL) AND ([Extent2].[Quarter] IS NULL))) AND ([Distinct1].[C3] = N'Year') AND ([Distinct1].[Year] = [Extent2].[Year])) AS [C4]
        FROM ( SELECT DISTINCT 
            [Extent1].[Year] AS [Year], 
            [Extent1].[Quarter] AS [Quarter], 
            1 AS [C1], 
            N'Quarter' AS [C2], 
            N'Year' AS [C3]
            FROM [dbo].[Sales] AS [Extent1]
        )  AS [Distinct1]
    )  AS [Project2]

Notice how we end up with nested subqueries when we don't flatten the referenced navigation properties. The subqueries become more pronounced with more navigation properties in the $apply expression.
Consider the following $apply expression referencing two navigation properties:

/Sales?$apply=groupby((Year,Quarter),aggregate(Product/TaxRate with average as AverageProductTaxRate,Product/TaxRate with sum as SumProductTaxRate))

Here's how the generated SQL query would look like when we don't take the navigation properties through the flattening process:

SELECT 
    [Project3].[Year] AS [Year], 
    [Project3].[C1] AS [C1], 
    [Project3].[C2] AS [C2], 
    [Project3].[Quarter] AS [Quarter], 
    [Project3].[C3] AS [C3], 
    N'SumProductTaxRate' AS [C4], 
    [Project3].[C4] AS [C5], 
    N'AverageProductTaxRate' AS [C6], 
    [Project3].[C5] AS [C7]
    FROM ( SELECT 
        [Project2].[Year] AS [Year], 
        [Project2].[Quarter] AS [Quarter], 
        [Project2].[C1] AS [C1], 
        [Project2].[C2] AS [C2], 
        [Project2].[C3] AS [C3], 
        [Project2].[C4] AS [C4], 
        (SELECT 
            AVG([Extent5].[TaxRate]) AS [A1]
            FROM  [dbo].[Sales] AS [Extent4]
            INNER JOIN [dbo].[Products] AS [Extent5] ON [Extent4].[ProductId] = [Extent5].[Id]
            WHERE ([Project2].[C2] = N'Quarter') AND (([Project2].[Quarter] = [Extent4].[Quarter]) OR (([Project2].[Quarter] IS NULL) AND ([Extent4].[Quarter] IS NULL))) AND ([Project2].[C3] = N'Year') AND ([Project2].[Year] = [Extent4].[Year])) AS [C5]
        FROM ( SELECT 
            [Distinct1].[Year] AS [Year], 
            [Distinct1].[Quarter] AS [Quarter], 
            [Distinct1].[C1] AS [C1], 
            [Distinct1].[C2] AS [C2], 
            [Distinct1].[C3] AS [C3], 
            (SELECT 
                SUM([Extent3].[TaxRate]) AS [A1]
                FROM  [dbo].[Sales] AS [Extent2]
                INNER JOIN [dbo].[Products] AS [Extent3] ON [Extent2].[ProductId] = [Extent3].[Id]
                WHERE ([Distinct1].[C2] = N'Quarter') AND (([Distinct1].[Quarter] = [Extent2].[Quarter]) OR (([Distinct1].[Quarter] IS NULL) AND ([Extent2].[Quarter] IS NULL))) AND ([Distinct1].[C3] = N'Year') AND ([Distinct1].[Year] = [Extent2].[Year])) AS [C4]
            FROM ( SELECT DISTINCT 
                [Extent1].[Year] AS [Year], 
                [Extent1].[Quarter] AS [Quarter], 
                1 AS [C1], 
                N'Quarter' AS [C2], 
                N'Year' AS [C3]
                FROM [dbo].[Sales] AS [Extent1]
            )  AS [Distinct1]
        )  AS [Project2]
    )  AS [Project3]

These nested subqueries are inefficient and would gradually lead to performance degradation.
Note: The nested subqueries behavior is currently reproducible with EF6. The EF Core team might have figured out how to generate efficient queries even when properties are not flattened. I'll create a ticket to scrutinize this further to determine whether to conditionally perform the flattening step.

Restructuring ComputeBinder for Extensibility

Overview of compute transformation

The compute transformation feature enables the dynamic addition of computed or derived properties to an OData query result. This allows users to define expressions that generate new virtual properties based on existing data within the query pipeline. The compute transformation supports:

  • Arithmetic Operations – Addition (+), subtraction (-), multiplication (*), division (/), etc.
  • String Manipulations – Transformation and concatenation of string values.
  • Integration with $apply – Works in conjunction with groupby and aggregate for advanced computations.

The ComputeBinder class is responsible for binding an OData compute transformation (represented as a ComputeTransformationNode) into a LINQ expression that gets applied dynamically to the query.

Introducing IComputeBinder for extensibility

To improve flexibility and maintainability, this restructuring introduces the IComputeBinder interface, which provides a contract for binding a compute transformation. This allows custom implementations of the binder in case different computation strategies are needed.

IComputeBinder interface definition:

public interface IComputeBinder
{
    Expression BindCompute(ComputeTransformationNode computeTransformationNode, QueryBinderContext context);
}

The introduction of this interface decouples the ComputeBinder logic from a specific implementation, making it easier to extend or replace in future updates.

Steps in compute transformation binding

The process of transforming an OData compute expression into a LINQ query projection involves the following steps:

  1. Create a wrapper for computed properties
  • A wrapper object is created to encapsulate the computed properties.
  • This wrapper’s Instance property is initialized to the original source object, allowing access to existing properties referenced in the compute expression.
  • Each computed property is represented as a NamedPropertyExpression, which pairs the computed property alias with its corresponding expression.
  1. Generate a lambda expression for initialization
  • A lambda expression is dynamically constructed to initialize the wrapper with the appropriate property assignments.
  • This ensures that the computed properties are properly mapped when the query executes.

Wrapper implementation for compute transformation

To support the compute transformation, the wrapper class used for computed properties must:

  • Inherit from DynamicTypeWrapper to support dynamic property additions.
  • Implement IGroupByWrapper to maintain compatibility with $apply transformations.
  • Implement IComputeWrapper<T> to provide structured access to the source object and EDM model:
    public interface IComputeWrapper<T>
    {
        public T Instance { get; set; }
        public IEdmModel Model { get; set; }
    }
    This interface ensures that the computed property container retains a reference to the original entity while also being compatible with EDM-based queries.

The default implementation of the compute wrapper, ComputeWrapper<T>, extends GroupByWrapper and indirectly implements IGroupByWrapper by subclassing GroupByWrapper:

internal class ComputeWrapper<T> : GroupByWrapper, 
    IGroupByWrapper<AggregationPropertyContainer, GroupByWrapper>, 
    IComputeWrapper<T>, 
    IEdmEntityObject
{
    public T Instance { get; set; }
    public IEdmModel Model { get; set; }
}

This ensures that:

  • The computed properties integrate smoothly with OData's aggregation pipeline.
  • The wrapper behaves like an EDM entity where needed, though implementing IEdmEntityObject is not strictly required, as demonstrated in the TestComputeWrapper<T> used in tests.

Testing

Unit tests and E2E tests were added to verify the changes. In addition, custom implementations of IAggregationBinder - TestAggregationBinder and TestComputeBinder - were added to demonstrate how to custom implementations of IAggregationBinder and IComputeBinder can be created and injected into the DI container. In addition, E2E tests have been added to confirm that the default and custom implementations of IAggregationBinder and IComputeBinder both achieve the same result. The E2E tests use both an EF Core in-memory and SqlServer database.

Next Steps

  • Port to 9.x - Should be straightforward
  • Expose built-in wrapper and container types that are currently internal to make extensibility via inheritance possible - separate pull request

Update 1

Following this investigation, which revealed that EF Core 6.x and above can successfully translate queries involving navigation properties, deep projections, and includes within aggregation clauses, the IAggregationBinder interface has been refactored. Specifically, the API responsible for flattening referenced properties has been moved into a new IFlatteningBinder interface, which can now be optionally implemented:

public interface IFlatteningBinder
{
    AggregationFlatteningResult FlattenReferencedProperties(
        TransformationNode transformationNode,
        IQueryable query,
        QueryBinderContext context);
}

The default aggregation binder currently implements this new interface. Moving forward, we plan to gather more information about the providers our customers are using to determine whether query flattening should remain opt-in or be enabled by default. In the meantime, customers who wish to avoid the overhead of flattening can implement a custom aggregation binder that omits the IFlatteningBinder functionality.

Here is an example:

public class CustomAggregationBinder : QueryBinder, IAggregationBinder
{
    private readonly IAggregationBinder aggregationBinder = new AggregationBinder();

    public Expression BindGroupBy(TransformationNode transformationNode, QueryBinderContext context)
    {
        return aggregationBinder.BindGroupBy(transformationNode, context);
    }

    public Expression BindSelect(TransformationNode transformationNode, QueryBinderContext context)
    {
        return aggregationBinder.BindSelect(transformationNode, context);
    }
}

Alternatively, consumers may implement IAggregationBinder from scratch, without implementing IFlatteningBinder, for full control over the binding behavior.

Additional Details

Fixes OData/WebApi#2108
Fixes #1416

@gathogojr gathogojr force-pushed the feature/aggregationbinder-restructure branch 2 times, most recently from d738dd3 to 1e16908 Compare December 20, 2024 06:10
@gathogojr gathogojr force-pushed the feature/aggregationbinder-restructure branch 3 times, most recently from ddb1c83 to 5f1336c Compare February 5, 2025 07:06
@gathogojr gathogojr force-pushed the feature/aggregationbinder-restructure branch 2 times, most recently from cb8049f to 272220e Compare February 13, 2025 11:30
@gathogojr gathogojr force-pushed the feature/aggregationbinder-restructure branch 2 times, most recently from dbd2e61 to cb6895f Compare March 18, 2025 05:50
marabooy
marabooy previously approved these changes Mar 18, 2025
@habbes
Copy link
Contributor

habbes commented Mar 20, 2025

@gathogojr based on the description, I'm trying to understand the high-level flow of data, expressions and transformations and what/whether dependencies exist between them. From looking at the signature, it appears that the different steps (flattening, BingGroupBy, BindSelect) can be executed independently so long as their respective results are combined at the end in the correct order, is this correct?

Does the following diagram correctly capture this flow?

graph TD
    Input[Input: TransformationNode, Context]
    InputQueryable[Input: IQueryable data source]
    BindGroupBy(BindGroupBy)
    BindSelect(BindSelect)
    FlattenProperties(FlattenProperties)
    GroupByResult[GroupByExpression]
    SelectResult[SelectExpression]
    FlatteingResult[FlatteningExpression]
    CombineQuery("Compose(FlatteningExpression, GroupByExpression, SelectExpression)")
    FinalQuery[FinalQueryExpression]
    FinalResult[FinalResult]
    ApplyQuery("Apply(Expression, IQueryable)")
    Input --> BindGroupBy
    Input --> BindSelect
    Input --> FlattenProperties
    InputQueryable --> FlattenProperties
    BindGroupBy --> GroupByResult
    BindSelect --> SelectResult
    FlattenProperties --> FlatteingResult
    GroupByResult --> CombineQuery
    SelectResult --> CombineQuery
    FlatteingResult --> CombineQuery
    CombineQuery --> FinalQuery
    FinalQuery --> ApplyQuery
    InputQueryable --> ApplyQuery
    ApplyQuery --> FinalResult
Loading

@gathogojr
Copy link
Contributor Author

@gathogojr based on the description, I'm trying to understand the high-level flow of data, expressions and transformations and in what dependencies exist between them. From looking at the signature, it appears that the different steps (flattening, BingGroupBy, BindSelect) can be executed independently so long as their respective results are combined at the end in the correct order, is this correct?

Does the following diagram correctly capture this flow?

graph TD
    Input[Input: TransformationNode, Context]
    InputQueryable[Input: IQueryable data source]
    BindGroupBy(BindGroupBy)
    BindSelect(BindSelect)
    FlattenProperties(FlattenProperties)
    GroupByResult[GroupByExpression]
    SelectResult[SelectExpression]
    FlatteingResult[FlatteningExpression]
    CombineQuery("Compose(FlatteningExpression, GroupByExpression, SelectExpression)")
    FinalQuery[FinalQueryExpression]
    FinalResult[FinalResult]
    ApplyQuery("Apply(Expression, IQueryable)")
    Input --> BindGroupBy
    Input --> BindSelect
    Input --> FlattenProperties
    InputQueryable --> FlattenProperties
    BindGroupBy --> GroupByResult
    BindSelect --> SelectResult
    FlattenProperties --> FlatteingResult
    GroupByResult --> CombineQuery
    SelectResult --> CombineQuery
    FlatteingResult --> CombineQuery
    CombineQuery --> FinalQuery
    FinalQuery --> ApplyQuery
    InputQueryable --> ApplyQuery
    ApplyQuery --> FinalResult
Loading

@habbes The diagram doesn't quite capture it correctly, though it almost does. BindGroupBy and BindSelect are dependent on FlattenedReferencedProperties (if the method is called). After we call FlattenedReferencedProperties, the query source changes, and so does the current parameter (lambda parameter). If $it was of type Sale originally, it changes to type FlatteningWrapper<Sale>. In that case BindGroupBy needs to reference this redefined parameter

@habbes
Copy link
Contributor

habbes commented Mar 20, 2025

@gathogojr

@habbes The diagram doesn't quite capture it correctly, though it almost does. BindGroupBy and BindSelect are dependent on FlattenedReferencedProperties (if the method is called). After we call FlattenedReferencedProperties, the query source changes, and so does the current parameter (lambda parameter). If $it was of type Sale originally, it changes to type FlatteningWrapper<Sale>. In that case BindGroupBy needs to reference this redefined parameter

Thanks for the clarification. I tried to capture the dependency if BindGroupBy and BindSelect on FlattedeReferenceProperties in the Compose(FlattenedExpression, GroupByExpression, SelectExpression) node which aims to express the order in which the expressions are combined, i.e. that translates to SelectExpression(GroupByExpression(FlattenedExpression) and the resulting expression will be applied to the original IQueryable.

But are you saying that the expression returned by BindGroupBy and BindSelect will be different depending on the flattening method's result? This is not obvious from the method signatures since BindGroupBy and BindSelect signatures in the description only accept a TransformationNode (and context), they do not take an IQueryable like Flattened..., nor do they take an Expression, so it's not clear from the description that the result of Flattened... is passed to or influences Bind* methods. This is why in the diagram I only compose their resulting expressions after each method has been independently evaluated. Is this other information (from flattening process) propagated in the context?

@gathogojr
Copy link
Contributor Author

@gathogojr

@habbes The diagram doesn't quite capture it correctly, though it almost does. BindGroupBy and BindSelect are dependent on FlattenedReferencedProperties (if the method is called). After we call FlattenedReferencedProperties, the query source changes, and so does the current parameter (lambda parameter). If $it was of type Sale originally, it changes to type FlatteningWrapper<Sale>. In that case BindGroupBy needs to reference this redefined parameter

Thanks for the clarification. I tried to capture the dependency if BindGroupBy and BindSelect on FlattedeReferenceProperties in the Compose(FlattenedExpression, GroupByExpression, SelectExpression) node which aims to express the order in which the expressions are combined, i.e. that translates to SelectExpression(GroupByExpression(FlattenedExpression) and the resulting expression will be applied to the original IQueryable.

But are you saying that the expression returned by BindGroupBy and BindSelect will be different depending on the flattening method's result? This is not obvious from the method signatures since BindGroupBy and BindSelect signatures in the description only accept a TransformationNode (and context), they do not take an IQueryable like Flattened..., nor do they take an Expression, so it's not clear from the description that the result of Flattened... is passed to or influences Bind* methods. This is why in the diagram I only compose their resulting expressions after each method has been independently evaluated. Is this other information (from flattening process) propagated in the context?

Is this other information (from flattening process) propagated in the context?: Correct. The context parameter in particular

/// 3). Nested property with Next
/// 4). Nested property without Next.
/// However, Entity Framework doesn't allow to set different properties for the same type in two places in a lambda expression.
/// Using new type with just new name to workaround that issue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something like:

s => new { "name" : s.Value } // where name is "some new name"

Copy link
Contributor

@habbes habbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I have with the PR is whether we're exposing a rigid public interface that's designed with one primary implementation in mind (EF 6) and forces every implement of the interfaces to fit their customizations to match this paradigm. For example, it seems like we require custom implementations of IAggregationBinder to also flatten references properties. Is flattening properties a core part of aggregation support or is it mainly a workaround for inefficient queries produces by classic EF? Is this something that every implementer might contend with? Can they skip flattening if they wanted to? or do something post-flattening?

How customizable are the interfaces? How composable are they? Do we expect any customization beyond overriding a few methods the built-in default implementations?

I was wondering whether we could have a model where the aggregation expression translation is broken down into a pipeline of phases. But I'm not sure if this is feasible or if the way aggregations work lend itself naturally to a pipeline model. You could think about and see if it makes sense.

In the pipeline model, each phase or component would have a method that accepts TransformationNode, QueryBinderContext (and possibly IQueryable) and transform the context or return a result that the next phase can consume. The BindGroupBy would represent one phase, BindSelect another, and even flattening would be another phase. We could create a common interface to represent these phases, and the aggregation binder would execute the phases in the order they appear in the pipeline. So the default pipeline would [Flatten, BindGroupBy, BindSelect]. These built-in implementations could be public (and possibly overridable). The customer could customize the aggregation by introducing providing their own pipeline, they could derive a new pipeline from the default pipeline by inserting new phases, removing some, etc. Might require some additional work to validate the result to ensure it's still a valid aggregation expression (or assume the user knows what they're doing).

Again, not entirely sure if this works exactly here. It may also be that I misunderstand the current flow. But I'm just thinking of whether we're currently coupling and projecting internal workarounds into public APIs and making customizations rigid and clunky.

TContainer GroupByContainer { get; set; }

/// <summary>Gets or sets the property container that contains the aggregation properties.</summary>
TContainer Container { get; set; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this would do with a better name. E.g AggregationContainer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KenitoInc This is a change I’d prefer to avoid for two key reasons:

  1. Scope & Impact – Renaming Container would be a sweeping change affecting many areas, making it difficult to justify and review.
  2. Conceptual Clarity – The Container property is not limited to aggregation; it is also used in other contexts, such as ComputeBinder, where it stores computed properties. The same name is used in SelectExpand feature. Naming it AggregationContainer would introduce unnecessary ambiguity, making its broader purpose less clear.

For these reasons, keeping Container as-is maintains both consistency and semantic correctness.

/// </remakrs>
internal class AggregationPropertyContainer : NamedProperty<object>
/// </remarks>
internal class AggregationPropertyContainer : NamedProperty<object>, IAggregationPropertyContainer<GroupByWrapper, AggregationPropertyContainer>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this class internal? I think we should allow this class to be extensible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KenitoInc cc. @habbes I have explained in QueryBinder.ConvertToAggregationPropertyContainerIfNeeded method why the coupling between "Aggregation" and "SelectExpand" should be broken first to make it easier to publicly expose the default wrapper (GroupByWrapper, FlatteningWrapper<T>, ComputeWrapper, EntitySetAggregationWrapper, etc.) and container (AggregationPropertyContainer) types - to support extending via inheritance, like you're suggesting here: https://github.com/OData/AspNetCoreOData/pull/1378/files#diff-b13ea755c3b30ddde5b8c435f2f908c3328849ce8febbfda51375405df9e1d8eR1725

The coupling between the two features it through the AggregationPropertyContainer inheriting from the NamedProperty<T> used in the SelectExpand feature.
To rehash my points here...

  • The NamedProperty<T> type, which inherits from PropertyContainer, was designed primarily for SelectExpand scenarios.
  • It introduces a significant amount of logic that is not required for aggregation.
  • Aggregation functionality only depends on the Name and Value properties, as well as the ToDictionaryCore method.
  • The ToDictionaryCore method requires a boolean includeAutoSelected parameter, which is irrelevant in aggregation scenarios and is simply ignored.
  • The inheritance structure complicates making essential wrapper types (e.g., GroupByWrapper, FlatteningWrapper<T>) and container types (e.g., AggregationPropertyContainer) public.
  • Exposing these types as part of the public API is necessary to enable subclassing AggregationBinder, instead of forcing developers to implement IAggregationBinder from scratch.
  • Due to the coupling via NamedProperty<T> inheritance, making the AggregationPropertyContainer public would require exposing approximately 128 private types from PropertyContainer, which is impractical.
  • The current coupling doesn't give us much. You might notice that the TestAggregationPropertyContainer in the E2E tests project avoids the coupling and works perfectly fine.

To manage the change and mitigate risks, I proposed to expose the wrapper and container types public in a separate pull request. What this pull request achieves for now is to allow people to implement their own wrapper and container types to use with their custom implementations of aggregation binder.

}

private List<MemberAssignment> CreateSelectMemberAssigments(Type type, MemberExpression propertyAccessor,
IEnumerable<GroupByPropertyNode> properties)
private static Expression WrapDynamicCastIfNeeded(Expression propertyAccessExpression)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this returns an Expression, it a good candidate for public virtual method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can look into the methods in AggregationBinder that can be extensible public virtual

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KenitoInc I’d prefer to defer making these methods public until we are ready to expose the default wrappers (e.g., GroupByWrapper, FlatteningWrapper<T>, ComputeWrapper<T>) and container implementations (e.g., AggregationPropertyContainer).

Reasons for Deferral:

  1. Dependency on Internal Types – Many of these methods rely on types that are not yet public. For example, while CreateEntitySetAggregateExpression could technically be exposed, it references other methods and types—such as EntitySetAggregationWrapper—that are currently internal.
  2. Incomplete Public API – Exposing these methods prematurely would create an inconsistent API, where some related types remain internal, making it difficult to extend or override the implementation cleanly.
  3. Planned Design Review – I plan to document and outline the necessary changes in a design proposal to ensure a well-thought-out approach before exposing the default wrapper and container types publicly. I mentioned this in the Next Steps section of the pull request description

For these reasons, making these methods public at this stage would be premature and introduce unnecessary complexity.

@gathogojr gathogojr force-pushed the feature/aggregationbinder-restructure branch from 5ae15b3 to b84a2aa Compare April 3, 2025 08:54
@gathogojr gathogojr force-pushed the feature/aggregationbinder-restructure branch from b84a2aa to 69cd5da Compare April 3, 2025 12:01
@gathogojr
Copy link
Contributor Author

One concern I have with the PR is whether we're exposing a rigid public interface that's designed with one primary implementation in mind (EF 6) and forces every implement of the interfaces to fit their customizations to match this paradigm. For example, it seems like we require custom implementations of IAggregationBinder to also flatten references properties. Is flattening properties a core part of aggregation support or is it mainly a workaround for inefficient queries produces by classic EF? Is this something that every implementer might contend with? Can they skip flattening if they wanted to? or do something post-flattening?

How customizable are the interfaces? How composable are they? Do we expect any customization beyond overriding a few methods the built-in default implementations?

I was wondering whether we could have a model where the aggregation expression translation is broken down into a pipeline of phases. But I'm not sure if this is feasible or if the way aggregations work lend itself naturally to a pipeline model. You could think about and see if it makes sense.

In the pipeline model, each phase or component would have a method that accepts TransformationNode, QueryBinderContext (and possibly IQueryable) and transform the context or return a result that the next phase can consume. The BindGroupBy would represent one phase, BindSelect another, and even flattening would be another phase. We could create a common interface to represent these phases, and the aggregation binder would execute the phases in the order they appear in the pipeline. So the default pipeline would [Flatten, BindGroupBy, BindSelect]. These built-in implementations could be public (and possibly overridable). The customer could customize the aggregation by introducing providing their own pipeline, they could derive a new pipeline from the default pipeline by inserting new phases, removing some, etc. Might require some additional work to validate the result to ensure it's still a valid aggregation expression (or assume the user knows what they're doing).

Again, not entirely sure if this works exactly here. It may also be that I misunderstand the current flow. But I'm just thinking of whether we're currently coupling and projecting internal workarounds into public APIs and making customizations rigid and clunky.

Here's my take...

we're exposing a rigid public interface that's designed with one primary implementation in mind (EF 6) and forces every implement of the interfaces to fit their customizations to match this paradigm.

I don't believe this is the case. Remember this is the implementation that we currently have out there and it's being applied against EF 6 and EF Core alike. We're not introducing anything new. Before this, we didn't even know that EF Core is able to generate efficient SQL queries even when the flattening process is skipped. But it's still the code we're running out there. I also mentioned that I don't consider to have done exhaustive investigation to determine conclusively that it's an unnecessary exercise when using EF Core. I don't particularly agree with the idea that the interface is rigid. If the flattening part is a big concern, we could remove it from the public API and retain BindGroupBy and BindSelect. The flattening could be an implementation detail. I wouldn't be strongly opposed to that. The FlattenReferenceProperties method could be a protected virtual method of the default AggregationBinder class that we can call from BindGroupBy method implementation. Would this placate your feeling about the API being rigid? cc @KenitoInc

Is flattening properties a core part of aggregation support or is it mainly a workaround for inefficient queries produces by classic EF?

Before this refactor to support extensibility, I don't believe a lot of the contributors knew that the mitigation against inefficient queries was only required for classic EF. We haven't even exhaustively investigated that.

Is this something that every implementer might contend with? Can they skip flattening if they wanted to?

If someone returns null from the FlattenReferenceProperties, it's perfectly fine. They don't really have to write the flattening logic

or do something post-flattening?

I personally don't know anything that can be done post-flattening. Are you suggesting we should have a post-flattening "hook" that the implementer can ride on to do something we don't know?

How customizable are the interfaces? How composable are they? Do we expect any customization beyond overriding a few methods the built-in default implementations?

Expound what you mean by interfaces being "customizable"? One can implement the interfaces from scratch and I have demonstrated that in the E2E tests. The E2E tests run against the default implementations of the interfaces alongside the custom implementations. The customizations I expect more of is not the interfaces being implemented from scratch but rather users wanting to override a few methods from the default implementations. I have mentioned in "Next Steps" that we'll expose the default wrapper and container types to support extensibility via inheritance.

I was wondering whether we could have a model where the aggregation expression translation is broken down into a pipeline of phases. But I'm not sure if this is feasible or if the way aggregations work lend itself naturally to a pipeline model. You could think about and see if it makes sense.

Isn't this what we have more or less? The ApplyBind method orchestrates the invocation of the different methods that are involved in the translation process. Other than for FlattenReferenceProperties that we can debate on, the other two steps of the translation are the phases involved in the translation - an aggregation involves grouping and projection which is what the BindGroupBy and BindSelect do. I'm however open to any concrete suggestions you might have on an alternative approach we could use to break down the different phases of the translation process, or if you have suggestions on how we could refine the current steps into smaller steps

In the pipeline model, each phase or component would have a method that accepts TransformationNode, QueryBinderContext (and possibly IQueryable) and transform the context or return a result that the next phase can consume. The BindGroupBy would represent one phase, BindSelect another, and even flattening would be another phase. We could create a common interface to represent these phases, and the aggregation binder would execute the phases in the order they appear in the pipeline. So the default pipeline would [Flatten, BindGroupBy, BindSelect]. These built-in implementations could be public (and possibly overridable). The customer could customize the aggregation by introducing providing their own pipeline, they could derive a new pipeline from the default pipeline by inserting new phases, removing some, etc. Might require some additional work to validate the result to ensure it's still a valid aggregation expression (or assume the user knows what they're doing).

I don't know how significantly different this proposal is to the current approach. Do you have something like the following in mind?

Core Interface

public interface IAggregationTranslator
{
    int Order { get; }
    Expression Process(IQueryable source, TransformationNode transformationNode, QueryBinderContext context);
}

Implement the interface in components

public class BindGroupBy : QueryBinder, IAggregationTranslator
{
    public int Order { get => return 20; }
    public IQueryable Process(IQueryable source, TransformationNode node, QueryBinderContext context)
    {
        // Implementation
    }
}

public class BindSelect : QueryBinder, IAggregationTranslator
{
    public int Order { get => return 30; }
    public IQueryable Process(IQueryable source, TransformationNode node, QueryBinderContext context)
    {
        // Implementation
    }
}

Optional flattening component

public int Order { get => return 10; }
public class FlattenReferencedProperties : QueryBinder, IAggregationTranslator
{
    public IQueryable Process(IQueryable source, TransformationNode node, QueryBinderContext context)
    {
        // Implementation
    }
}

Pipeline interface

public interface IAggregationPipeline
{
    IQueryable Execute(IQueryable source, TransformationNode transformationNode, QueryBinderContext context);
}

Pipeline implementation

public class AggregationPipeline
{
    private readonly List<IAggregationTranslator> components;

    public AggregationPipeline(IEnumerable<IAggregationTranslator> components)
    {
        this.components = components.ToList();
    }

    public IQueryable Execute(IQueryable source, TransformationNode node, QueryBinderContext context)
    {
        IQueryable current = source;

        foreach (var component in this.components.OrderBy(c => c.Order))
        {
            current = component.Process(current, node, context);
        }

        return current;
    }
}

Configure components as services
services.

services.AddTransient<IAggregationTranslator, FlattenReferencedProperties>();
services.AddTransient<IAggregationTranslator, BindGroupBy>();
services.AddTransient<IAggregationTranslator, BindSelect>();

services.AddTransient<IAggregationPipeline, AggregationPipeline>();
// .NET will inject the translator components into the constructor of the AggregationPipeline

Aggregation binder

public class AggregationBinder
{
    private readonly IAggregationPipeline pipeline;

    public AggregationBinder(IAggregationPipeline pipeline)
    {
        this.pipeline = pipeline;
    }

    public IQueryable Bind(IQueryable source, TransformationNode transformationNode, QueryBinderContext context)
    {
        return this.pipeline.Execute(source, transformationNode, context);
    }
}

ApplyBind

public static IQueryable ApplyBind(this AggregationBinder binder, IQueryable source, TransformationNode transformationNode, QueryBinderContext context, out Type resultClrType)
{
    context.EnsureFlattenedProperties();
    binder.Bind(source, transformationNode, context);
}

Or did you have something different in mind? Can you illustrate? Do you consider the pipeline approach as something we should strive for in the first iteration?

Again, not entirely sure if this works exactly here. It may also be that I misunderstand the current flow. But I'm just thinking of whether we're currently coupling and projecting internal workarounds into public APIs and making customizations rigid and clunky.

This is a first attempt at refactoring the aggregation feature to support extensibility. I'm sure there are many ways we can slice and dice it. We can improve iteratively based on actual customer feedback. If you noticed, the base branch is actually release-8.x. We'd have an opportunity to iterate over both minor and major releases and that iteration can be driven by actual customer feedback. Premature optimization is the root of all evil. Let's not sacrifice simplicity, clarity, flexibility, and correctness for unknown gains too early.
Also, "coupling and projecting internal workarounds" sounds like an unfair criticism of the effort. The flattening is only one single step of the process, and we can't classify it as an "internal" workaround since it's mitigating against a well-known issue affecting a publicly available EF (and possibly EF Core) implementations. I have made a proposal to not expose an API to support that in this first iteration.

@marabooy
Copy link
Member

marabooy commented Apr 4, 2025

@gathogojr One of the reasons I would avoid having the Pipelines at this stage and why we should spend more time to plan them out is some of the issues on knowing how they should be organized especially if their inputs and outputs can change the output or outcome which flattening can.

@gathogojr gathogojr force-pushed the feature/aggregationbinder-restructure branch from 69cd5da to 4ab4797 Compare April 4, 2025 08:37
@habbes
Copy link
Contributor

habbes commented Apr 7, 2025

@gathogojr @marabooy. I like your interpretation of what a pipeline could look like. But I'm okay with not going that route. I'm okay with having an interface that only exposes groupby and select methods, since they are fundamental parts of the aggregation, then flattening could be part of the default implementation that could be overridden or provided as some helper method.

@gathogojr gathogojr force-pushed the feature/aggregationbinder-restructure branch from 096fed7 to 85d3d3c Compare April 9, 2025 09:13
@gathogojr gathogojr merged commit e41f00b into OData:release-8.x Apr 11, 2025
2 checks passed
@gathogojr gathogojr deleted the feature/aggregationbinder-restructure branch April 11, 2025 06:54
gathogojr added a commit to gathogojr/AspNetCoreOData that referenced this pull request Apr 11, 2025
gathogojr added a commit to gathogojr/AspNetCoreOData that referenced this pull request Apr 11, 2025
gathogojr added a commit to gathogojr/AspNetCoreOData that referenced this pull request Apr 14, 2025
gathogojr added a commit to gathogojr/AspNetCoreOData that referenced this pull request Apr 14, 2025
gathogojr added a commit to gathogojr/AspNetCoreOData that referenced this pull request Apr 14, 2025
gathogojr added a commit to gathogojr/AspNetCoreOData that referenced this pull request Apr 14, 2025
gathogojr added a commit to gathogojr/AspNetCoreOData that referenced this pull request Apr 14, 2025
gathogojr added a commit to gathogojr/AspNetCoreOData that referenced this pull request Apr 14, 2025
gathogojr added a commit to gathogojr/AspNetCoreOData that referenced this pull request Apr 14, 2025
gathogojr added a commit to gathogojr/AspNetCoreOData that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants