Skip to content

[pkg/ottl]: Add Sort converter #34283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
07215d1
Add OTTL sort converter
kaisecheng Jul 27, 2024
04366f5
support mixed of numeric types to sort as double
kaisecheng Jul 27, 2024
e2b90e1
change log
kaisecheng Jul 28, 2024
73ecb89
add support to array type
kaisecheng Jul 29, 2024
dab7c86
Update pkg/ottl/ottlfuncs/func_sort.go
kaisecheng Jul 29, 2024
2e36574
- add support to pcommon.Value
kaisecheng Jul 29, 2024
d43e95f
update doc
kaisecheng Jul 29, 2024
447df3e
lint
kaisecheng Jul 30, 2024
028744e
tidy
kaisecheng Jul 30, 2024
1eb08f3
- preserve the data types in the sort result
kaisecheng Jul 30, 2024
a0d9791
update doc
kaisecheng Jul 30, 2024
d676f1a
fix unit test
kaisecheng Jul 31, 2024
4568ca0
Merge branch 'main' into ottl_sort_func
kaisecheng Jul 31, 2024
d166048
preserve the data type for boolean array
kaisecheng Jul 31, 2024
524e546
rename
kaisecheng Jul 31, 2024
8f2acd1
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
kaisecheng Aug 1, 2024
a13bdbc
fix CI codegen
kaisecheng Aug 1, 2024
8c0e371
Merge branch 'main' into ottl_sort_func
kaisecheng Aug 2, 2024
1bb3d86
Update pkg/ottl/ottlfuncs/README.md
kaisecheng Aug 2, 2024
2894981
Update pkg/ottl/ottlfuncs/func_sort.go
kaisecheng Aug 2, 2024
9c22e38
Update pkg/ottl/ottlfuncs/README.md
kaisecheng Aug 2, 2024
660f4fc
Update pkg/ottl/ottlfuncs/func_sort.go
kaisecheng Aug 2, 2024
3325267
remove multierr dependency
kaisecheng Aug 2, 2024
0a9b019
Merge branch 'main' into ottl_sort_func
kaisecheng Aug 2, 2024
b2f8062
Merge branch 'main' into ottl_sort_func
kaisecheng Aug 8, 2024
6e0e66d
Merge branch 'main' into ottl_sort_func
kaisecheng Aug 8, 2024
f2145f3
add more e2e tests
kaisecheng Aug 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .chloggen/ottl_sort_func.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: pkg/ottl

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add `Sort` function to sort array to ascending order or descending order

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [34200]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
18 changes: 18 additions & 0 deletions pkg/ottl/e2e/e2e_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -619,6 +619,24 @@ func Test_e2e_converters(t *testing.T) {
tCtx.GetLogRecord().Attributes().PutStr("test", "d74ff0ee8da3b9806b18c877dbf29bbde50b5bd8e4dad7a3a725000feb82e8f1")
},
},
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for a mixed-type slice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty complex function, I'd like to see more e2e tests that cover the different type scenarios it supports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added more e2e test covering unit types and mixed types

statement: `set(attributes["test"], Sort(Split(attributes["flags"], "|"), "desc"))`,
want: func(tCtx ottllog.TransformContext) {
s := tCtx.GetLogRecord().Attributes().PutEmptySlice("test")
s.AppendEmpty().SetStr("C")
s.AppendEmpty().SetStr("B")
s.AppendEmpty().SetStr("A")
},
},
{
statement: `set(attributes["test"], Sort([Int(11), Double(2.2), Double(-1)]))`,
want: func(tCtx ottllog.TransformContext) {
s := tCtx.GetLogRecord().Attributes().PutEmptySlice("test")
s.AppendEmpty().SetDouble(-1)
s.AppendEmpty().SetDouble(2.2)
s.AppendEmpty().SetDouble(11)
},
},
{
statement: `set(span_id, SpanID(0x0000000000000000))`,
want: func(tCtx ottllog.TransformContext) {
Expand Down
25 changes: 25 additions & 0 deletions pkg/ottl/ottlfuncs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -446,6 +446,7 @@ Available Converters:
- [Seconds](#seconds)
- [SHA1](#sha1)
- [SHA256](#sha256)
- [Sort](#sort)
- [SpanID](#spanid)
- [Split](#split)
- [String](#string)
Expand Down Expand Up @@ -1208,6 +1209,30 @@ Examples:

**Note:** According to the National Institute of Standards and Technology (NIST), SHA256 is no longer a recommended hash function. It should be avoided except when required for compatibility. New uses should prefer FNV whenever possible.

### Sort

`Sort(target, Optional[order])`

The `Sort` Converter sorts the `target` array in ascending or descending order.

`target` is a `pcommon.Slice` type field. `order` is a string that must be one of `asc` or `desc`. The default `order` is `asc`.

If elements in `target` are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on permanently converting the types of values in heterogeneous arrays vs. just converting the types for comparison? I think it would be fairly straightforward to only convert the types within the sort function and leave the actual type in the array. I can see arguments for both sides here, but want to make sure we're considering our options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see changing the types permanently and sorting a list as two distinct intentions. Both are reasonable actions in data processing. Here, we provide better flexibility to users. If they want to change the types permanently, they can simply overwrite the attributes by using set(). Otherwise, they will still have the original array on their hands

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks. If we see changing the types and sorting as separate actions, then shouldn't we keep the original types of the array elements when sorting? Right now we do:

[true, 1, 2.3, "str"] --> ["1", "2.3", "str", "true"]

When I think it would be possible to do:

[true, 1, 2.3, "str"] --> [1, 2.3, "str", true]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Yes, preserving the types may be more intuitive. I thought keeping the sort result as a string to make it clear that the comparison is done as a string. It sounds good to maintain the original type.


- Integers: Returns a sorted array of integers.
- Doubles: Returns a sorted array of doubles.
- Integers and doubles: Converts to doubles and returns a sorted array of doubles.
- Strings: Returns a sorted array of strings.
- Booleans: Converts to strings and returns a sorted array of strings.
- Integers, doubles, booleans, and strings: Converts to strings and returns a sorted array of strings.
- Other types: Returns the `target` unchanged.

Examples:

- `Sort(attributes["device.tags"])`

- `Sort(attributes["device.tags"], "desc")`

### SpanID

`SpanID(bytes)`
Expand Down
219 changes: 219 additions & 0 deletions pkg/ottl/ottlfuncs/func_sort.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package ottlfuncs // import "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl/ottlfuncs"

import (
"cmp"
"context"
"fmt"
"slices"
"strconv"

"go.opentelemetry.io/collector/pdata/pcommon"

"github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl"
)

const (
sortAsc = "asc"
sortDesc = "desc"
)

type SortArguments[K any] struct {
Target ottl.Getter[K]
Order ottl.Optional[string]
}

func NewSortFactory[K any]() ottl.Factory[K] {
return ottl.NewFactory("Sort", &SortArguments[K]{}, createSortFunction[K])
}

func createSortFunction[K any](_ ottl.FunctionContext, oArgs ottl.Arguments) (ottl.ExprFunc[K], error) {
args, ok := oArgs.(*SortArguments[K])

if !ok {
return nil, fmt.Errorf("SortFactory args must be of type *SortArguments[K]")
}

order := sortAsc
if !args.Order.IsEmpty() {
o := args.Order.Get()
switch o {
case sortAsc, sortDesc:
order = o
default:
return nil, fmt.Errorf("invalid arguments: %s. Order should be either \"%s\" or \"%s\"", o, sortAsc, sortDesc)
}
}

return Sort(args.Target, order)
}

func Sort[K any](target ottl.Getter[K], order string) (ottl.ExprFunc[K], error) {
return func(ctx context.Context, tCtx K) (any, error) {
val, err := target.Get(ctx, tCtx)
if err != nil {
return nil, err
}

if order != sortAsc && order != sortDesc {
return nil, fmt.Errorf("invalid arguments: %s. Order should be either \"%s\" or \"%s\"", order, sortAsc, sortDesc)
}

switch v := val.(type) {
case pcommon.Slice:
return sortSlice(v, order)
case []any:
// handle Sort([1,2,3])
slice := pcommon.NewValueSlice().SetEmptySlice()
if err := slice.FromRaw(v); err != nil {
return v, nil
}
return sortSlice(slice, order)
case []string:
// handle value from Split()
dup := makeCopy(v)
return sortTypedSlice(dup, order), nil
case []int64:
dup := makeCopy(v)
return sortTypedSlice(dup, order), nil
case []float64:
dup := makeCopy(v)
return sortTypedSlice(dup, order), nil
case []bool:
var arr []string
for _, b := range v {
arr = append(arr, strconv.FormatBool(b))
}
return sortTypedSlice(arr, order), nil
default:
return v, nil
}
}, nil
}

// sortSlice sorts a pcommon.Slice based on the specified order.
// It gets the common type for all elements in the slice and converts all elements to this common type, creating a new copy
// Parameters:
// - slice: The pcommon.Slice to be sorted
// - order: The sort order. "asc" for ascending, "desc" for descending
//
// Returns:
// - A sorted slice as []any or the original pcommon.Slice
// - An error if an unsupported type is encountered
func sortSlice(slice pcommon.Slice, order string) (any, error) {
length := slice.Len()
if length == 0 {
return slice, nil
}

commonType, ok := findCommonValueType(slice)
if !ok {
return slice, nil
}

switch commonType {
case pcommon.ValueTypeInt:
arr := makeTypedCopy(length, func(idx int) int64 {
return slice.At(idx).Int()
})
return sortTypedSlice(arr, order), nil
case pcommon.ValueTypeDouble:
arr := makeTypedCopy(length, func(idx int) float64 {
s := slice.At(idx)
if s.Type() == pcommon.ValueTypeInt {
return float64(s.Int())
}

return s.Double()
})
return sortTypedSlice(arr, order), nil
case pcommon.ValueTypeStr:
arr := makeTypedCopy(length, func(idx int) string {
return slice.At(idx).AsString()
})
return sortTypedSlice(arr, order), nil
default:
return nil, fmt.Errorf("sort with unsupported type: '%T'", commonType)
}
}

type TargetType interface {
~int64 | ~float64 | ~string
}

// findCommonValueType determines the most appropriate common type for all elements in a pcommon.Slice.
// It returns two values:
// - A pcommon.ValueType representing the desired common type for all elements.
// Mixed Numeric types return ValueTypeDouble. Integer type returns ValueTypeInt. Double type returns ValueTypeDouble.
// String, Bool, Empty and mixed of the mentioned types return ValueTypeStr, as they require string conversion for comparison.
// - A boolean indicating whether a common type could be determined (true) or not (false).
// returns false for ValueTypeMap, ValueTypeSlice and ValueTypeBytes. They are unsupported types for sort.
func findCommonValueType(slice pcommon.Slice) (pcommon.ValueType, bool) {
length := slice.Len()
if length == 0 {
return pcommon.ValueTypeEmpty, false
}

wantType := slice.At(0).Type()
wantStr := false
wantDouble := false

for i := 0; i < length; i++ {
value := slice.At(i)
currType := value.Type()

switch currType {
case pcommon.ValueTypeInt:
if wantType == pcommon.ValueTypeDouble {
wantDouble = true
}
case pcommon.ValueTypeDouble:
if wantType == pcommon.ValueTypeInt {
wantDouble = true
}
case pcommon.ValueTypeStr, pcommon.ValueTypeBool, pcommon.ValueTypeEmpty:
wantStr = true
default:
return pcommon.ValueTypeEmpty, false
}
}

if wantStr {
wantType = pcommon.ValueTypeStr
} else if wantDouble {
wantType = pcommon.ValueTypeDouble
}

return wantType, true
}

func makeTypedCopy[T TargetType](length int, converter func(idx int) T) []T {
var arr []T
for i := 0; i < length; i++ {
arr = append(arr, converter(i))
}
return arr
}

func makeCopy[T TargetType](src []T) []T {
dup := make([]T, len(src))
copy(dup, src)
return dup
}

func sortTypedSlice[T TargetType](arr []T, order string) []T {
if len(arr) == 0 {
return arr
}

slices.SortFunc(arr, func(a, b T) int {
if order == sortDesc {
return cmp.Compare(b, a)
}
return cmp.Compare(a, b)
})

return arr
}
Loading