Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ES][v2] Embed tagDotReplacement in ToDBModel #6946

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Manik2708
Copy link
Contributor

Which problem is this PR solving?

Description of the changes

  • Embed tagDotReplacement in ToDBModel

How was this change tested?

  • Unit Tests

Checklist

@Manik2708 Manik2708 requested a review from a team as a code owner March 30, 2025 13:58
@Manik2708 Manik2708 requested a review from jkowall March 30, 2025 13:58
Copy link

codecov bot commented Mar 30, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.12%. Comparing base (bc586e3) to head (3ad31dc).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6946      +/-   ##
==========================================
+ Coverage   95.95%   96.12%   +0.17%     
==========================================
  Files         346      347       +1     
  Lines       20407    20378      -29     
==========================================
+ Hits        19581    19588       +7     
+ Misses        622      595      -27     
+ Partials      204      195       -9     
Flag Coverage Δ
badger_v1 10.53% <ø> (ø)
badger_v2 2.18% <ø> (ø)
cassandra-4.x-v1-manual 15.84% <ø> (ø)
cassandra-4.x-v2-auto 2.17% <ø> (ø)
cassandra-4.x-v2-manual 2.17% <ø> (ø)
cassandra-5.x-v1-manual 15.84% <ø> (ø)
cassandra-5.x-v2-auto 2.17% <ø> (ø)
cassandra-5.x-v2-manual 2.17% <ø> (ø)
elasticsearch-6.x-v1 20.76% <ø> (ø)
elasticsearch-7.x-v1 20.84% <ø> (ø)
elasticsearch-8.x-v1 21.02% <ø> (ø)
elasticsearch-8.x-v2 2.18% <ø> (-0.12%) ⬇️
grpc_v1 11.61% <ø> (ø)
grpc_v2 8.47% <ø> (ø)
kafka-3.x-v1 10.82% <ø> (ø)
kafka-3.x-v2 2.18% <ø> (ø)
memory_v2 2.18% <ø> (ø)
opensearch-1.x-v1 20.89% <ø> (ø)
opensearch-2.x-v1 20.89% <ø> (ø)
opensearch-2.x-v2 2.18% <ø> (ø)
tailsampling-processor 0.59% <ø> (ø)
unittests 94.91% <100.00%> (+0.16%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

}
return dest
}

func attributeToDbTag(key string, attr pcommon.Value) dbmodel.KeyValue {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to tagAppender.go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Logs: spanEventsToDbSpanLogs(span.Events()),
Process: process,
}
}

func getDbSpanTags(span ptrace.Span, scope pcommon.InstrumentationScope) []dbmodel.KeyValue {
var spanKindTag, statusCodeTag, statusMsgTag dbmodel.KeyValue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to do this is a trial to clean the code a bit! Also to embed tagDotReplacement

}, true
}

func getTagFromStatusCode(statusCode ptrace.StatusCode) (dbmodel.KeyValue, bool) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This from L282-L347 is a part of tag appender now!

@@ -255,96 +219,6 @@ func spanEventsToDbSpanLogs(events ptrace.SpanEventSlice) []dbmodel.Log {
return logs
}

func getTagFromSpanKind(spanKind ptrace.SpanKind) (dbmodel.KeyValue, bool) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also moved tagAppender.go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of these changes are necessary or beneficial. Tag appender (bad name) should have just a single function - take a collection of already converted tags (converted in this translator) and materialize all/some of them to the top-level fields.

@Manik2708
Copy link
Contributor Author

@yurishkuro This is a try to separate tag appending from the ToDBModel and embed tagDotReplacement in the ToDBModel. Please review!

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the example where snapshot testing would be very helpful, maybe spend a day to build those tests first.

@Manik2708
Copy link
Contributor Author

Manik2708 commented Mar 31, 2025

this is the example where snapshot testing would be very helpful, maybe spend a day to build those tests first.

@yurishkuro I am able to generate or reuse the db spans fixtures but I don't have any idea to serialize-deserialize ptrace from json fixtures. Is there anything provided by OTEL? Or we need to use txt fixtures which are used in goldendataset
I have tested the already existing fixture in the following way:

func TestToDbModel_Fixtures(t *testing.T) {
	dbStr, err := os.ReadFile("../../../elasticsearch/dbmodel/fixtures/es_01.json")
	require.NoError(t, err)
	var span dbmodel.Span
	err = json.Unmarshal(dbStr, &span)
	require.NoError(t, err)
	td, err := FromDBModel([]dbmodel.Span{span})
	require.NoError(t, err)
	spans := ToDBModel(td)
	assert.Equal(t, span, spans[0])
}

Signed-off-by: Manik2708 <[email protected]>
Signed-off-by: Manik2708 <[email protected]>
Signed-off-by: Manik2708 <[email protected]>
Signed-off-by: Manik2708 <[email protected]>
@Manik2708 Manik2708 requested a review from yurishkuro April 4, 2025 08:56
expectedTd, err := unmarshaller.UnmarshalTraces(tracesData)
require.NoError(t, err)
dotReplacement := "#"
toDb := NewToDBModel(tt.allTagsAsFields, tt.tagKeysAsFields, dotReplacement)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I have skipped roundtrip because tagDotReplacement in FromDBModel will be implemented in next PR!

Signed-off-by: Manik2708 <[email protected]>
Signed-off-by: Manik2708 <[email protected]>
@Manik2708
Copy link
Contributor Author

@yurishkuro Please review this PR, after this PR, similar modification will be done in FromDBModel and we will get rid of es_01.json

Comment on lines +189 to +192
tracesData := loadTraces(t, 1)
unmarshaller := ptrace.JSONUnmarshaler{}
expectedTd, err := unmarshaller.UnmarshalTraces(tracesData)
require.NoError(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have a function called loadTraces why doesn't it unmarshal and return ptrace model directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing this but I think it is better to return []bytes rather than marshalling because we are asserting bytes not the objects. Asserting objects needs more complexity as we need to pass a copy of spans to to FromModel because it might manipulate the span (in fact it is manupilating). So loadTraces is a part of loadFixtures, for which we need byte data not the objects.

@Manik2708 Manik2708 requested a review from yurishkuro April 5, 2025 21:17
@Manik2708
Copy link
Contributor Author

@yurishkuro Can you please review this PR as it is a blocker for the next PR (embedding tagDotReplacement in FromDBModel)

Comment on lines +32 to +33
toDb := newToDBModel(false, nil, ".")
modelSpan := toDb.spanToDbSpan(span, spanScope, dbmodel.Process{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this calling private methods instead of the main entry point?

"github.com/jaegertracing/jaeger/internal/storage/elasticsearch/dbmodel"
)

// tagAppender append tags to dbmodel KeyValue slice and tagsMap by replacing dots with tagDotReplacement
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first, this comment is practically useless, it tells what the actual implementation code is doing (which I can see by looking at the code) instead of telling what business function it performs and why.

second, the struct is doing way more than that, which is why I asked you to write a definition in the comment. Its primary objective is to convert tags into top-level object fields in ES span, so that they get indexed more efficiently (that is what the definition is supposed to say). This objective has nothing to do with understanding all the different flavors of the tags like span kind, status code, etc. - all of that is meant to be handled by the transformer. In the v1 code the whole concept of "tag appender" used to be like a single function (it wasn't even a struct), so why is it necessary to mix it up with so much unrelated functionality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got your point on mixing span kind, status code etc but the difference in v1 and v2 is that there is more conversion logic present in v2. I also thought initially to create a method of toDBModel but then we had to pass map[string]any and []dbmodel.KeyValue everytime when we call the method along with key and value. This is what is happening in v1 but in v2 we had to use this method more frequently, take an example:

tag := make(map[string]any)
tags := make([]dbmodel.KeyValue)
kindStr := convertToStringSpanKind(span.Kind())
if kindStr != "" {
  t.appendTag(model.SpanKindKey, pcommon.NewStrValue(kindStr), tag, tags)
}
// Similar with other tags
return tag, tags

To reduce this redundant passing of parameter, I wrapped it inside a struct. We could move to the v1 way also, would require your suggestion here!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code that serves different problems needs to be kept separately. You already have implementation of translating tags / attributes between OTLP and DB, which is part of the overall converter - logical organization, why change that? Separately there's an additional capability that can materialize nested tags into top-level tags in ES object, for improved indexing in ES. That's a separate functionality that doesn't need to know how the tags are translated between OTLP and DB, it just needs to apply after that translation and materialize some of the tags

Copy link
Contributor Author

@Manik2708 Manik2708 Apr 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can seperate the tag conversion and materialization once the confusion stated in #6946 (comment) is resolved! If we have to move by accepting any value (which might lead to "1234" instead of 1234 when spans are converted to OTEL traces) then we can do it but if not (means we have to follow the v1 constraint) then we have to differentiate that is we should be materializing along with conversion otherwise every value in tag map will be a string or we have to do an exra conversion to convert them to their original type!

@@ -0,0 +1,82 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the convention in the fixtures is that es_01 corresponds to otel_01, es_02 to otel_02, etc. Don't break this convention by introducing arbitrary naming scheme. If you have two different flavors of ES span (e.g. one with materialized fields another without) then they should still be named es_01_{suffix}

// Returns slice of translated DB Spans and error if translation failed.
func ToDBModel(td ptrace.Traces) []dbmodel.Span {
func (t toDBModel) convertToDBModel(td ptrace.Traces) []dbmodel.Span {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (t toDBModel) convertToDBModel(td ptrace.Traces) []dbmodel.Span {
func (t *toDBModel) convertToDBModel(td ptrace.Traces) []dbmodel.Span {

Comment on lines +38 to +39
func newToDBModel(allTagsAsObject bool, tagKeysAsFields map[string]bool, tagDotReplacement string) toDBModel {
return toDBModel{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func newToDBModel(allTagsAsObject bool, tagKeysAsFields map[string]bool, tagDotReplacement string) toDBModel {
return toDBModel{
func newToDBModel(allTagsAsObject bool, tagKeysAsFields map[string]bool, tagDotReplacement string) *toDBModel {
return &toDBModel{

}
return dest
}

func attributeToDbTag(key string, attr pcommon.Value) dbmodel.KeyValue {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

@@ -255,96 +219,6 @@ func spanEventsToDbSpanLogs(events ptrace.SpanEventSlice) []dbmodel.Log {
return logs
}

func getTagFromSpanKind(spanKind ptrace.SpanKind) (dbmodel.KeyValue, bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of these changes are necessary or beneficial. Tag appender (bad name) should have just a single function - take a collection of already converted tags (converted in this translator) and materialize all/some of them to the top-level fields.

@Manik2708
Copy link
Contributor Author

#6946 (comment) @yurishkuro There is a problem in this approach, if we will firstly convert all the tags to db tag and then materialize them to top-level fields then the value of dbmodel.KeyValue is always put as string (this is what is being done in v1) whereas in tag map it can be anything. For example when 25 is saved in dbmodel.KeyValue it is stored as "25" in tags but is stored as 25 in tag map (I did this here because we don't want to change the db level span in v2). If we will employ this approach then we will need to convert strings back to their values to put into tag map.

@yurishkuro
Copy link
Member

in the DB model we have Tags []KeyValue and

type KeyValue struct {
        Key   string    `json:"key"`
        Type  ValueType `json:"type,omitempty"`
        Value any       `json:"value"`
}

The Value here is not string, similar to Tag map[string]any. So I don't see a discrepancy - materializing Tags -> Tag should be lossless.

@Manik2708
Copy link
Contributor Author

in the DB model we have Tags []KeyValue and

type KeyValue struct {
        Key   string    `json:"key"`
        Type  ValueType `json:"type,omitempty"`
        Value any       `json:"value"`
}

The Value here is not string, similar to Tag map[string]any. So I don't see a discrepancy - materializing Tags -> Tag should be lossless.

@yurishkuro That exactly what even I thought but in v1, every value is string in snapshot tests, also please see this:


Every value which is stored in db tag is string but in tag map, it is like this:
tagsMap[strings.ReplaceAll(kv.Key, ".", fd.tagDotReplacement)] = kv.Value()

So how should we proceed in v2? As we have to think of backward-compatibilty also as when converting db spans back to OTEL traces, we have to think whether the value is string or any!

@yurishkuro
Copy link
Member

I don't know why v1 converter uses AsString(), I believe I even added a TODO asking about that specifically. But we don't have to blindly replicate v1 converter behavior, I think it should be capturing Value() regardless of where the tag is stored. It's especially important for numeric fields as storing raw value means ES queries can be made against such numeric field, e.g. computing some stats.

The only price we'd pay for using Value() is that the reverse translation (db->otlp) will have to be able to deal with the stored value being either a raw value or a string.

BTW, using Value() directly is not always correct because if the type is Binary it should be encoded somehow. There may be other limitations of using raw values, e.g. whole numbers in JS are limited to 53bits, so we already have some special handing for those when returning to UI to avoid losing precision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants