Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for 2 semconv registries #627

Merged
merged 24 commits into from
Apr 10, 2025

Conversation

lquerel
Copy link
Contributor

@lquerel lquerel commented Mar 3, 2025

This PR closes #604.

With this PR, it’s now possible to create a custom SemConv registry that depends on another registry (e.g., the official OTEL SemConv registry). Currently, only a single-level dependency is supported, meaning only two registries can be merged and resolved automatically by Weaver. This PR is an initial step toward full support for multiple registries without these limitations.

For example, let’s imagine a custom registry named acme referencing the OTEL registry. The following registry_manifest.yaml file should be defined in the directory containing the custom acme registry.

name: acme
description: This registry contains the semantic conventions for the Acme vendor.
semconv_version: 0.1.0
schema_base_url: https://acme.com/schemas/
dependencies:
  - name: otel
    registry_path: [data/multi-registry/otel_registry](https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.31.0.zip[model])

The registry_path can be any valid registry path already supported by Weaver.

The definition of the custom registry acme is defined in the following yaml file:

groups:
  - id: shared.attributes
    type: attribute_group
    brief: Attributes shared between our signals.
    attributes:
      - id: auction.id
        type: int
        brief: >
          The id of the auction.
        stability: stable
      - id: auction.name
        type: string
        brief: >
          The name of the auction
        stability: stable
        examples: ["Fish for sale"]

  - id: metric.auction.bid.count
    type: metric
    metric_name: auction.bid.count
    stability: stable
    brief: "Count of all bids we've seen"
    instrument: counter
    unit: "{bid}"
    attributes:
      - ref: auction.id
        requirement_level: required
      - ref: auction.name
        requirement_level: recommended
      - ref: error.type
        requirement_level: required

The error.type attribute is not defined locally, so Weaver will resolve it using the imported registry.

Notes for reviewers:

  • I’ve extracted the concept of VirtualDirectory from RegistryRepo in order to (1) avoid a circular reference caused by the registry dependencies feature and (2) prepare for supporting virtual directories for templates and policies in a future PR.
  • The include_unreferenced flag has been added to support the use case mentioned by Jeremy. This flag defaults to false.

Copy link

codecov bot commented Mar 25, 2025

Codecov Report

Attention: Patch coverage is 27.08333% with 35 lines in your changes missing coverage. Please review.

Project coverage is 73.6%. Comparing base (8d65bd8) to head (dcb2462).

Files with missing lines Patch % Lines
crates/weaver_resolver/src/lib.rs 19.4% 29 Missing ⚠️
crates/weaver_semconv/src/registry_path.rs 0.0% 4 Missing ⚠️
crates/weaver_semconv_gen/src/lib.rs 71.4% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main    #627     +/-   ##
=======================================
- Coverage   74.2%   73.6%   -0.6%     
=======================================
  Files         57      57             
  Lines       4578    4612     +34     
=======================================
- Hits        3399    3398      -1     
- Misses      1179    1214     +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lquerel lquerel changed the title [WIP] - Support for 2 semconv registries Support for 2 semconv registries Mar 26, 2025
@lquerel lquerel marked this pull request as ready for review March 26, 2025 00:25
@lquerel lquerel requested a review from a team as a code owner March 26, 2025 00:25
@jerbly
Copy link
Contributor

jerbly commented Apr 5, 2025

This is really good. I tried this on my own company registry where we purposefully want to add attributes into the otel defined namespaces. I got a few of these:

The group id `registry.aws.s3` is declared multiple times in the following locations:
  │ [Provenance { registry_id: "otel", path: "https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.31.0.zip[model]/aws/
  │ registry.yaml" }, Provenance { registry_id: "main", path: "model/registry/aws.yml" }]

Perhaps we need a merge mode or maybe just group-id naming guidance? Or perhaps a mode where all groups are prepended with the name field in the registry_manifest, making: acme.aws.s3 and otel.aws.s3. I'm not sure.

@jerbly
Copy link
Contributor

jerbly commented Apr 5, 2025

Another usability thing I ran into, not introduced in this PR though, is the discovery of the registry_manifest.yaml. Model files are discovered by searching from the path given at the command line. However, the manifest file has to be at the root of the given path. Perhaps the manifest could be discovered in the same way?

Regardless of the above, if the manifest is NOT found it would be useful to have a log line stating it has not been found to help debug. Perhaps I misspelled it or used .yml. Manifest file not found at location ...

@lquerel
Copy link
Contributor Author

lquerel commented Apr 9, 2025

This is really good. I tried this on my own company registry where we purposefully want to add attributes into the otel defined namespaces. I got a few of these:

The group id `registry.aws.s3` is declared multiple times in the following locations:
  │ [Provenance { registry_id: "otel", path: "https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.31.0.zip[model]/aws/
  │ registry.yaml" }, Provenance { registry_id: "main", path: "model/registry/aws.yml" }]

Perhaps we need a merge mode or maybe just group-id naming guidance? Or perhaps a mode where all groups are prepended with the name field in the registry_manifest, making: acme.aws.s3 and otel.aws.s3. I'm not sure.

I suggest we discuss this during our next SIG meeting.

@lquerel
Copy link
Contributor Author

lquerel commented Apr 9, 2025

Another usability thing I ran into, not introduced in this PR though, is the discovery of the registry_manifest.yaml. Model files are discovered by searching from the path given at the command line. However, the manifest file has to be at the root of the given path. Perhaps the manifest could be discovered in the same way?

Regardless of the above, if the manifest is NOT found it would be useful to have a log line stating it has not been found to help debug. Perhaps I misspelled it or used .yml. Manifest file not found at location ...

@jerbly Could you create a GitHub issue to add the warning to improve user experience? Regarding the recursive search for the manifest file, I’m less sure about it, as it raises the problem of potentially finding multiple manifests and deciding how to handle that. For now, I think it’s best to keep it simple until we have a clear need for that type of functionality.

@lquerel lquerel added enhancement New feature or request ux User experience enhancement labels Apr 9, 2025
@lquerel lquerel self-assigned this Apr 9, 2025
@lquerel lquerel moved this to Next Release in OTel Weaver Project Apr 9, 2025
@lquerel lquerel moved this from Next Release to To consider for the next release in OTel Weaver Project Apr 9, 2025
@jerbly
Copy link
Contributor

jerbly commented Apr 9, 2025

@jerbly Could you create a GitHub issue to add the warning to improve user experience?

See #691

@jsuereth jsuereth moved this from To consider for the next release to Next Release in OTel Weaver Project Apr 9, 2025
Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to go now with the default change and extra config.

/// Garbage collect all the signals and attributes not defined or referenced in the
/// current registry, i.e. telemetry objects only defined in a dependency and not
/// referenced in the current registry.
fn gc_unreferenced_objects(manifest: Option<&RegistryManifest>, registry: &mut Registry) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"garbage collect" :)

I like keeping this concise, easy to read/maintain.

@lquerel lquerel merged commit 17d6609 into open-telemetry:main Apr 10, 2025
24 checks passed
@github-project-automation github-project-automation bot moved this from Next Release to Done in OTel Weaver Project Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ux User experience enhancement
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Support for 2 semconv registries
3 participants