Skip to content

feat(azure_logs_ingestion): Initial azure_logs_ingestion sink #22912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

jlaundry
Copy link

Summary

The current azure_monitor_logs sink uses the Data Collector API, which has been deprecated and will be removed in September 2026.

This sink uses the replacement Logs Ingestion API.

While I did consider making this a drop-in replacement for the existing sink, users need to make numerous breaking infrastructure changes, including:

  • Creating new Data Collection Endpoint and Data Collection Rule resources
  • Moving from a workspace-based secret key to an OAuth credential (App Registration, Managed Identity, etc.)
  • (optionally) Re-configuring logs to use the built-in tables, instead of _CL custom tables.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

  1. Following the Tutorial steps, create a Log Analytics workspace, App Registration, Data Collection Endpoint, and Data Collection Rule
  2. Set the AZURE_TENANT_ID, AZURE_CLIENT_ID, and AZURE_CLIENT_SECRET environment variables from the App Registration
  3. Use the following vector.yaml:
sources:
  stdin:
    type: stdin

sinks:
  azure:
    type: azure_logs_ingestion
    inputs:
      - stdin
    endpoint: https://dce-e42z.westus2-1.ingest.monitor.azure.com
    dcr_immutable_id: dcr-00000000000000000000000000000000
    stream_name: Custom-vector_CL

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

References

@jlaundry jlaundry requested review from a team as code owners April 20, 2025 23:12
@bits-bot
Copy link

bits-bot commented Apr 20, 2025

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added domain: sinks Anything related to the Vector's sinks domain: external docs Anything related to Vector's external, public documentation labels Apr 20, 2025
Signed-off-by: Jed Laundry <[email protected]>
check-spelling run (pull_request_target) for feature-azure_logs_ingestion

Signed-off-by: check-spelling-bot <[email protected]>
on-behalf-of: @check-spelling <[email protected]>
@pront pront self-assigned this Apr 21, 2025
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jlaundry, thanks for this contribution!

It looks good. Two things:

  • We will need some documentation files. See an example here (all files under website). Note that base/ is generated by make generate-component-docs.
  • Is the intention to complete replace the azure_monitor_logs sink? If that's the case maybe we can mark the existing one as deprecated in favor of this new sink.

Copy link
Contributor

@rtrieu rtrieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! Approving with one very minor suggestion.

The maximum size of a batch that is processed by a sink.

This is based on the uncompressed size of the batched events, before they are
serialized/compressed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
serialized/compressed.
serialized or compressed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This text is imported from https://github.com/vectordotdev/vector/blob/master/src/sinks/util/batch.rs#L104, and appears in most of the sinks/base/*.cue files - should we raise a separate PR to change this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable.

@jlaundry
Copy link
Author

We will need some documentation files. See an example here (all files under website). Note that base/ is generated by make generate-component-docs.

Apologies, I thought this page was auto-generated as well - added now.

Is the intention to complete replace the azure_monitor_logs sink? If that's the case maybe we can mark the existing one as deprecated in favor of this new sink.

Good point, added 🙂


static CONTENT_TYPE_VALUE: LazyLock<HeaderValue> =
LazyLock::new(|| HeaderValue::from_static(CONTENT_TYPE));
// static X_MS_CLIENT_REQUEST_ID_HEADER: LazyLock<HeaderName> =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete if not needed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +10 to +11
// #[cfg(test)]
// mod tests;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to add a tests.rs? If not you can delete this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it would be great to have a simple integration test here. But I don't want to block the PR because of this. If you look at other integrations tests, it should be easy to add one (assuming we have a azure logs docker image)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spent some time this weekend looking for a nice way to do testing... and unfortunately it's not as easy as it seems.

  • Because the API expects an OAuth token, we need to mock the OAuth endpoint, which would be simple, except:
  • to mock the token endpoint, we need the azure_identity crate to be updated to 0.23.0 to use a ClientSecretCredential with specific authority endpoint, and
  • that update is blocked, because it and azure_core have substantial refactors, and azure_storage hasn't yet been updated (the azure_blob sink uses azure_storage)
  • The only other option is to use a real Azure App Registration as part of the test suite, and ignore the OAuth token in the mock API - which would be simple to setup for the GitHub Actions tests, but would then require individual devs to have Azure CLI / their own AZURE_CLIENT_ID, which doesn't seem nice.

I've stashed a WIP: jlaundry/vector@feature-azure_logs_ingestion...jlaundry:vector:feature-azure_logs_ingestion-tests and will tinker with this for a bit, but we might be best waiting until the azure_storage crate update is unblocked.

) -> crate::Result<Self> {
let mut parts = endpoint.into_parts();
parts.path_and_query = Some(
// https://my-dce-5kyl.eastus-1.ingest.monitor.azure.com/dataCollectionRules/dcr-000a00a000a00000a000000aa000a0aa/streams/Custom-MyTable?api-version=2023-01-01
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete debug comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: external docs Anything related to Vector's external, public documentation domain: sinks Anything related to the Vector's sinks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Azure DCR-based custom logs -> Logs ingestion API
4 participants