Skip to content

[receiver/apachespark] Add application id filter #39936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

NN---
Copy link

@NN--- NN--- commented May 7, 2025

Description

Add application ids filter.

Link to tracking issue

Fixes #39627

Testing

Run with application ids specified.

Documentation

Allow filtering Apache Spark receiver by application id in addition to application name.

@atoulme
Copy link
Contributor

atoulme commented May 7, 2025

Please add a changelog. Please clarify in the README that the filters are a union, meaning all application names and IDs matching will be allowed. Since we already have application names, why add ids?

@atoulme atoulme marked this pull request as draft May 7, 2025 23:46
@atoulme
Copy link
Contributor

atoulme commented May 7, 2025

Please fix CI and address feedback, and move back to ready for review when done.

@NN---
Copy link
Author

NN--- commented May 8, 2025

Sure.
I have several features to add because scanning history server for everything is taking much time.

Feature I need:
Filter by id
Limit by number
Limit by start time absolute
Limit by start time relative

The start - end should be also span. But this is next change.

@NN--- NN--- marked this pull request as ready for review May 8, 2025 08:22
@NN--- NN--- changed the title Add application id filter [receiver/apachespark] Add application id filter May 8, 2025
@NN--- NN--- force-pushed the apachespark_application_id branch from a1c20b5 to 7a3fc42 Compare May 10, 2025 14:59
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label May 25, 2025
@NN---
Copy link
Author

NN--- commented May 25, 2025

Don't close

@github-actions github-actions bot removed the Stale label May 26, 2025
@dehaansa
Copy link
Contributor

@NN--- I believe this is still waiting on addressing the feedback above from @\atoulme

Please clarify in the README that the filters are a union, meaning all application names and IDs matching will be allowed.

Also pinging the codeowners to see if you can get codeowner review @mrsillydog @Caleb-Hurshman

Comment on lines 82 to 93
for _, name := range s.config.ApplicationNames {
if apps, ok := appMap[name]; ok {
if apps, ok := nameMap[name]; ok {
allowedApps = append(allowedApps, apps...)
}
}

// Add apps matching IDs
for _, id := range s.config.ApplicationIDs {
if apps, ok := idMap[id]; ok {
allowedApps = append(allowedApps, apps...)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user were to specify an app ID and an app name that both matched a single app, this would result in the app being added to allowedApps twice, and metrics being scraped for the same app twice. I think we should add some logic to prevent this case in order to make this a true union.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that union is problematic.
Will make it interception.

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Jun 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[receiver/apachespark] Add apachespark scraping by application id
6 participants