Skip to content

Safely pass Prometheus key deletion errors since not every setup will use it #585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 22, 2025

Conversation

tieneupin
Copy link
Contributor

@tieneupin tieneupin commented Apr 16, 2025

The remove_session_by_id() fails when deleting the last session because some of the prometheus key deletion functions weren't encased in a try-except block, which would cause the whole delete request to fail if some of the keys being deleted are already non-existent.

This PR adds a utility function that encases each individual delete request in a try-except block, and logs a warning if that call fails. Implementing that, the 'delete session' button now works as expected. If Murfey database lookup fails, that's a genuine error and the button should fail.

Additionally, an endpoint was added to look up the real-time contents of any prometheus gauges and counters that have been defined in this repo.

@tieneupin tieneupin requested a review from d-j-hatton April 16, 2025 18:09
@tieneupin tieneupin self-assigned this Apr 16, 2025
@tieneupin tieneupin added bug Something isn't working server Relates to the server component labels Apr 16, 2025
Copy link

codecov bot commented Apr 16, 2025

Codecov Report

Attention: Patch coverage is 19.35484% with 25 lines in your changes missing coverage. Please review.

Project coverage is 30.17%. Comparing base (2f8fd7d) to head (2a9ed08).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #585      +/-   ##
==========================================
- Coverage   30.17%   30.17%   -0.01%     
==========================================
  Files          80       80              
  Lines       10494    10509      +15     
  Branches     1402     1405       +3     
==========================================
+ Hits         3167     3171       +4     
- Misses       7222     7233      +11     
  Partials      105      105              
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@stephen-riggs
Copy link
Contributor

I'm not sure I understand the context where this comes up. The prometheus metrics are built into the code, so don't we always start them?
This PR does seem useful though, as I think we'd still want to remove the session even if the prometheus cleanup failed.

@tieneupin
Copy link
Contributor Author

tieneupin commented Apr 17, 2025

I'm not sure I understand the context where this comes up. The prometheus metrics are built into the code, so don't we always start them? This PR does seem useful though, as I think we'd still want to remove the session even if the prometheus cleanup failed.

Ah, I might have misdiagnosed it, then. When I poked this endpoint, I got a KeyError on the following block:

for ri in rsync_instances:
    prom.seen_files.remove(ri.source, session.visit)
    prom.transferred_files.remove(ri.source, session.visit)
    prom.transferred_files_bytes.remove(ri.source, session.visit)
    prom.seen_data_files.remove(ri.source, session.visit)
    prom.transferred_data_files.remove(ri.source, session.visit)
    prom.transferred_data_files_bytes.remove(ri.source, session.visit)

Both ri.source and session.visit were present in the KeyError message printed out, so my understanding was that the entity does not exist in the Prometheus bit of the code, for some reason.

@tieneupin
Copy link
Contributor Author

Updated the PR by encasing each individual prometheus deletion call with a utility function that logs a warning but passes if the call fails. Also added an endpoint that allows you to inspect the contents of the prometheus gauges and counters that have been defined in the top level in the prometheus module.

@tieneupin tieneupin merged commit 368dd2f into main Apr 22, 2025
17 checks passed
@tieneupin tieneupin deleted the cascade-delete-session branch April 22, 2025 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working server Relates to the server component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants