Skip to content

Optimise Storage of Validation Results in MongoDB #371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
karatugo opened this issue Jan 22, 2025 · 0 comments
Open

Optimise Storage of Validation Results in MongoDB #371

karatugo opened this issue Jan 22, 2025 · 0 comments

Comments

@karatugo
Copy link
Member

Description:

The current implementation of store_validation_results_in_db is inefficient as it performs multiple database updates for every field (retrieved, data_valid, and error_code) for every study, resulting in 3 x number of studies queries to MongoDB. This inefficiency impacts performance, particularly when processing a large number of studies.

Proposed Solutions:

  • Batch Update Based on Callback ID:

    • Group all validation results by callbackID and perform a single bulk update in MongoDB.
    • This will reduce the number of database interactions significantly by updating all records related to the same callbackID in one query.
  • Selective Update for Failed Cases:

    • Detect studies that failed (error_code != None) and update them separately.
    • Perform a bulk update for studies without errors, avoiding the overhead of individual field updates.
  • Consolidate Field Updates:

    • Instead of calling store_retrieved_status(), store_data_valid_status(), and store_error_code() individually for each study, combine these into a single MongoDB document update operation.
@karatugo karatugo self-assigned this Jan 22, 2025
@karatugo karatugo changed the title Optimize Storage of Validation Results in MongoDB Optimise Storage of Validation Results in MongoDB Jan 22, 2025
@ljwh2 ljwh2 added this to the Sumstats validation issues 2025 milestone May 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants