Skip to content

[batch] Drop dead rows from job_group_inst_coll_cancellable_resources #14623

Open
@daniel-goldstein

Description

@daniel-goldstein

What happened?

See here for context on why this table exists and how it is used. Records are added or updated in this table whenever jobs are added to the database or after an attempt for a job completes. Records are currently only removed when the records belong to a cancelled job group. If a job group runs to completion, we end up with many rows in the database that no longer serve any purpose, and (if you sum over the token column), have 0s for all the job columns. This does not affect correctness, but is a lot of wasted space in the database. This leads to two points that together would save a lot of space in the database (I've not quantified how much but select count(*) on this table takes longer than I've been willing to wait.

  1. Rows in this table with the same key (batch_id, update_id, job_group_id, inst_coll) but different token value can be "compacted" into one row with key (batch_id, update_id, job_group_id, inst_coll, 0) (token 0) where all the other columns are summed. This is most useful for cold rows.
  2. Rows whose n_*_jobs and *_cancellable_cores_mcpu columns are 0 can be deleted.

We already do 1 for the aggregated billing tables. Use tokens for parallelism on hot rows and then compact records so that records from before the current day always end up only using 1 row.

Implementing 1 should be a big win for the size of this table. Following that up with 2 would eliminate what I presume to be the vast majority of data in this table.

Version

0.2.132

Relevant log output

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions