Gracefully skip BigBench tasks with no data & guard final aggregation #3066
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
Running the harness on BigBench stalls (or crashes) once it reaches bigbench_simple_arithmetic_multiple_targets_json_generate_until because that task’s configured split (“train”) doesn’t exist, causing:
A ValueError: Instruction "train" corresponds to no data! in datasets.load_dataset().
An unguarded “Test One Doc” block in ConfigurableTask.init that blindly indexes eval_docs[0].
Later, an UnboundLocalError in evaluate() when referencing show_group_table if all tasks have zero examples.
This PR makes the harness robust to missing-split/empty tasks by:
Catching missing-split errors in ConfigurableTask.download() and substituting an empty DatasetDict({"default": …}).
Wrapping the “one‐doc” sanity checks in ConfigurableTask.init inside if self.eval_docs: (with an else fallback for minimal attribute initialization).
Filtering out zero-example tasks at the top of evaluate(), so they never build requests or invoke the LM.
Initializing show_group_table=False before the group-aggregation step and switching from bitwise & to logical and when inserting the "groups" entry in results_dict.
These changes resolve the crash and prevent any lag or endless stalling when the harness reaches that BigBench task, or other tasks that experience similar issues.