Gracefully skip BigBench tasks with no data & guard final aggregation #3066

NourFahmy · 2025-06-17T18:28:43Z

Description:
Running the harness on BigBench stalls (or crashes) once it reaches bigbench_simple_arithmetic_multiple_targets_json_generate_until because that task’s configured split (“train”) doesn’t exist, causing:

A ValueError: Instruction "train" corresponds to no data! in datasets.load_dataset().
An unguarded “Test One Doc” block in ConfigurableTask.init that blindly indexes eval_docs[0].
Later, an UnboundLocalError in evaluate() when referencing show_group_table if all tasks have zero examples.

This PR makes the harness robust to missing-split/empty tasks by:

Catching missing-split errors in ConfigurableTask.download() and substituting an empty DatasetDict({"default": …}).
Wrapping the “one‐doc” sanity checks in ConfigurableTask.init inside if self.eval_docs: (with an else fallback for minimal attribute initialization).
Filtering out zero-example tasks at the top of evaluate(), so they never build requests or invoke the LM.
Initializing show_group_table=False before the group-aggregation step and switching from bitwise & to logical and when inserting the "groups" entry in results_dict.

These changes resolve the crash and prevent any lag or endless stalling when the harness reaches that BigBench task, or other tasks that experience similar issues.

Gracefully skip BigBench tasks with no data & guard final aggregation

baberabb · 2025-06-30T10:38:27Z

Sorry for the tardiness, I'll review soon. My main hesitation is we do not want tasks with misconfigured configs to silently work, as that would make maintenance much harder: we'd lose visibility into which tasks are actually broken/misconfigured and would be especially problematic for aggregating group scores. I'll have to think what workaround works better. Currently thinking an exclude-tasks param might work so users can explicitly skip some tasks.

bug fix for tasks with 0 samples

ff16a51

Gracefully skip BigBench tasks with no data & guard final aggregation

NourFahmy requested review from baberabb and StellaAthena as code owners June 17, 2025 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gracefully skip BigBench tasks with no data & guard final aggregation #3066

Gracefully skip BigBench tasks with no data & guard final aggregation #3066

Uh oh!

NourFahmy commented Jun 17, 2025

Uh oh!

baberabb commented Jun 30, 2025

Uh oh!

Uh oh!

Gracefully skip BigBench tasks with no data & guard final aggregation #3066

Are you sure you want to change the base?

Gracefully skip BigBench tasks with no data & guard final aggregation #3066

Uh oh!

Conversation

NourFahmy commented Jun 17, 2025

Uh oh!

baberabb commented Jun 30, 2025

Uh oh!

Uh oh!