Closed
Description
From @nsoranzo on Gitter:
I have huge issues trying to schedule a workflow which executes a series of tools on ridiculously large (21989 elements) collections. If at some point a job handler is restarted it seems it restarts already (partially) completed steps from scratch, in fact a particular step/tool has already 114,557 jobs on the history (in various states).
So the problem is we flush invocation step states only after the whole step has been scheduled. With a collection that size though you could imagine scheduling a single step to take an hour - so it is very possible a hard restart of Galaxy could cause problems like this.