Description
There is a bunch of complexity related to dataset collection HDAs - for instance they may or may not appear elsewhere in the history (not just datasets with different HDAs - but the same HDA). This results in terrifying consequences such as a user being able to delete an HDA and affecting a collection that is seemingly unrelated to them. I implemented this but I was just doing what the Trello card spec'ed out - the design was someone else's and I largely don't regret that the models are flexible enough to allow these combinations - but under normal UI-driven use things almost certainly should be simpler.
I am going to layout here how I think it "should" work - i.e. how it UI-driven interactions should operate. I am hoping to get some consensus on this.
- An HDA should not ever appear both in a history at the top-level and inside an HDCA.
- Copying collections between histories or inside a history should copy each HDA.
- The collection creators should result in duplicated HDAs being added to the new HDCA - there should be an option to hide or delete the original HDAs.
- Mapping over a collection should result in one collection appearing for each output in the history - not top-level HDAs that are hidden later as it currently works.
- When
hid
is used in tool actions - something else should be used - probably<hdca_id>:(<element_id>:*):<element_id>
-> e.g.1:sample_x:forward
. Element IDs are preserved like tags instead of growing like names - so this works more like HIDs. - Deleting a collection via the UI should delete all the HDAs in the collection. Purging a collection should work the same way.
- Deleting a collection should result in all related jobs being cancelled.
- HDCA structure should be write-once (semi-immutable). Once
hdca.collection.populated
is True, those are the HDAs that belong to the collection forever - those HDAs may change but the contents of the HDCA will now.