Skip to content

Sample Sheet Import of Datasets and Collections #4733

Closed
@jmchilton

Description

@jmchilton

User Stories

This section describes user stories that progressively build up a new GUI component for creating collections from "sample sheet" inputs. This would be a two step modal (avoiding the word wizard) that would allow importing sheets of tabular data into collections of arbitrary complexity. This would allow biologists to use information potentially generated from cores directly or build structured views of their data using tools such as Excel which they are potentially most comfortable dealing with.

User Story 1

  • User is presented with an interface to upload a single file or copy/paste in a CSV.
  • User uploads a spread sheet with one column, "path" which is just the path relative to FTP.
  • Galaxy backend processes this into a JSON format for consumption by the GUI description.
  • GUI renders the tabular data and allows "rule creation for parsing it".
  • User can select column for path to file.
  • User clicks "Build" and the backend creates the requested collection.

User Story 2

  • Same as above but file:// can be used for admins, and http:// https:// ftp:// can be used for all users.

User Story 3

  • Same as above but user can specify a two columns - one for path and one for identifier.

User Story 4

  • Same as above but user can specify 3 columns - an additional one for forward/reverse and build "list:paired"s in this case.

User Story 5

  • Same as above, but user can specify any number of list identifiers to build nested structures.

User Story 6

  • Same as above but user submit a very large sheet and only the first N rows are returned and rendered in the GUI so this can work at any scale. This will also ensure we are describing "rules" via the GUI and not working with the data directly.

User Story 7

  • Same as above but the user can select a column that splits the collection into separate collections. This enables for instance nested control versus nested condition collections.

User Story 8

  • Same as above but the user can specify a column to serve as a validator for the data - such as an md5sum or a sha1sum.

User Story 9

  • Same as above but the user can specify rules to apply to a column to generate a new "pseudo" column and assign a rule to that column. For instance a regex to parse "_f" versus "_r" for forward-reverse.

User Story 10

  • Same as above but a column can be used to specify tags and annotations for the datasets.

Future Directions:

Record Dataset Collection Types

The way paired data is described above could be extended to be used with record collection types. I would see the path forward as merging the record dataset collection commit from CWL, allow tools to describe collection types they consume, allow users to fetch these type descriptions during import here and apply rules to the columns and rows in some structured way. This would also be a way to consume certain metadata from the sheet - the record descriptions allow non-data parameters the way they do in CWL.

xref #3834
xref common-workflow-lab#71

Metadata

We need to come up with ways to think about user-supplied metadata in the context of collections and outside of records I think. I say we get this practical piece done first and then start working toward that if it is a priority.

EtherCalc

There would be a couple potential uses for a Supervisor setup that always ran an EtherCalc server beside Galaxy and some permanent bridge connecting them. This could allow users to work with sample sheet data in a more "Excel-y" way before it is even imported. This GUI described here could then follow those imports and transformations.

Other Related Issues of Interest

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions