Skip to content

Implement a subset of the Common Workflow Language #12909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 54 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
72c1be1
Deal with workflow definitions without position fields.
jmchilton Nov 18, 2019
709321c
Implement subset of the Common Workflow Language tool and workflow fo…
jmchilton Nov 5, 2020
2959d82
CWL Testing and Runner Improvements.
jmchilton Nov 6, 2018
62709bd
WIP: Work toward Galaxy-flavored CWL tools.
jmchilton Apr 20, 2018
509de13
[WIP] Implement client UI for field parameter type for CWL.
jmchilton Nov 17, 2019
86afdc3
WORKAROUND TO GET TAR TO DIRECTORY WORKING AGAIN.
jmchilton Nov 7, 2020
3feb266
Fix non_data_connection workflows
mvdbeek Nov 9, 2021
c8ab035
Fix handling of uploaded_file_name
mvdbeek Nov 10, 2021
ed0d416
Fix directory location tests
mvdbeek Nov 10, 2021
537a019
start documenting state of CWL support
mr-c Nov 11, 2021
500d1fc
Add sentinel value workaround for GALAXY_SLOTS hack
mvdbeek Nov 11, 2021
b8203c5
Assert length of input connections, instead of inputs when disconnect…
mvdbeek Nov 12, 2021
e44ae56
Fix type hints
mr-c Nov 13, 2021
20a19ea
Disable cheetah in configfiles, env vars for cwl tools
mvdbeek Dec 6, 2021
8f442da
Drop test_deserialize_cwl_tool, already testing that more accurately …
mvdbeek Dec 6, 2021
42832ba
Fix wrong resolution of Any type when re-using CWL tools
mvdbeek Dec 7, 2021
0ffd8b0
Coerce discovered optional files to data
mvdbeek Dec 7, 2021
1538271
Fix complex types via record collection type
mvdbeek Dec 8, 2021
29e3755
Fix handle_known_output for nested output records
mvdbeek Dec 8, 2021
9a749cf
Skip staging inputs for outputs
mvdbeek Dec 8, 2021
5742cad
Fix packed document support if main/#main is tool instead of workflow
mvdbeek Dec 8, 2021
59e8050
Implement rough mapping between EDAM formats and datatypes
mvdbeek Dec 9, 2021
8cec0e0
Support uploading directory literals
mvdbeek Dec 10, 2021
3fc4743
Keep directory parameters in job parameters
mvdbeek Dec 11, 2021
f962202
Merge subworkflow input logic?
mvdbeek Sep 4, 2023
8e75874
Drop divergent to_cwl/from_cwl, factor out extra_step_state building
mvdbeek Sep 5, 2023
2470c0d
TreeDict fix
mvdbeek Sep 5, 2023
f33ea82
Use regular staging for CWL tests instead of allow_path_paste, which …
mvdbeek Sep 5, 2023
e9d4448
Fix directory uploads
mvdbeek Sep 6, 2023
eb37645
Record unnamed_outputs as job outputs
mvdbeek Sep 6, 2023
9b53547
Download complex outputs
mvdbeek Sep 25, 2023
061b1df
Download secondary files as well
mvdbeek Sep 25, 2023
6c41e1c
Implement downloading directory archive
mvdbeek Oct 30, 2023
23ab36e
Quickfix for moving away tool working directory
mvdbeek Oct 30, 2023
87eb125
Various fixes for stricter cwltool and cwltest
mvdbeek Oct 31, 2023
5b2b090
Fix up ontology to datatype mapping for __FETCH_DATA__
mvdbeek Oct 31, 2023
0b69ade
Shortcut param_dict building for CWL tools
mvdbeek Oct 31, 2023
332ea81
WIP: untar directory to extra_files_path
mvdbeek Nov 1, 2023
c5734a0
Add test for workflow default file overrides tool default file
mvdbeek Nov 3, 2023
47e0745
WIP:CWL default file value_from work
mvdbeek Nov 4, 2023
a4be181
Into split trans to app
mvdbeek Nov 5, 2023
faa9d83
Separate and fix value_from overriding default
mvdbeek Nov 5, 2023
ef429c1
Ensure that expression tool null values are treated as null values wh…
mvdbeek Nov 5, 2023
2b307eb
Replace file location with URL ...
mvdbeek Nov 5, 2023
5e09aae
Pack workflow
mvdbeek Nov 5, 2023
68907a0
Update list of new failing 1.2 tests
mvdbeek Nov 6, 2023
d07585b
Drop now passing red tests
mvdbeek Nov 6, 2023
0bce492
Exclude red and required 1.0 tests from github matrix
mvdbeek Nov 6, 2023
bdc15c0
Fix output addition to history if input name is same as output name
mvdbeek Nov 7, 2023
18ae455
Avoid unnecessarily recreating ToolProxy for CWL workflow tools
nsoranzo Feb 3, 2025
ad4b52e
Avoid unnecessarily recreating ToolProxy when loading CWL tool in too…
nsoranzo Feb 3, 2025
b7229d7
Allow dev versions of CWL
nsoranzo Feb 3, 2025
f835153
Log exception if converting toolbox to dict fails
nsoranzo Jun 24, 2025
b13efde
Get tool versions instead of the whole tool panel to check tool presence
nsoranzo Jun 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .github/workflows/cwl_conformance.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,17 @@ concurrency:
jobs:
test:
name: Test
if: ${{ false }}
runs-on: ubuntu-latest
continue-on-error: ${{ startsWith(matrix.marker, 'red') }}
strategy:
fail-fast: false
matrix:
python-version: ['3.9']
marker: ['green', 'red and required', 'red and not required']
conformance-version: ['cwl_conformance_v1_0'] #, 'cwl_conformance_v1_1', 'cwl_conformance_v1_2']
conformance-version: ['cwl_conformance_v1_0', 'cwl_conformance_v1_1', 'cwl_conformance_v1_2']
exclude:
- marker: red and required
conformance-version: cwl_conformance_v1_0
services:
postgres:
image: postgres:17
Expand Down
24 changes: 24 additions & 0 deletions doc/source/dev/cwl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
CWL import in Galaxy
====================

What is supported
-----------------

What is not supported
---------------------

Some CWL Expressions / Parameter references that do math on `$(resources.cores)`
or similar will likely not work.

How to enable it?
-----------------

1. List paths to CWL tools in `tool_conf.xml` .
2. Set the following in `galaxy.yml`:

```yaml
enable_beta_tool_formats: true
enable_beta_workflow_modules: true
check_upload_content: false
strict_cwl_validation: false
```
3 changes: 3 additions & 0 deletions lib/galaxy/config/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -937,6 +937,9 @@ def _process_config(self, kwargs: Dict[str, Any]) -> None:
else None
)

# TODO: migrate to schema.
# Should CWL artifacts be loaded with strict validation enabled.
self.strict_cwl_validation = string_as_bool(kwargs.get("strict_cwl_validation", "True"))
# These are not even beta - just experiments - don't use them unless
# you want yours tools to be broken in the future.
self.enable_beta_tool_formats = string_as_bool(kwargs.get("enable_beta_tool_formats", "False"))
Expand Down
2 changes: 1 addition & 1 deletion lib/galaxy/config/sample/datatypes_conf.xml.sample
Original file line number Diff line number Diff line change
Expand Up @@ -338,7 +338,7 @@
<datatype extension="tar" auto_compressed_types="gz,bz2" type="galaxy.datatypes.binary:CompressedArchive" subclass="true" display_in_upload="true">
<converter file="archive_to_directory.xml" target_datatype="directory"/>
</datatype>
<datatype extension="directory" type="galaxy.datatypes.data:Directory"/>
<datatype extension="directory" type="galaxy.datatypes.data:Directory" display_in_upload="true"/>
<datatype extension="bwa_mem2_index" display_in_upload="true" type="galaxy.datatypes.data:Directory" subclass="true"/>
<datatype extension="zarr" type="galaxy.datatypes.data:ZarrDirectory">
<infer_from suffix="zarr" />
Expand Down
30 changes: 29 additions & 1 deletion lib/galaxy/datatypes/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@
self.config = config
self.edam = edam
self.datatypes_by_extension: Dict[str, Data] = {}
self.datatypes_by_format = {}
self.datatypes_by_suffix_inferences = {}
self.mimetypes_by_extension = {}
self.datatype_converters = {}
Expand Down Expand Up @@ -276,13 +277,25 @@
upload_warning_template = Template(upload_warning_el.text or "")
datatype_instance = datatype_class()
self.datatypes_by_extension[extension] = datatype_instance
if not datatype_class.is_subclass:
edam_format = datatype_class.edam_format
prefixed_format = f"edam:{edam_format}"
if prefixed_format not in self.datatypes_by_format:
register_datatype_by_format = True
for super_klass in datatype_class.__mro__[1:-1]:
super_edam_format = getattr(super_klass, "edam_format", None)
if super_edam_format == edam_format:
register_datatype_by_format = False
break
if register_datatype_by_format:
self.datatypes_by_format[prefixed_format] = datatype_instance
if mimetype is None:
# Use default mimetype per datatype specification.
mimetype = self.datatypes_by_extension[extension].get_mime()
self.mimetypes_by_extension[extension] = mimetype
if datatype_class.track_type:
self.available_tracks.append(extension)
if display_in_upload and extension not in self.upload_file_formats:
if display_in_upload:
self.upload_file_formats.append(extension)
# Max file size cut off for setting optional metadata.
self.datatypes_by_extension[extension].max_optional_metadata_filesize = elem.get(
Expand Down Expand Up @@ -443,6 +456,7 @@
override=override,
compressed_sniffers=compressed_sniffers,
)
self.upload_file_formats = list(set(self.upload_file_formats))
self.upload_file_formats.sort()
# Load build sites
if use_build_sites:
Expand Down Expand Up @@ -615,7 +629,7 @@

return generic_datatype_instance

def is_extension_unsniffable_binary(self, ext):
datatype = self.get_datatype_by_extension(ext)
return datatype is not None and isinstance(datatype, binary.Binary) and not hasattr(datatype, "sniff")

Expand Down Expand Up @@ -654,6 +668,20 @@
"""Returns a datatype object based on an extension"""
return self.datatypes_by_extension.get(ext, None)

def get_datatype_by_format_ontology(self, ontology: str):
"""Returns a datatype by format ontology"""
if "edamontology.org/" in ontology:

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
edamontology.org/
may be at an arbitrary position in the sanitized URL.
ontology = f"edam:{ontology.split('edamontology.org/')[1]}"
return self.datatypes_by_format.get(ontology)

def get_datatype_ext_by_format_ontology(self, ontology: str, only_uploadable: bool = False) -> Optional[str]:
"""Returns a datatype by format ontology"""
datatype = self.get_datatype_by_format_ontology(ontology)
if datatype:
if not only_uploadable or datatype.file_ext in self.upload_file_formats:
return datatype.file_ext
return None

def change_datatype(self, data, ext):
if data.extension != ext:
data.extension = ext
Expand Down
2 changes: 1 addition & 1 deletion lib/galaxy/jobs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1172,7 +1172,7 @@ def can_split(self):

@property
def is_cwl_job(self):
return self.tool.tool_type == "cwl"
return self.tool.tool_type in ["galactic_cwl", "cwl"]

def get_job_runner_url(self):
log.warning(f"({self.job_id}) Job runner URLs are deprecated, use destinations instead.")
Expand Down
27 changes: 17 additions & 10 deletions lib/galaxy/jobs/command_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,19 +94,26 @@ def build_command(
external_command_shell = container.shell
else:
external_command_shell = shell
externalized_commands = __externalize_commands(
job_wrapper, external_command_shell, commands_builder, remote_command_params, container=container
)
if container and modify_command_for_container:
# Stop now and build command before handling metadata and copying
# working directory files back. These should always happen outside
# of docker container - no security implications when generating
# metadata and means no need for Galaxy to be available to container
# and not copying workdir outputs back means on can be more restrictive
# of where container can write to in some circumstances.
run_in_container_command = container.containerize_command(externalized_commands)
if job_wrapper.tool and not job_wrapper.tool.may_use_container_entry_point:
externalized_commands = __externalize_commands(
job_wrapper, external_command_shell, commands_builder, remote_command_params, container=container
)
# Stop now and build command before handling metadata and copying
# working directory files back. These should always happen outside
# of docker container - no security implications when generating
# metadata and means no need for Galaxy to be available to container
# and not copying workdir outputs back means on can be more restrictive
# of where container can write to in some circumstances.
run_in_container_command = container.containerize_command(externalized_commands)
else:
tool_commands = commands_builder.build()
run_in_container_command = container.containerize_command(tool_commands)
commands_builder = CommandsBuilder(run_in_container_command)
else:
externalized_commands = __externalize_commands(
job_wrapper, external_command_shell, commands_builder, remote_command_params, container=container
)
commands_builder = CommandsBuilder(externalized_commands)

# Galaxy writes I/O files to outputs, Pulsar uses metadata. metadata seems like
Expand Down
12 changes: 11 additions & 1 deletion lib/galaxy/jobs/runners/local.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

import datetime
import logging
import math
import os
import subprocess
import tempfile
Expand Down Expand Up @@ -67,7 +68,16 @@ def _command_line(self, job_wrapper: "MinimalJobWrapper") -> Tuple[str, str]:
if slots:
slots_statement = f'GALAXY_SLOTS="{int(slots)}"; export GALAXY_SLOTS; GALAXY_SLOTS_CONFIGURED="1"; export GALAXY_SLOTS_CONFIGURED;'
else:
slots_statement = 'GALAXY_SLOTS="1"; export GALAXY_SLOTS;'
cores_min = 1
if job_wrapper.tool:
try:
# In CWL 1.2 it can be a float that can be rounded to the next whole number
cores_min = math.ceil(float(job_wrapper.tool.cores_min))
except ValueError:
# TODO: in CWL this can be an expression referencing runtime
# parameters, e.g. `$(inputs.special_file.size)`
pass
slots_statement = f'GALAXY_SLOTS="{cores_min}"; export GALAXY_SLOTS;'

job_id = job_wrapper.get_id_tag()
job_file = JobState.default_job_file(job_wrapper.working_directory, job_id)
Expand Down
5 changes: 5 additions & 0 deletions lib/galaxy/managers/collections.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,8 +258,11 @@ def _create_instance_for_collection(
name=name,
)
assert isinstance(dataset_collection_instance, model.HistoryDatasetCollectionAssociation)

if implicit_inputs:
for input_name, input_collection in implicit_inputs:
if getattr(input_collection, "ephemeral", False):
input_collection = input_collection.persistent_object
if isinstance(input_collection, model.HistoryDatasetCollectionAssociation):
# Can also get dragged DatasetCollectionElement's here.
# We only use this for extracting workflows currently,
Expand Down Expand Up @@ -424,6 +427,8 @@ def _append_tags(self, dataset_collection_instance, implicit_inputs=None, tags=N
tags = tags or {}
implicit_inputs = implicit_inputs or []
for _, v in implicit_inputs:
if getattr(v, "ephemeral", False):
v = v.persistent_object
for tag in v.auto_propagated_tags:
tags[tag.value] = tag
for _, tag in tags.items():
Expand Down
1 change: 1 addition & 0 deletions lib/galaxy/managers/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -670,6 +670,7 @@ def add_serializers(self):
"genome_build": lambda item, key, **context: str(item.dbkey) if item.dbkey is not None else None,
# derived (not mapped) attributes
"data_type": lambda item, key, **context: f"{item.datatype.__class__.__module__}.{item.datatype.__class__.__name__}",
"cwl_formats": lambda item, key, **context: item.cwl_formats,
"converted": self.serialize_converted_datasets,
# TODO: metadata/extra files
}
Expand Down
7 changes: 5 additions & 2 deletions lib/galaxy/managers/executables.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,22 +29,25 @@ def artifact_class(trans, as_dict: Dict[str, Any], allow_in_directory: Optional[
as_dict = yaml.safe_load(f)

artifact_class = as_dict.get("class", None)
target_object = None
if artifact_class is None and "$graph" in as_dict:
object_id = object_id or "main"
graph = as_dict["$graph"]
target_object = None
if isinstance(graph, dict):
target_object = graph.get(object_id)
else:
for item in graph:
found_id = item.get("id")
if found_id == object_id or found_id == f"#{object_id}":
target_object = item
break

if target_object and target_object.get("class"):
artifact_class = target_object["class"]
if artifact_class in ("CommandLineTool", "ExpressionTool"):
target_object["cwlVersion"] = as_dict["cwlVersion"]

return artifact_class, as_dict, object_id
return artifact_class, as_dict, object_id, target_object


__all__ = ("artifact_class",)
1 change: 1 addition & 0 deletions lib/galaxy/managers/hdas.py
Original file line number Diff line number Diff line change
Expand Up @@ -546,6 +546,7 @@ def __init__(self, app: StructuredApp):
"file_name",
"display_apps",
"display_types",
"cwl_formats",
"validated_state",
"validated_state_message",
# 'url',
Expand Down
33 changes: 30 additions & 3 deletions lib/galaxy/managers/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@

if TYPE_CHECKING:
from galaxy.managers.base import OrmFilterParsersType
from galaxy.tool_util.cwl.parser import ToolProxy


def tool_payload_to_tool(app, tool_dict: Dict[str, Any]) -> Optional[Tool]:
Expand Down Expand Up @@ -96,17 +97,18 @@ def get_tool_by_id(self, object_id):
stmt = select(DynamicTool).where(DynamicTool.id == object_id, DynamicTool.public == true())
return self.session().scalars(stmt).one_or_none()

def create_tool(self, tool_payload: DynamicToolPayload):
def create_tool(self, tool_payload: DynamicToolPayload) -> DynamicTool:
if not getattr(self.app.config, "enable_beta_tool_formats", False):
raise exceptions.ConfigDoesNotAllowException(
"Set 'enable_beta_tool_formats' in Galaxy config to create dynamic tools."
)

uuid = model.get_uuid()
target_object = None
tool_directory: Optional[str] = None
tool_path: Optional[str] = None
if tool_payload.src == "from_path":
tool_format, representation, _ = artifact_class(None, tool_payload.model_dump())
tool_format, representation, _, target_object = artifact_class(None, tool_payload.model_dump())
tool_directory = tool_payload.tool_directory
tool_path = tool_payload.path
else:
Expand All @@ -118,13 +120,17 @@ def create_tool(self, tool_payload: DynamicToolPayload):
if not tool_format:
raise exceptions.ObjectAttributeMissingException("Current tool representations require 'class'.")

proxy: Optional[ToolProxy] = None
if tool_format in ("GalaxyTool", "GalaxyUserTool"):
tool_id = representation.get("id")
if not tool_id:
tool_id = str(uuid)
elif tool_format in ("CommandLineTool", "ExpressionTool"):
# CWL tools
if tool_path:
if target_object is not None:
representation = {"raw_process_reference": target_object, "uuid": str(uuid), "class": tool_format}
proxy = tool_proxy(tool_object=target_object, tool_directory=tool_directory, uuid=uuid)
elif tool_path:
proxy = tool_proxy(tool_path=tool_path, uuid=uuid)
else:
# Build a tool proxy so that we can convert to the persistable
Expand All @@ -149,10 +155,31 @@ def create_tool(self, tool_payload: DynamicToolPayload):
hidden=tool_payload.hidden,
value=representation,
public=True,
proxy=proxy,
)
self.app.toolbox.load_dynamic_tool(dynamic_tool)
return dynamic_tool

def create_tool_from_proxy(self, proxy: "ToolProxy") -> DynamicTool:
if not getattr(self.app.config, "enable_beta_tool_formats", False):
raise exceptions.ConfigDoesNotAllowException(
"Set 'enable_beta_tool_formats' in Galaxy config to create dynamic tools."
)
dynamic_tool = self.get_tool_by_uuid(proxy.uuid)
if not dynamic_tool:
representation = proxy.to_persistent_representation()
dynamic_tool = self.create(
tool_format=proxy._class,
tool_id=proxy.galaxy_id(),
tool_version=representation.get("version"),
uuid=proxy.uuid,
value=representation,
public=True,
proxy=proxy,
)
self.app.toolbox.load_dynamic_tool(dynamic_tool)
return dynamic_tool

def create_unprivileged_tool(
self, user: model.User, tool_payload: DynamicUnprivilegedToolCreatePayload
) -> DynamicTool:
Expand Down
Loading
Loading