Skip to content

Commit a512e7a

Browse files
committed
Merge remote-tracking branch 'upstream/dev' into fixes
2 parents 6cbe19d + 2da7da1 commit a512e7a

File tree

14 files changed

+47
-30
lines changed

14 files changed

+47
-30
lines changed

.nf-core.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,4 @@
11
repository_type: pipeline
22
nf_core_version: "2.14.1"
3+
lint:
4+
files_exist: conf/igenomes.config

.vscode/settings.json

Lines changed: 0 additions & 4 deletions
This file was deleted.

README.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,13 @@
1919

2020
## Introduction
2121

22-
**nf-core/reportho** is a bioinformatics pipeline that compares and assembles orthology predictions for a query protein. It fetches ortholog lists for a query (or its closest annotated homolog) from public sources, calculates pairwise and global agreement, and generates a consensus list with the desired level of confidence. Optionally, it offers common analysis on the consensus orthologs, such as MSA and phylogeny reconstruction. Additionally, it generates a clean, human-readable report of the results.
22+
**nf-core/reportho** is a bioinformatics pipeline that compares and summarizes orthology predictions for one or a set of query proteins. For each query (or its closest annotated homolog), it fetches ortholog lists from public databases, calculates the agreement of the obtained predictions(pairwise and global) and finally generates a consensus list of orthologs with the desired level of confidence. Optionally, it offers common analysis on the consensus orthologs, such as MSA and phylogeny reconstruction. Additionally, it generates a clean, human-readable report of the results.
2323

2424
<!-- Tube map -->
2525

2626
![nf-core-reportho tube map](docs/images/reportho_tube_map.svg?raw=true "nf-core-reportho tube map")
2727

28-
1. **Obtain Query Information**: (depends on provided input) identification of Uniprot ID and taxon ID for the query or its closest homolog.
28+
1. **Obtain Query Information**: identification of Uniprot ID and taxon ID for the query (or its closest homolog if the fasta file is used as input instead of the Uniprot ID).
2929
2. **Fetch Orthologs**: fetching of ortholog predictions from public databases, either through API or from local snapshot.
3030
3. **Compare and Assemble**: calculation of agreement statistics, creation of ortholog lists, selection of the consensus list.
3131

@@ -47,13 +47,15 @@ First, prepare a samplesheet with your input data that looks as follows:
4747
```csv title="samplesheet_fasta.csv"
4848
id,fasta
4949
BicD2,data/bicd2.fasta
50+
HBB,data/hbb.fasta
5051
```
5152

5253
or if you know the UniProt ID of the protein you can provide it directly:
5354

5455
```csv title="samplesheet.csv"
5556
id,query
5657
BicD2,Q8TD16
58+
HBB,P68871
5759
```
5860

5961
> [!NOTE]
@@ -82,13 +84,13 @@ For more details about the output files and reports, please refer to the
8284

8385
## Credits
8486

85-
nf-core/reportho was originally written by Igor Trujnara (@itrujnara).
87+
nf-core/reportho was originally written by Igor Trujnara ([@itrujnara](https://github.com/itrujnara)).
8688

8789
We thank the following people for their extensive assistance in the development of this pipeline:
8890

89-
- Luisa Santus (@lsantus)
90-
- Alessio Vignoli (@avignoli)
91-
- Jose Espinosa-Carrasco (@JoseEspinosa)
91+
- Luisa Santus ([@luisas](https://github.com/luisas))
92+
- Alessio Vignoli ([@alessiovignoli](https://github.com/alessiovignoli))
93+
- Jose Espinosa-Carrasco ([@JoseEspinosa](https://github.com/JoseEspinosa))
9294

9395
## Contributions and Support
9496

bin/fetch_oma_by_sequence.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,14 @@
99
from Bio import SeqIO
1010
from utils import fetch_seq
1111

12+
# Script overview:
13+
# Fetches the OMA entry for a given protein sequence
14+
# The sequence is passed as a FASTA file
15+
# If the sequence is not found, the script exits with an error
16+
# It outputs 3 files:
17+
# 1. The canonical ID of the sequence
18+
# 2. The taxonomy ID of the species
19+
# 3. A boolean indicating if the sequence was an exact match
1220

1321
def main() -> None:
1422
if len(sys.argv) < 5:

docs/usage.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,15 @@ A final samplesheet file may look something like the one below:
2525
```csv title="samplesheet.csv"
2626
id,query
2727
BicD2,Q8TD16
28+
HBB,P68871
2829
```
2930

3031
or the one below, if you provide the sequence of the protein in FASTA format:
3132

3233
```csv title="samplesheet.csv"
3334
id,fasta
3435
BicD2,/home/myuser/data/bicd2.fa
36+
HBB,/home/myuser/data/hbb.fa
3537
```
3638

3739
| Column | Description |

main.nf

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,8 @@ workflow NFCORE_REPORTHO {
4646
samplesheet_fasta,
4747
)
4848

49-
// emit:
50-
// multiqc_report = REPORTHO.out.multiqc_report // channel: /path/to/multiqc_report.html
49+
emit:
50+
multiqc_report = REPORTHO.out.multiqc_report // channel: /path/to/multiqc_report.html
5151

5252
}
5353
/*

modules.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
},
1313
"csvtk/join": {
1414
"branch": "master",
15-
"git_sha": "5e0c5677ea33b3d4c3793244035a191bd03e6736",
15+
"git_sha": "614abbf126f287a3068dc86997b2e1b6a93abe20",
1616
"installed_by": ["modules"]
1717
},
1818
"fastme": {

modules/local/create_tcoffeetemplate.nf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ process CREATE_TCOFFEETEMPLATE {
1111

1212
output:
1313
tuple val (meta), path("*_template.txt"), emit: template
14+
path("versions.yml"), emit: versions
1415

1516
when:
1617
task.ext.when == null || task.ext.when

modules/local/dump_params.nf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ process DUMP_PARAMS {
1717

1818
output:
1919
tuple val(meta), path("params.yml"), emit: params
20+
path("versions.yml"), emit: versions
2021

2122
when:
2223
task.ext.when == null || task.ext.when

modules/local/fetch_eggnog_group_local.nf

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,8 @@ process FETCH_EGGNOG_GROUP_LOCAL {
22
tag "$meta.id"
33
label 'process_single'
44

5-
conda "conda-forge::python=3.11.0 conda-forge::biopython=1.83.0 conda-forge::requests=2.31.0"
6-
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
7-
'https://depot.galaxyproject.org/singularity/mulled-v2-bc54124b36864a4af42a9db48b90a404b5869e7e:5258b8e5ba20587b7cbf3e942e973af5045a1e59-0' :
8-
'biocontainers/mulled-v2-bc54124b36864a4af42a9db48b90a404b5869e7e:5258b8e5ba20587b7cbf3e942e973af5045a1e59-0' }"
5+
conda "conda-forge::python=3.12.3 conda-forge::ripgrep=14.1.0"
6+
container "community.wave.seqera.io/library/python_ripgrep:324b372792aae9ce"
97

108
input:
119
tuple val(meta), path(uniprot_id), path(taxid), path(exact)
@@ -34,6 +32,7 @@ process FETCH_EGGNOG_GROUP_LOCAL {
3432
cat <<- END_VERSIONS > versions.yml
3533
"${task.process}":
3634
Python: \$(python --version | cut -f2)
35+
ripgrep: \$(rg --version | head -n1 | cut -d' ' -f2)
3736
END_VERSIONS
3837
"""
3938

@@ -46,6 +45,7 @@ process FETCH_EGGNOG_GROUP_LOCAL {
4645
cat <<- END_VERSIONS > versions.yml
4746
"${task.process}":
4847
Python: \$(python --version | cut -f2)
48+
ripgrep: \$(rg --version | head -n1 | cut -d' ' -f2)
4949
END_VERSIONS
5050
"""
5151
}

modules/local/fetch_oma_group_local.nf

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,8 @@ process FETCH_OMA_GROUP_LOCAL {
22
tag "$meta.id"
33
label 'process_single'
44

5-
conda "conda-forge::python=3.11.0 conda-forge::biopython=1.83.0 conda-forge::requests=2.31.0"
6-
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
7-
'https://depot.galaxyproject.org/singularity/mulled-v2-bc54124b36864a4af42a9db48b90a404b5869e7e:5258b8e5ba20587b7cbf3e942e973af5045a1e59-0' :
8-
'biocontainers/mulled-v2-bc54124b36864a4af42a9db48b90a404b5869e7e:5258b8e5ba20587b7cbf3e942e973af5045a1e59-0' }"
5+
conda "conda-forge::python=3.12.3 conda-forge::ripgrep=14.1.0"
6+
container "community.wave.seqera.io/library/python_ripgrep:324b372792aae9ce"
97

108
input:
119
tuple val(meta), path(uniprot_id), path(taxid), path(exact)
@@ -24,15 +22,23 @@ process FETCH_OMA_GROUP_LOCAL {
2422
script:
2523
prefix = task.ext.prefix ?: meta.id
2624
"""
25+
# Obtain the OMA ID for the given Uniprot ID of the query protein
2726
omaid=\$(uniprot2oma_local.py $uniprot_idmap $uniprot_id)
28-
zcat $db | grep \$omaid | head -1 | cut -f3- | awk '{gsub(/\\t/,"\\n"); print}' > ${prefix}_oma_group_oma.txt || test -f ${prefix}_oma_group_oma.txt
27+
28+
# Perform the database search for the given query in OMA
29+
zcat $db | rg \$omaid | head -1 | cut -f3- | awk '{gsub(/\\t/,"\\n"); print}' > ${prefix}_oma_group_oma.txt || test -f ${prefix}_oma_group_oma.txt
30+
31+
# Convert the OMA ids to Uniprot, Ensembl and RefSeq ids
2932
oma2uniprot_local.py $uniprot_idmap ${prefix}_oma_group_oma.txt > ${prefix}_oma_group_raw.txt
3033
uniprotize_oma_local.py ${prefix}_oma_group_raw.txt $ensembl_idmap $refseq_idmap > ${prefix}_oma_group.txt
34+
35+
# Add the OMA column to the csv file
3136
csv_adorn.py ${prefix}_oma_group.txt OMA > ${prefix}_oma_group.csv
3237
3338
cat <<- END_VERSIONS > versions.yml
3439
"${task.process}":
3540
Python: \$(python --version | cut -f2)
41+
ripgrep: \$(rg --version | head -n1 | cut -d' ' -f2)
3642
END_VERSIONS
3743
"""
3844

@@ -44,6 +50,7 @@ process FETCH_OMA_GROUP_LOCAL {
4450
cat <<- END_VERSIONS > versions.yml
4551
"${task.process}":
4652
Python: \$(python --version | cut -f2)
53+
ripgrep: \$(rg --version | head -n1 | cut -d' ' -f2)
4754
END_VERSIONS
4855
"""
4956
}

modules/local/fetch_panther_group_local.nf

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,8 @@ process FETCH_PANTHER_GROUP_LOCAL {
22
tag "$meta.id"
33
label 'process_single'
44

5-
conda "conda-forge::python=3.11.0 conda-forge::biopython=1.83.0 conda-forge::requests=2.31.0"
6-
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
7-
'https://depot.galaxyproject.org/singularity/mulled-v2-bc54124b36864a4af42a9db48b90a404b5869e7e:5258b8e5ba20587b7cbf3e942e973af5045a1e59-0' :
8-
'biocontainers/mulled-v2-bc54124b36864a4af42a9db48b90a404b5869e7e:5258b8e5ba20587b7cbf3e942e973af5045a1e59-0' }"
5+
conda "conda-forge::python=3.12.3 conda-forge::ripgrep=14.1.0"
6+
container "community.wave.seqera.io/library/python_ripgrep:324b372792aae9ce"
97

108
input:
119
tuple val(meta), path(uniprot_id), path(taxid), path(exact)
@@ -22,12 +20,13 @@ process FETCH_PANTHER_GROUP_LOCAL {
2220
prefix = task.ext.prefix ?: meta.id
2321
"""
2422
id=\$(cat ${uniprot_id})
25-
grep \$id $panther_db | tr '|' ' ' | tr '\\t' ' ' | cut -d' ' -f3,6 | awk -v id="\$id" -F'UniProtKB=' '{ for(i=0;i<=NF;i++) { if(\$i !~ id) s=s ? s OFS \$i : \$i } print s; s="" }' > ${prefix}_panther_group_raw.txt || test -f ${prefix}_panther_group_raw.txt
23+
rg \$id $panther_db | tr '|' ' ' | tr '\\t' ' ' | cut -d' ' -f3,6 | awk -v id="\$id" -F'UniProtKB=' '{ for(i=0;i<=NF;i++) { if(\$i !~ id) s=s ? s OFS \$i : \$i } print s; s="" }' > ${prefix}_panther_group_raw.txt || test -f ${prefix}_panther_group_raw.txt
2624
csv_adorn.py ${prefix}_panther_group_raw.txt PANTHER > ${prefix}_panther_group.csv
2725
2826
cat <<- END_VERSIONS > versions.yml
2927
"${task.process}":
3028
Python: \$(python --version | cut -f2)
29+
ripgrep: \$(rg --version | head -n1 | cut -d' ' -f2)
3130
END_VERSIONS
3231
"""
3332

@@ -39,6 +38,7 @@ process FETCH_PANTHER_GROUP_LOCAL {
3938
cat <<- END_VERSIONS > versions.yml
4039
"${task.process}":
4140
Python: \$(python --version | cut -f2)
41+
ripgrep: \$(rg --version | head -n1 | cut -d' ' -f2)
4242
END_VERSIONS
4343
"""
4444
}

modules/nf-core/csvtk/join/tests/main.nf.test

Lines changed: 0 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

subworkflows/local/get_orthologs.nf

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,6 @@ workflow GET_ORTHOLOGS {
5050
.map { it -> [it[0], file(it[1])] }
5151
.set { ch_fasta }
5252

53-
ch_fasta.view()
54-
5553
IDENTIFY_SEQ_ONLINE (
5654
ch_fasta
5755
)
@@ -135,6 +133,7 @@ workflow GET_ORTHOLOGS {
135133

136134
ch_versions = ch_versions.mix(FETCH_INSPECTOR_GROUP_ONLINE.out.versions)
137135

136+
// EggNOG
138137
FETCH_EGGNOG_GROUP_LOCAL (
139138
ch_query,
140139
ch_eggnog,

0 commit comments

Comments
 (0)