Skip to content

RSeQC Infer Experiment fails with --bam-csi-index enabled #1517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
milos7250 opened this issue Mar 18, 2025 · 1 comment
Open

RSeQC Infer Experiment fails with --bam-csi-index enabled #1517

milos7250 opened this issue Mar 18, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@milos7250
Copy link

milos7250 commented Mar 18, 2025

Description of the bug

Whilst looking at the results of running the pipeline, I have noticed that the multiqc report is missing strandedness information. I also have an error with all my samples failing strandedness checks at the end of the pipeline. I investigated the error files for rseqc infer experiment, and I have found an error message saying that it was not able to load the index file for the sample.

INFO nextflow.Nextflow - -[nf-core/rnaseq] Please check MultiQC report: 4/4 samples failed strandedness check.

I am wondering if this has something to do with the parameter --bam-csi-index that I have enabled as I am working with a quite large transcriptome.

Nextflow logs do not show any other errors, the jobs for BAM_RSEQC finish with no errors with exit code 0. The whole pipeline also finished with no further errors (except the strandedness one). The infer_experiment.txt files just contain Unknown Data type and nothing else.

I also noticed that the index file is not staged into the work directory of BAM_RSEQC.

I am unsure of what other information I should provide, please let me know and I will provide additional information.

Command used and terminal output

$ nextflow run nf-core/rnaseq -r 3.18.0 -profile apptainer -params-file params.json -with-dag -resume


[E::idx_find_and_load] Could not retrieve index file for 'Rep1.markdup.sorted.bam'
Reading reference gene model PanBaRT20.bed ... Done
Loading SAM/BAM file ...  Finished
Total 0 usable reads were sampled

Relevant files

execution_trace_2025-03-18_13-26-52.txt
nf_core_rnaseq_software_mqc_versions.yml.txt
params1.json (removed names of index/fasta files)

System information

N E X T F L O W ~ version 24.10.5
ran on linux HPC
Slurm executor
apptainer container engine
nf-core/rnaseq -r 3.18.0

@milos7250 milos7250 added the bug Something isn't working label Mar 18, 2025
@milos7250
Copy link
Author

I think I have traced this issue further. The gtf file I was using did not work well with the GTF2BED process that is included in the PREPARE_GENOME subforkflow. The step failed silently, the bed file ended up empty, but the pipeline was still able to proceed. This caused the RSeQC infer experiment to fail, and also caused the juncion analysis step to report 100% novel splice junctions (which was suspicious, so I went and traced the issue to the empty bed file).

I am unsure what the issue with my gtf file was, but I converted it into GFF and supplied that in the parameters, and that solved both issues with infer experiment, and the splice junction analysis.

Perhaps a check should be included for the generated bed file to make sure it's not empty, but that seems like a contribution to nf-core/modules rather than here. Correct me if I'm wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant