Skip to content
This repository was archived by the owner on Oct 18, 2024. It is now read-only.

Generated summary file missing required column #139

Open
skchronicles opened this issue Feb 9, 2023 · 0 comments
Open

Generated summary file missing required column #139

skchronicles opened this issue Feb 9, 2023 · 0 comments

Comments

@skchronicles
Copy link

Hello @a-slide,

I hope you are having a great day! I was testing out pycoQC and ran into an issue after generating a summaries file with Fast5_to_seq_summary.

Describe the bug
The Fast5_to_seq_summary output summaries file was passed to pycoQC and produced the following error:

Traceback (most recent call last):
  File "/usr/local/bin/pycoQC", line 8, in <module>
    sys.exit(main_pycoQC())
  File "/usr/local/lib/python3.10/dist-packages/pycoQC/__main__.py", line 115, in main_pycoQC
    pycoQC (
  File "/usr/local/lib/python3.10/dist-packages/pycoQC/pycoQC.py", line 120, in pycoQC
    parser = pycoQC_parse (
  File "/usr/local/lib/python3.10/dist-packages/pycoQC/pycoQC_parse.py", line 96, in __init__
    summary_reads_df = self._parse_summary()
  File "/usr/local/lib/python3.10/dist-packages/pycoQC/pycoQC_parse.py", line 136, in _parse_summary
    df = self._select_df_columns (
  File "/usr/local/lib/python3.10/dist-packages/pycoQC/pycoQC_parse.py", line 397, in _select_df_columns
    raise pycoQCError("Column {} not found in the provided sequence_summary file".format(col))
pycoQC.common.pycoQCError: Column read_len not found in the provided sequence_summary file

To Reproduce
Steps to reproduce the behavior:

  1. Fast5_to_seq_summary command to generate the summary file:
$ Fast5_to_seq_summary --threads 8 -f sample/fast5 -s summary.tsv --verbose 2

Here are the first few lines of the output summary.tsv file:

read_id	run_id	channel	start_time
000a1b52-fad6-4d6f-b113-c4b24013fcf9	8d6deda632c3a7303f91016b7707e7310e0bc054	256	42618
0026ba30-0061-401d-8dc1-3cb556d71cb9	8d6deda632c3a7303f91016b7707e7310e0bc054	133	29349
000d264a-1a98-4a55-beb5-9f02dd42fce2	8d6deda632c3a7303f91016b7707e7310e0bc054	170	42809
001ddc14-ccb8-42c3-9fd3-74db3c431a75	8d6deda632c3a7303f91016b7707e7310e0bc054	110	42649
0048af85-5c18-4745-b51e-2fab957aceab	8d6deda632c3a7303f91016b7707e7310e0bc054	61	42292
00519880-3d53-4ee3-8528-7a388ad69b24	8d6deda632c3a7303f91016b7707e7310e0bc054	198	42873

As you can see here, there is no column containing sequence/read length information.

  1. pycoQC command to generate the report:
$ pycoQC -f summary.tsv -o  test.html -j test.json --verbose

Expected behavior

I was expecting the summaries file generated by Fast5_to_seq_summary to be compatible with pycoQC. I also tried re-running the Fast5_to_seq_summary with the following fields option (to include everything):

--fields barcode_arrangement barcode_full_arrangement barcode_score calibration_strand_end calibration_strand_genome_template calibration_strand_identity calibration_strand_start called_events channel channel_digitisation channel_offset channel_range channel_sampling_rate device_id duration flow_cell_id mean_qscore_template protocol_run_id read_id read_number run_id sample_id sequence_length_template skip_prob start_mux start_time stay_prob step_prob strand_score

however, that did not seem to help, and I am getting the same error message.

I can see here, in your parser, that you are looking for these columns to rename and then check to see if they exist.

image

however, if I try to pass sequence_length_2 or sequence_length to the --fields option of Fast5_to_seq_summary, it errors out:

Check input data and options
Traceback (most recent call last):
  File "/usr/local/bin/Fast5_to_seq_summary", line 8, in <module>
    sys.exit(main_Fast5_to_seq_summary())
  File "/usr/local/lib/python3.10/dist-packages/pycoQC/__main__.py", line 168, in main_Fast5_to_seq_summary
    Fast5_to_seq_summary (
  File "/usr/local/lib/python3.10/dist-packages/pycoQC/Fast5_to_seq_summary.py", line 119, in __init__
    raise pycoQCError ("Field {} is not valid, please choose among the following valid fields: {}".format(field, ",".join(self.attrs_grp_dict.keys())))
pycoQC.common.pycoQCError: Field sequence_length_2d is not valid, please choose among the following valid fields: mean_qscore_template,sequence_length_template,called_events,skip_prob,stay_prob,step_prob,strand_score,read_id,start_time,duration,start_mux,read_number,channel,channel_digitisation,channel_offset,channel_range,channel_sampling_rate,run_id,sample_id,device_id,protocol_run_id,flow_cell_id,calibration_strand_genome_template,calibration_strand_end,calibration_strand_start,calibration_strand_identity,barcode_arrangement,barcode_full_arrangement,barcode_score

Desktop:

  • OS: Ubuntu 20.04
  • pycoQC Version: v.2.5.2, installed from pypi

If you need anything else, please let me know.

Best Regards,
@skchronicles

@skchronicles skchronicles changed the title Generate summaries file missing required column Generated summary file missing required column Feb 9, 2023
skchronicles added a commit to OpenOmics/modr that referenced this issue Feb 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant