Skip to content

feat: Add wrapper for Tandem Repeat Finder (TRF) utility #4160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 51 commits into
base: master
Choose a base branch
from

Conversation

rohan-ibn-tariq
Copy link

@rohan-ibn-tariq rohan-ibn-tariq commented Jun 2, 2025

This PR provides support of executing TRF command line utility (https://prefix.dev/channels/bioconda/packages/trf) through Snakemake using Snakemake wrappers. On top of basic TRF feature(s) this wrapper additionally makes it easy for users to write Snakemake rules by providing following things:

  • In case user doesn't provide any parameters it automatically sets to recommended defaults by the TRF author as on the website(https://tandem.bu.edu/trf/help) and allows users to right a bare minimum Snakemake rule.
  • Raises errors or warning in case the parameter value or flag with options value is set to that not allowed by the author.
  • It filters or recognizes some mistakes and raise errors or warning based on that:
    • Error in case a known flag is mistyped like -m tp *m with feedback.
    • Warning and ignoring unknown flags if tried to entered like -) -Y etc.
    • Supporting multiple options for option value flag, -l20, -l 20 and -l=20.
  • In case user provides partial parameters list, it use recommended defaults with updates to those parameters which user provided specifically.
  • Handles missing log files gracefully.
  • Support to print the directory contents on Screen once output has been ready.
  • Has logging configured for debug, info mode etc.
  • Allows fasta format only as per snakemake wrapper utils: https://github.com/snakemake/snakemake-wrapper-utils/blob/master/snakemake_wrapper_utils/snakemake.py
  • Have basic integration tests in place with various rule scenarios.
  • Updated meta for related documentation.

QC

While the contributions guidelines are more extensive, please particularly ensure that:

  • test.py was updated to call any added or updated example rules in a Snakefile -> yes, but in test-wrappers.py
  • input: and output: file paths in the rules can be chosen arbitrarily
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:)
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to -> Not applicable in my case.
  • the meta.yaml contains a link to the documentation of the respective tool or command under url:
  • conda environments use a minimal amount of channels and packages, in recommended ordering

Summary by CodeRabbit

  • New Features

    • Added a Snakemake wrapper for the Tandem Repeat Finder (TRF) tool, enabling automated detection of tandem repeats in DNA sequences.
    • Provided environment configuration files for reproducible setup on Linux platforms.
    • Included metadata and documentation for wrapper usage, parameters, and outputs.
  • Tests

    • Introduced comprehensive test workflows and sample data to validate the TRF wrapper under various scenarios, including parameter variations and error handling.

rohan-ibn-tariq added 30 commits May 28, 2025 08:35
…e and logic, unit test sample, stdout stderr to log file directive
Copy link
Contributor

coderabbitai bot commented Jun 2, 2025

📝 Walkthrough

Walkthrough

This change introduces a new Snakemake wrapper for the Tandem Repeat Finder (TRF) tool, including environment specifications, metadata, wrapper implementation, test workflows, sample data, and automated test cases. The additions enable reproducible TRF runs within Snakemake, with parameter validation, logging, and comprehensive testing of wrapper behavior.

Changes

File(s) Change Summary
bio/trf/environment.linux-64.pin.txt,
bio/trf/environment.yaml
Added Conda environment specification and YAML for TRF tool installation and reproducibility.
bio/trf/meta.yaml Added metadata file defining the Snakemake wrapper interface, parameters, and usage notes for TRF.
bio/trf/wrapper.py Added new Snakemake wrapper script for TRF with parameter parsing, validation, command construction, and logging.
bio/trf/test/Snakefile Added sample Snakemake rule to demonstrate TRF wrapper usage with recommended parameters and flags.
bio/trf/test/demo_data/small_test.fasta Added small FASTA file with three test DNA sequences for TRF validation.
bio/trf/test/tests.smk Added Snakemake test workflow with three rules to validate parameter handling and flag parsing in the wrapper.
test_wrappers.py Added four new test functions for the TRF wrapper and improved formatting in existing tests.

Sequence Diagram(s)

sequenceDiagram
    participant Snakemake
    participant Wrapper (wrapper.py)
    participant TRF Tool

    Snakemake->>Wrapper (wrapper.py): Provide input, output, params, log
    Wrapper (wrapper.py)->>Wrapper (wrapper.py): Validate/parse params and flags
    Wrapper (wrapper.py)->>Wrapper (wrapper.py): Prepare command and output directory
    Wrapper (wrapper.py)->>TRF Tool: Execute TRF with constructed command
    TRF Tool-->>Wrapper (wrapper.py): Generate output files
    Wrapper (wrapper.py)-->>Snakemake: Completion and log handling
Loading

Suggested reviewers

  • johanneskoester

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8527cb3 and 38e5270.

📒 Files selected for processing (2)
  • bio/trf/test/Snakefile (1 hunks)
  • test_wrappers.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • bio/trf/test/Snakefile
  • test_wrappers.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Summary
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (12)
bio/trf/utils/__init__.py (1)

1-4: Add a final newline to satisfy Pylint convention
Static analysis flagged a missing final newline (C0304). Please ensure the file ends with a newline to adhere to style guidelines.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 4-4: Final newline missing

(C0304)

bio/trf/environment.yaml (1)

1-8: Standardize version specifiers in environment YAML
In Conda environment files, version pins are typically written without spaces around the equals sign (e.g., trf=4.09 rather than trf =4.09). Consider updating both entries for consistency and clarity.

bio/trf/test/Snakefile (1)

9-16: Potential output directory conflicts between rules.

All three rules use the same output directory pattern "trf_output/{sample}", which could cause conflicts if multiple rules target the same sample or if rules are run in parallel. Consider using rule-specific prefixes.

 rule run_trf_basic_with_defaults_nolog:
     input:
         sample="demo_data/{sample}.fasta",
     output:
-        directory("trf_output/{sample}"),
+        directory("trf_output/basic_defaults/{sample}"),
     wrapper:
         "master/bio/trf"
bio/trf/utils/logger_utils/logger_config.py (1)

16-21: Consider improving type annotation for log_file_path parameter.

The log_file_path parameter should be optional since file logging can be disabled, but it's currently typed as required str.

 def setup_logger(
     name: str,
-    log_file_path: str,
+    log_file_path: str | None,
     log_level: int = logging.INFO,
     enable_file: bool = True,
 ) -> logging.Logger:
bio/trf/utils/filteration/split_params_utils.py (1)

37-137: Consider refactoring this function to reduce complexity.

The split_trf_params function has high cyclomatic complexity with 20 local variables, 23 branches, and 62 statements. While the logic is correct, breaking it down into smaller helper functions would improve maintainability and testability.

Consider extracting these logical sections into separate helper functions:

  1. Parameter validation and normalization (lines 57-69)
  2. Malformed option detection (lines 78-83)
  3. Non-flag token handling (lines 84-93)
  4. Flag with value parsing (lines 95-109)
  5. Option with value parsing (lines 111-121)
  6. Flag processing (lines 123-135)

Example refactoring approach:

+def _process_numeric_params(params_lower, logger):
+    """Extract and validate numeric parameters."""
+    trf_numeric_params = {}
+    for key, value in params_lower.items():
+        if key in TRF_NUMERIC_KEYS:
+            trf_numeric_params[key] = value
+        elif key != "extra":
+            logger.warning(TRF_MESSAGES["UNKNOWN_NUMERIC_PARAM"](key, value))
+    return trf_numeric_params
+
+def _process_token(token, i, tokens, trf_flags, trf_options_with_values, logger):
+    """Process a single token and return the new index."""
+    # Token processing logic here
+    # Return (new_index, updated_flags, updated_options)
+    pass
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 37-37: Too many local variables (20/15)

(R0914)


[refactor] 37-37: Too many branches (23/12)

(R0912)


[refactor] 37-37: Too many statements (62/50)

(R0915)

bio/trf/test/tests.smk (1)

62-62: Fix typo in rule name.

The rule name has a double underscore that appears to be unintentional.

-rule run_trf_permissible__optional_flag_with_mistake:
+rule run_trf_permissible_optional_flag_with_mistake:
bio/trf/utils/filteration/constants.py (1)

15-15: Fix typo in comment.

-in other TRF parameter parsing utiliti(y/es).
+in other TRF parameter parsing utilities.
bio/trf/wrapper.py (3)

1-8: Fix typo in module docstring.

There's a typo in the docstring: "recommened" should be "recommended".

-    to recommened parameter values and flags for an ease of 
+    to recommended parameter values and flags for an ease of 

21-21: Consider making log file path configurable.

The hardcoded log file path "logs/trf_run_by_logger.log" might cause issues in different execution contexts or when multiple instances run concurrently.

Consider making this path relative to the output directory or allowing configuration:

-logger = setup_logger("trf_logger", "logs/trf_run_by_logger.log", logging.INFO)
+log_dir = Path(snakemake.output[0]).resolve() / "logs"
+logger = setup_logger("trf_logger", str(log_dir / "trf_run_by_logger.log"), logging.INFO)

63-63: Document the exit code 3 handling.

The shell command allows exit code 3 to be ignored. This should be documented to explain why this specific exit code is acceptable.

Add a comment explaining why exit code 3 is allowed:

+# TRF returns exit code 3 in some normal completion scenarios
 shell(f"( {cmd} || [ $? -eq 3 ] ) {log_redirect}")
bio/trf/meta.yaml (1)

26-26: Fix YAML formatting issues.

The file has several formatting issues that should be corrected for proper YAML standards:

  • Trailing spaces on multiple lines
  • Missing newline at end of file
-  Flags are default to `-m -f -d`, in case the user doesn't specify a flag. Flag(s) are specified using the 'extra' param (e.g., '-d -h'). 
+  Flags are default to `-m -f -d`, in case the user doesn't specify a flag. Flag(s) are specified using the 'extra' param (e.g., '-d -h').
-  If you choose `logs` directory to log in, kindly don't use this file name: **trf_run_by_logger.log**. This is reserved for logger input information. 
+  If you choose `logs` directory to log in, kindly don't use this file name: **trf_run_by_logger.log**. This is reserved for logger input information.
-  The wrapper has been made as flexible as possible but there is room for adding more flexibility like supporting advance typing catches `-l    =   20` etc., but this has been not catered 
+  The wrapper has been made as flexible as possible but there is room for adding more flexibility like supporting advance typing catches `-l    =   20` etc., but this has been not catered
-  Note: As this is a wrapper for TRF utility, it comes with it's limitations or defects if any. 
+  Note: As this is a wrapper for TRF utility, it comes with it's limitations or defects if any.
+

Also applies to: 32-32, 42-42, 46-46

🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 26-26: trailing spaces

(trailing-spaces)

test_wrappers.py (1)

6980-6988: Consider breaking up long lines for better readability.

While the current format is functional, the very long file path comparisons in the compare_results_with_expected dictionary could be formatted with line breaks for better maintainability:

        compare_results_with_expected={
-            "trf_output/small_test/small_test.fasta.2.7.7.80.10.50.500.dat": "expected/small_test.fasta.2.7.7.80.10.50.500.dat",
+            "trf_output/small_test/small_test.fasta.2.7.7.80.10.50.500.dat": (
+                "expected/small_test.fasta.2.7.7.80.10.50.500.dat"
+            ),
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 6980-6980: Line too long (128/100)

(C0301)


[convention] 6981-6981: Line too long (130/100)

(C0301)


[convention] 6982-6982: Line too long (146/100)

(C0301)


[convention] 6983-6983: Line too long (140/100)

(C0301)


[convention] 6984-6984: Line too long (148/100)

(C0301)


[convention] 6985-6985: Line too long (140/100)

(C0301)


[convention] 6986-6986: Line too long (148/100)

(C0301)


[convention] 6987-6987: Line too long (140/100)

(C0301)


[convention] 6988-6988: Line too long (148/100)

(C0301)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1362ea7 and 923c50d.

⛔ Files ignored due to path filters (1)
  • bio/trf/test/expected/small_test.fasta.2.7.7.80.10.50.500.dat is excluded by !**/*.dat
📒 Files selected for processing (30)
  • bio/trf/config/__init__.py (1 hunks)
  • bio/trf/config/constants.py (1 hunks)
  • bio/trf/config/trf_config.py (1 hunks)
  • bio/trf/environment.linux-64.pin.txt (1 hunks)
  • bio/trf/environment.yaml (1 hunks)
  • bio/trf/meta.yaml (1 hunks)
  • bio/trf/test/Snakefile (1 hunks)
  • bio/trf/test/demo_data/small_test.fasta (1 hunks)
  • bio/trf/test/expected/small_test.fasta.2.7.7.80.10.50.500.mask (1 hunks)
  • bio/trf/test/expected/small_test.fasta.2.7.7.80.10.50.500.summary.html (1 hunks)
  • bio/trf/test/expected/small_test.fasta.s1.2.7.7.80.10.50.500.1.html (1 hunks)
  • bio/trf/test/expected/small_test.fasta.s1.2.7.7.80.10.50.500.1.txt.html (1 hunks)
  • bio/trf/test/expected/small_test.fasta.s2.2.7.7.80.10.50.500.1.html (1 hunks)
  • bio/trf/test/expected/small_test.fasta.s2.2.7.7.80.10.50.500.1.txt.html (1 hunks)
  • bio/trf/test/expected/small_test.fasta.s3.2.7.7.80.10.50.500.1.html (1 hunks)
  • bio/trf/test/expected/small_test.fasta.s3.2.7.7.80.10.50.500.1.txt.html (1 hunks)
  • bio/trf/test/tests.smk (1 hunks)
  • bio/trf/utils/__init__.py (1 hunks)
  • bio/trf/utils/filteration/__init__.py (1 hunks)
  • bio/trf/utils/filteration/constants.py (1 hunks)
  • bio/trf/utils/filteration/split_params_utils.py (1 hunks)
  • bio/trf/utils/logger_utils/__init__.py (1 hunks)
  • bio/trf/utils/logger_utils/constants.py (1 hunks)
  • bio/trf/utils/logger_utils/log_utils.py (1 hunks)
  • bio/trf/utils/logger_utils/logger_config.py (1 hunks)
  • bio/trf/utils/validation/__init__.py (1 hunks)
  • bio/trf/utils/validation/constants.py (1 hunks)
  • bio/trf/utils/validation/validation_utils.py (1 hunks)
  • bio/trf/wrapper.py (1 hunks)
  • test_wrappers.py (5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/trf/utils/__init__.py
  • bio/trf/utils/logger_utils/constants.py
  • bio/trf/utils/validation/__init__.py
  • bio/trf/config/constants.py
  • bio/trf/utils/logger_utils/log_utils.py
  • bio/trf/utils/validation/constants.py
  • bio/trf/utils/filteration/__init__.py
  • bio/trf/utils/filteration/constants.py
  • bio/trf/utils/logger_utils/logger_config.py
  • bio/trf/utils/logger_utils/__init__.py
  • bio/trf/wrapper.py
  • test_wrappers.py
  • bio/trf/config/__init__.py
  • bio/trf/utils/filteration/split_params_utils.py
  • bio/trf/utils/validation/validation_utils.py
  • bio/trf/config/trf_config.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/trf/wrapper.py
🧬 Code Graph Analysis (3)
bio/trf/utils/validation/__init__.py (1)
bio/trf/utils/validation/validation_utils.py (2)
  • validate_trf_numeric_params (24-69)
  • validate_trf_option_values (72-113)
bio/trf/utils/logger_utils/__init__.py (2)
bio/trf/utils/logger_utils/log_utils.py (1)
  • log_output_dir_contents (12-42)
bio/trf/utils/logger_utils/logger_config.py (1)
  • setup_logger (16-88)
bio/trf/wrapper.py (5)
bio/trf/config/trf_config.py (5)
  • TRFConfig (29-139)
  • update_params (62-76)
  • update_flags_bool (78-94)
  • update_flags_with_value (96-112)
  • build_command (114-139)
bio/trf/utils/filteration/split_params_utils.py (1)
  • split_trf_params (37-137)
bio/trf/utils/logger_utils/log_utils.py (1)
  • log_output_dir_contents (12-42)
bio/trf/utils/logger_utils/logger_config.py (1)
  • setup_logger (16-88)
bio/trf/utils/validation/validation_utils.py (2)
  • validate_trf_numeric_params (24-69)
  • validate_trf_option_values (72-113)
🪛 Pylint (3.3.7)
bio/trf/utils/__init__.py

[convention] 4-4: Final newline missing

(C0304)

bio/trf/wrapper.py

[error] 15-15: Unable to import 'snakemake.shell'

(E0401)


[error] 16-16: Unable to import 'snakemake_wrapper_utils.snakemake'

(E0401)


[error] 23-23: Undefined variable 'snakemake'

(E0602)


[error] 24-24: Undefined variable 'snakemake'

(E0602)


[convention] 26-26: Constant name "log_redirect" doesn't conform to UPPER_CASE naming style

(C0103)


[error] 27-27: Undefined variable 'snakemake'

(E0602)


[error] 27-27: Undefined variable 'snakemake'

(E0602)


[error] 28-28: Undefined variable 'snakemake'

(E0602)


[error] 31-31: Undefined variable 'snakemake'

(E0602)


[error] 32-32: Undefined variable 'snakemake'

(E0602)


[error] 50-50: Undefined variable 'snakemake'

(E0602)

test_wrappers.py

[convention] 2067-2067: Line too long (162/100)

(C0301)


[convention] 2118-2118: Line too long (130/100)

(C0301)


[convention] 6980-6980: Line too long (128/100)

(C0301)


[convention] 6981-6981: Line too long (130/100)

(C0301)


[convention] 6982-6982: Line too long (146/100)

(C0301)


[convention] 6983-6983: Line too long (140/100)

(C0301)


[convention] 6984-6984: Line too long (148/100)

(C0301)


[convention] 6985-6985: Line too long (140/100)

(C0301)


[convention] 6986-6986: Line too long (148/100)

(C0301)


[convention] 6987-6987: Line too long (140/100)

(C0301)


[convention] 6988-6988: Line too long (148/100)

(C0301)


[convention] 7069-7069: Line too long (128/100)

(C0301)


[convention] 6967-6967: Missing function or method docstring

(C0116)


[warning] 6967-6967: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 6993-6993: Missing function or method docstring

(C0116)


[warning] 6993-6993: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7008-7008: Missing function or method docstring

(C0116)


[warning] 7008-7008: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7024-7024: Missing function or method docstring

(C0116)


[warning] 7024-7024: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7040-7040: Missing function or method docstring

(C0116)


[warning] 7040-7040: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7056-7056: Missing function or method docstring

(C0116)


[warning] 7056-7056: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7074-7074: Missing function or method docstring

(C0116)


[warning] 7074-7074: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7090-7090: Missing function or method docstring

(C0116)


[warning] 7090-7090: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7106-7106: Missing function or method docstring

(C0116)


[warning] 7106-7106: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7121-7121: Missing function or method docstring

(C0116)


[warning] 7121-7121: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7136-7136: Missing function or method docstring

(C0116)


[warning] 7136-7136: Redefining name 'run' from outer scope (line 41)

(W0621)

bio/trf/utils/filteration/split_params_utils.py

[refactor] 37-37: Too many local variables (20/15)

(R0914)


[refactor] 37-37: Too many branches (23/12)

(R0912)


[refactor] 37-37: Too many statements (62/50)

(R0915)

bio/trf/utils/validation/validation_utils.py

[warning] 51-51: Consider explicitly re-raising using 'except ValueError as exc' and 'raise ValueError(f"TRF parameter '{key}' must be an integer, got: {value}.") from exc'

(W0707)


[warning] 99-99: Consider explicitly re-raising using 'except ValueError as exc' and 'raise ValueError(f'Value for {flag} must be an integer, got: {value}') from exc'

(W0707)

🪛 YAMLlint (1.37.1)
bio/trf/meta.yaml

[error] 26-26: trailing spaces

(trailing-spaces)


[error] 32-32: trailing spaces

(trailing-spaces)


[error] 42-42: trailing spaces

(trailing-spaces)


[error] 46-46: no new line character at the end of file

(new-line-at-end-of-file)


[error] 46-46: trailing spaces

(trailing-spaces)

🪛 Ruff (0.11.9)
bio/trf/wrapper.py

23-23: Undefined name snakemake

(F821)


24-24: Undefined name snakemake

(F821)


27-27: Undefined name snakemake

(F821)


27-27: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


50-50: Undefined name snakemake

(F821)

bio/trf/utils/validation/validation_utils.py

51-51: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


99-99: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Summary
🔇 Additional comments (41)
bio/trf/environment.linux-64.pin.txt (1)

1-31: Auto-generated conda pin file: no review required.
This file is generated by Conda to lock exact package builds and hashes for reproducibility.

bio/trf/test/expected/small_test.fasta.2.7.7.80.10.50.500.mask (1)

1-31: Non-code file (FASTA output): no review needed.
This is an expected-mask FASTA used for test verification.

bio/trf/test/demo_data/small_test.fasta (1)

1-24: Non-code file (FASTA input): no review needed.
This is test input data for the TRF wrapper.

bio/trf/test/expected/small_test.fasta.2.7.7.80.10.50.500.summary.html (1)

1-29: Skip review for static test artifact.
This HTML summary is a golden output used for validating the TRF Snakemake wrapper and contains no executable logic.

bio/trf/test/expected/small_test.fasta.s1.2.7.7.80.10.50.500.1.txt.html (1)

1-68: Skip review for static test artifact.
This HTML report is part of the expected outputs for TRF wrapper tests and doesn’t require code-level review.

bio/trf/test/expected/small_test.fasta.s3.2.7.7.80.10.50.500.1.txt.html (1)

1-188: Skip review for static test artifact.
A large pre-generated HTML output serving as a golden file for test validation; no logic changes to assess.

bio/trf/test/expected/small_test.fasta.s3.2.7.7.80.10.50.500.1.html (1)

1-28: Skip review for static HTML output.
This is a static test artifact generated by TRF and not subject to code review.

bio/trf/utils/logger_utils/__init__.py (1)

16-19: Re-exported logging utilities are consistent.
The imported symbols (DEBUG_FORMATTER, FILE_HANDLER, NORMAL_FORMATTER, log_output_dir_contents, setup_logger) match the entries in __all__, providing a clear public API for logging configuration.

bio/trf/test/expected/small_test.fasta.s1.2.7.7.80.10.50.500.1.html (1)

1-28: LGTM! Well-structured TRF output test file.

This HTML file correctly represents the expected output format from TRF version 4.09. The structure includes proper citation information, sequence parameters, and a well-formatted results table. This test artifact will help ensure the TRF wrapper produces consistent and correct output.

bio/trf/test/expected/small_test.fasta.s2.2.7.7.80.10.50.500.1.html (1)

1-28: LGTM! Comprehensive test coverage with multiple sequences.

This HTML file provides test coverage for a second sequence (length 84) with different tandem repeat analysis results. The consistent TRF output format across multiple test files ensures robust validation of the wrapper functionality.

bio/trf/utils/logger_utils/log_utils.py (1)

12-42: Excellent implementation with robust error handling.

The log_output_dir_contents function is well-designed with several strong points:

  • Proper use of keyword-only arguments for the logger parameter
  • Robust error handling for non-existent directories
  • Cross-platform compatibility using Path objects
  • Informative logging messages including file counts and sizes
  • Appropriate logging levels (warning for errors, info for normal operation)

The implementation follows Python best practices and provides valuable debugging information for TRF wrapper execution.

bio/trf/utils/logger_utils/constants.py (1)

8-21: Well-designed logging configuration constants.

The logging constants provide excellent configurability:

  • DEBUG_FORMATTER includes comprehensive information (level, timestamp, logger name) ideal for development and debugging
  • NORMAL_FORMATTER offers a clean, production-friendly format
  • FILE_HANDLER uses sensible defaults: 5MB file size limit prevents excessive disk usage, 3 backup files provide good history retention, and UTF-8 encoding ensures proper character handling

This centralized configuration approach promotes consistency across the TRF wrapper's logging system.

bio/trf/test/expected/small_test.fasta.s2.2.7.7.80.10.50.500.1.txt.html (1)

1-77: Skip review for static expected output artifact.

This file serves as a static HTML fixture for integration tests and contains no executable code to review.

bio/trf/utils/validation/__init__.py (1)

1-23: Module exports for validation utilities look correct.

The imports from constants and validation_utils match what’s exposed in __all__, and the docstring accurately describes the public API.

bio/trf/config/constants.py (1)

1-37: Default TRF configuration constants correctly defined.

The TRF_DEFAULT_PARAMS, TRF_DEFAULT_FLAGS_WITH_VALUE, and TRF_DEFAULT_FLAGS_BOOL dictionaries align with the TRF author’s recommended defaults (as referenced in the docstring), providing a clear and maintainable base configuration.

bio/trf/utils/validation/constants.py (1)

1-48: Validation constraint dictionaries appear comprehensive and consistent.

TRF_NUMERIC_CONSTRAINTS and TRF_OPTION_VALUE_CONSTRAINTS cover all required parameters and flags with appropriate bounds, allowed sets, and warning thresholds. The docstring clearly documents their purpose and structure.

bio/trf/utils/filteration/__init__.py (1)

1-34: Public interface for parameter splitting correctly defined.

The module cleanly re-exports split_trf_params along with its associated constants (TRF_NUMERIC_KEYS, TRF_VALID_FLAGS, TRF_VALID_OPTIONS_WITH_VALUES, TRF_MESSAGES) as specified in the docstring and __all__. This aligns well with the wrapper’s parameter parsing requirements.

bio/trf/test/Snakefile (2)

20-37: Good example demonstrating full parameter specification.

This rule effectively demonstrates how to use all available TRF parameters with explicit logging configuration.


43-55: Effective demonstration of partial parameter override.

This rule clearly shows how the wrapper handles partial parameter specification while maintaining defaults for unspecified parameters.

bio/trf/utils/logger_utils/logger_config.py (1)

68-71: Good defensive programming for directory creation.

The logic properly checks for directory existence and creates it if needed using exist_ok=True to handle race conditions.

bio/trf/config/__init__.py (1)

1-31: Well-structured module interface with clear documentation.

The module provides a clean public API with proper __all__ definition and comprehensive docstring explaining the exposed components.

bio/trf/config/trf_config.py (3)

56-60: Good use of dataclass with proper default factories.

The implementation correctly uses default_factory to avoid mutable default arguments and properly deep copies the default parameters.


131-139: Command building logic is well-structured.

The method effectively combines all parameter types into a coherent command string with proper formatting and spacing.


127-129: ⚠️ Potential issue

Logic issue: Always overrides user flags with defaults.

The current condition if not self.flags_with_value and not self.flags_bool: uses AND logic, meaning defaults are only applied when BOTH dictionaries are empty. However, if a user provides only boolean flags or only value flags, the other type gets set to defaults, potentially overriding user intent.

Consider this scenario: user provides flags_bool = {"h": True} but no flags_with_value. The current logic will override flags_with_value with defaults, which may not be intended.

-        if not self.flags_with_value and not self.flags_bool:
+        # Only set defaults if no flags have been configured at all
+        if not self.flags_with_value and not self.flags_bool:
             self.flags_with_value = copy.deepcopy(TRF_DEFAULT_FLAGS_WITH_VALUE)
             self.flags_bool = copy.deepcopy(TRF_DEFAULT_FLAGS_BOOL)
+        elif not self.flags_with_value:
+            # User provided bool flags but no value flags - use empty dict for value flags
+            pass
+        elif not self.flags_bool:
+            # User provided value flags but no bool flags - use empty dict for bool flags
+            pass

Likely an incorrect or invalid review comment.

bio/trf/utils/filteration/split_params_utils.py (1)

57-137: Well-implemented parameter parsing logic!

The function correctly handles multiple input formats, case-insensitive matching, and provides clear error messages for malformed inputs. The logging approach is appropriate for warning about ignored parameters while raising exceptions for critical errors.

bio/trf/test/tests.smk (1)

1-113: Excellent test coverage!

The test suite comprehensively covers various scenarios including:

  • Invalid file type handling
  • Invalid parameter values
  • Different flag input styles
  • Error cases with malformed flags
  • Missing log directive behavior

This provides robust validation of the TRF wrapper's error handling and parameter parsing capabilities.

bio/trf/utils/filteration/constants.py (1)

20-56: Well-structured constants and message templates!

The module provides a clean separation of concerns with:

  • Efficient set-based lookups for parameter validation
  • Comprehensive regex patterns for input parsing
  • Consistent error message formatting using lambda templates

This design promotes maintainability and makes it easy to extend with new parameters or flags.

bio/trf/wrapper.py (2)

46-46: Document the necessity of changing working directory.

The os.chdir() call changes the global working directory, which can have side effects. Please ensure this is necessary for TRF execution and consider if there are alternative approaches.

Is changing the working directory necessary for TRF to function correctly? Consider if the command can be executed with explicit paths instead.


48-67: Well-structured parameter processing and execution flow.

The parameter processing, validation, and command execution flow is well-organized and follows good practices:

  • Proper separation of concerns with dedicated validation functions
  • Configuration object pattern for managing parameters
  • Comprehensive logging of the process
🧰 Tools
🪛 Ruff (0.11.9)

50-50: Undefined name snakemake

(F821)

🪛 Pylint (3.3.7)

[error] 50-50: Undefined variable 'snakemake'

(E0602)

bio/trf/meta.yaml (1)

1-46: Comprehensive and well-structured metadata.

The metadata file provides excellent documentation of the TRF wrapper interface, including:

  • Clear parameter descriptions with defaults and constraints
  • Detailed usage notes and examples
  • Appropriate warnings about limitations and reserved file names
  • Good integration guidance for developers
🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 26-26: trailing spaces

(trailing-spaces)


[error] 32-32: trailing spaces

(trailing-spaces)


[error] 42-42: trailing spaces

(trailing-spaces)


[error] 46-46: no new line character at the end of file

(new-line-at-end-of-file)


[error] 46-46: trailing spaces

(trailing-spaces)

bio/trf/utils/validation/validation_utils.py (1)

24-70: Well-designed validation functions with comprehensive constraint checking.

Both validation functions demonstrate excellent design:

  • Clear separation of concerns between numeric parameters and option values
  • Comprehensive constraint validation (min, max, allowed values, recommendations)
  • Proper logging integration for warnings about unknown parameters
  • Good error messages that help users understand validation failures
  • Graceful handling of unknown parameters with warnings rather than errors

Also applies to: 72-113

🧰 Tools
🪛 Ruff (0.11.9)

51-51: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🪛 Pylint (3.3.7)

[warning] 51-51: Consider explicitly re-raising using 'except ValueError as exc' and 'raise ValueError(f"TRF parameter '{key}' must be an integer, got: {value}.") from exc'

(W0707)

test_wrappers.py (10)

158-158: LGTM! Good formatting improvement.

The empty line addition improves code readability and follows Python formatting conventions.


1268-1281: LGTM! Good formatting improvements.

The addition of trailing commas in the function call arguments improves consistency and makes future diffs cleaner when adding new arguments.


2067-2069: LGTM! Improved code formatting and readability.

The proper line breaks and indentation for the commented code and dictionary formatting enhance readability.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 2067-2067: Line too long (162/100)

(C0301)


2114-2119: LGTM! Good formatting consistency.

The trailing comma addition and proper line formatting maintain consistency with the rest of the codebase.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 2118-2118: Line too long (130/100)

(C0301)


6962-6991: Excellent comprehensive test for TRF basic functionality.

The test properly validates the basic TRF wrapper functionality with default parameters and includes comprehensive output file validation using compare_results_with_expected. The test covers all expected output file types (.dat, .mask, .html, .txt.html) which ensures the wrapper generates complete results.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 6980-6980: Line too long (128/100)

(C0301)


[convention] 6981-6981: Line too long (130/100)

(C0301)


[convention] 6982-6982: Line too long (146/100)

(C0301)


[convention] 6983-6983: Line too long (140/100)

(C0301)


[convention] 6984-6984: Line too long (148/100)

(C0301)


[convention] 6985-6985: Line too long (140/100)

(C0301)


[convention] 6986-6986: Line too long (148/100)

(C0301)


[convention] 6987-6987: Line too long (140/100)

(C0301)


[convention] 6988-6988: Line too long (148/100)

(C0301)


[convention] 6967-6967: Missing function or method docstring

(C0116)


[warning] 6967-6967: Redefining name 'run' from outer scope (line 41)

(W0621)


6993-7006: Good coverage of custom parameter functionality.

This test ensures the wrapper correctly handles all custom parameters, which is essential for the flexibility mentioned in the PR objectives.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 6993-6993: Missing function or method docstring

(C0116)


[warning] 6993-6993: Redefining name 'run' from outer scope (line 41)

(W0621)


7008-7021: Well-designed test for partial parameter handling.

This test validates the feature mentioned in the PR objectives where partial parameters are combined with recommended defaults, ensuring robust parameter management.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 7008-7008: Missing function or method docstring

(C0116)


[warning] 7008-7008: Redefining name 'run' from outer scope (line 41)

(W0621)


7024-7054: Excellent error handling validation.

These tests properly validate that the wrapper correctly handles invalid input files and parameter values by expecting subprocess.CalledProcessError. This aligns with the PR objective of validating parameter values and flagging errors for disallowed options.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 7024-7024: Missing function or method docstring

(C0116)


[warning] 7024-7024: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7040-7040: Missing function or method docstring

(C0116)


[warning] 7040-7040: Redefining name 'run' from outer scope (line 41)

(W0621)


7056-7119: Comprehensive flag validation testing.

These tests thoroughly cover the flag validation functionality mentioned in the PR objectives, including:

  • Permissible flags working correctly
  • Error detection for flag mistakes
  • Handling of additional features
  • Processing of impermissible flags

This ensures robust parameter validation as specified in the requirements.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 7069-7069: Line too long (128/100)

(C0301)


[convention] 7056-7056: Missing function or method docstring

(C0116)


[warning] 7056-7056: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7074-7074: Missing function or method docstring

(C0116)


[warning] 7074-7074: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7090-7090: Missing function or method docstring

(C0116)


[warning] 7090-7090: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7106-7106: Missing function or method docstring

(C0116)


[warning] 7106-7106: Redefining name 'run' from outer scope (line 41)

(W0621)


7136-7149: Good logging directive validation.

This test ensures the custom logging functionality works correctly, which supports the configurable logging modes mentioned in the PR objectives.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 7136-7136: Missing function or method docstring

(C0116)


[warning] 7136-7136: Redefining name 'run' from outer scope (line 41)

(W0621)

Copy link
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP, see my comments on Discord

@rohan-ibn-tariq rohan-ibn-tariq marked this pull request as draft June 2, 2025 17:19
@rohan-ibn-tariq rohan-ibn-tariq marked this pull request as ready for review June 4, 2025 05:43
@rohan-ibn-tariq rohan-ibn-tariq marked this pull request as draft June 4, 2025 05:45
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (9)
bio/trf/meta.yaml (4)

20-20: Fix typo in parameter description.

There's a spelling error in the extra parameter description.

-  extra: "Optional command-line flags to pass to Tandem Repeats Finder (TRF). |nl| These flags are appended after the 7 required numeric parameters. Supported flags (default state in parentheses): -m (generate masked sequence file), -f (include flanking sequence), -d (produce .dat file), -h (suppress HTML output), -l <n> (set max tandem repeat size, allowed: >=1, preffered: >=1 and <=29), -ngs (more compact .dat output on multisequence files), -u (usage), -v (version). Provide flags as a quoted string, e.g., '-d -h'."
+  extra: "Optional command-line flags to pass to Tandem Repeats Finder (TRF). |nl| These flags are appended after the 7 required numeric parameters. Supported flags (default state in parentheses): -m (generate masked sequence file), -f (include flanking sequence), -d (produce .dat file), -h (suppress HTML output), -l <n> (set max tandem repeat size, allowed: >= 1, preferred: >= 1 and <= 29), -ngs (more compact .dat output on multisequence files), -u (usage), -v (version). Provide flags as a quoted string, e.g., '-d -h'."

22-24: Fix formatting and typo in notes section.

There are trailing spaces and a typo in the notes section.

notes: >
-  Flag(s) are specified using the 'extra' param (e.g., '-d -h'). 
+  Flag(s) are specified using the 'extra' param (e.g., '-d -h').
  For `-l` flag, write it with space like -l 29. TRF documentation allows -l=29 **but on current version running it behaves
-  abonormally, hence avoid, as this behaves according to the utility.** |nl| |nl|
+  abnormally, hence avoid, as this behaves according to the utility.** |nl| |nl|
🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 22-22: trailing spaces

(trailing-spaces)


28-28: Fix formatting inconsistency in developer notes.

There's an unnecessary period after "For Developers: |nl|".

-  For Developers: |nl|.
+  For Developers: |nl|

36-37: Fix trailing spaces and missing newline.

Static analysis identified formatting issues that should be addressed.

  |nl|
-  Note: As this is a wrapper for TRF utility, it comes with it's limitations or defects if any. Also, allowed values mentioned 
-  below are specified as deduced from the TRF's resources, so this wrapper doesn't validate on top of the TRF utility, rather depict's TRF behaviour.
+  Note: As this is a wrapper for TRF utility, it comes with its limitations or defects if any. Also, allowed values mentioned
+  below are specified as deduced from the TRF's resources, so this wrapper doesn't validate on top of the TRF utility, rather depicts TRF behaviour.
+
🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 36-36: trailing spaces

(trailing-spaces)


[error] 37-37: no new line character at the end of file

(new-line-at-end-of-file)

test_wrappers.py (5)

6962-6996: Improve line formatting for better readability.

The test logic is excellent and comprehensive, validating multiple TRF output file types. However, the long lines in the compare_results_with_expected dictionary should be formatted for better readability.

Consider breaking the long dictionary entries across multiple lines:

         compare_results_with_expected={
-            "trf_output/small_test/"
-            "small_test.fasta.2.7.7.80.10.50.500.dat": 
-            "expected/small_test.fasta.2.7.7.80.10.50.500.dat",
+            "trf_output/small_test/small_test.fasta.2.7.7.80.10.50.500.dat": (
+                "expected/small_test.fasta.2.7.7.80.10.50.500.dat"
+            ),
-            "trf_output/small_test/small_test.fasta.2.7.7.80.10.50.500.mask": 
-            "expected/small_test.fasta.2.7.7.80.10.50.500.mask",
+            "trf_output/small_test/small_test.fasta.2.7.7.80.10.50.500.mask": (
+                "expected/small_test.fasta.2.7.7.80.10.50.500.mask"
+            ),
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 6962-6962: Missing function or method docstring

(C0116)


[warning] 6962-6962: Redefining name 'run' from outer scope (line 41)

(W0621)


6998-7010: Add docstring for better documentation.

The test logic is sound for validating parameter order flexibility in TRF. Consider adding a docstring to explain the test purpose.

 def test_trf_basic_with_change_order(run):
+    """Test TRF wrapper with different parameter order and missing extra flags."""
     run(
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 6998-6998: Missing function or method docstring

(C0116)


[warning] 6998-6998: Redefining name 'run' from outer scope (line 41)

(W0621)


7013-7027: Excellent negative testing approach.

The test correctly validates error handling for invalid parameter values using pytest.raises. This ensures the wrapper properly validates inputs and fails gracefully.

Consider adding a docstring:

 def test_trf_with_invalid_param_value(run):
+    """Test that TRF wrapper fails appropriately when required parameter values are missing."""
     with pytest.raises(subprocess.CalledProcessError):
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 7013-7013: Missing function or method docstring

(C0116)


[warning] 7013-7013: Redefining name 'run' from outer scope (line 41)

(W0621)


7029-7047: Good test coverage for additional flags with formatting improvements needed.

The test appropriately validates support for additional permissible flags. The single file comparison might be sufficient for this specific test case.

Apply the same line formatting improvements as suggested for the previous test:

         compare_results_with_expected={
-            "trf_output/small_test/"
-            "small_test.fasta.2.7.7.80.10.50.500.dat": 
-            "expected/small_test.fasta.2.7.7.80.10.50.500.dat",
+            "trf_output/small_test/small_test.fasta.2.7.7.80.10.50.500.dat": (
+                "expected/small_test.fasta.2.7.7.80.10.50.500.dat"
+            ),
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 7029-7029: Missing function or method docstring

(C0116)


[warning] 7029-7029: Redefining name 'run' from outer scope (line 41)

(W0621)


7049-7082: Comprehensive case-insensitive parameter testing with formatting needs.

Excellent test for validating case-insensitive parameter handling, ensuring the wrapper is user-friendly. The comprehensive output validation is appropriate for this functionality test.

Apply consistent line formatting improvements throughout the dictionary:

         compare_results_with_expected={
-            "trf_output/small_test/"
-            "small_test.fasta.2.7.7.80.10.50.500.dat": 
-            "expected/small_test.fasta.2.7.7.80.10.50.500.dat",
+            "trf_output/small_test/small_test.fasta.2.7.7.80.10.50.500.dat": (
+                "expected/small_test.fasta.2.7.7.80.10.50.500.dat"
+            ),

Also consider adding a docstring:

 def test_trf_basic_with_uppercase_params(run):
+    """Test TRF wrapper accepts parameters with uppercase keys and produces correct outputs."""
     run(
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 7049-7049: Missing function or method docstring

(C0116)


[warning] 7049-7049: Redefining name 'run' from outer scope (line 41)

(W0621)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 923c50d and 5e04700.

📒 Files selected for processing (7)
  • bio/trf/environment.linux-64.pin.txt (1 hunks)
  • bio/trf/environment.yaml (1 hunks)
  • bio/trf/meta.yaml (1 hunks)
  • bio/trf/test/Snakefile (1 hunks)
  • bio/trf/test/tests.smk (1 hunks)
  • bio/trf/wrapper.py (1 hunks)
  • test_wrappers.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • bio/trf/environment.yaml
  • bio/trf/environment.linux-64.pin.txt
  • bio/trf/test/tests.smk
  • bio/trf/test/Snakefile
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/trf/wrapper.py
  • test_wrappers.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/trf/wrapper.py
🪛 Ruff (0.11.9)
bio/trf/wrapper.py

70-70: Undefined name snakemake

(F821)


70-70: Undefined name snakemake

(F821)


71-71: Undefined name snakemake

(F821)


73-73: Undefined name snakemake

(F821)


74-74: Undefined name snakemake

(F821)


77-77: Undefined name snakemake

(F821)


78-78: Undefined name snakemake

(F821)


90-90: Undefined name snakemake

(F821)

🪛 Pylint (3.3.7)
bio/trf/wrapper.py

[error] 12-12: Unable to import 'snakemake.shell'

(E0401)


[error] 70-70: Undefined variable 'snakemake'

(E0602)


[error] 70-70: Undefined variable 'snakemake'

(E0602)


[error] 71-71: Undefined variable 'snakemake'

(E0602)


[error] 73-73: Undefined variable 'snakemake'

(E0602)


[error] 74-74: Undefined variable 'snakemake'

(E0602)


[error] 77-77: Undefined variable 'snakemake'

(E0602)


[error] 78-78: Undefined variable 'snakemake'

(E0602)


[error] 90-90: Undefined variable 'snakemake'

(E0602)

test_wrappers.py

[convention] 2067-2067: Line too long (162/100)

(C0301)


[convention] 2118-2118: Line too long (130/100)

(C0301)


[convention] 6962-6962: Missing function or method docstring

(C0116)


[warning] 6962-6962: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 6998-6998: Missing function or method docstring

(C0116)


[warning] 6998-6998: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7013-7013: Missing function or method docstring

(C0116)


[warning] 7013-7013: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7029-7029: Missing function or method docstring

(C0116)


[warning] 7029-7029: Redefining name 'run' from outer scope (line 41)

(W0621)


[convention] 7049-7049: Missing function or method docstring

(C0116)


[warning] 7049-7049: Redefining name 'run' from outer scope (line 41)

(W0621)

🪛 YAMLlint (1.37.1)
bio/trf/meta.yaml

[error] 22-22: trailing spaces

(trailing-spaces)


[error] 36-36: trailing spaces

(trailing-spaces)


[error] 37-37: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Summary
🔇 Additional comments (2)
bio/trf/wrapper.py (1)

12-12:

❌ Incorrect review comment

Missing imports that are referenced in past review comments.

The past review comment mentioned imports from config, utils.filteration, utils.logger_utils, and utils.validation modules, but these imports are not present in the current code. However, the AI summary indicates these modules exist in the codebase. This suggests the implementation may be incomplete.


🏁 Script executed:

#!/bin/bash
# Description: Check if the referenced modules exist in the codebase and if they should be imported

# Check for config module
fd -t f "config.py" bio/trf/
fd -t d "config" bio/trf/

# Check for utils modules  
fd -t d "utils" bio/trf/
fd -t f "*filteration*" bio/trf/
fd -t f "*logger_utils*" bio/trf/
fd -t f "*validation*" bio/trf/

# Check if TRFConfig class is used anywhere
rg "TRFConfig" bio/trf/

Length of output: 2072


#!/bin/bash
# Inspect imports and references in bio/trf/wrapper.py

echo "=== Import statements in bio/trf/wrapper.py ==="
rg -n "^(import|from)" bio/trf/wrapper.py

echo
echo "=== References to TRFConfig and utils in bio/trf/wrapper.py ==="
rg -n "TRFConfig" bio/trf/wrapper.py || true
rg -n "filteration" bio/trf/wrapper.py || true
rg -n "logger_utils" bio/trf/wrapper.py || true
rg -n "validation" bio/trf/wrapper.py || true

No missing imports in bio/trf/wrapper.py
A review of the current wrapper shows it only imports and uses snakemake.shell. There are no references to TRFConfig or any utils.* modules in this file. The past comments about importing config, utils.filteration, utils.logger_utils, or utils.validation do not apply to the current implementation.

• Imports found in wrapper.py

  • from snakemake.shell import shell

• No occurrences of

  • TRFConfig
  • filteration
  • logger_utils
  • validation

The code does not require the previously mentioned imports—please ignore that outdated feedback.

Likely an incorrect or invalid review comment.

🧰 Tools
🪛 Pylint (3.3.7)

[error] 12-12: Unable to import 'snakemake.shell'

(E0401)

test_wrappers.py (1)

2067-2069: LGTM! Good formatting improvements.

The addition of explanatory comments for disabled gzip file comparisons and trailing commas improve code readability and maintainability.

Also applies to: 2118-2119

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 2067-2067: Line too long (162/100)

(C0301)

@rohan-ibn-tariq rohan-ibn-tariq marked this pull request as ready for review June 4, 2025 07:57
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
bio/trf/meta.yaml (1)

22-22: Fix formatting issues identified by static analysis.

The YAML file has trailing whitespace and is missing a newline at the end of the file.

Apply this diff to fix the formatting:

-notes: >
+notes: >
-  below are specified as deduced from the TRF's resources, so this wrapper doesn't validate on top of the TRF utility, rather depict's TRF behaviour.
+  below are specified as deduced from the TRF's resources, so this wrapper doesn't validate on top of the TRF utility, rather depict's TRF behaviour.
+

Also applies to: 36-36, 37-37

🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 22-22: trailing spaces

(trailing-spaces)

bio/trf/wrapper.py (1)

59-59: Optimize dictionary access.

Use direct membership testing instead of accessing .keys() for better performance and readability.

Apply this diff:

-        k: collected_params[k] for k in trf_params.keys() if k in collected_params
+        k: collected_params[k] for k in trf_params if k in collected_params
🧰 Tools
🪛 Ruff (0.11.9)

59-59: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e04700 and 8527cb3.

📒 Files selected for processing (2)
  • bio/trf/meta.yaml (1 hunks)
  • bio/trf/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/trf/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/trf/wrapper.py
🧠 Learnings (1)
bio/trf/wrapper.py (1)
Learnt from: rohan-ibn-tariq
PR: snakemake/snakemake-wrappers#4160
File: bio/trf/wrapper.py:18-26
Timestamp: 2025-06-04T06:32:20.050Z
Learning: For Snakemake wrappers, it's preferable to keep parameter dictionaries and constants directly visible in wrapper.py files rather than importing from config modules, to maintain minimal, self-documenting code that doc viewers can understand at a glance.
🪛 Ruff (0.11.9)
bio/trf/wrapper.py

59-59: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


76-76: Undefined name snakemake

(F821)


76-76: Undefined name snakemake

(F821)


78-78: Undefined name snakemake

(F821)


80-80: Undefined name snakemake

(F821)


81-81: Undefined name snakemake

(F821)


92-92: Undefined name snakemake

(F821)


93-93: Undefined name snakemake

(F821)


116-116: Undefined name snakemake

(F821)

🪛 Pylint (3.3.7)
bio/trf/wrapper.py

[error] 12-12: Unable to import 'snakemake.shell'

(E0401)


[error] 76-76: Undefined variable 'snakemake'

(E0602)


[error] 76-76: Undefined variable 'snakemake'

(E0602)


[error] 78-78: Undefined variable 'snakemake'

(E0602)


[error] 80-80: Undefined variable 'snakemake'

(E0602)


[error] 81-81: Undefined variable 'snakemake'

(E0602)


[error] 92-92: Undefined variable 'snakemake'

(E0602)


[error] 93-93: Undefined variable 'snakemake'

(E0602)


[error] 116-116: Undefined variable 'snakemake'

(E0602)

🪛 YAMLlint (1.37.1)
bio/trf/meta.yaml

[error] 22-22: trailing spaces

(trailing-spaces)


[error] 36-36: trailing spaces

(trailing-spaces)


[error] 37-37: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Summary
🔇 Additional comments (1)
bio/trf/wrapper.py (1)

1-129: Excellent implementation quality with comprehensive error handling.

The TRF wrapper implementation demonstrates high quality with:

  • Robust parameter validation with clear error messages
  • Comprehensive error handling throughout the workflow
  • Graceful fallback for logging setup failures
  • Well-structured command construction and execution
  • Informative user feedback and warnings

The code is well-documented and follows Snakemake wrapper conventions effectively.

🧰 Tools
🪛 Ruff (0.11.9)

59-59: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


76-76: Undefined name snakemake

(F821)


76-76: Undefined name snakemake

(F821)


78-78: Undefined name snakemake

(F821)


80-80: Undefined name snakemake

(F821)


81-81: Undefined name snakemake

(F821)


92-92: Undefined name snakemake

(F821)


93-93: Undefined name snakemake

(F821)


116-116: Undefined name snakemake

(F821)

🪛 Pylint (3.3.7)

[error] 12-12: Unable to import 'snakemake.shell'

(E0401)


[error] 76-76: Undefined variable 'snakemake'

(E0602)


[error] 76-76: Undefined variable 'snakemake'

(E0602)


[error] 78-78: Undefined variable 'snakemake'

(E0602)


[error] 80-80: Undefined variable 'snakemake'

(E0602)


[error] 81-81: Undefined variable 'snakemake'

(E0602)


[error] 92-92: Undefined variable 'snakemake'

(E0602)


[error] 93-93: Undefined variable 'snakemake'

(E0602)


[error] 116-116: Undefined variable 'snakemake'

(E0602)

Copy link
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the expected results, I would not store and compare them at all. This will cause additional maintenance when trf is updated. A smoke test (just checking that trf runs through) is enough since the wrapper is reasonably simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants