Skip to content

Fix vcf2zarr tests #1309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 14, 2025
Merged

Fix vcf2zarr tests #1309

merged 3 commits into from
Apr 14, 2025

Conversation

tomwhite
Copy link
Collaborator

@tomwhite tomwhite marked this pull request as ready for review April 14, 2025 09:47
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (c4bf260) to head (3cae0f6).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff            @@
##              main     #1309   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           46        46           
  Lines         2992      2992           
=========================================
  Hits          2992      2992           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tomwhite
Copy link
Collaborator Author

I got ValueError: Zarr schema format version mismatch: 0.4 != 0.5, so I regenerated the schema, see 3cae0f6.

But now the docs build is failing with

"'alleles' is not a valid dimension or coordinate for Dataset with dimensions FrozenMappingWarningOnValuesAccess({'variants': 10879, 'samples': 250, 'FORMAT_AD_dim': 2})"

I noticed that for call_AD the dimension is now FORMAT_AD_dim, not alleles.

Any idea why that's happening @benjeffery?

@jeromekelleher
Copy link
Collaborator

I noticed that for call_AD the dimension is now FORMAT_AD_dim, not alleles.

That's a good catch @tomwhite, I think we can regard that as bug. Probably something simple happened in the generalisation code, I'll investigate

@jeromekelleher
Copy link
Collaborator

This is weird @tomwhite, I don't see what would have changed here. The number in VCF is "." in the 1000G files I'm looking at, and I assume it's the same here, so I don't see how we'd ever have gotten "alleles" in there. There was never any special case for AD, and I don't think we should overrule the header anyway.

Did you manually edit the schema to fix this problem before by any chance?

@tomwhite
Copy link
Collaborator Author

This is weird @tomwhite, I don't see what would have changed here. The number in VCF is "." in the 1000G files I'm looking at, and I assume it's the same here, so I don't see how we'd ever have gotten "alleles" in there. There was never any special case for AD, and I don't think we should overrule the header anyway.

Did you manually edit the schema to fix this problem before by any chance?

Yes, that was it. Sorry!

@jeromekelleher
Copy link
Collaborator

Great! LGTM

@tomwhite tomwhite merged commit 3770969 into sgkit-dev:main Apr 14, 2025
17 checks passed
tomwhite added a commit to tomwhite/sgkit that referenced this pull request Apr 22, 2025
tomwhite added a commit that referenced this pull request Apr 22, 2025
@tomwhite tomwhite mentioned this pull request May 28, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ImportError: cannot import name 'vcf2zarr' from 'bio2zarr'
3 participants