Skip to content

Added functions to support IO for Parquet files. #562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

ShigrafS
Copy link

@ShigrafS ShigrafS commented Apr 27, 2025

Closes #307

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Summary

This PR adds Parquet file support to the movement package, enabling it to read and write pose tracking data in the tidy DataFrame format used by the [animovement](https://github.com/roaldarbol/animovement) R package. It enhances interoperability, supports efficient data storage, and simplifies integration with modern data analysis tools.


🧩 Related Issue

#307

Support tidy dataframe and Parquet I/O to facilitate data exchange with animovement


✨ What's New

✅ Load Functions (movement/io/load_poses.py)

  • Added from_tidy_df: Converts a tidy pandas DataFrame into an xarray.Dataset.
  • Added from_animovement_file: Reads a .parquet file and converts it using from_tidy_df.
  • Updated from_file to support source_software="animovement".

✅ Save Functions (movement/io/save_poses.py)

  • Added to_tidy_df: Converts an xarray.Dataset to a tidy DataFrame with optional confidence values.
  • Added to_animovement_file: Saves a dataset to a .parquet file via to_tidy_df.

✅ Dependency Update

  • Added pyarrow to pyproject.toml to support Parquet I/O via pandas.

✅ Tests (tests/test_parquet_io.py)

  • Added a new test suite covering:
    • Conversion between tidy DataFrames and datasets
    • Round-trip accuracy (DataFrame → dataset → DataFrame, and Parquet file round-trips)
    • Edge cases like missing data, no confidence, and invalid inputs

💡 Why This Matters

  • Interoperability: Enables seamless exchange with the animovement package.
  • Performance: Parquet provides efficient columnar storage and compression.
  • Usability: Tidy format is ideal for plotting, statistics, and tabular exploration.
  • Reliability: Comprehensive test coverage ensures stable, correct behavior.
  • Modernization: Brings movement closer to data science best practices.

How has this PR been tested?

Local pytest and CI tests.

Is this a breaking change?

If this PR breaks any existing functionality, please explain how and why.

Does this PR require an update to the documentation?

If any features have changed, or have been added. Please explain how the
documentation has been updated.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@ShigrafS ShigrafS marked this pull request as ready for review April 27, 2025 17:40
@ShigrafS
Copy link
Author

@niksirbi @sfmig This PR is ready to be merged.
Kindly review it.

Copy link

codecov bot commented Apr 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (f2b539d) to head (642ad7c).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #562   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           28        28           
  Lines         1571      1641   +70     
=========================================
+ Hits          1571      1641   +70     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ShigrafS
Copy link
Author

@niksirbi @sfmig I've added a few tests to increase the coverage to 100%.
This should solve the Codecov issue.

Copy link

sonarqubecloud bot commented May 1, 2025

@niksirbi niksirbi self-requested a review May 2, 2025 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement I/O for parquet files
1 participant