Skip to content

Add frame level task #693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Cadene
Copy link
Collaborator

@Cadene Cadene commented Feb 7, 2025

What this does

Motivations: We want to start supporting datasets with multiple tasks per episode. We also want to iterate on a simpler API for adding new frames to a dataset, where all features are added using dataset.add_frame(frame: dict).

For instance:

  • 2D features like a sequence of x,y positions (waypoints)
  • string features like captions

TODO

  • Add annotation per frame
  • Remove task argument of save_episode . add_frame is now the only way to add the task (even if it's the same for every frame)
  • Add first test for add_frame
  • Updated examples/port_datasets/pusht_zarr.py accordingly

How it was tested

  • Ran python examples/port_datasets/pusht_zarr.py
  • Ran tests

How to checkout & try? (for the reviewer)

pytest -sx tests/test_datasets.py::test_add_frame

@aliberts aliberts force-pushed the user/aliberts/2024_11_25_compute_stats_v2 branch from 8feeede to 0c55461 Compare February 9, 2025 13:26
@aliberts aliberts mentioned this pull request Feb 10, 2025
3 tasks
@Cadene Cadene changed the base branch from user/aliberts/2024_11_25_compute_stats_v2 to user/aliberts/2025_02_10_dataset_v2.1 February 10, 2025 15:51
@Cadene Cadene changed the title LeRobotDataset v2.1 Add text_features Feb 10, 2025
@Cadene Cadene force-pushed the user/rcadene/2025_01_27_dataset_v2.1 branch from 34f201c to 8e98c79 Compare February 11, 2025 15:53
@Cadene Cadene changed the title Add text_features Add frame level task Feb 11, 2025
@Cadene Cadene requested a review from aliberts February 11, 2025 16:13
@Cadene Cadene added the dataset Issues regarding data inputs, processing, or datasets label Feb 11, 2025
@Cadene Cadene marked this pull request as ready for review February 11, 2025 16:14
@Cadene Cadene force-pushed the user/rcadene/2025_01_27_dataset_v2.1 branch from ef18a95 to 3281e66 Compare February 11, 2025 16:20
Copy link
Collaborator

@aliberts aliberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some comments

Comment on lines 222 to 223
task_index = self.get_task_index(task)
if task_index not in self.tasks:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equivalent but a bit faster and easier to read I think

Suggested change
task_index = self.get_task_index(task)
if task_index not in self.tasks:
if task not in self.tasks.values():
task_index = self.get_task_index(task)

Copy link
Collaborator Author

@Cadene Cadene Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this logic quite confusing, so I refactor it.

dataset.add_frame({"1d": torch.randn(1)})


def test_add_frame(tmp_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is quite long for what it's doing (~0.5s)
Did you check if this new add_frame is slower than before?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.13s on my side

Screenshot 2025-02-13 at 18 54 49

@Cadene
Copy link
Collaborator Author

Cadene commented Feb 13, 2025

reverting commit:

commit fd6436af9de41a8436396afc296ebb53b0d7bc2c (HEAD -> user/rcadene/2025_01_27_dataset_v2.1)
Author: Remi Cadene <[email protected]>
Date:   Thu Feb 13 19:05:20 2025 +0100

    Revert "Update tests/test_datasets.py"

    This reverts commit 16bb53f8d6e89214b83d3f706e86f40f9128c5f7.

commit 3abc897ca23329301333aec3ee085ca732e99679
Author: Remi Cadene <[email protected]>
Date:   Thu Feb 13 19:05:14 2025 +0100

    Revert "Update tests/test_datasets.py"

    This reverts commit 7170819c605a429402aed904d868b5eff21c0d2a.

commit 7170819c605a429402aed904d868b5eff21c0d2a (origin/user/rcadene/2025_01_27_dataset_v2.1)
Author: Remi <[email protected]>
Date:   Thu Feb 13 18:55:43 2025 +0100

    Update tests/test_datasets.py

    Co-authored-by: Simon Alibert <[email protected]>

commit 16bb53f8d6e89214b83d3f706e86f40f9128c5f7
Author: Remi <[email protected]>
Date:   Thu Feb 13 18:31:51 2025 +0100

    Update tests/test_datasets.py

    Co-authored-by: Simon Alibert <[email protected]>

Because getting:

__________________________________________________________________________________________________ test_add_frame_no_task ___________________________________________________________________________________________________

tmp_path = PosixPath('/private/var/folders/kj/4wjq7mtn6xdb3hz4bwd4db_c0000gn/T/pytest-of-rcadene/pytest-81/test_add_frame_no_task0')

    def test_add_frame_no_task(tmp_path):
        features = {"1d": {"dtype": "float32", "shape": (1,), "names": None}}
>       dataset = LeRobotDataset.create(repo_id=DUMMY_REPO_ID, fps=30, root=tmp_path, features=features)

tests/test_datasets.py:98:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
lerobot/common/datasets/lerobot_dataset.py:957: in create
    obj.meta = LeRobotDatasetMetadata.create(
lerobot/common/datasets/lerobot_dataset.py:296: in create
    obj.root.mkdir(parents=True, exist_ok=False)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = PosixPath('/private/var/folders/kj/4wjq7mtn6xdb3hz4bwd4db_c0000gn/T/pytest-of-rcadene/pytest-81/test_add_frame_no_task0'), mode = 511, parents = True, exist_ok = False

    def mkdir(self, mode=0o777, parents=False, exist_ok=False):
        """
        Create a new directory at this given path.
        """
        try:
>           self._accessor.mkdir(self, mode)
E           FileExistsError: [Errno 17] File exists: '/private/var/folders/kj/4wjq7mtn6xdb3hz4bwd4db_c0000gn/T/pytest-of-rcadene/pytest-81/test_add_frame_no_task0'

../../miniconda3/envs/lerobot/lib/python3.10/pathlib.py:1175: FileExistsError

@Cadene Cadene merged commit 9d6886d into user/aliberts/2025_02_10_dataset_v2.1 Feb 14, 2025
7 checks passed
@Cadene Cadene deleted the user/rcadene/2025_01_27_dataset_v2.1 branch February 14, 2025 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset Issues regarding data inputs, processing, or datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants