tc-huang
diff --git a/‎README.md
Lines changed: 1 addition & 1 deletion b/‎README.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/10_use_so100.md
Lines changed: 1 addition & 5 deletions b/‎examples/10_use_so100.md
Lines changed: 1 addition & 5 deletions
diff --git a/‎examples/11_use_lekiwi.md
Lines changed: 1 addition & 5 deletions b/‎examples/11_use_lekiwi.md
Lines changed: 1 addition & 5 deletions
diff --git a/‎examples/11_use_moss.md
Lines changed: 1 addition & 5 deletions b/‎examples/11_use_moss.md
Lines changed: 1 addition & 5 deletions
diff --git a/‎examples/7_get_started_with_real_robot.md
Lines changed: 1 addition & 5 deletions b/‎examples/7_get_started_with_real_robot.md
Lines changed: 1 addition & 5 deletions
diff --git a/‎examples/port_datasets/pusht_zarr.py
Lines changed: 24 additions & 17 deletions b/‎examples/port_datasets/pusht_zarr.py
Lines changed: 24 additions & 17 deletions
diff --git a/‎lerobot/common/constants.py
Lines changed: 15 additions & 0 deletions b/‎lerobot/common/constants.py
Lines changed: 15 additions & 0 deletions
diff --git a/‎lerobot/common/datasets/backward_compatibility.py
Lines changed: 54 additions & 0 deletions b/‎lerobot/common/datasets/backward_compatibility.py
Lines changed: 54 additions & 0 deletions
@@ -210,7 +210,7 @@ A `LeRobotDataset` is serialised using several widespread file formats for each
 - videos are stored in mp4 format to save space
 - metadata are stored in plain json/jsonl files
 
-Dataset can be uploaded/downloaded from the HuggingFace hub seamlessly. To work on a local dataset, you can use the `local_files_only` argument and specify its location with the `root` argument if it's not in the default `~/.cache/huggingface/lerobot` location.
+Dataset can be uploaded/downloaded from the HuggingFace hub seamlessly. To work on a local dataset, you can specify its location with the `root` argument if it's not in the default `~/.cache/huggingface/lerobot` location.
 
 ### Evaluate a pretrained policy
 
 
@@ -335,7 +335,7 @@ python lerobot/scripts/control_robot.py \
   --control.push_to_hub=true
 ```
 
-Note: You can resume recording by adding `--control.resume=true`. Also if you didn't push your dataset yet, add `--control.local_files_only=true`.
+Note: You can resume recording by adding `--control.resume=true`.
 
 ## H. Visualize a dataset
 
@@ -363,8 +363,6 @@ python lerobot/scripts/control_robot.py \
   --control.episode=0
 ```
 
-Note: If you didn't push your dataset yet, add `--control.local_files_only=true`.
-
 ## J. Train a policy
 
 To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command:
@@ -378,8 +376,6 @@ python lerobot/scripts/train.py \
   --wandb.enable=true
 ```
 
-Note: If you didn't push your dataset yet, add `--control.local_files_only=true`.
-
 Let's explain it:
 1. We provided the dataset as argument with `--dataset.repo_id=${HF_USER}/so100_test`.
 2. We provided the policy with `policy.type=act`. This loads configurations from [`configuration_act.py`](../lerobot/common/policies/act/configuration_act.py). Importantly, this policy will automatically adapt to the number of motor sates, motor actions and cameras of your robot (e.g. `laptop` and `phone`) which have been saved in your dataset.
 
@@ -391,7 +391,7 @@ python lerobot/scripts/control_robot.py \
   --control.push_to_hub=true
 ```
 
-Note: You can resume recording by adding `--control.resume=true`. Also if you didn't push your dataset yet, add `--control.local_files_only=true`.
+Note: You can resume recording by adding `--control.resume=true`.
 
 # H. Visualize a dataset
 
@@ -418,8 +418,6 @@ python lerobot/scripts/control_robot.py \
   --control.episode=0
 ```
 
-Note: If you didn't push your dataset yet, add `--control.local_files_only=true`.
-
 ## J. Train a policy
 
 To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command:
@@ -433,8 +431,6 @@ python lerobot/scripts/train.py \
   --wandb.enable=true
 ```
 
-Note: If you didn't push your dataset yet, add `--control.local_files_only=true`.
-
 Let's explain it:
 1. We provided the dataset as argument with `--dataset.repo_id=${HF_USER}/lekiwi_test`.
 2. We provided the policy with `policy.type=act`. This loads configurations from [`configuration_act.py`](../lerobot/common/policies/act/configuration_act.py). Importantly, this policy will automatically adapt to the number of motor sates, motor actions and cameras of your robot (e.g. `laptop` and `phone`) which have been saved in your dataset.
 
@@ -256,7 +256,7 @@ python lerobot/scripts/control_robot.py \
   --control.push_to_hub=true
 ```
 
-Note: You can resume recording by adding `--control.resume=true`. Also if you didn't push your dataset yet, add `--control.local_files_only=true`.
+Note: You can resume recording by adding `--control.resume=true`.
 
 ## Visualize a dataset
 
@@ -284,8 +284,6 @@ python lerobot/scripts/control_robot.py \
   --control.episode=0
 ```
 
-Note: If you didn't push your dataset yet, add `--control.local_files_only=true`.
-
 ## Train a policy
 
 To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command:
@@ -299,8 +297,6 @@ python lerobot/scripts/train.py \
   --wandb.enable=true
 ```
 
-Note: If you didn't push your dataset yet, add `--control.local_files_only=true`.
-
 Let's explain it:
 1. We provided the dataset as argument with `--dataset.repo_id=${HF_USER}/moss_test`.
 2. We provided the policy with `policy.type=act`. This loads configurations from [`configuration_act.py`](../lerobot/common/policies/act/configuration_act.py). Importantly, this policy will automatically adapt to the number of motor sates, motor actions and cameras of your robot (e.g. `laptop` and `phone`) which have been saved in your dataset.
 
@@ -768,7 +768,7 @@ You can use the `record` function from [`lerobot/scripts/control_robot.py`](../l
 1. Frames from cameras are saved on disk in threads, and encoded into videos at the end of each episode recording.
 2. Video streams from cameras are displayed in window so that you can verify them.
 3. Data is stored with [`LeRobotDataset`](../lerobot/common/datasets/lerobot_dataset.py) format which is pushed to your Hugging Face page (unless `--control.push_to_hub=false` is provided).
-4. Checkpoints are done during recording, so if any issue occurs, you can resume recording by re-running the same command again with `--control.resume=true`. You might need to add `--control.local_files_only=true` if your dataset was not uploaded to hugging face hub. Also you will need to manually delete the dataset directory to start recording from scratch.
+4. Checkpoints are done during recording, so if any issue occurs, you can resume recording by re-running the same command again with `--control.resume=true`. You will need to manually delete the dataset directory if you want to start recording from scratch.
 5. Set the flow of data recording using command line arguments:
    - `--control.warmup_time_s=10` defines the number of seconds before starting data collection. It allows the robot devices to warmup and synchronize (10 seconds by default).
    - `--control.episode_time_s=60` defines the number of seconds for data recording for each episode (60 seconds by default).
@@ -883,8 +883,6 @@ python lerobot/scripts/control_robot.py \
   --control.episode=0
 ```
 
-Note: You might need to add `--control.local_files_only=true` if your dataset was not uploaded to hugging face hub.
-
 Your robot should replicate movements similar to those you recorded. For example, check out [this video](https://x.com/RemiCadene/status/1793654950905680090) where we use `replay` on a Aloha robot from [Trossen Robotics](https://www.trossenrobotics.com).
 
 ## 4. Train a policy on your data
@@ -902,8 +900,6 @@ python lerobot/scripts/train.py \
   --wandb.enable=true
 ```
 
-Note: You might need to add `--dataset.local_files_only=true` if your dataset was not uploaded to hugging face hub.
-
 Let's explain it:
 1. We provided the dataset as argument with `--dataset.repo_id=${HF_USER}/koch_test`.
 2. We provided the policy with `policy.type=act`. This loads configurations from [`configuration_act.py`](../lerobot/common/policies/act/configuration_act.py). Importantly, this policy will automatically adapt to the number of motor sates, motor actions and cameras of your robot (e.g. `laptop` and `phone`) which have been saved in your dataset.
 
@@ -2,9 +2,10 @@
 from pathlib import Path
 
 import numpy as np
-import torch
+from huggingface_hub import HfApi
 
-from lerobot.common.datasets.lerobot_dataset import LEROBOT_HOME, LeRobotDataset
+from lerobot.common.constants import HF_LEROBOT_HOME
+from lerobot.common.datasets.lerobot_dataset import CODEBASE_VERSION, LeRobotDataset
 from lerobot.common.datasets.push_dataset_to_hub._download_raw import download_raw
 
 PUSHT_TASK = "Push the T-shaped blue block onto the T-shaped green target surface."
@@ -89,9 +90,9 @@ def calculate_coverage(zarr_data):
 
     num_frames = len(block_pos)
 
-    coverage = np.zeros((num_frames,))
+    coverage = np.zeros((num_frames,), dtype=np.float32)
     # 8 keypoints with 2 coords each
-    keypoints = np.zeros((num_frames, 16))
+    keypoints = np.zeros((num_frames, 16), dtype=np.float32)
 
     # Set x, y, theta (in radians)
     goal_pos_angle = np.array([256, 256, np.pi / 4])
@@ -117,7 +118,7 @@ def calculate_coverage(zarr_data):
         intersection_area = goal_geom.intersection(block_geom).area
         goal_area = goal_geom.area
         coverage[i] = intersection_area / goal_area
-        keypoints[i] = torch.from_numpy(PushTEnv.get_keypoints(block_shapes).flatten())
+        keypoints[i] = PushTEnv.get_keypoints(block_shapes).flatten()
 
     return coverage, keypoints
 
@@ -134,8 +135,8 @@ def main(raw_dir: Path, repo_id: str, mode: str = "video", push_to_hub: bool = T
     if mode not in ["video", "image", "keypoints"]:
         raise ValueError(mode)
 
-    if (LEROBOT_HOME / repo_id).exists():
-        shutil.rmtree(LEROBOT_HOME / repo_id)
+    if (HF_LEROBOT_HOME / repo_id).exists():
+        shutil.rmtree(HF_LEROBOT_HOME / repo_id)
 
     if not raw_dir.exists():
         download_raw(raw_dir, repo_id="lerobot-raw/pusht_raw")
@@ -148,6 +149,10 @@ def main(raw_dir: Path, repo_id: str, mode: str = "video", push_to_hub: bool = T
     action = zarr_data["action"][:]
     image = zarr_data["img"]  # (b, h, w, c)
 
+    if image.dtype == np.float32 and image.max() == np.float32(255):
+        # HACK: images are loaded as float32 but they actually encode uint8 data
+        image = image.astype(np.uint8)
+
     episode_data_index = {
         "from": np.concatenate(([0], zarr_data.meta["episode_ends"][:-1])),
         "to": zarr_data.meta["episode_ends"],
@@ -175,28 +180,30 @@ def main(raw_dir: Path, repo_id: str, mode: str = "video", push_to_hub: bool = T
 
         for frame_idx in range(num_frames):
             i = from_idx + frame_idx
+            idx = i + (frame_idx < num_frames - 1)
             frame = {
-                "action": torch.from_numpy(action[i]),
+                "action": action[i],
                 # Shift reward and success by +1 until the last item of the episode
-                "next.reward": reward[i + (frame_idx < num_frames - 1)],
-                "next.success": success[i + (frame_idx < num_frames - 1)],
+                "next.reward": reward[idx : idx + 1],
+                "next.success": success[idx : idx + 1],
+                "task": PUSHT_TASK,
             }
 
-            frame["observation.state"] = torch.from_numpy(agent_pos[i])
+            frame["observation.state"] = agent_pos[i]
 
             if mode == "keypoints":
-                frame["observation.environment_state"] = torch.from_numpy(keypoints[i])
+                frame["observation.environment_state"] = keypoints[i]
             else:
-                frame["observation.image"] = torch.from_numpy(image[i])
+                frame["observation.image"] = image[i]
 
             dataset.add_frame(frame)
 
-        dataset.save_episode(task=PUSHT_TASK)
-
-    dataset.consolidate()
+        dataset.save_episode()
 
     if push_to_hub:
         dataset.push_to_hub()
+        hub_api = HfApi()
+        hub_api.create_tag(repo_id, tag=CODEBASE_VERSION, repo_type="dataset")
 
 
 if __name__ == "__main__":
@@ -218,5 +225,5 @@ def main(raw_dir: Path, repo_id: str, mode: str = "video", push_to_hub: bool = T
         main(raw_dir, repo_id=repo_id, mode=mode)
 
         # Uncomment if you want to load the local dataset and explore it
-        # dataset = LeRobotDataset(repo_id=repo_id, local_files_only=True)
+        # dataset = LeRobotDataset(repo_id=repo_id)
         # breakpoint()
@@ -1,4 +1,9 @@
 # keys
+import os
+from pathlib import Path
+
+from huggingface_hub.constants import HF_HOME
+
 OBS_ENV = "observation.environment_state"
 OBS_ROBOT = "observation.state"
 OBS_IMAGE = "observation.image"
@@ -15,3 +20,13 @@
 OPTIMIZER_STATE = "optimizer_state.safetensors"
 OPTIMIZER_PARAM_GROUPS = "optimizer_param_groups.json"
 SCHEDULER_STATE = "scheduler_state.json"
+
+# cache dir
+default_cache_path = Path(HF_HOME) / "lerobot"
+HF_LEROBOT_HOME = Path(os.getenv("HF_LEROBOT_HOME", default_cache_path)).expanduser()
+
+if "LEROBOT_HOME" in os.environ:
+    raise ValueError(
+        f"You have a 'LEROBOT_HOME' environment variable set to '{os.getenv('LEROBOT_HOME')}'.\n"
+        "'LEROBOT_HOME' is deprecated, please use 'HF_LEROBOT_HOME' instead."
+    )
@@ -0,0 +1,54 @@
+import packaging.version
+
+V2_MESSAGE = """
+The dataset you requested ({repo_id}) is in {version} format.
+
+We introduced a new format since v2.0 which is not backward compatible with v1.x.
+Please, use our conversion script. Modify the following command with your own task description:
+```
+python lerobot/common/datasets/v2/convert_dataset_v1_to_v2.py \\
+    --repo-id {repo_id} \\
+    --single-task "TASK DESCRIPTION."  # <---- /!\\ Replace TASK DESCRIPTION /!\\
+```
+
+A few examples to replace TASK DESCRIPTION: "Pick up the blue cube and place it into the bin.", "Insert the
+peg into the socket.", "Slide open the ziploc bag.", "Take the elevator to the 1st floor.", "Open the top
+cabinet, store the pot inside it then close the cabinet.", "Push the T-shaped block onto the T-shaped
+target.", "Grab the spray paint on the shelf and place it in the bin on top of the robot dog.", "Fold the
+sweatshirt.", ...
+
+If you encounter a problem, contact LeRobot maintainers on [Discord](https://discord.com/invite/s3KuuzsPFb)
+or open an [issue on GitHub](https://github.com/huggingface/lerobot/issues/new/choose).
+"""
+
+V21_MESSAGE = """
+The dataset you requested ({repo_id}) is in {version} format.
+While current version of LeRobot is backward-compatible with it, the version of your dataset still uses global
+stats instead of per-episode stats. Update your dataset stats to the new format using this command:
+```
+python lerobot/common/datasets/v21/convert_dataset_v20_to_v21.py --repo-id={repo_id}
+```
+
+If you encounter a problem, contact LeRobot maintainers on [Discord](https://discord.com/invite/s3KuuzsPFb)
+or open an [issue on GitHub](https://github.com/huggingface/lerobot/issues/new/choose).
+"""
+
+FUTURE_MESSAGE = """
+The dataset you requested ({repo_id}) is only available in {version} format.
+As we cannot ensure forward compatibility with it, please update your current version of lerobot.
+"""
+
+
+class CompatibilityError(Exception): ...
+
+
+class BackwardCompatibilityError(CompatibilityError):
+    def __init__(self, repo_id: str, version: packaging.version.Version):
+        message = V2_MESSAGE.format(repo_id=repo_id, version=version)
+        super().__init__(message)
+
+
+class ForwardCompatibilityError(CompatibilityError):
+    def __init__(self, repo_id: str, version: packaging.version.Version):
+        message = FUTURE_MESSAGE.format(repo_id=repo_id, version=version)
+        super().__init__(message)