Skip to content

Commit 3354d91

Browse files
alibertsCadene
andauthored
LeRobotDataset v2.1 (huggingface#711)
Co-authored-by: Remi <[email protected]> Co-authored-by: Remi Cadene <[email protected]>
1 parent aca464c commit 3354d91

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+2031
-1330
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,7 @@ A `LeRobotDataset` is serialised using several widespread file formats for each
210210
- videos are stored in mp4 format to save space
211211
- metadata are stored in plain json/jsonl files
212212

213-
Dataset can be uploaded/downloaded from the HuggingFace hub seamlessly. To work on a local dataset, you can use the `local_files_only` argument and specify its location with the `root` argument if it's not in the default `~/.cache/huggingface/lerobot` location.
213+
Dataset can be uploaded/downloaded from the HuggingFace hub seamlessly. To work on a local dataset, you can specify its location with the `root` argument if it's not in the default `~/.cache/huggingface/lerobot` location.
214214

215215
### Evaluate a pretrained policy
216216

examples/10_use_so100.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,7 @@ python lerobot/scripts/control_robot.py \
335335
--control.push_to_hub=true
336336
```
337337

338-
Note: You can resume recording by adding `--control.resume=true`. Also if you didn't push your dataset yet, add `--control.local_files_only=true`.
338+
Note: You can resume recording by adding `--control.resume=true`.
339339

340340
## H. Visualize a dataset
341341

@@ -363,8 +363,6 @@ python lerobot/scripts/control_robot.py \
363363
--control.episode=0
364364
```
365365

366-
Note: If you didn't push your dataset yet, add `--control.local_files_only=true`.
367-
368366
## J. Train a policy
369367

370368
To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command:
@@ -378,8 +376,6 @@ python lerobot/scripts/train.py \
378376
--wandb.enable=true
379377
```
380378

381-
Note: If you didn't push your dataset yet, add `--control.local_files_only=true`.
382-
383379
Let's explain it:
384380
1. We provided the dataset as argument with `--dataset.repo_id=${HF_USER}/so100_test`.
385381
2. We provided the policy with `policy.type=act`. This loads configurations from [`configuration_act.py`](../lerobot/common/policies/act/configuration_act.py). Importantly, this policy will automatically adapt to the number of motor sates, motor actions and cameras of your robot (e.g. `laptop` and `phone`) which have been saved in your dataset.

examples/11_use_lekiwi.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -391,7 +391,7 @@ python lerobot/scripts/control_robot.py \
391391
--control.push_to_hub=true
392392
```
393393

394-
Note: You can resume recording by adding `--control.resume=true`. Also if you didn't push your dataset yet, add `--control.local_files_only=true`.
394+
Note: You can resume recording by adding `--control.resume=true`.
395395

396396
# H. Visualize a dataset
397397

@@ -418,8 +418,6 @@ python lerobot/scripts/control_robot.py \
418418
--control.episode=0
419419
```
420420

421-
Note: If you didn't push your dataset yet, add `--control.local_files_only=true`.
422-
423421
## J. Train a policy
424422

425423
To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command:
@@ -433,8 +431,6 @@ python lerobot/scripts/train.py \
433431
--wandb.enable=true
434432
```
435433

436-
Note: If you didn't push your dataset yet, add `--control.local_files_only=true`.
437-
438434
Let's explain it:
439435
1. We provided the dataset as argument with `--dataset.repo_id=${HF_USER}/lekiwi_test`.
440436
2. We provided the policy with `policy.type=act`. This loads configurations from [`configuration_act.py`](../lerobot/common/policies/act/configuration_act.py). Importantly, this policy will automatically adapt to the number of motor sates, motor actions and cameras of your robot (e.g. `laptop` and `phone`) which have been saved in your dataset.

examples/11_use_moss.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -256,7 +256,7 @@ python lerobot/scripts/control_robot.py \
256256
--control.push_to_hub=true
257257
```
258258

259-
Note: You can resume recording by adding `--control.resume=true`. Also if you didn't push your dataset yet, add `--control.local_files_only=true`.
259+
Note: You can resume recording by adding `--control.resume=true`.
260260

261261
## Visualize a dataset
262262

@@ -284,8 +284,6 @@ python lerobot/scripts/control_robot.py \
284284
--control.episode=0
285285
```
286286

287-
Note: If you didn't push your dataset yet, add `--control.local_files_only=true`.
288-
289287
## Train a policy
290288

291289
To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command:
@@ -299,8 +297,6 @@ python lerobot/scripts/train.py \
299297
--wandb.enable=true
300298
```
301299

302-
Note: If you didn't push your dataset yet, add `--control.local_files_only=true`.
303-
304300
Let's explain it:
305301
1. We provided the dataset as argument with `--dataset.repo_id=${HF_USER}/moss_test`.
306302
2. We provided the policy with `policy.type=act`. This loads configurations from [`configuration_act.py`](../lerobot/common/policies/act/configuration_act.py). Importantly, this policy will automatically adapt to the number of motor sates, motor actions and cameras of your robot (e.g. `laptop` and `phone`) which have been saved in your dataset.

examples/7_get_started_with_real_robot.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -768,7 +768,7 @@ You can use the `record` function from [`lerobot/scripts/control_robot.py`](../l
768768
1. Frames from cameras are saved on disk in threads, and encoded into videos at the end of each episode recording.
769769
2. Video streams from cameras are displayed in window so that you can verify them.
770770
3. Data is stored with [`LeRobotDataset`](../lerobot/common/datasets/lerobot_dataset.py) format which is pushed to your Hugging Face page (unless `--control.push_to_hub=false` is provided).
771-
4. Checkpoints are done during recording, so if any issue occurs, you can resume recording by re-running the same command again with `--control.resume=true`. You might need to add `--control.local_files_only=true` if your dataset was not uploaded to hugging face hub. Also you will need to manually delete the dataset directory to start recording from scratch.
771+
4. Checkpoints are done during recording, so if any issue occurs, you can resume recording by re-running the same command again with `--control.resume=true`. You will need to manually delete the dataset directory if you want to start recording from scratch.
772772
5. Set the flow of data recording using command line arguments:
773773
- `--control.warmup_time_s=10` defines the number of seconds before starting data collection. It allows the robot devices to warmup and synchronize (10 seconds by default).
774774
- `--control.episode_time_s=60` defines the number of seconds for data recording for each episode (60 seconds by default).
@@ -883,8 +883,6 @@ python lerobot/scripts/control_robot.py \
883883
--control.episode=0
884884
```
885885

886-
Note: You might need to add `--control.local_files_only=true` if your dataset was not uploaded to hugging face hub.
887-
888886
Your robot should replicate movements similar to those you recorded. For example, check out [this video](https://x.com/RemiCadene/status/1793654950905680090) where we use `replay` on a Aloha robot from [Trossen Robotics](https://www.trossenrobotics.com).
889887

890888
## 4. Train a policy on your data
@@ -902,8 +900,6 @@ python lerobot/scripts/train.py \
902900
--wandb.enable=true
903901
```
904902

905-
Note: You might need to add `--dataset.local_files_only=true` if your dataset was not uploaded to hugging face hub.
906-
907903
Let's explain it:
908904
1. We provided the dataset as argument with `--dataset.repo_id=${HF_USER}/koch_test`.
909905
2. We provided the policy with `policy.type=act`. This loads configurations from [`configuration_act.py`](../lerobot/common/policies/act/configuration_act.py). Importantly, this policy will automatically adapt to the number of motor sates, motor actions and cameras of your robot (e.g. `laptop` and `phone`) which have been saved in your dataset.

examples/port_datasets/pusht_zarr.py

Lines changed: 24 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,10 @@
22
from pathlib import Path
33

44
import numpy as np
5-
import torch
5+
from huggingface_hub import HfApi
66

7-
from lerobot.common.datasets.lerobot_dataset import LEROBOT_HOME, LeRobotDataset
7+
from lerobot.common.constants import HF_LEROBOT_HOME
8+
from lerobot.common.datasets.lerobot_dataset import CODEBASE_VERSION, LeRobotDataset
89
from lerobot.common.datasets.push_dataset_to_hub._download_raw import download_raw
910

1011
PUSHT_TASK = "Push the T-shaped blue block onto the T-shaped green target surface."
@@ -89,9 +90,9 @@ def calculate_coverage(zarr_data):
8990

9091
num_frames = len(block_pos)
9192

92-
coverage = np.zeros((num_frames,))
93+
coverage = np.zeros((num_frames,), dtype=np.float32)
9394
# 8 keypoints with 2 coords each
94-
keypoints = np.zeros((num_frames, 16))
95+
keypoints = np.zeros((num_frames, 16), dtype=np.float32)
9596

9697
# Set x, y, theta (in radians)
9798
goal_pos_angle = np.array([256, 256, np.pi / 4])
@@ -117,7 +118,7 @@ def calculate_coverage(zarr_data):
117118
intersection_area = goal_geom.intersection(block_geom).area
118119
goal_area = goal_geom.area
119120
coverage[i] = intersection_area / goal_area
120-
keypoints[i] = torch.from_numpy(PushTEnv.get_keypoints(block_shapes).flatten())
121+
keypoints[i] = PushTEnv.get_keypoints(block_shapes).flatten()
121122

122123
return coverage, keypoints
123124

@@ -134,8 +135,8 @@ def main(raw_dir: Path, repo_id: str, mode: str = "video", push_to_hub: bool = T
134135
if mode not in ["video", "image", "keypoints"]:
135136
raise ValueError(mode)
136137

137-
if (LEROBOT_HOME / repo_id).exists():
138-
shutil.rmtree(LEROBOT_HOME / repo_id)
138+
if (HF_LEROBOT_HOME / repo_id).exists():
139+
shutil.rmtree(HF_LEROBOT_HOME / repo_id)
139140

140141
if not raw_dir.exists():
141142
download_raw(raw_dir, repo_id="lerobot-raw/pusht_raw")
@@ -148,6 +149,10 @@ def main(raw_dir: Path, repo_id: str, mode: str = "video", push_to_hub: bool = T
148149
action = zarr_data["action"][:]
149150
image = zarr_data["img"] # (b, h, w, c)
150151

152+
if image.dtype == np.float32 and image.max() == np.float32(255):
153+
# HACK: images are loaded as float32 but they actually encode uint8 data
154+
image = image.astype(np.uint8)
155+
151156
episode_data_index = {
152157
"from": np.concatenate(([0], zarr_data.meta["episode_ends"][:-1])),
153158
"to": zarr_data.meta["episode_ends"],
@@ -175,28 +180,30 @@ def main(raw_dir: Path, repo_id: str, mode: str = "video", push_to_hub: bool = T
175180

176181
for frame_idx in range(num_frames):
177182
i = from_idx + frame_idx
183+
idx = i + (frame_idx < num_frames - 1)
178184
frame = {
179-
"action": torch.from_numpy(action[i]),
185+
"action": action[i],
180186
# Shift reward and success by +1 until the last item of the episode
181-
"next.reward": reward[i + (frame_idx < num_frames - 1)],
182-
"next.success": success[i + (frame_idx < num_frames - 1)],
187+
"next.reward": reward[idx : idx + 1],
188+
"next.success": success[idx : idx + 1],
189+
"task": PUSHT_TASK,
183190
}
184191

185-
frame["observation.state"] = torch.from_numpy(agent_pos[i])
192+
frame["observation.state"] = agent_pos[i]
186193

187194
if mode == "keypoints":
188-
frame["observation.environment_state"] = torch.from_numpy(keypoints[i])
195+
frame["observation.environment_state"] = keypoints[i]
189196
else:
190-
frame["observation.image"] = torch.from_numpy(image[i])
197+
frame["observation.image"] = image[i]
191198

192199
dataset.add_frame(frame)
193200

194-
dataset.save_episode(task=PUSHT_TASK)
195-
196-
dataset.consolidate()
201+
dataset.save_episode()
197202

198203
if push_to_hub:
199204
dataset.push_to_hub()
205+
hub_api = HfApi()
206+
hub_api.create_tag(repo_id, tag=CODEBASE_VERSION, repo_type="dataset")
200207

201208

202209
if __name__ == "__main__":
@@ -218,5 +225,5 @@ def main(raw_dir: Path, repo_id: str, mode: str = "video", push_to_hub: bool = T
218225
main(raw_dir, repo_id=repo_id, mode=mode)
219226

220227
# Uncomment if you want to load the local dataset and explore it
221-
# dataset = LeRobotDataset(repo_id=repo_id, local_files_only=True)
228+
# dataset = LeRobotDataset(repo_id=repo_id)
222229
# breakpoint()

lerobot/common/constants.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
11
# keys
2+
import os
3+
from pathlib import Path
4+
5+
from huggingface_hub.constants import HF_HOME
6+
27
OBS_ENV = "observation.environment_state"
38
OBS_ROBOT = "observation.state"
49
OBS_IMAGE = "observation.image"
@@ -15,3 +20,13 @@
1520
OPTIMIZER_STATE = "optimizer_state.safetensors"
1621
OPTIMIZER_PARAM_GROUPS = "optimizer_param_groups.json"
1722
SCHEDULER_STATE = "scheduler_state.json"
23+
24+
# cache dir
25+
default_cache_path = Path(HF_HOME) / "lerobot"
26+
HF_LEROBOT_HOME = Path(os.getenv("HF_LEROBOT_HOME", default_cache_path)).expanduser()
27+
28+
if "LEROBOT_HOME" in os.environ:
29+
raise ValueError(
30+
f"You have a 'LEROBOT_HOME' environment variable set to '{os.getenv('LEROBOT_HOME')}'.\n"
31+
"'LEROBOT_HOME' is deprecated, please use 'HF_LEROBOT_HOME' instead."
32+
)
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
import packaging.version
2+
3+
V2_MESSAGE = """
4+
The dataset you requested ({repo_id}) is in {version} format.
5+
6+
We introduced a new format since v2.0 which is not backward compatible with v1.x.
7+
Please, use our conversion script. Modify the following command with your own task description:
8+
```
9+
python lerobot/common/datasets/v2/convert_dataset_v1_to_v2.py \\
10+
--repo-id {repo_id} \\
11+
--single-task "TASK DESCRIPTION." # <---- /!\\ Replace TASK DESCRIPTION /!\\
12+
```
13+
14+
A few examples to replace TASK DESCRIPTION: "Pick up the blue cube and place it into the bin.", "Insert the
15+
peg into the socket.", "Slide open the ziploc bag.", "Take the elevator to the 1st floor.", "Open the top
16+
cabinet, store the pot inside it then close the cabinet.", "Push the T-shaped block onto the T-shaped
17+
target.", "Grab the spray paint on the shelf and place it in the bin on top of the robot dog.", "Fold the
18+
sweatshirt.", ...
19+
20+
If you encounter a problem, contact LeRobot maintainers on [Discord](https://discord.com/invite/s3KuuzsPFb)
21+
or open an [issue on GitHub](https://github.com/huggingface/lerobot/issues/new/choose).
22+
"""
23+
24+
V21_MESSAGE = """
25+
The dataset you requested ({repo_id}) is in {version} format.
26+
While current version of LeRobot is backward-compatible with it, the version of your dataset still uses global
27+
stats instead of per-episode stats. Update your dataset stats to the new format using this command:
28+
```
29+
python lerobot/common/datasets/v21/convert_dataset_v20_to_v21.py --repo-id={repo_id}
30+
```
31+
32+
If you encounter a problem, contact LeRobot maintainers on [Discord](https://discord.com/invite/s3KuuzsPFb)
33+
or open an [issue on GitHub](https://github.com/huggingface/lerobot/issues/new/choose).
34+
"""
35+
36+
FUTURE_MESSAGE = """
37+
The dataset you requested ({repo_id}) is only available in {version} format.
38+
As we cannot ensure forward compatibility with it, please update your current version of lerobot.
39+
"""
40+
41+
42+
class CompatibilityError(Exception): ...
43+
44+
45+
class BackwardCompatibilityError(CompatibilityError):
46+
def __init__(self, repo_id: str, version: packaging.version.Version):
47+
message = V2_MESSAGE.format(repo_id=repo_id, version=version)
48+
super().__init__(message)
49+
50+
51+
class ForwardCompatibilityError(CompatibilityError):
52+
def __init__(self, repo_id: str, version: packaging.version.Version):
53+
message = FUTURE_MESSAGE.format(repo_id=repo_id, version=version)
54+
super().__init__(message)

0 commit comments

Comments
 (0)