Description
Describe the bug
Isaac Lab's built-in train.py
and play.py
scripts for Stable-Baselines3 (SB3) handle observation and reward normalization using VecNormalize
(from stable_baselines3.common.vec_env import VecNormalize
) incorrectly.
In train.py
, VecNormalize
is applied correctly to the environment to normalize observations and rewards during training. However, the trained normalization statistics are not saved at the end of training.
In play.py
, the environment is wrapped with a fresh VecNormalize
instance, which starts collecting new statistics from scratch, even with training = True
. This results in mismatched normalization distributions and degraded policy performance during evaluation.
Steps to reproduce
You can spot the issue directly in the code:
- In
train.py
, the environment is wrapped inVecNormalize(...)
, but the statistics (e.g.,env.save(...)
) are never saved. - In
play.py
, a newVecNormalize
instance is created, but no saved stats are loaded (e.g.,VecNormalize.load(...)
is missing), and training remains enabled (training=True
), so stats begin re-learning from scratch. - This breaks the assumption that
play.py
evaluates the trained policy under the same distribution as in training.
Optional Reproduction Steps (which work with any environment and sb3):
- Enable
normalize_input
(andnormalize_value
) in the SB3 config (e.g.,path/to/IsaacLab/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/lift/config/franka/agents/sb3_ppo_cfg.yaml
) so thatVecNormalize
is applied in training. - Run the built-in
train.py
script (frompath/to/IsaacLab/scripts/reinforcement_learning/sb3
) to train a policy usingVecNormalize
. - Note that
VecNormalize
statistics are not saved after training. - Run
play.py
(frompath/to/IsaacLab/scripts/reinforcement_learning/sb3
) to evaluate the trained policy. - Observe that normalization stats are uninitialized and begin re-learning from scratch (poor policy performance).
System Info
- Commit: 9f1aa4c
- Isaac Sim Version: 4.5.0
- OS: Ubuntu 22.04.5 LTS
- GPU: NVIDIA RTX 3500 Ada Generation Laptop GPU (12 GB)
- CUDA: 12.1
- GPU Driver: 535.230.02
Additional context
A simple fix would be to:
Save VecNormalize
stats at the end of training in train.py
(beginning with line 140)
if "normalize_input" in agent_cfg or "normalize_value" in agent_cfg:
env = VecNormalize(
env,
training=True,
norm_obs="normalize_input" in agent_cfg and agent_cfg.pop("normalize_input"),
norm_reward="normalize_value" in agent_cfg and agent_cfg.pop("normalize_value"),
clip_obs="clip_obs" in agent_cfg and agent_cfg.pop("clip_obs"),
gamma=agent_cfg["gamma"],
clip_reward=np.inf,
)
# create agent from stable baselines
agent = PPO(policy_arch, env, verbose=1, **agent_cfg)
# configure the logger
new_logger = configure(log_dir, ["stdout", "tensorboard"])
agent.set_logger(new_logger)
# callbacks for agent
checkpoint_callback = CheckpointCallback(save_freq=1000, save_path=log_dir, name_prefix="model", verbose=2)
# train the agent
agent.learn(total_timesteps=n_timesteps, callback=checkpoint_callback)
# save the final model
agent.save(os.path.join(log_dir, "model"))
# Save VecNormalize stats
if isinstance(env, VecNormalize):
vecnorm_path = os.path.join(log_dir, "model", "vecnormalize.pkl")
env.save(vecnorm_path)
# close the simulator
env.close()
Load the stats in play.py
Replace the current VecNormalize
initialization with the snippet below (line 128 to 138 in play.py
). You may use an additional argparse
argument for the path to the saved VecNormalize
stats (or reuse the existing --checkpoint
or --use_last_checkpoint
argument):
# Load VecNormalize stats
if "normalize_input" in agent_cfg or "normalize_value" in agent_cfg:
vecnorm_path = os.path.join(log_dir, "vecnormalize.pkl") # or use argparse instead
if os.path.exists(vecnorm_path):
print(f"[INFO] Loading VecNormalize stats from: {vecnorm_path}")
env = VecNormalize.load(vecnorm_path, env)
env.training = False
else:
print("[WARNING] VecNormalize stats file not found, falling back to unnormalized environment.")
Checklist
- I have checked that there is no similar issue in the repo (required)
- I have checked that the issue is not in running Isaac Sim itself and is related to the repo
Acceptance Criteria
-
train.py
automatically savesVecNormalize
statistics when normalization is enabled -
play.py
loads saved statistics if available, using either a new argparse flag or reusing--checkpoint
or--use_last_checkpoint
.