Skip to content

[Bug Report] SB3 train.py does not save VecNormalize stats; play.py starts with new stats #2635

Open
@JonasFano

Description

@JonasFano

Describe the bug

Isaac Lab's built-in train.py and play.py scripts for Stable-Baselines3 (SB3) handle observation and reward normalization using VecNormalize (from stable_baselines3.common.vec_env import VecNormalize) incorrectly.

In train.py, VecNormalize is applied correctly to the environment to normalize observations and rewards during training. However, the trained normalization statistics are not saved at the end of training.

In play.py, the environment is wrapped with a fresh VecNormalize instance, which starts collecting new statistics from scratch, even with training = True. This results in mismatched normalization distributions and degraded policy performance during evaluation.

Steps to reproduce

You can spot the issue directly in the code:

  • In train.py, the environment is wrapped in VecNormalize(...), but the statistics (e.g., env.save(...)) are never saved.
  • In play.py, a new VecNormalize instance is created, but no saved stats are loaded (e.g., VecNormalize.load(...) is missing), and training remains enabled (training=True), so stats begin re-learning from scratch.
  • This breaks the assumption that play.py evaluates the trained policy under the same distribution as in training.

Optional Reproduction Steps (which work with any environment and sb3):

  1. Enable normalize_input (and normalize_value) in the SB3 config (e.g., path/to/IsaacLab/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/lift/config/franka/agents/sb3_ppo_cfg.yaml) so that VecNormalize is applied in training.
  2. Run the built-in train.py script (from path/to/IsaacLab/scripts/reinforcement_learning/sb3) to train a policy using VecNormalize.
  3. Note that VecNormalize statistics are not saved after training.
  4. Run play.py (from path/to/IsaacLab/scripts/reinforcement_learning/sb3) to evaluate the trained policy.
  5. Observe that normalization stats are uninitialized and begin re-learning from scratch (poor policy performance).

System Info

  • Commit: 9f1aa4c
  • Isaac Sim Version: 4.5.0
  • OS: Ubuntu 22.04.5 LTS
  • GPU: NVIDIA RTX 3500 Ada Generation Laptop GPU (12 GB)
  • CUDA: 12.1
  • GPU Driver: 535.230.02

Additional context

A simple fix would be to:

Save VecNormalize stats at the end of training in train.py (beginning with line 140)

if "normalize_input" in agent_cfg or "normalize_value" in agent_cfg:
    env = VecNormalize(
        env,
        training=True,
        norm_obs="normalize_input" in agent_cfg and agent_cfg.pop("normalize_input"),
        norm_reward="normalize_value" in agent_cfg and agent_cfg.pop("normalize_value"),
        clip_obs="clip_obs" in agent_cfg and agent_cfg.pop("clip_obs"),
        gamma=agent_cfg["gamma"],
        clip_reward=np.inf,
    )

# create agent from stable baselines
agent = PPO(policy_arch, env, verbose=1, **agent_cfg)
# configure the logger
new_logger = configure(log_dir, ["stdout", "tensorboard"])
agent.set_logger(new_logger)

# callbacks for agent
checkpoint_callback = CheckpointCallback(save_freq=1000, save_path=log_dir, name_prefix="model", verbose=2)
# train the agent
agent.learn(total_timesteps=n_timesteps, callback=checkpoint_callback)
# save the final model
agent.save(os.path.join(log_dir, "model"))

# Save VecNormalize stats
if isinstance(env, VecNormalize):
    vecnorm_path = os.path.join(log_dir, "model", "vecnormalize.pkl")
    env.save(vecnorm_path)

# close the simulator
env.close()

Load the stats in play.py

Replace the current VecNormalize initialization with the snippet below (line 128 to 138 in play.py). You may use an additional argparse argument for the path to the saved VecNormalize stats (or reuse the existing --checkpoint or --use_last_checkpoint argument):

# Load VecNormalize stats
if "normalize_input" in agent_cfg or "normalize_value" in agent_cfg:
        vecnorm_path = os.path.join(log_dir, "vecnormalize.pkl") # or use argparse instead
        if os.path.exists(vecnorm_path):
            print(f"[INFO] Loading VecNormalize stats from: {vecnorm_path}")
            env = VecNormalize.load(vecnorm_path, env)
            env.training = False
        else:
            print("[WARNING] VecNormalize stats file not found, falling back to unnormalized environment.")

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have checked that the issue is not in running Isaac Sim itself and is related to the repo

Acceptance Criteria

  • train.py automatically saves VecNormalize statistics when normalization is enabled
  • play.py loads saved statistics if available, using either a new argparse flag or reusing --checkpoint or --use_last_checkpoint.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions