[Bug Report] SB3 train.py does not save VecNormalize stats; play.py starts with new stats

### Describe the bug

Isaac Lab's built-in `train.py` and `play.py` scripts for Stable-Baselines3 (SB3) handle observation and reward normalization using `VecNormalize` (`from stable_baselines3.common.vec_env import VecNormalize`) incorrectly.

In `train.py`, `VecNormalize` is applied correctly to the environment to normalize observations and rewards during training. However, the trained normalization statistics are not saved at the end of training.

In `play.py`, the environment is wrapped with a fresh `VecNormalize` instance, which starts collecting new statistics from scratch, even with `training = True`. This results in mismatched normalization distributions and degraded policy performance during evaluation.


### Steps to reproduce

You can spot the issue directly in the code:

- In `train.py`, the environment is wrapped in `VecNormalize(...)`, but the statistics (e.g., `env.save(...)`) are never saved.
- In `play.py`, a new `VecNormalize` instance is created, but no saved stats are loaded (e.g., `VecNormalize.load(...)` is missing), and training remains enabled (`training=True`), so stats begin re-learning from scratch.
- This breaks the assumption that `play.py` evaluates the trained policy under the same distribution as in training.


#### Optional Reproduction Steps (which work with any environment and sb3):

1. Enable `normalize_input` (and `normalize_value`) in the SB3 config (e.g., `path/to/IsaacLab/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/lift/config/franka/agents/sb3_ppo_cfg.yaml`) so that `VecNormalize` is applied in training.
2. Run the built-in `train.py` script (from `path/to/IsaacLab/scripts/reinforcement_learning/sb3`) to train a policy using `VecNormalize`.
3. Note that `VecNormalize` statistics are not saved after training.
4. Run `play.py` (from `path/to/IsaacLab/scripts/reinforcement_learning/sb3`) to evaluate the trained policy.
5. Observe that normalization stats are uninitialized and begin re-learning from scratch (poor policy performance).



### System Info

- Commit: 9f1aa4c
- Isaac Sim Version: 4.5.0
- OS: Ubuntu 22.04.5 LTS
- GPU: NVIDIA RTX 3500 Ada Generation Laptop GPU (12 GB)
- CUDA: 12.1
- GPU Driver: 535.230.02

### Additional context

A simple fix would be to:

#### Save `VecNormalize` stats at the end of training in `train.py` (beginning with line 140)

```python
if "normalize_input" in agent_cfg or "normalize_value" in agent_cfg:
    env = VecNormalize(
        env,
        training=True,
        norm_obs="normalize_input" in agent_cfg and agent_cfg.pop("normalize_input"),
        norm_reward="normalize_value" in agent_cfg and agent_cfg.pop("normalize_value"),
        clip_obs="clip_obs" in agent_cfg and agent_cfg.pop("clip_obs"),
        gamma=agent_cfg["gamma"],
        clip_reward=np.inf,
    )

# create agent from stable baselines
agent = PPO(policy_arch, env, verbose=1, **agent_cfg)
# configure the logger
new_logger = configure(log_dir, ["stdout", "tensorboard"])
agent.set_logger(new_logger)

# callbacks for agent
checkpoint_callback = CheckpointCallback(save_freq=1000, save_path=log_dir, name_prefix="model", verbose=2)
# train the agent
agent.learn(total_timesteps=n_timesteps, callback=checkpoint_callback)
# save the final model
agent.save(os.path.join(log_dir, "model"))

# Save VecNormalize stats
if isinstance(env, VecNormalize):
    vecnorm_path = os.path.join(log_dir, "model", "vecnormalize.pkl")
    env.save(vecnorm_path)

# close the simulator
env.close()
```

#### Load the stats in `play.py`

Replace the current `VecNormalize` initialization with the snippet below (line 128 to 138 in `play.py`). You may use an additional `argparse` argument for the path to the saved `VecNormalize` stats (or reuse the existing `--checkpoint` or `--use_last_checkpoint` argument):


```python
# Load VecNormalize stats
if "normalize_input" in agent_cfg or "normalize_value" in agent_cfg:
        vecnorm_path = os.path.join(log_dir, "vecnormalize.pkl") # or use argparse instead
        if os.path.exists(vecnorm_path):
            print(f"[INFO] Loading VecNormalize stats from: {vecnorm_path}")
            env = VecNormalize.load(vecnorm_path, env)
            env.training = False
        else:
            print("[WARNING] VecNormalize stats file not found, falling back to unnormalized environment.")
```

### Checklist

- [X] I have checked that there is no similar issue in the repo (**required**)
- [X] I have checked that the issue is not in running Isaac Sim itself and is related to the repo

### Acceptance Criteria

- [ ] `train.py` automatically saves `VecNormalize` statistics when normalization is enabled
- [ ] `play.py` loads saved statistics if available, using either a new argparse flag or reusing `--checkpoint` or `--use_last_checkpoint`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug Report] SB3 train.py does not save VecNormalize stats; play.py starts with new stats #2635

Describe the bug

Steps to reproduce

Optional Reproduction Steps (which work with any environment and sb3):

System Info

Additional context

Save `VecNormalize` stats at the end of training in `train.py` (beginning with line 140)

Load the stats in `play.py`

Checklist

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug Report] SB3 train.py does not save VecNormalize stats; play.py starts with new stats #2635

Description

Describe the bug

Steps to reproduce

Optional Reproduction Steps (which work with any environment and sb3):

System Info

Additional context

Save VecNormalize stats at the end of training in train.py (beginning with line 140)

Load the stats in play.py

Checklist

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Save `VecNormalize` stats at the end of training in `train.py` (beginning with line 140)

Load the stats in `play.py`