Risk of NaN loss in Pi0's sampling_beta implementation, especially using bfloat16 #1096

YuhengZhi · 2025-05-11T02:59:51Z

System Info

Not related.

Information

One of the scripts in the examples/ folder of LeRobot
My own task or dataset (give details below)

Reproduction

In the current implementation of sample_beta in modeling_pi0.py (link):

def sample_beta(alpha, beta, bsize, device):
    gamma1 = torch.empty((bsize,), device=device).uniform_(0, 1).pow(1 / alpha)
    gamma2 = torch.empty((bsize,), device=device).uniform_(0, 1).pow(1 / beta)
    return gamma1 / (gamma1 + gamma2)

When gamma1 and gamma2 are both sampled to be 0, the returned value will be NaN, causing loss to be also NaN. This happens very rarely in FP32 but when I attempted to train the pi0 model fully in bf16, it becomes much easier for gamma1 and gamma2 to be 0.

Consequently, my loss becomes NaN after roughly the first 24k samples.

This is a bug that may arise when a user tries either:
a) changing Pi0's precision fully to bfloat16 and training for ~24k samples, or
b) training pi0 in the default mixed precision for a very large number of steps.

A similar issue was brought up in the C++ Stan Math Library. They modified implementation of beta distribution to happen in the log space, which is numerically stable, as similarly done by the official openpi code (openpi's code, jax.random.beta implementation).

Expected behavior

sample_beta should never return NaN.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Risk of NaN loss in Pi0's sampling_beta implementation, especially using bfloat16 #1096

Risk of NaN loss in Pi0's sampling_beta implementation, especially using bfloat16 #1096

YuhengZhi commented May 11, 2025

Risk of NaN loss in Pi0's sampling_beta implementation, especially using bfloat16 #1096

Risk of NaN loss in Pi0's sampling_beta implementation, especially using bfloat16 #1096

Comments

YuhengZhi commented May 11, 2025

System Info

Information

Reproduction

Expected behavior