Skip to content

[Frostbite] env doesn't return done=True on death, but goes into "Demo Play" mode #1539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
artofbeinghuman opened this issue Jun 18, 2019 · 2 comments

Comments

@artofbeinghuman
Copy link

artofbeinghuman commented Jun 18, 2019

Hello,
I have stumbled upon a peculiar thing with the FrostbiteNoFrameskip-v4 environment.
Consider the following code snippet, where I run the env indefinitely, at each step giving the 0-th action, which according to env.get_action_meanings() is NOOP, meaning the agent will do nothing.

import gym
env = gym.make("FrostbiteNoFrameskip-v4")
ob = env.reset()
while True:
    _, _, done, _ = env.step(0)
    env.render()
    if done:
        break

As expected the agent stands around doing nothing, until he freezes to death, upon which one life is deducted. This goes on until he runs out of lives. Then, it would be expected, that the final env.step(0) returns done=True, such that I can break from the game. However, this does not happen and instead the environment goes into a mode, which I can only describe as "Demo Play", like it would showcase the game in a video. I will add a screenshot of this. In this "Demo Mode" the agent stays indefinitely, dying several times, without losing lives and while also not gaining any points.
Screenshot from 2019-06-18 18-15-29

If we change the above toy example and let the agent go downwards all the time (env.step(5)), then upon dying, the environment sends done=True and the script quits the while loop successfully.

What is going on?

Best,
Marvin

@artofbeinghuman
Copy link
Author

The problem seems to be, that the agent doesn't get his last life discounted in info = {'ale.lives': 1} when freezing to death. So since at no point info == {'ale.lives': 0}, the environment also doesn't return done = True. However, if the agent dies by drowing (if you supply the down action at every step) then upon dying in his last life, ale.lives is set to 0 and done = True is returned.

Can anybody with a bit more experience say, if this has to be fixed in gym or is it actually a problem in the underlying ALE?

Thanks,
Marvin

@christopherhesse
Copy link
Contributor

christopherhesse commented Jul 12, 2019

It looks like gym just calls game_over which calls isTerminal on the environment (https://github.com/mgbellemare/Arcade-Learning-Environment/blob/f7fff8733c8cc0f54d749ddeaf29bd7f478d6f0f/src/games/supported/Frostbite.cpp#L61). This certainly looks like a bug, just not a bug in gym, could you please file it on the ALE repo? https://github.com/mgbellemare/Arcade-Learning-Environment/issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants