Make ROM use more robust #193

nczempin · 2017-03-26T22:33:45Z

Varying on the ideas of issue #192, my primary goal is really to enable myself or others to make it easier (hopefully just data-driven, so no compilation necessary) to use ROMs other than those that have their own classes compiled in.

The process of converting a rom name from the command line to the correct code is not very robust; if you don't use the precise naming expected, you can run into a segmentation fault in loadROM(). Since the ROMs already seem to be recognized, given that their names are printed on the command line, it should be possible in many if not most cases to work from that information and more or less ignore the filename.

What most ROMs really just need is to be told "where do I find the score" and "what value do I compare to what to determine the end of the game". Since those can just be numbers, there is no need for a separate subclass each time. The common cases ("use BCD from these n locations", where n is usually 1 to 3, "the game ends when the content of this memory address goes to m", where m is usually 0 or FF; there will be others) can be enabled by default, and for common variations we can just introduce another parameter (e. g. "use normal integer representation") that will run into a different part of generic code.

For figuring out the scores and lives, the way I've described in issue #192 has its advantages, but it's not trivial to implement. In the meantime, we can use ALE itself to help us figure out semiautomatically which addresses contain these values, by filtering the memory contents by those that mostly increase (for scores) or mostly decrease (for lives). A user can let an agent play or play manually (there is a minor complication in that we don't return to Python until we re-disable manual control, but it's manageable) and watch some memory dumps and reasonably quickly find e. g. the memory address that goes to 0x23 when your score goes to 23, and usually one right next to it will go from 0 to one when we go over 100, etc.

I have a bunch of other ideas, no need to describe them all in detail here; suffice it to say that I'm working on making those changes myself except for those where someone says they're already 90 % done :-)

nczempin · 2017-03-26T22:54:09Z

The most important part of reverse-engineering the ROMs is actually not the score, but the "lives" or similar metric that determines the terminal state: Many games that are "arcade-like" have the implicit goal of surviving as long as possible, so having simply a +1 per step (or normalized to once/second etc.) would for many games move an agent in the right direction.

Of course, this is not true for all games; in Pong this would optimize for 21-20 or 20-21 scores that take very long. However, to achieve this, the agent has to learn to return the ball, which is a good start. Not as good as actually finding out where the score is kept in memory, but better than simply not giving out any reward at all. And in Space Invaders the agent won't go for motherships, but it still needs to learn to fight the regular aliens.

…-Foundation#193

nczempin · 2017-03-27T12:41:48Z

Ideally we would like to be able to reverse-engineer both the score and the terminal condition.

I already described that for many games, just being able to detect the terminal condition we can use the number of steps it takes before we reach it as the reward.

It also works the other way round: If we don't know the terminal condition, we can just terminate after an arbitrary number of steps, and agents that maximize the score will be better in the actual game that detects the terminal condition.

…-Foundation#193

nczempin added a commit to nczempin/Arcade-Learning-Environment that referenced this issue Mar 27, 2017

Make some preparations for making ROM usage more robust, issue Farama…

9e3755b

…-Foundation#193

nczempin mentioned this issue Apr 2, 2017

Score wrapping #103

Open

nczempin added a commit to nczempin/Arcade-Learning-Environment that referenced this issue Apr 10, 2017

Make some preparations for making ROM usage more robust, issue Farama…

7dcea00

…-Foundation#193

nczempin added a commit to nczempin/Arcade-Learning-Environment that referenced this issue May 3, 2017

Make some preparations for making ROM usage more robust, issue Farama…

c8d8c51

…-Foundation#193

JesseFarebro added the enhancement label Sep 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make ROM use more robust #193

Make ROM use more robust #193

nczempin commented Mar 26, 2017

nczempin commented Mar 26, 2017 •

edited

Loading

Uh oh!

nczempin commented Mar 27, 2017

Uh oh!

Make ROM use more robust #193

Make ROM use more robust #193

Comments

nczempin commented Mar 26, 2017

nczempin commented Mar 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nczempin commented Mar 27, 2017

Uh oh!

nczempin commented Mar 26, 2017 •

edited

Loading