Skip to content

Score not propagating between moves #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
CivilizationalAgency opened this issue Sep 12, 2024 · 8 comments
Open

Score not propagating between moves #26

CivilizationalAgency opened this issue Sep 12, 2024 · 8 comments

Comments

@CivilizationalAgency
Copy link

Since yesterday I've noticed that scores are not accurately propagating back between moves like they used to (whilst also allowing for some loss/regression to 0 for uncertainty), so the score is contradicting itself between moves and the move ranking is completely wrong, since the score of a move is no longer given by the evaluation of the final move of the best line

@noobpwnftw
Copy link
Owner

I've changed the score backup function to a more well-defined weighed averaging scheme. It is expected to be more accurately propagating leaf scores back to root, however this change can take some time to reach every line.

@Bratish971
Copy link

I've changed the score backup function to a more well-defined weighed averaging scheme. It is expected to be more accurately propagating leaf scores back to root, however this change can take some time to reach every line.

Does it means, what if in main line at the end score 0, at start of line score be different from 0?

@CivilizationalAgency
Copy link
Author

CivilizationalAgency commented Sep 13, 2024

It does not seem to be propagating even between consecutive moves, for instance for the chess database the strongest first move at the time of writing has a score of 6, but the strongest responses from black have a score of -1. Previously the largest discrepancy between consecutive moves was 2 points if I recall correctly

@robertnurnberg
Copy link

Note that the score of the best move is no longer equal to the "evaluation" of that position on cdb. The evaluation of the position is now based on https://en.wikipedia.org/wiki/Softmax_function. For the position after 1. d4, we get this weighted average:

> python cdbeval.py --san "1. d4"
move:  g8f6, score:   -1, weight: 1.000000
move:  d7d5, score:   -1, weight: 1.000000
move:  e7e6, score:   -3, weight: 0.818731
move:  c7c6, score:   -9, weight: 0.449329
move:  d7d6, score:  -13, weight: 0.301194
move:  g7g6, score:  -17, weight: 0.201897
move:  f7f5, score:  -21, weight: 0.135335
move:  a7a6, score:  -26, weight: 0.082085
move:  c7c5, score:  -29, weight: 0.060810
move:  b8c6, score:  -29, weight: 0.060810
move:  h7h6, score:  -71, weight: 0.000912
move:  a7a5, score:  -72, weight: 0.000825
move:  b8a6, score:  -75, weight: 0.000611
move:  b7b6, score:  -79, weight: 0.000410
move:  g8h6, score: -105, weight: 0.000030
move:  h7h5, score: -114, weight: 0.000012
move:  b7b5, score: -126, weight: 0.000004
move:  e7e5, score: -140, weight: 0.000001
move:  f7f6, score: -143, weight: 0.000001
move:  g7g5, score: -227, weight: 0.000000
Weighted eval:  -5.971027695491816

If you want to test this also for other positions, you can use this script: cdbeval.py.

@CivilizationalAgency
Copy link
Author

Note that the score of the best move is no longer equal to the "evaluation" of that position on cdb. The evaluation of the position is now based on https://en.wikipedia.org/wiki/Softmax_function. For the position after 1. d4, we get this weighted average:

> python cdbeval.py --san "1. d4"
move:  g8f6, score:   -1, weight: 1.000000
move:  d7d5, score:   -1, weight: 1.000000
move:  e7e6, score:   -3, weight: 0.818731
move:  c7c6, score:   -9, weight: 0.449329
move:  d7d6, score:  -13, weight: 0.301194
move:  g7g6, score:  -17, weight: 0.201897
move:  f7f5, score:  -21, weight: 0.135335
move:  a7a6, score:  -26, weight: 0.082085
move:  c7c5, score:  -29, weight: 0.060810
move:  b8c6, score:  -29, weight: 0.060810
move:  h7h6, score:  -71, weight: 0.000912
move:  a7a5, score:  -72, weight: 0.000825
move:  b8a6, score:  -75, weight: 0.000611
move:  b7b6, score:  -79, weight: 0.000410
move:  g8h6, score: -105, weight: 0.000030
move:  h7h5, score: -114, weight: 0.000012
move:  b7b5, score: -126, weight: 0.000004
move:  e7e5, score: -140, weight: 0.000001
move:  f7f6, score: -143, weight: 0.000001
move:  g7g5, score: -227, weight: 0.000000
Weighted eval:  -5.971027695491816

If you want to test this also for other positions, you can use this script: cdbeval.py.

Thank you for the response! I understand that giving a greater weighting to suboptimal moves would make the score more robust to an incorrectly calculated best response so scores should be more stable, the tradeoff being that the weighting of the strongest move is diluted. Intuitively this would become most useful for moves evaluated to a shallower depth where there is greater uncertainty, and conversely for moves with greater depth you just use the best response. I see the temperature parameter in the script, where is it coming from? It would make sense to me if it was inversely related to evaluation depth, but this doesn't seem to be the case since already for the evaluation of the first moves (e.g. 1. d4) the weighting of the best response is already being diluted. Or is it just because it isn't updated yet like @noobpwnftw mentioned?

@robertnurnberg
Copy link

Yes, the script uses the same (global) temperature as cdb. For a more detailed discussion of the pros and cons you could join the chessdb channel on the stockfish discord server: https://discord.com/channels/435943710472011776/1101022188313772083

@CivilizationalAgency
Copy link
Author

Has the use of a dynamic temperature as a function of PV depth been considered to restore a more useful score for positions with a high eval depth/low uncertainty?

@noobpwnftw
Copy link
Owner

Don't have a way to make estimations of that, I guess given time it'll solve the problem by itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants