-
Notifications
You must be signed in to change notification settings - Fork 2.5k
PV depth increasing and decreasing (sorting bug?) #1787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can confirm this "bug" in Arena. I've also never seen this before. I've also noticed another issue with this latest patch - at least some test positions (such as listed below) which @vondele randomise draw eval patch "fixed", appear to be back to previous behaviour ("draw blindness"?): |
The depth behavior is not unexpected (i.e. was implemented on purpose). It is the way fail high searches are now being handled while the aspiration window is being adjusted. @ssj100 surprising to see that these positions would be influenced, but it can't be excluded. |
@vondele Thanks for confirming the depth behaviour is not unexpected. Yes I'm surprised that those positions would be influenced too, but Bryan SF is also seeing the same "issue". Could you check it out please? |
@vondele I put the three cases on fishcooking. One from K-H game, one from AZ vs. SF8 Game 3, and one from the disabled LC0 game. |
I can confirm that for the test positions reported by @ssj100 the randomize eval patch doesn't work anymore (on single thread). I created a test in the framework that might fix this: http://tests.stockfishchess.org/tests/view/5bd305a60ebc595e0ae1e13c |
More info and discussion of this topic here: |
Depth going up and down is far more serious "bug" than the one that caused recent revert (end of fail high PV) imo. |
well, it is the correct output for what the engine is right now doing at failhigh. Fail highs are resolved at increasingly lower depth (named 'adjustedDepth' in the code). So, the output definitely matches what is going on... However, 'depth' means anyway not much... we could consider this printed 'adjustedDepth' an implementation detail and continue to print 'rootDepth'. I have no strong opinion here. |
So, if I understood correctly, latest pv, even if it shows lower depth than the one before that, is to be considered "best so far"? |
tricky .... I would guess so (in the sense that we will play the move that is first in that PV if we're forced by the clock to play, and that's the best line we've found so far), but we optimize for winning games ... and that's not the same as producing the best PVs. We might have found that it is more effective (overall) to do a lousy job while resolving a failhigh, so we gain more time that we can use when it is needed more. The PV in the new scheme, once the failhigh is resolved is presumably of lower quality than the resolved PV in the old scheme (because it is at lower depth). |
That answers my question, last line shown is best SF could come up so far, with the code as it is now.
That one concerns me less, as it is in the hands of devs to find the optimum. What I was worried is we are not sure any more if the "best found so far" is what is typed last or what is typed with higher depth. That would render SF almost unusable for chess analysis.
That is confusing. If the new resolved pv is of lower quality than the old resolved pv is, shouldn't that mean lower quality of moves played overall, too? And if so, how come the patch is elo gain (at 1 core at least)? |
because of time gained in iterations involving failhigh, the overall quality of played moves might increase, e.g. we can do an iteration more, or we can resolve the failhigh. The patch is quite clearly an elo gainer. |
Ah ok, gains time. One thing to consider for us correspondence players (time not really an issue). |
nevertheless similar, consider the new scheme the most efficient way to high depth. |
Yes of course. Thank you @vondele for your input! |
As I understand from this topic, the new depth reduce patch is good for the engines games, but it degrades the quality of the analysis. Is it possible to make a check-box in parameters for the possibility old depth scheme for the correspondence players? |
Kevin, given the exact same amount of time, new SF will outperform old SF. But if you compare the quality of pv on the same depth, old SF pv will be probably better. At least that's the way I understood this. |
DragonMist, ok, but it is dizzying. I also noticed a drop in speed with a new patch on my PC (about 6%). Has anyone else noticed? |
I think this is one of those things that for us, practical users, it is better to ignore it (same as I concluded about introducing contempt per se, as well as about analysis asymmetry). Haven't tried it yet so no idea about slow down. |
slowdown is well possible, but again doesn't say much for this kind of change, search is pretty different now e.g. the fraction of TThits vs nodes will be rather different. |
I have noticed slower performance as well now that I've played more positions. But most important right now is that we clearly understand, from a correspondence player's point of view, which is the best line? The line last displayed even if lower depth than previous line at higher depth? |
The last line displayed, yes. Even if lower depth. |
I think the last line, but this would actually be testable... I might be giving it a try on fishtest. |
Well do we know if, in engine vs engine matches, the elo gain is from which line? The last line regardless of depth or the one with highest depth? |
@vondele if you can create such test, do it please, it confuses us. |
yes, looking into it. |
Test here: http://tests.stockfishchess.org/tests/view/5bd37a120ebc595e0ae1e7c3 |
a) this patch passed more tests than most other patches, |
Ivan, we're not trying to interpret the notion of depth, we're trying to ascertain if the latest displayed pv by new SF is in fact strongest, i.e. stronger than previous with higher depth, which we saw is feature of new patch (showing depth of X then going back) |
DragonMist, in my opinion when the depth goes down, you should wait for it to go up again to be sure. |
OK, test is clear now, better play the move of the last PV. |
Yeah, watched every second of it. Great help, thanks @vondele ! |
Cool...how do I know when a fail high/low has been resolved? |
Lowering the depth seems to confuse xboard somewhat (it does not print the lower depth pv in its engine output window), |
≤<Cool...how do I know when a fail high/low has been resolved? According UCI-protocol SF gives out fail Highs with the lowerbound notion and fail lows with the upperbound notion. Usually a GUI highlights fail h/l with according symbols for instance arena puts a +/- symbol after depth/seldepth. When the GUI gives out a line without these symbols then you know that the fail h/l is resolved. |
@pb00068 what about just continuing to print 'rootDepth' instead of 'adjustedDepth' in the pv info lines. It is cosmetics, but I feel like printing adjustedDepth is exposing users to an implementation detail (what if the scheme gets even more complex in the future). The issue with xboard would be and additional reason. |
I just noticed a new issue with the new patch. I analyzed some position for several hours, got depth 62 and 0.00 eval. As soon as I began to run the received variation on the game notation, SF lost its accumulated depth and evals, as if the hash had been suddenly reset. Can it be fixed? |
@vondele I feel that SF should give out the thruth and not something the users or the GUI expected for pure habit. There's no protocol that specify depths must be reported in strict ascending order. Anyhow thats a question / proposal that IMO should be answered by maintainers. |
well, depth is all about convention, little about truth.... but I'm not too opinionated about which convention we adopt. |
@Coolchessguykevin If you don't stop analysis while doing moves over the board, it's totally normal that search begins again with depth 1 (without clearing the hash). |
@pb00068 It is very strange. I did not notice this in previous versions. The search continued from high depths with the previously obtained eval, when I was making candidate moves on the board in the considered position. |
Coolchessguykevin which GUI are you using? |
I'm really sorry to say, but this patch is causing a real mess! master, go depth 24 Only showing the last 2 (or 3 ?) iterations, starting from depth 22. |
pb00068 I use Aquarium 2018 GUI. Of course, I do not mean an instantaneous continuation from the same depth, when I continue the variant manually, but a very rapid increase in it (in a matter of seconds high values are reached, close to the reached value initially). In the most recent dev-version, with the introduction of the next move on the board, the analysis began from zero, that is, the accumulated estimate and deep depth disappeared. Its increase began very slowly from low values, like when you first started the engine. |
@joster: search was at rootdepth 24 then it failed high, then search terminated with resolving at rootdepth 23, so actually the used/explored depth is something between 23 and 24, more near to 23 however. I must admit that SF now do not more strict obey the Go depth Command. I'm in vacation the next 3 days, after that i'm quite confident to find a solution/fix for this minor problem. |
@pb00068 Hi Günther, glad to hear this. I really think this patch needs some polishing. ;-) In the meantime you may want to take a look at this joergoster@b9c2496 which seems to fix the problem of truncated PVs. But I need to write a verification function for the PVs and do some tests. |
@vondele will this patch affect the functioning of your randomization patch cuz now main thread is having a different depth than the rest and filling the TT with it? Some of the positions that @ssj100 and @crossbr mentioned are not getting resolved as they were before. Is this something to be concerned about or what can be done to make sure that the original behavior of the randomization patch is not affected by this patch? |
At the moment, I don't know how this interacts with randomization patch. I assume we'll find out. I'll prepare a patch for some of the other issues mentioned in this thread, but I'm waiting for some running tests. |
My guess is that because the randomization patch adds randomness (or at
least pseudo...) the exact details of which positions it helps will change
with subsequent releases. I expect it still works, but the detail of which
positions are helped (and which are not) has changed.
|
the committed FH patch (081af90) had a number of changes beyond adjusting the depth of search on fail high, with some undesirable side effects. 1) decreasing depth on PV output, confusing GUIs and players alike as described in issue official-stockfish#1787. The depth printed is anyway a convention, let's consider adjustedDepth an implementation detail, and continue to print rootDepth. Depth, nodes, time and move quality all increase as we compute more. (fixing this output has no effect on play). 2) fixes go depth output (now based on rootDepth again, no effect on play), also reported in issue official-stockfish#1787 3) lastBestDepth is used to compute how long a move is stable, a new move found during FH is incorrectly considered stable if based on adjustedDepth instead of rootDepth. (Changes TM). Reverting this: passed STC [-3,1] LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 82982 W: 17810 L: 17808 D: 47364 http://tests.stockfishchess.org/tests/view/5bd391a80ebc595e0ae1e993 passed LTC [-3,1] LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 109083 W: 17602 L: 17619 D: 73862 http://tests.stockfishchess.org/tests/view/5bd40c820ebc595e0ae1f1fb 4) In the thread voting scheme, the rank of the FH thread is now artificially low, incorrectly since the quality of the move is much better than what adjustedDepth suggests (e.g. if it takes 10 iterations to find VALUE_KNOWN_WIN, it has very low depth). Further evidence comes from a test that showed that the move of highest depth is not better than that of the last PV (which is potentially of much lower adjustedDepth). I.e. this test http://tests.stockfishchess.org/tests/view/5bd37a120ebc595e0ae1e7c3 failed SPRT [0, 5] LLR: -2.95 (-2.94,2.94) [0.00,5.00] Total: 10609 W: 2266 L: 2345 D: 5998 In a running 5+0.05 th 8 test (more than 10000 games) a positive Elo estimate is shown (strong enough for a [-3,1], possibly not [0,4]): http://tests.stockfishchess.org/tests/view/5bd421be0ebc595e0ae1f315 LLR: -0.13 (-2.94,2.94) [0.00,4.00] Total: 13644 W: 2573 L: 2532 D: 8539 Elo 1.04 [-2.52,4.61] / LOS 71% Thus, restore old behavior as a bugfix, keeping the core of the FH resolving scheme. (This is non-functional for a bench, but changes searches via TM and in the threaded case). Bench: 3166402
…ish#1791 (report real used depth) bench: 5439821
The recently committed Fail-High patch (081af90) had a number of changes beyond adjusting the depth of search on fail high, with some undesirable side effects. 1) Decreasing depth on PV output, confusing GUIs and players alike as described in issue #1787. The depth printed is anyway a convention, let's consider adjustedDepth an implementation detail, and continue to print rootDepth. Depth, nodes, time and move quality all increase as we compute more. (fixing this output has no effect on play). 2) Fixes go depth output (now based on rootDepth again, no effect on play), also reported in issue #1787 3) The depth lastBestDepth is used to compute how long a move is stable, a new move found during fail-high is incorrectly considered stable if based on adjustedDepth instead of rootDepth (this changes time management). Reverting this passed STC and LTC: STC LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 82982 W: 17810 L: 17808 D: 47364 http://tests.stockfishchess.org/tests/view/5bd391a80ebc595e0ae1e993 LTC LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 109083 W: 17602 L: 17619 D: 73862 http://tests.stockfishchess.org/tests/view/5bd40c820ebc595e0ae1f1fb 4) In the thread voting scheme, the rank of the fail-high thread is now artificially low, incorrectly since the quality of the move is much better than what adjustedDepth suggests (e.g. if it takes 10 iterations to find VALUE_KNOWN_WIN, it has very low depth). Further evidence comes from a test that showed that the move of highest depth is not better than that of the last PV (which is potentially of much lower adjustedDepth). I.e. this test http://tests.stockfishchess.org/tests/view/5bd37a120ebc595e0ae1e7c3 failed SPRT[0, 5]: LLR: -2.95 (-2.94,2.94) [0.00,5.00] Total: 10609 W: 2266 L: 2345 D: 5998 In a running 5+0.05 th 8 test (more than 10000 games) a positive Elo estimate is shown (strong enough for a [-3,1], possibly not [0,4]): http://tests.stockfishchess.org/tests/view/5bd421be0ebc595e0ae1f315 LLR: -0.13 (-2.94,2.94) [0.00,4.00] Total: 13644 W: 2573 L: 2532 D: 8539 Elo 1.04 [-2.52,4.61] / LOS 71% Thus, restore old behavior as a bugfix, keeping the core of the fail-high patch idea as resolving scheme. This is non-functional for bench, but changes searches via time management and in the threaded case. Bench: 3556672
should all be back to normal with the latest merge. |
I use Aquarium 2018 too and when I start an engine in the same position where I've already searched with infinite analysis and depth/score is saved in tree then it doesn't continue from same depth. It always takes me about the same time to reach the same depth I reached previously. Are you loading/saving hash first with a SF clone like Raub? Even that doesn't work for me, at least not with > 8 GB hash. |
Closing the issue for now after 3f1eb85 has been merged. |
Uh oh!
There was an error while loading. Please reload this page.
I happen across this position in a game and I'm thinking this is either a SF bug or gui but I've been using Aquarium for a decade and never saw this. Notice the depth below goes from d=31 to d=32 then back down to d=31. Then it reaches d=33 and goes down to 32 then skips and goes to d=34. This is with the latest SF (reduce depth after fail high).
Sorry about the formatting (looks better in notepad) but I couldn't improve the view and had to settle to insert it as a quote...this is the PV directly from the infinite analysis tree window in Aquarium (best to view from bottom to top):
[FEN "r4rk1/2pnqppp/p2bpn2/1p2N3/3P1B2/1N4P1/PP2PPKP/R1QR4 b - - 0 17"]
Now I suppose that's expected behaviour with the latest patch but that just feels weird.
The text was updated successfully, but these errors were encountered: