Skip to content

Fixes and improvements to PID. #27740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 14, 2025

Conversation

feldi12
Copy link
Contributor

@feldi12 feldi12 commented Mar 12, 2025

Description

Following a bug issue #27734 I examined the PID procedure and found, that it does not follow an actual PID algorithm as it constraints I term to only positive values. I also found, that in typical configuration during handover from max heating to PID, there is a pause to heating.

The constraining of the I term to non-negative values, may cause PID to improperly function, particularly when large flow changes are experienced during the print. Normally this term may be negative to offset the differentiating term and proportional term, by limiting it to zero we effectively reduce PID to PD over target temperature.

The uneven heating during handover to PID when heating is due to starting with I term at zero. During handover we come from a temperature outside of PID work range, typically from room temperature. As the I term is limited to max heater power in every iteration of the loop, starting with max I term value allows for smooth handover, and reduces total time to stabilize significantly. It will also reduce occurrence of alarms due to too slow heating, that were sometimes causing print aborts in printers with less powerful hot end heaters. Images comparing heating process are pasted in the issue linked above.

I also moved the max_I_term determining point outside loop - this value changes only when Ki is changed and there is no need to recompute it every iteration.

Requirements

No additional requirements. Couple of extra bytes in programme.

Benefits

Improves time to reach target temperature from room temperature and general temperature stability, particularly when large flow changes are experienced.

Configurations

Change should improve all configurations with PID enabled.

Related Issues

See #27734.

@ellensp
Copy link
Contributor

ellensp commented Mar 12, 2025

@feldi12
Copy link
Contributor Author

feldi12 commented Mar 12, 2025

Please read https://marlinfw.org/docs/development/coding_standards.html

Oh, right you are... I just followed my preferred indentation style without much thought.

@feldi12
Copy link
Contributor Author

feldi12 commented Mar 12, 2025

Two things I'd appreciate your thoughts on:

  • I think emitting a message at handover to PID would be a good idea. This would help a lot of users diagnose problems. From what I saw, people generally do not know, that there is a point, when control is handed over to PID. I did not find a guide for messaging, so if you agree, what call you think would be good? echo msg?
  • The call to constrain(temp_iState + pid_error, -max_power_over_i_gain/4.0f, max_power_over_i_gain) still uses rather arbitrary limits. What do you think about creating configuration parameter to allow users tuning this to specific printers?

@feldi12 feldi12 marked this pull request as ready for review March 12, 2025 09:46
@tombrazier
Copy link
Contributor

Okay, first time contributor I think. Fantastic, welcome on board.

Just to get it out of the way, a minor comment on coding style: Marlin is strict on the use of white space. No trailing white space (lines 222 and 243 of temperature.h) and one space between if and the opening bracket (line 225).

Also, whilst improving PID is an excellent thing to do, for a stable hotend MPC is generally better. I mention this just in case you didn't know.

Getting to the actual content of this PR, I'll post a detailed follow up comment. Watch this space...

@tombrazier
Copy link
Contributor

...starting with the easy change: calculating max_power_over_i_gain outside the PID loop is a very sensible speed optimisation at the cost of RAM. I am in two minds about this. I really hate floating point divisions. But on older, smaller mainboards (I still run an 8 bit MCU!) RAM is like hen's teeth. I think a better approach would be to multiply by Ki and then constrain without the need for the division.

However I have issues with how max_power_over_i_gain is calculated anyway (more to come on this). So maybe we sort that out first.

@tombrazier
Copy link
Contributor

Before I go on, to respond to your two questions @feldi12, I think a debug message at handover, guarded by PID_DEBUG, makes sense.

I have more to say on the arbitrary limits...

@tombrazier
Copy link
Contributor

...second easy(ish) thing to comment on: the uneven heating during handover to PID.

The P term really ought to correlate well to PID_FUNCTIONAL_RANGE. What I mean is that a perfectly optimised heater would be on full power until a moment where it suddenly switches to steady state maintenance power, resulting in a quick and clean settling at target temperature. However PID just isn't a great algorithm to get this behaviour because it is badly affected by latency and noise. (This is the problem MPC solves.)

If, in reality, PID is not switching well at the boundaries of PID_FUNCTIONAL_RANGE, I'm not sure fixing it with I is correct. In fact, I think there's a set-your-printer-on-fire pathological case: if Ki is a very small number then max_power_over_i_gain could end up being a very large number and it will take a long time for consecutive pid_error values to reduce it. During this time approximately maximum heating will be being applied.

@feldi12
Copy link
Contributor Author

feldi12 commented Mar 12, 2025

In fact, I think there's a set-your-printer-on-fire pathological case: if Ki is a very small number then max_power_over_i_gain could end up being a very large number and it will take a long time for consecutive pid_error values to reduce it.

I see what you mean. But in almost all cases the P term is so small, that at handover the heater almost stops, and I term keeps building until it reaches maximum anyway. As you said before, this limit is not computed in particularly good way, so improving this guestimate might solve this.

Secondly for the fire scenario - isn't this why we have the safety timers? I actually started digging into that, because the kink at handover kept triggering the safety timer and resetting my printer.

P.S: I also run 8-bit. Atmega in anycubic i3 S. I don't think it really adds RAM usage, as PID is called so often, that you would crash sooner or later anyway if this 4 bytes were not available anyway...

@tombrazier
Copy link
Contributor

...and, finally, detailed analysis as regards constraining max_power_over_i_gain:

TLDR; The code as originally written in #14373 (i.e. commit 1db7013) is correct. However two bugs were introduced in #14746, one of which is addressed in this PR (well spotted @feldi12). However a later series of changes actually undid #14746 for hotends (but not for beds and chambers). This PR therefore addresses a potential PID bug for beds and chambers. However (I keep using this word!) I think the lower bound used is arbitrary and not necessarily correct.

In full...

When max_power_over_i_gain was introduced in #14373, it was correct for the lower bound to be zero. We can see this if we think about what the P, I and D components of PID do. The I component exists to correct fixed offset errors at steady state. When this is the case the D component will be zero (or close to zero). The P component has to be a positive number for heater power to be generated and so at steady state the temperature will settle somewhere below the target unless the I component corrects this. I.e. the I component should only ever correct in an upwards direction. If it were to become negative then the temperature is not at steady state and the I component should not be interfering.

This all changed in #14746 which introduced what it called MIN_POWER (and equivalents for bed and, later, chamber). Strictly speaking this parameter would have been better called NOMINAL_POWER. It shifts the origin of the PID output to a value close to the ideal output power. So PID now no longer calculates an absolute output power. Instead it calculates a positive or negative offset from MIN_POWER. In this case the temperature could well settle above the target (if MIN_POWER is too large). And so the I component of PID now should be able to go negative.

But what should the limits be for temp_iState? As you comment, @feldi12, the upper bound of max_power_over_i_gain seems somewhat arbitrary and a lower bound of minus a quarter of that number is certainly arbitrary.

If we dig into the history, I think there was originally a sensible logic to the calculation of max_power_over_i_gain, but a bug in #14746 messed that up. I think we can go back to this original reasoning and that will give us a sensible way to decide both upper and lower bounds now. Originally, in #14373 the value of max_power_over_i_gain prevented the I component of PID from becoming greater than PID_MAX (and equivalent for bed, etc.).

When #14746 added MIN_POWER, I think a bug was introduced. It changed the line
max_power_over_i_gain = (float)PID_MAX / PID_PARAM(Ki, ee)
to
max_power_over_i_gain = float(PID_MAX) / PID_PARAM(Ki, ee) - float(MIN_POWER).

I think the new line should have been
max_power_over_i_gain = float(PID_MAX - MIN_POWER) / PID_PARAM(Ki, ee).

So the I component would once again be constrained to prevent I by itself from demanding more than maximum power.

Finally, this brings us to what the lower bound for temp_iState should be. If MIN_POW (as it is now called for all heaters) is really nominal power and we are preventing the I component from pushing output power further than MAX_POW then I think the I component should also be constrained to not push power below zero.

So the right code (with the optimisation that elminates the division) is:

work_i = constrain(work_i + Ki * pid_error, -float(MIN_POW), float(MAX_POW - MIN_POW));

And the temp_iState and max_power_over_i_gain variables are unnecessary. And work_i gets set to zero instead temp_iState of when pid_reset is true.

In looking into all of this, I have also found another bug: commit e85eca2 removed MIN_POWER from Configuration.h. I am guessing this was an error because it does not relate to the check-in comment. Later #24389 replaced MIN_POWER in the PID code with zero, presumably because MIN_POWER was now missing from Configuration.h. Finally #25585 cleaned up MIN_POWER from Conditionals_post.h. However the concept of MIN_POWER remains for bed and chamber PID with the config values MIN_BED_POWER and MIN_CHAMBER_POWER. I think MIN_POWER should be returned to the code for the hotend (for which it was originally implemented!).

@tombrazier
Copy link
Contributor

I see what you mean. But in almost all cases the P term is so small, that at handover the heater almost stops, and I term keeps building until it reaches maximum anyway. As you said before, this limit is not computed in particularly good way, so improving this guestimate might solve this.

If P has to be really small that implies a lot of latency. I.e. larger P would introduce too much instability. Really, try MPC.

Secondly for the fire scenario - isn't this why we have the safety timers? I actually started digging into that, because the kink at handover kept triggering the safety timer and resetting my printer.

Well spotted. Yes, the protection code should prevent a fire. But that's just a failsafe. If it is ever triggered then by definition the code has a bug or something else has gone wrong.

P.S: I also run 8-bit. Atmega in anycubic i3 S. I don't think it really adds RAM usage, as PID is called so often, that you would crash sooner or later anyway if this 4 bytes were not available anyway...

Oh, good point. But if my last comment is right then this is moot anyway.

@tombrazier
Copy link
Contributor

Review comments aside, I am really glad that you're contributing @feldi12. Especially given you have obviously understood the code, which would be beyond most mortals.

@mp-pomorszcz
Copy link

mp-pomorszcz commented Mar 12, 2025

Review comments aside, I am really glad that you're contributing @feldi12. Especially given you have obviously understood the code, which would be beyond most mortals.

:) Thanks.

I run out of brains for now, I'll go through your long comment later... As you guessed this is first time I contribute and as such, i find it is surprisingly time consuming.

I'm also gonna 'thank you' right back at ya, as you obviously put a lot of effort in keeping Marlin alive. Cheers.

P.S: Oh my... i just saw I accidentally made two separate account for my two main email addresses and used them like one in this discussion...

@tombrazier
Copy link
Contributor

i find it is surprisingly time consuming

Yeah. The basic PR submitting process becomes pretty straightforward after you've done a few. And if the changes are simple then it can sometimes be merged into the bugfix branch quite quickly. Ditto once you get a bit better known.

What I have found most time consuming is testing and then responding to bug reports which result from there just being so many different configurations running Marlin. OTOH I think it is so worthwhile.

@tombrazier
Copy link
Contributor

Thinking about it, restoring the MIN_POWER logic for the hotend will make it possible to resolve the problem with power dropping when after handover to PID. All you then have to do is get an idea of the steady state power needed and set it in MIN_POWER. Note, the value reported by M105 is halved so you need to double it for the right MIN_POWER.

Further note on submitting PRs: if you're fixing a bug yourself you do not also have to submit a bug report. You can just describe the bug in the PR. That saves some work.

@feldi12
Copy link
Contributor Author

feldi12 commented Mar 12, 2025

OK, I went through your comment and I think I have an idea how this would apply... Let's start with theory:

The P component has to be a positive number for heater power to be generated and so at steady state the temperature will settle somewhere below the target unless the I component corrects this. I.e. the I component should only ever correct in an upwards direction. If it were to become negative then the temperature is not at steady state and the I component should not be interfering.

So what I was considering was actually an unsteady state. The PID will have to resolve constantly changing parameters, mostly due to changes in geometry of airflow around nozzle and changes in flow of filament. For example in case of finishing a big part of infill and moving to printing fine detail in the walls, the flow will suddenly reduce by a big amount. In this case a steady state output will cause to overshoot target and integral term will speedup recovery.

Now the suggested constrain work_i = constrain(work_i + Ki * pid_error, -float(MIN_POW), float(MAX_POW - MIN_POW)); allow for the term to become negative, but only if we have MIN_POW set to above zero. And knowing that people optimize effort spent at a cost of almost everything else (read: are lazy), we are likely to stay with 0 lower bound, as MIN_POW is set to zero by default... And I believe it may be very hard to make a sensible estimation of the MIN_POW, as heater power vary greatly between printers.

Now defaults that may lead to bad behaviour links us neatly to the other discussion we had:

Secondly for the fire scenario - isn't this why we have the safety timers? I actually started digging into that, because the kink at handover kept triggering the safety timer and resetting my printer.

Well spotted. Yes, the protection code should prevent a fire. But that's just a failsafe. If it is ever triggered then by definition the code has a bug or something else has gone wrong.

By default if someone encounters the repeating failsafe trouble, he will first look at the timer limit:

/**
* Whenever an M104, M109, or M303 increases the target temperature, the
* firmware will wait for the WATCH_TEMP_PERIOD to expire. If the temperature
* hasn't increased by WATCH_TEMP_INCREASE degrees, the machine is halted and
* requires a hard reset. This test restarts with any M104/M109/M303, but only
* if the current temperature is far enough below the target for a reliable
* test.
*
* If you get false positives for "Heating failed", increase WATCH_TEMP_PERIOD
* and/or decrease WATCH_TEMP_INCREASE. WATCH_TEMP_INCREASE should not be set
* below 2.
*/
#define WATCH_TEMP_PERIOD 40 // (seconds)
#define WATCH_TEMP_INCREASE 2 // (°C)
#endif

So we see, that the comment suggest increasing the timer in case of trouble. This is where most people will stop digging, just increasing the limit until it works. And I would argue, that loosening the protection is more dangerous than fixing the source of common trigger.

This brings us back to MIN_POW for hot end. This would solve all our troubles, but setting it to sensible value would have to be somehow enforces. For example providing no default value, making it a setting to be mandatory set when creating configuration for a given printer, with a lot of comments on how to approach this and what value is optimal.

This failsafe triggering may be a more common problem, then one would think. In my case I'm using a firmware released by @knutwurst , the https://github.com/knutwurst/Marlin-2-0-x-Anycubic-i3-MEGA-S . This is best firmware for my printer I found, and while it still uses PID instead of MPC, I found that after allowing negative I term it keep hotend stable to the extent possible by readout precision. That is plenty good enough in my books.

It is however a repo that is particularly well kept for such a niche project and we should be aware, that other might need more persuasion to pay attention and actually take advantage of this kind of improvements.

How would you approach the problem of enforcing sensible setting of MIN_POW, and do you see it as crucial as I do?

@tombrazier
Copy link
Contributor

So what I was considering was actually an unsteady state...

I think this is a recipe for oscillations. Check my maths... Power is proportional to the rate of change of temperature. The I component is the integral of temperature. So we have

$$\frac{dT}{dt} = \alpha . \int_{x=0}^t (T_s - T(x)) dx$$

where $T$ is temperature, $t$ is time and $T_s$ is the setpoint and for some $\alpha$. Taking the derivative of both sides,

$$\frac{d^2 T}{dt^2} = \alpha . (T_s - T)$$

which is a simple harmonic oscillator.

How would you approach the problem of enforcing sensible setting of MIN_POW, and do you see it as crucial as I do?

Enforcing it is not necessary, it's up the the user to use it or not as they prefer. Most users are not looking to solve this problem. For those who are, there are the comments in the config files and #support-chat on the marlin discord.

@thinkyhead
Copy link
Member

I think the new line should have been max_power_over_i_gain = float(PID_MAX - MIN_POWER) / PID_PARAM(Ki, ee).

I concur that this makes more logical sense, assuming that i is really being applied as a proportion to the permitted gain over min, and not simply as a limit to the max power.

I have also found another bug: commit e85eca2 removed MIN_POWER from Configuration.h. I am guessing this was an error because it does not relate to the check-in comment.

You are correct! This was due to the commit being merged very soon after #14746 right after MIN_POWER was added. So we should definitely restore MIN_POWER as a start, with apologies to @mikeshub for stepping on that very useful option.

thinkyhead added a commit to MarlinFirmware/Configurations that referenced this pull request Mar 12, 2025
@thinkyhead
Copy link
Member

thinkyhead commented Mar 17, 2025

Not all LCD controllers currently use set_Ki to set Ki so the code for those LCDs will need updates to ensure max_power_over_i_gain is recalculated whenever Ki is edited. Some UIs set the Ki value in-place during editing —while a PID loop is running— and those will cause max_power_over_i_gain to be out of sync with the current Ki value.

In future it is not guaranteed that vendors will be aware that they should not modify Ki in menus directly, so we will have to lock down these data members and modify how they are edited.

To that end, there is an "editable setting" concept under development as a general part of the Marlin Features class, which would specify valid value ranges, value formatting, etc. These will also deal with values being storable to EEPROM. Once that API has been developed and propagated it should standardize how values are edited on-screen and then we won't have these kinds of special cases.

@tombrazier
Copy link
Contributor

@thinkyhead Adding MIN_POWER to the example configs in and of itself is insufficient because all references to it have subsequently been scrubbed from the code. It also needs to be added back to Configuration.h and the default value added back to Conditionals_post.h (removed in #25585) and the instantiation of hotend_pid_t in temperature.h needs to reference it:

  typedef
    #if ALL(PID_EXTRUSION_SCALING, PID_FAN_SCALING)
      PIDCF_t<MIN_POWER, PID_MAX, LPQ_MAX_LEN, PID_FAN_SCALING_MIN_SPEED, PID_FAN_SCALING_LIN_FACTOR>
    #elif ENABLED(PID_EXTRUSION_SCALING)
      PIDC_t<MIN_POWER, PID_MAX, LPQ_MAX_LEN>
    #elif ENABLED(PID_FAN_SCALING)
      PIDF_t<MIN_POWER, PID_MAX, PID_FAN_SCALING_MIN_SPEED, PID_FAN_SCALING_LIN_FACTOR>
    #else
      PID_t<MIN_POWER, PID_MAX>
    #endif
  hotend_pid_t;

@thisiskeithb
Copy link
Member

It also needs to be added back to Configuration.h and the default value added back to Conditionals_post.h (removed in #25585) and the instantiation of hotend_pid_t in temperature.h needs to reference it

Should be in latest bugfix-2.1.x:

@tombrazier
Copy link
Contributor

Now that MIN_POWER is restored, I think it solves the problem which this PR attempts to solve with the logic related to pid_below. Further, I think this logic may break PID for a lot of printers because it introduces a sudden jump in temp_iState to an arbitrarily chosen value. I think this logic should be removed.

However I agree with @feldi12 that MIN_POWER needs to be better documented.

Further, I note that my comments about MIN_POWER being better named NOMINAL_POWER are incorrect (for the hotend at least) because PIDRunner::get_pid_output() constrains the output power to be at least MIN_POWER and so MIN_POWER must be lower than the nominal power output. This also means that max_power_over_i_gain never needs to go negative because PID will always settle below the setpoint without the influence of the I component.

Finally, max_power_over_i_gain is not actually needed and the division is not needed to calculate it.

I think the changes in https://github.com/tombrazier/Marlin/tree/27740 should be applied to this PR.

@mikeshub
Copy link
Contributor

A bit late to the party. Thanks for the shoutout. Yes that MIN_POWER term was there so the handoff to the PID controller would go more smoothly. It has to have some power going through it because there is a powerful fan blowing on the hot end at all times.

Definitely proceed with caution on the PID controller. "Perfect is the enemy of good" is very true with the PID controller. These seemingly little changes may improve the performance on a single printer but break things badly for lots of other people.

The reason I choose not to let the I term go negative when I did my work way back was because logically it doesn't make sense to me as the lowest a PID loop can go is 0 duty cycle PWM. Imagine if the PID loop has been over temp for long enough to make the I term go negative. Once the temp cools to the set point you want the PID loop to start holding that temperature again. There's going to be a fairly powerful fan cooling the hotend at all times so that filament doesn't melt in the heat break. If the I term is down below zero already then it will just be fighting the P term to get it back up to the correct temperature. If you are in a situation where the PID loop is staying over temp due to wound up I term the better fix is to adjust your P gain up and I term down. Not allowing negative I term should also help with oscillation when the controller ends up with too high of an I gain.

The autotuner gets gains close enough to be good enough for anyone to use. That said I haven't printed in a while so it may be improved since I last used it, but I would be surprised if it was perfect. If you want to get better performance out of the PID loop I highly recommend tweaking your gains manually instead of messing with the insides of the PID loop.

@thinkyhead thinkyhead force-pushed the bugfix-2.1.x branch 3 times, most recently from 4354891 to efa1758 Compare March 28, 2025 01:57
@thinkyhead
Copy link
Member

Thanks for the helpful input @mikeshub

I went ahead and brought over Tom's suggestions for review. Take a look and see what you think.

thinkyhead added a commit to MarlinFirmware/Configurations that referenced this pull request Apr 14, 2025
@thinkyhead
Copy link
Member

The remaining changes in this PR seem to be safe, leaving the logic essentially untouched but requiring a little less permanent RAM usage. Let me know if you encounter any anomalies with this modification. I will be putting this through its paces on a BIQU BX while also working on improving SD Card error recovery. I will tag and release 2.1.3-b3 once this testing and tweaking is completed. The final 2.1.3 release will commence next week, so if there are any annoying issues you 'd like to see solved before then, now is the time!

@thinkyhead thinkyhead merged commit 5e8a523 into MarlinFirmware:bugfix-2.1.x Apr 14, 2025
65 checks passed
EvilGremlin pushed a commit to EvilGremlin/Marlin that referenced this pull request May 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] The PID routine does not follow pid algorith, the I term cannot be negative
7 participants