Avoid reevaluating arguments in _round, _constrain, _sign. by jaguilar · Pull Request #529 · simplefoc/Arduino-FOC

jaguilar · 2026-04-25T18:11:17Z

Description

_round, _constrain and _sign currently reevaluate their arguments, which wastes CPU cycles. In torque/foc_current mode, this fix saves about 300ns/foc loop (about 2% of total FOC CPU usage). In other modes, especially estimated_current, it saves more.

Type of change

Optimization

How Has This Been Tested?

I tested this on on my STM32G474 FOC bench setup.

Test Configuration/Setup:

Hardware: STM32G474, custom 3BLDC driver
IDE: VSCode & arm-none-eabi-gcc
MCU package version (stm32duino/arduino-esp32/..)

runger1101001 · 2026-04-29T13:22:28Z

Nice.
You're saying the ternary operator isn't optimised as well as the simpler structure in the template function?

Would you have time to compare __builtin_round() with the self-implemented one? That might be another possible option?

jaguilar · 2026-04-29T14:00:54Z

I could try it.

The point of the template function is the make sure that each expression is only evaluated once. So for example if you have _constrain(complicated_expr, low, high), you don't want to do if (complicated_expr < low) ? low : (complicated_expr > high) ? high : complicated_expr, since in the non-clamped case you have to reevaluate the expression three times.

jaguilar · 2026-04-29T14:49:51Z

Okay, replacing with builtins gives another 600ns speedup on a baseline of 9500ns for loopFOC (for position control mode in foc_current torque control). These implementations should be C++11 compatible. Let me know if it is acceptable and I will amend the commit.

Note that I have no way of testing that this works across different compilers so if you have a test suite to hand please feel welcome to run it.

template<typename T>
constexpr inline int _sign(T val) {
  return __builtin_signbit(val);
}

#ifndef _round
template<typename T>
constexpr inline std::enable_if<std::is_same<T, double>::value, long>::type _round(T x) {
  return __builtin_round(x);
}
template<typename T>
constexpr inline std::enable_if<std::is_same<T, float>::value, long>::type _round(T x) {
  return __builtin_roundf(x);
}
#endif

// Use enable_if to select the fastest implementation according to the amt type.
// Using __builtin_fXf is measurably faster than using the ternary approach.
template<typename T, typename L, typename H>
constexpr inline std::enable_if<std::is_integral<T>::value, T>::type _constrain(T amt, L low, H high) {
  return (amt < low) ? low : (amt > high) ? high : amt;
}
template<typename T, typename L, typename H>
constexpr inline std::enable_if<std::is_same<T, float>::value, T>::type _constrain(T amt, L low, H high) {
  return __builtin_fmaxf(low, __builtin_fminf(high, amt));
}
template<typename T, typename L, typename H>
constexpr inline std::enable_if<std::is_same<T, double>::value, T>::type _constrain(T amt, L low, H high) {
  return __builtin_fmax(low, __builtin_fmin(high, amt));
}

runger1101001 · 2026-04-29T15:51:26Z

Wow, thank you so much for trying it - like I thought, even better :-)

Personally I’m very tempted to do it like that… what do you think?
The built ins are GCC specific. But I’m not aware of any Arduino platform that does not use gcc… so I think we’re fine for compatibility in our ecosystem and it’s only people porting to CubeIDE and Keil compilers that might have troubles…

Although very cool I think I’d leave out the template magic that selects the fastest one. But also up for discussion.

jaguilar · 2026-04-29T17:49:55Z

Although very cool I think I’d leave out the template magic that selects the fastest one. But also up for discussion.

I don't think we can, because if anyone is compiling without -ffast-math (which is probably a lot of people because it's not on by default), the __builtin_fmax/_fmin/_round calls would add double promotion and software double rounding. (Will await your agreement or further feedback on this point before continuing.)

jaguilar · 2026-04-30T14:35:24Z

Looks like the AVR compiler indeed does not have type_traits, and thus we can't use enable_if for everything. My idea right now is to keep the enable_if code but to use the non-enable_if code for AVR.

_round, _constrain and _sign currently reevaluate their arguments, which wastes CPU cycles. In torque/foc_current mode, this fix saves about 300ns/foc loop (about 2% of total CPU usage). In other modes, especially estimated_current, it saves more.

jaguilar · 2026-04-30T14:50:14Z

Not sure what the deal is with the arduino_due_x compile -- seems to be a problem with min and max which should not have changed due to my code. Let me know if anything jumps out at you.

dekutree64 · 2026-05-27T16:27:12Z

Personally I'd rather not take that first step onto the slippery slope toward the horror that is "modern C++". The builtins are a tantalizing treat for an optimization nut like myself, but SimpleFOC is getting too bloated as it is, so I'd rather push back toward minimalism and readability for people to learn how FOC works rather than pure focus on performance.

My first question is, why is the compiler not optimizing it in the first place? The only moderately-complex expressions I see on a find-in-files of _constrain are the lag estimate _constrain( -current_sp*shaft_velocity*pole_pairs*axis_inductance.q, -voltage_limit, voltage_limit) and stepper setPwm functions _constrain(abs(Ualpha)/voltage_power_supply,0.0f,1.0f). The rest have at most one arithmetic operation, so if a compiler can't see that opportunity, this would only be a drop in the bucket of micro-optimization to be done.

The setPwm functions are wasteful anyway, doing two constrains when the voltage limit should (#536) always be below voltage_power_supply anyway. So I'd refactor to something like this:

void StepperDriver4PWM::setPwm(float Ualpha, float Ubeta) {
  float duty_cycle1A(0.0f),duty_cycle1B(0.0f),duty_cycle2A(0.0f),duty_cycle2B(0.0f);
  float limit = min(voltage_limit, voltage_power_supply); // Sanity check. voltage_limit should always be less than voltage_power_supply
  float duty1 = min(abs(Ualpha), limit)/voltage_power_supply;
  float duty2 = min(abs(Ubeta), limit)/voltage_power_supply;
  if(Ualpha > 0) duty_cycle1B = duty1; else duty_cycle1A = duty1;
  if(Ubeta > 0) duty_cycle2B = duty2; else duty_cycle2A = duty2;
  _writeDutyCycle4PWM(duty_cycle1A, duty_cycle1B, duty_cycle2A, duty_cycle2B, params);
}

Still a nested ternary, but easier for a dumb compiler to see the repetition, and not that many wasted comparisons in any case.

The lag estimate is such a sprawl that it should be split onto two lines anyway, and then can do like this to avoid the issue.

      if(_isset(axis_inductance.q)) {
        voltage.d = -current_sp*shaft_velocity*pole_pairs*axis_inductance.q;
        voltage.d = _constrain( voltage.d, -voltage_limit, voltage_limit) + feed_forward_voltage.d
      } else
        voltage.d = feed_forward_voltage.d;

Or pull it out into a separate function since it's the same in estimated_current and dc_current.

The similar inductance calculations in foc_current mode could also be split onto two lines like that, but I've never understood why those should be there in the first place. The effect of phase lag is that some of the should-be q current ends up in the d axis, but since the current PIDs are already adjusting the voltage to correct for it, modifying d voltage afterward should overcorrect and cause field weakening instead.

There are some instances of two arithmetic operations in some hardware-specific driver code like this:

  int pwm_h = _constrain(val-dead_time/2,0,255);
  int pwm_l = _constrain(val+dead_time/2,0,255);

But if we want to accommodate braindead compilers then we should be doing like this anyway:

  dead_time /= 2;
  int pwm_h = _constrain(val-dead_time,0,255);
  int pwm_l = _constrain(val+dead_time,0,255);

jaguilar · 2026-05-27T17:43:34Z

You could definitely adjust all the callsites to pass variables only. IMO it violates the principle of least surprise that something that looks like a function reevaluated its arguments. It’s ugly to have to work around this by extracting things into variables. Functions implemented as macros are universally limited this way in C and C++. I also dont think that having library functions implemented properly can rightly be called “bloated.” What makes software hard to understand is the complexity of the interface. This new constrain actually decreases the interface’s complexity. Because now something that looks like a function behaves like one. (This is the osterhaut a philosophy of software design way of looking at things.)

…

On Wed, May 27, 2026 at 10:27 AM dekutree64 ***@***.***> wrote: *dekutree64* left a comment (simplefoc/Arduino-FOC#529) <#529 (comment)> Personally I'd rather not take that first step onto the slippery slope toward the horror that is "modern C++". The builtins are a tantalizing treat for an optimization nut like myself, but SimpleFOC is getting too bloated as it is, so I'd rather push back toward minimalism and readability for people to learn how FOC works rather than pure focus on performance. My first question is, why is the compiler not optimizing it in the first place? The only moderately-complex expressions I see on a find-in-files of _constrain are the lag estimate _constrain( -current_sp*shaft_velocity*pole_pairs*axis_inductance.q, -voltage_limit, voltage_limit) and stepper setPwm functions _constrain(abs(Ualpha)/voltage_power_supply,0.0f,1.0f). The rest have at most one arithmetic operation, so if a compiler can't see that opportunity, this would only be a drop in the bucket of micro-optimization to be done. The setPwm functions are wasteful anyway, doing two constrains when the voltage limit *should* (#536 <#536>) always be below voltage_power_supply anyway. So I'd refactor to something like this: void StepperDriver4PWM::setPwm(float Ualpha, float Ubeta) { float duty_cycle1A(0.0f),duty_cycle1B(0.0f),duty_cycle2A(0.0f),duty_cycle2B(0.0f); float limit = min(voltage_limit, voltage_power_supply); // Sanity check. voltage_limit should always be less than voltage_power_supply float duty1 = min(abs(Ualpha), limit)/voltage_power_supply; float duty2 = min(abs(Ubeta), limit)/voltage_power_supply; if(Ualpha > 0) duty_cycle1B = duty1; else duty_cycle1A = duty1; if(Ubeta > 0) duty_cycle2B = duty2; else duty_cycle2A = duty2; _writeDutyCycle4PWM(duty_cycle1A, duty_cycle1B, duty_cycle2A, duty_cycle2B, params); } Still a nested ternary, but easier for a dumb compiler to see the repetition, and not that many wasted comparisons in any case. The lag estimate is such a sprawl that it should be split onto two lines anyway, and then can do like this to avoid the issue. if(_isset(axis_inductance.q)) { voltage.d = -current_sp*shaft_velocity*pole_pairs*axis_inductance.q; voltage.d = _constrain( voltage.d, -voltage_limit, voltage_limit) + feed_forward_voltage.d } else voltage.d = feed_forward_voltage.d; Or pull it out into a separate function since it's the same in estimated_current and dc_current. The similar inductance calculations in foc_current mode could also be split onto two lines like that, but I've never understood why those should be there in the first place. The effect of phase lag is that some of the should-be q current ends up in the d axis, but since the current PIDs are already adjusting the voltage to correct for it, modifying d voltage afterward should overcorrect and cause field weakening instead. There are some instances of two arithmetic operations in some hardware-specific driver code like this: int pwm_h = _constrain(val-dead_time/2,0,255); int pwm_l = _constrain(val+dead_time/2,0,255); But if we want to accommodate braindead compilers then we should be doing like this anyway: dead_time /= 2; int pwm_h = _constrain(val-dead_time,0,255); int pwm_l = _constrain(val+dead_time,0,255); — Reply to this email directly, view it on GitHub <#529?email_source=notifications&email_token=AAGDGTAZIPYZE2RMBARH3VL444JPNA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINJVGY2DQOJRHE4KM4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2KYZTPN52GK4S7MNWGSY3L#issuecomment-4556489198>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGDGTH33BQ5GPUUVVXZS63444JPNAVCNFSM6AAAAACYGKAYAOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DKNJWGQ4DSMJZHA> . Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS <https://github.com/notifications/mobile/ios/AAGDGTF4MEHP73LN4QHNBC3444JPNA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINJVGY2DQOJRHE4KM4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2KUZTPN52GK4S7NFXXG> and Android <https://github.com/notifications/mobile/android/AAGDGTAFV5KKJPQUK4XOVH3444JPNA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINJVGY2DQOJRHE4KM4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2K4ZTPN52GK4S7MFXGI4TPNFSA>. Download it today! You are receiving this because you authored the thread.Message ID: ***@***.***>

dekutree64 · 2026-05-27T19:25:35Z

Yeah, if it won't even optimize a single operation in the first macro argument then we should probably just do it your way. It is proper, but nonetheless costly in terms of comprehensibility and maintenance. I've been writing C++ for 27 years and even I don't know what typename std::enable_if<std::is_same<T, float>::value, long>::type does, so I hate to subject newbie programmers to it on a function that's used frequently throughout the code and thus very frustrating if you don't understand how it works. And if there's ever an error involving those functions it will be far more difficult to track down through the sprawl of template error messages. And since there are so few occurrences in the code where it really matters, I'd rather kick the can down the road rather than paying the price upfront and continuously from now on.

jaguilar · 2026-05-27T19:33:26Z

It is definitely unfortunate that we have to do it this way. More recent versions of C++ have a simpler way of saying this but in Arduino we can only rely on C++11. In modern C++ I’d just replace this with std::clamp. What that template does is evaluate the first argument (is_same::value — are these two things the same type) and then if so it becomes the type of the second argument. (If the first argument returns false the value inference fails and the template is dropped from overload consideration. ) That being said, I understand not everyone needs this to perform the way I need it to. If you guys don’t want to take up this change it’s fine with me. I’ll keep it in my fork. :)

…

On Wed, May 27, 2026 at 1:25 PM dekutree64 ***@***.***> wrote: *dekutree64* left a comment (simplefoc/Arduino-FOC#529) <#529 (comment)> Yeah, if it won't even optimize a single operation in the first macro argument then we should probably just do it your way. It is proper, but nonetheless costly in terms of comprehensibility and maintenance. I've been writing C++ for 27 years and even I don't know what typename std::enable_if<std::is_same<T, float>::value, long>::type does, so I hate to subject newbie programmers to it on a function that's used frequently throughout the code and thus very frustrating if you don't understand how it works. And if there's ever an error involving those functions it will be far more difficult to track down through the sprawl of template error messages. And since there are so few occurrences in the code where it really matters, I'd rather kick the can down the road rather than paying the price upfront and continuously from now on. — Reply to this email directly, view it on GitHub <#529?email_source=notifications&email_token=AAGDGTEQYDQO5GCVK365J2T4446MLA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINJVG44TINZZGM4KM4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2KYZTPN52GK4S7MNWGSY3L#issuecomment-4557947938>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGDGTFT6CKZ5USSHBMHDL34446MLAVCNFSM6AAAAACYGKAYAOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DKNJXHE2DOOJTHA> . Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS <https://github.com/notifications/mobile/ios/AAGDGTHMNHURBMSWNUXZWJ34446MLA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINJVG44TINZZGM4KM4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2KUZTPN52GK4S7NFXXG> and Android <https://github.com/notifications/mobile/android/AAGDGTFDITAMHMTRZ4UOPCL4446MLA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINJVG44TINZZGM4KM4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2K4ZTPN52GK4S7MFXGI4TPNFSA>. Download it today! You are receiving this because you authored the thread.Message ID: ***@***.***>

jaguilar force-pushed the opt branch 2 times, most recently from 0dfd803 to 920d0ae Compare April 30, 2026 14:17

Avoid reevaluating arguments.

309c294

_round, _constrain and _sign currently reevaluate their arguments, which wastes CPU cycles. In torque/foc_current mode, this fix saves about 300ns/foc loop (about 2% of total CPU usage). In other modes, especially estimated_current, it saves more.

jaguilar force-pushed the opt branch from 920d0ae to 309c294 Compare April 30, 2026 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid reevaluating arguments in _round, _constrain, _sign. - #529

Avoid reevaluating arguments in _round, _constrain, _sign.#529
jaguilar wants to merge 1 commit into
simplefoc:devfrom
jaguilar:opt

jaguilar commented Apr 25, 2026

Uh oh!

runger1101001 commented Apr 29, 2026

Uh oh!

jaguilar commented Apr 29, 2026

Uh oh!

jaguilar commented Apr 29, 2026 •

edited

Loading

Uh oh!

runger1101001 commented Apr 29, 2026 •

edited

Loading

Uh oh!

jaguilar commented Apr 29, 2026 •

edited

Loading

Uh oh!

jaguilar commented Apr 30, 2026

Uh oh!

jaguilar commented Apr 30, 2026

Uh oh!

dekutree64 commented May 27, 2026

Uh oh!

jaguilar commented May 27, 2026 via email

Uh oh!

dekutree64 commented May 27, 2026

Uh oh!

jaguilar commented May 27, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jaguilar commented Apr 25, 2026

Description

Type of change

How Has This Been Tested?

Uh oh!

runger1101001 commented Apr 29, 2026

Uh oh!

jaguilar commented Apr 29, 2026

Uh oh!

jaguilar commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

runger1101001 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaguilar commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaguilar commented Apr 30, 2026

Uh oh!

jaguilar commented Apr 30, 2026

Uh oh!

dekutree64 commented May 27, 2026

Uh oh!

jaguilar commented May 27, 2026 via email

Uh oh!

dekutree64 commented May 27, 2026

Uh oh!

jaguilar commented May 27, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jaguilar commented Apr 29, 2026 •

edited

Loading

runger1101001 commented Apr 29, 2026 •

edited

Loading

jaguilar commented Apr 29, 2026 •

edited

Loading