Avoid reevaluating arguments in _round, _constrain, _sign.#529
Avoid reevaluating arguments in _round, _constrain, _sign.#529jaguilar wants to merge 1 commit intosimplefoc:devfrom
Conversation
_round, _constrain and _sign currently reevaluate their arguments, which wastes CPU cycles. In torque/foc_current mode, this fix saves about 300ns/foc loop (about 2% of total CPU usage). In other modes, especially estimated_current, it saves more.
|
Nice. Would you have time to compare |
|
I could try it. The point of the template function is the make sure that each expression is only evaluated once. So for example if you have _constrain(complicated_expr, low, high), you don't want to do if |
|
Okay, replacing with builtins gives another 600ns speedup on a baseline of 9500ns for loopFOC (for position control mode in foc_current torque control). These implementations should be C++11 compatible. Let me know if it is acceptable and I will amend the commit. Note that I have no way of testing that this works across different compilers so if you have a test suite to hand please feel welcome to run it. template<typename T>
constexpr inline int _sign(T val) {
return __builtin_signbit(val);
}
#ifndef _round
template<typename T>
constexpr inline std::enable_if<std::is_same<T, double>::value, long>::type _round(T x) {
return __builtin_round(x);
}
template<typename T>
constexpr inline std::enable_if<std::is_same<T, float>::value, long>::type _round(T x) {
return __builtin_roundf(x);
}
#endif
// Use enable_if to select the fastest implementation according to the amt type.
// Using __builtin_fXf is measurably faster than using the ternary approach.
template<typename T, typename L, typename H>
constexpr inline std::enable_if<std::is_integral<T>::value, T>::type _constrain(T amt, L low, H high) {
return (amt < low) ? low : (amt > high) ? high : amt;
}
template<typename T, typename L, typename H>
constexpr inline std::enable_if<std::is_same<T, float>::value, T>::type _constrain(T amt, L low, H high) {
return __builtin_fmaxf(low, __builtin_fminf(high, amt));
}
template<typename T, typename L, typename H>
constexpr inline std::enable_if<std::is_same<T, double>::value, T>::type _constrain(T amt, L low, H high) {
return __builtin_fmax(low, __builtin_fmin(high, amt));
} |
|
Wow, thank you so much for trying it - like I thought, even better :-) Personally I’m very tempted to do it like that… what do you think? Although very cool I think I’d leave out the template magic that selects the fastest one. But also up for discussion. |
I don't think we can, because if anyone is compiling without -ffast-math (which is probably a lot of people because it's not on by default), the __builtin_fmax/_fmin/_round calls would add double promotion and software double rounding. (Will await your agreement or further feedback on this point before continuing.) |
Description
_round, _constrain and _sign currently reevaluate their arguments, which wastes CPU cycles. In torque/foc_current mode, this fix saves about 300ns/foc loop (about 2% of total FOC CPU usage). In other modes, especially estimated_current, it saves more.
Type of change
How Has This Been Tested?
I tested this on on my STM32G474 FOC bench setup.
Test Configuration/Setup: