Right now this is the code the dispatcher runs on every call:
|
template <class... Tys> |
|
XSIMD_INLINE auto operator()(Tys&&... args) noexcept |
|
{ |
|
return walk_archs(ArchList {}, std::forward<Tys>(args)...); |
|
} |
There is some recursive template machinery, coupled with runtime checks on supported_arch::has().
There is potentially quite a few steps for the compiler to prove that it can replace that with a direct jump (to be further investigated).
In Apache Arrow, we use an internal dispatch mechanism where we evaluate available architecture once and store the function pointer (during a dispatcher static initialization).
Here however, we do not have a unique type for all candidate function because the first parameter, the arch, are distinct struct.
|
if (availables_archs.has(Arch {})) |
|
return functor(Arch {}, std::forward<Tys>(args)...); |
Example from the doc:
struct sum
{
template <class Arch, class T>
T operator()(Arch, T const* data, unsigned size);
};
Now the proposed breaking change would be to take a free function templated by arch, but not using it as a parameter.
I am not immediately sure how we'd handle T but I think we can manage.
template <class Arch>
T sum1(float const* data, unsigned size);
Or perhaps with a static method of a functor.
template <class Arch>
struct sum2
{
template <class T>
static T call(T const* data, unsigned size);
};
In both cases there is an underlying function pointer that can be directly stored (sum1<sse4_2>/sum1<avx2> or sum2<sse4_2>::call<float>/sum2<avx2>::call<float>).
@serge-sans-paille do you think it is worth investigating? I can try to find a bit more time to look at the current generated assembly.
CC @JohanMabille
Right now this is the code the dispatcher runs on every call:
xsimd/include/xsimd/config/xsimd_arch.hpp
Lines 223 to 227 in 6b61b1a
There is some recursive template machinery, coupled with runtime checks on
supported_arch::has().There is potentially quite a few steps for the compiler to prove that it can replace that with a direct jump (to be further investigated).
In Apache Arrow, we use an internal dispatch mechanism where we evaluate available architecture once and store the function pointer (during a dispatcher static initialization).
Here however, we do not have a unique type for all candidate function because the first parameter, the arch, are distinct struct.
xsimd/include/xsimd/config/xsimd_arch.hpp
Lines 210 to 211 in 6b61b1a
Example from the doc:
Now the proposed breaking change would be to take a free function templated by arch, but not using it as a parameter.
I am not immediately sure how we'd handle
Tbut I think we can manage.Or perhaps with a
staticmethod of a functor.In both cases there is an underlying function pointer that can be directly stored (
sum1<sse4_2>/sum1<avx2>orsum2<sse4_2>::call<float>/sum2<avx2>::call<float>).@serge-sans-paille do you think it is worth investigating? I can try to find a bit more time to look at the current generated assembly.
CC @JohanMabille