Benchmark for computing the LCM of 1000 random numbers from 1 to 1,000,000:
6.19 ms ± 0.09 ms current_Python
6.14 ms ± 0.12 ms current_C
3.46 ms ± 0.16 ms proposal_Python
Code:
def current_C(xs):
return lcm(*xs)
def current_Python(xs):
# re-implementation of the current C implementation
res = 1
for x in xs:
res = lcm(res, x)
return res
def proposal_Python(xs):
res = 1
for x in xs:
res = lcm(x, res)
return res
full benchmark code
Attempt This Online!
from math import lcm
from random import randint
from timeit import repeat
from statistics import mean, stdev
def current_C(xs):
return lcm(*xs)
def current_Python(xs):
res = 1
for x in xs:
res = lcm(res, x)
return res
def proposal_Python(xs):
res = 1
for x in xs:
res = lcm(x, res)
return res
funcs = current_C, current_Python, proposal_Python
times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e3 for t in times[f][10:]]
return f'{mean(ts):4.2f} ms ± {stdev(ts):4.2f} ms '
for _ in range(20):
xs = [randint(1, 10**6) for _ in range(1000)]
for f in funcs:
t = min(repeat(lambda: f(xs), number=1))
times[f].append(t)
for f in sorted(funcs, key=stats, reverse=True):
print(stats(f), f.__name__)
The difference is using lcm(x, res) instead of lcm(res, x). When computing the LCM of multiple/many values, res tends to be(come) larger than x. And then lcm(x, res) is faster than lcm(res, x).
Why? Because lcm(a, b) is computed as (a // g) * b, where g = gcd(a, b). Let A, B and G be the respective bit lengths, and let's pretend there's no Karatsuba. Then a // g takes AG time and results in a number with A-G bits. Multiplying that with b takes (A-G)B time. So overall AG + (A-G)B = AG + AB - BG = AB + (A-B)G time. So it's better to have B > A.
The simple optimization would be to swap the arguments here:
|
Py_SETREF(res, long_lcm(res, x)); |
In addition to this "likely we have res > x" rule of thumb, the two-argument lcm could check which one is longer and swap them if b is shorter. And then call gcd(b, a) instead of gcd(a, b) so that gcd doesn't have to swap them right back.
Linked PRs
Benchmark for computing the LCM of 1000 random numbers from 1 to 1,000,000:
Code:
full benchmark code
Attempt This Online!
The difference is using
lcm(x, res)instead oflcm(res, x). When computing the LCM of multiple/many values,restends to be(come) larger thanx. And thenlcm(x, res)is faster thanlcm(res, x).Why? Because
lcm(a, b)is computed as(a // g) * b, whereg = gcd(a, b). Let A, B and G be the respective bit lengths, and let's pretend there's no Karatsuba. Thena // gtakesAGtime and results in a number withA-Gbits. Multiplying that withbtakes(A-G)Btime. So overallAG + (A-G)B=AG + AB - BG=AB + (A-B)Gtime. So it's better to haveB > A.The simple optimization would be to swap the arguments here:
cpython/Modules/mathmodule.c
Line 786 in 81bf10e
In addition to this "likely we have
res > x" rule of thumb, the two-argumentlcmcould check which one is longer and swap them ifbis shorter. And then callgcd(b, a)instead ofgcd(a, b)so thatgcddoesn't have to swap them right back.Linked PRs