Optimize math.lcm(*xs)

Benchmark for computing the LCM of 1000 random numbers from 1 to 1,000,000:
```
6.19 ms ± 0.09 ms  current_Python
6.14 ms ± 0.12 ms  current_C
3.46 ms ± 0.16 ms  proposal_Python
```
Code:
```python
def current_C(xs):
    return lcm(*xs)

def current_Python(xs):
    # re-implementation of the current C implementation
    res = 1
    for x in xs:
       res = lcm(res, x)
    return res

def proposal_Python(xs):
    res = 1
    for x in xs:
        res = lcm(x, res)
    return res
```
<details><summary>full benchmark code</summary>

[Attempt This Online!](https://ato.pxeger.com/run?1=lVO9TsMwEB7YMvEIt9WpTJUAQqhSJ16Aga2KIrc504jaiexLlarqk7B0gZ1X4DF4Gmwn_WUiS-y7z9933539_lmvaVHp3e6jIXnz-HN1LU2lQAlaQKnqyhAs5yoKQSN04X592O9KTV2KSoUlHVJYo-gzlgSVlsq53WcVCs1dvMBVFBUoYd4Yg5ryJ9baeByB-wxSY7SXZkMXPMc9h5JPwRYmkIa1rAy0UGpobZc85D2XW3Fo41MJF-rYa1PVlRXLf9Of8Lfcb_7yy0bPPebglF-Y4ZfyUeRb6s9s5BimWVCWXjlwbUPJvreWyb5O8ugpwRBSvAt48vjAM5XZNE3GWXZamhxs_CwYOaf3o1u5BWXh-ws2YTbn4YEz4Rhzz-gm_4rsNul126DbXweWckiT4fAhhgt8miRJnB26ePRybCQ5IlVq1t0fthRqVogxSD8LDrpRMzSTNI6PB3pvI1HXqAtGcVdmILfusmHBggaHN1xPQr_8iFZoLE5eTIO9h9r42vf95CBHea6FwjyPu4fRv4_9O_kF)

```python
from math import lcm
from random import randint
from timeit import repeat
from statistics import mean, stdev

def current_C(xs):
    return lcm(*xs)

def current_Python(xs):
    res = 1
    for x in xs:
       res = lcm(res, x)
    return res

def proposal_Python(xs):
    res = 1
    for x in xs:
        res = lcm(x, res)
    return res

funcs = current_C, current_Python, proposal_Python

times = {f: [] for f in funcs}
def stats(f):
    ts = [t * 1e3 for t in times[f][10:]]
    return f'{mean(ts):4.2f} ms ± {stdev(ts):4.2f} ms '

for _ in range(20):
    xs = [randint(1, 10**6) for _ in range(1000)]
    for f in funcs:
        t = min(repeat(lambda: f(xs), number=1))
        times[f].append(t)

for f in sorted(funcs, key=stats, reverse=True):
    print(stats(f), f.__name__)
```

</details>

The difference is using `lcm(x, res)` instead of `lcm(res, x)`. When computing the LCM of multiple/many values, `res` tends to be(come) larger than `x`. And then `lcm(x, res)` is faster than `lcm(res, x)`.

Why? Because `lcm(a, b)` is computed as `(a // g) * b`, where `g = gcd(a, b)`. Let A, B and G be the respective bit lengths, and let's pretend there's no Karatsuba. Then `a // g` takes `AG` time and results in a number with `A-G` bits. Multiplying that with `b` takes `(A-G)B` time. So overall `AG + (A-G)B` = `AG + AB - BG` = `AB + (A-B)G` time. So it's better to have `B > A`.

The simple optimization would be to swap the arguments here:

https://github.com/python/cpython/blob/81bf10e4f20a0f6d36b67085eefafdf7ebb97c33/Modules/mathmodule.c#L786

In addition to this "likely we have `res > x`" rule of thumb, the [two-argument `lcm`](https://github.com/python/cpython/blob/81bf10e4f20a0f6d36b67085eefafdf7ebb97c33/Modules/mathmodule.c#L728) could check which one is longer and swap them if  `b` is shorter. And then call `gcd(b, a)` instead of `gcd(a, b)` so that [`gcd`](https://github.com/python/cpython/blob/81bf10e4f20a0f6d36b67085eefafdf7ebb97c33/Objects/longobject.c#L5350) doesn't have to [swap them right back](https://github.com/python/cpython/blob/81bf10e4f20a0f6d36b67085eefafdf7ebb97c33/Objects/longobject.c#L5369).


### Linked PRs
* gh-111450
* gh-135554
* gh-135609

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize math.lcm(*xs) #102221

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Optimize math.lcm(*xs) #102221

Description

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions