Speed up complex.pow when the exponent is 2.0 or 0.5 (#23237) - Nim - This repository contains the Nim compiler, Nim's stdlib, tools, and documentation. (mirror)

diff options

author	Angel Ezquerra <AngelEzquerra@users.noreply.github.com>	2024-01-20 06:39:49 +0100
committer	GitHub <noreply@github.com>	2024-01-20 06:39:49 +0100
commit	83f2708909e0c59b84f61c2721907ff896285dce (patch)
tree	b361b53fbc60223d29c984002d48ce9cede45e28 /lib/pure/algorithm.nim
parent	720021908db1f6622c1ebcdad60dff5c2740a80b (diff)
download	Nim-83f2708909e0c59b84f61c2721907ff896285dce.tar.gz

Speed up complex.pow when the exponent is 2.0 or 0.5 (#23237)

This PR speeds up the calculation of the power of a complex number when
the exponent is 2.0 or 0.5 (i.e the square and the square root of a
complex number). These are probably two of (if not) the most common
exponents. The speed up that is achieved according to my measurements
(using the timeit library) when the exponent is set to 2.0 or 0.5 is >
x7, while there is no measurable difference when using other exponents.

For the record, this is the function I used to mesure the performance:

```nim
import std/complex
import timeit

proc calculcatePows(v: seq[Complex], factor: Complex): seq[Complex] {.noinit, discardable.} =
  result = newSeq[Complex](v.len)
  for n in 0 ..< v.len:
    result[n] = pow(v[n], factor)

let v: seq[Complex64] = collect:
  for n in 0 ..< 1000:
    complex(float(n))

echo timeGo(calculcatePows(v, complex(1.5)))
echo timeGo(calculcatePows(v, complex(0.5)))
echo timeGo(calculcatePows(v, complex(2.0)))
```

Which with the original code got:

> [177μs 857.03ns] ± [1μs 234.85ns] per loop (mean ± std. dev. of 7
runs, 1000 loops each)
> [128μs 217.92ns] ± [1μs 630.93ns] per loop (mean ± std. dev. of 7
runs, 1000 loops each)
> [136μs 220.16ns] ± [3μs 475.56ns] per loop (mean ± std. dev. of 7
runs, 1000 loops each)

While with the improved code got:

> [176μs 884.30ns] ± [1μs 307.30ns] per loop (mean ± std. dev. of 7
runs, 1000 loops each)
> [23μs 160.79ns] ± [340.18ns] per loop (mean ± std. dev. of 7 runs,
10000 loops each)
> [19μs 93.29ns] ± [1μs 128.92ns] per loop (mean ± std. dev. of 7 runs,
10000 loops each)

That is, the new optimized path is 5.6 (23 vs 128 us per loop) to 7.16
times faster (19 vs 136 us per loop), while the non-optimized path takes
the same time as the original code.

Diffstat (limited to 'lib/pure/algorithm.nim')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: