diff options
author | Kartik Agaram <vc@akkartik.com> | 2020-10-03 23:17:17 -0700 |
---|---|---|
committer | Kartik Agaram <vc@akkartik.com> | 2020-10-03 23:17:17 -0700 |
commit | faeeef34f549ad00ede2136ffabdee9368f02dbe (patch) | |
tree | b349bbf5a832ead2b6252a29ba771115fe1f897f | |
parent | 2f5a8496e64c063796ea5daf6381701d7cd873f1 (diff) | |
download | mu-faeeef34f549ad00ede2136ffabdee9368f02dbe.tar.gz |
6941 - detective story for the day
-rw-r--r-- | mu.md | 5 | ||||
-rw-r--r-- | x86_approx.md | 114 |
2 files changed, 119 insertions, 0 deletions
diff --git a/mu.md b/mu.md index 6892df9d..c3b456bf 100644 --- a/mu.md +++ b/mu.md @@ -263,6 +263,11 @@ var/xreg <- max *var2/reg2 Remember, when these instructions use indirect mode, they still use an integer register. Floating-point registers can't hold addresses. +Two instructions in the above list are approximate. According to the Intel +Manual, `reciprocal` and `inverse-square-root` [go off the rails around the +fourth decimal place](x86_approx.md). If you need more precision, use `divide` +separately. + Most instructions operate exclusively on integer or floating-point operands. The only exceptions are the instructions for converting between integers and floating-point numbers. diff --git a/x86_approx.md b/x86_approx.md new file mode 100644 index 00000000..d26f1bdd --- /dev/null +++ b/x86_approx.md @@ -0,0 +1,114 @@ +# How approximate is Intel's floating-point reciprocal instruction? + +2020/10/03 + +Here's a test Mu program that prints out the bits for 0.5: + + ``` + fn main -> r/ebx: int { + var two/eax: int <- copy 2 + var half/xmm0: float <- convert two + half <- reciprocal half + var mem: float + copy-to mem, half + var out/eax: int <- reinterpret mem + print-int32-hex 0, out + print-string 0, "\n" + r <- copy 0 + } + ``` + +It gives different results when emulated and run natively: + + ``` + $ ./translate_mu_debug x.mu # debug mode = error checking + $ ./bootstrap run a.elf + 0x3f000000 + $ ./a.elf + 0x3efff000 + ``` + +I spent some time digging into this before I realized it wasn't a bug in Mu, +just an artifact of the emulator not actually using the `reciprocal` instruction. +Here's a procedure you can follow along with to convince yourself. + +Start with this program (good.c): + + ```c + #include<stdio.h> + int main(void) { + int n = 2; + float f = 1.0/n; + printf("%f\n", f); + return 0; + } + ``` + +It works as you'd expect (compiling unoptimized to actually compute the +division): + + ``` + $ gcc good.c + $ ./a.out + 0.5 + ``` + +Let's look at its Assembly: + + ``` + $ gcc -S good.c + ``` + +The generated `good.s` has a lot of stuff that doesn't interest us, surrounding +these lines: + + ```asm + ; destination + movl $2, -8(%rbp) + cvtsi2sd -8(%rbp), %xmm0 + movsd .LC0(%rip), %xmm1 + divsd %xmm0, %xmm1 + movapd %xmm1, %xmm0 + ``` + +This fragment converts `2` into floating-point and then divides 1.0 (the +constant `.LC0`) by it, leaving the result in register `xmm0`. + +There's a way to get gcc to emit the `rcpss` instruction using intrinsics, but +I don't know how to do it, so I'll modify the generated Assembly directly: + + ```diff + movl $2, -8(%rbp) + < cvtsi2sd -8(%rbp), %xmm0 + < movsd .LC0(%rip), %xmm1 + < divsd %xmm0, %xmm1 + < movapd %xmm1, %xmm0 + --- + > cvtsi2ss -8(%rbp), %xmm0 + > rcpss %xmm0, %xmm0 + > movss %xmm0, -4(%rbp) + ``` + +Let's compare the result of both versions: + + ``` + $ gcc good.s + $ ./a.out + 0.5 + $ gcc good.modified.s + $ ./a.out + 0.499878 + ``` + +Whoa! + +Reading the Intel manual more closely, it guarantees that the relative error +of `rcpss` is less than `1.5*2^-12`, and indeed 12 bits puts us squarely in +the fourth decimal place. + +Among the x86 instructions Mu supports, two are described in the Intel manual +as "approximate": `reciprocal` (`rcpss`) and `inverse-square-root` (`rsqrtss`). +Intel introduced these instructions as part of its SSE expansion in 1999. When +it upgraded SSE to SSE2 (in 2000), most of its single-precision floating-point +instructions got upgraded to double-precision — but not these two. So +they seem to be an evolutionary dead-end. |