Skip to content
Commit 939f7535 authored by Sam Gross's avatar Sam Gross Committed by Naoki Shibata
Browse files

Fix FMA4 detection (#262)

FMA4 support is in bit 16 of register ECX, not EDX of the "extended
processor info" (0x80000001).

The mapping of registers to reg is:

  reg[0] = eax
  reg[1] = ebx
  reg[2] = ecx <---
  reg[3] = edx

Bit 16 of EDX is PAT (Page Attribute Table) on AMD CPUs, which is widely
supported. Intel CPUs do not set this bit. This causes "Illegal instruction"
errors on AMD CPUs that do not support FMA4.

See https://github.com/pytorch/pytorch/issues/12112
See https://github.com/shibatch/sleef/issues/261

http://developer.amd.com/wordpress/media/2012/10/254811.pdf (Page 20)
parent 1be36545
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment