go: cmd/asm: ARM64 instructions FLDPQ and FSTPQ not supported

What version of Go are you using (`go version`)?

$ go version
go version go1.15 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build668753232=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Assemble the following ARM64 assembly file:

TEXT foo(SB),0,$0-0
        FLDPQ (R0), (F0, F1)
        FSTPQ (F0, F1), (R0)

What did you expect to see?

An object file with instructions that disassemble to the GNU-style instructions

   0:	ad400400 	ldp	q0, q1, [x0]
   4:	ad000400 	stp	q0, q1, [x0]

What did you see instead?

$ go tool asm test.s
test.s:2: unrecognized instruction "FLDPQ"
test.s:3: unrecognized instruction "FSTPQ"
asm: assembly of test.s failed

It appears that only FLDPS and FLDPD are supported. Going through the instruction lists, it appears that no FP instructions with 128 bit instruction size seem to be supported at all. It would be great to have them added as I need them for a project.

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 1
Comments: 40 (16 by maintainers)

Commits related to this issue

Add ARM64 prototype for Count8 This prototype performs at 2.25 GB/s on a Raspberry Pi 3B and is written for the GNU assembler. I was unable to write it directly for the Go assembler as critical defe... — committed to clausecker/pospop by clausecker 4 years ago
cmd/internal/obj/arm64: enable some SIMD instructions Enable VBSL, VBIT, VCMTST, VUXTL VUXTL2 and FMOVQ SIMD instructions required by the issue #40725. And FMOVQ instrucion is used to move a large co... — committed to golang/go by zhangfannie 4 years ago
cmd/asm: add more SIMD instructions on arm64 This CL adds USHLL, USHLL2, UZP1, UZP2, and BIF instructions requested by #40725. And since UXTL* are aliases of USHLL*, this CL also merges them into one... — committed to golang/go by lijunchen 4 years ago
cmd/asm: fix the issue of moving 128-bit integers to vector registers on arm64 The CL 249758 added `FMOVQ $vcon, Vd` instruction and assembler used 128-bit simd literal-loading to load `$vcon` from p... — committed to golang/go by zhangfannie 4 years ago
cmd/asm: add several arm64 SIMD instructions This patch enables VSLI, VUADDW(2), VUSRA and FMOVQ SIMD instructions required by the issue #40725. And the GNU syntax of 'FMOVQ' is 128-bit ldr/str(immed... — committed to golang/go by zhangfannie 4 years ago
Make use of newly added instructions With 3089ef6, the Go project finally supports VUSRA, FMOVQ, VUADDW, and VUADDW2. Make use of them. Unfortunately, not all addressing modes seem to be supported ... — committed to clausecker/pospop by clausecker 4 years ago

Most upvoted comments

Thanks, will do so. I’ll proceed and prototype the algorithm in the GNU assembler so I can tell you which instructions exactly I’m going to need.

clausecker on Aug 14, 2020

Some more valid (I hope) instructions that don’t encode:

VLD1R.P 1(R1), [V9.B8] // displacement 8 is incorrectly accepted
FMOVD $0x8040201008040201, F10 // won't generate a literal pool load, won't encode
VCMTST V8.B8, V9.B8, V9.B8 // unknown instruction
VSUB V10.B8, V9.B8, V10.B8 // invalid rrr, VADD works

There’s probably a bunch more. Right now, almost every single SIMD instruction I want to use is broken in the assembler. This is not a sustainable state of affairs.

clausecker on Aug 12, 2020

Best would be to have support for all SIMD and FP instructions, but fixing those I mentioned might be a good start.

Please don’t take this as a critisism, but as an observer of a number of this class of request, the best results are obtained when the OP, you in this case, can enumerate exactly which instructions to add. I have no explanation why requests for all XXX instructions are unsuccessful, but encourage you to list precisely the instructions you would like to see added as there is anecdotal evidence that requests formed in this way are resolved faster.

davecheney on Aug 14, 2020

@clausecker The patch is ready and is under internal review. I will submit it as soon as possible. Thank you.

zhangfannie on Sep 29, 2020

@clausecker We will do it. Thank you.

zhangfannie on Sep 18, 2020

@zhangfannie I’m not sure what you mean. The idea was to have the literal-loading form of FMOVQ be

    FMOVQ $foo, $bar, F0

where foo and bar will be placed in a nearby literal pool such that they are adjacent. This then compiles to something like

    FMOVQ pool(PC), F0
    ...
pool:
    DWORD $foo
    DWORD $bar

Or perhaps I misunderstood something here?

clausecker on Sep 15, 2020

@cherrymui @clausecker Thank you for the solution. I will post a patch to write 128 bit with two immediates.

zhangfannie on Sep 15, 2020

@fuzxxl You are welcome. We will enable the assembler’s support for these instructions ASAP. 🙂

zhangfannie on Aug 26, 2020

The instruction I want is listed in the section “LDR (literal, SIMD&FP)” of the ARM reference:

LDR <St>, label
LDR <Dt>, label
LDR <Qt>, label

here, label has a PC-relative addressing mode referring to a literal in a nearby literal pool. These instructions should therefore be translated to Go syntax as

FMOVS $imm32, Fr
FMOVD $imm64, Fr
FMOVQ $imm128, Fr

The assembler needs to place the immediate in a nearby literal pool and substitute its address.

Of course, the assembler is free to use the “FMOV (scalar, immediate)” variant if the immediate fits. Similarly, it should translate these to movi or mvni if the immediate is of the appropriate form.

The desired usage is to load a bit mask into a SIMD register:

FMOVD $0x8040201008040201, F0

Perhaps the mnemonics MOVS, MOVD, and MOVQ might indeed be better here as the instructions do not directly deal with floating point numbers.

clausecker on Aug 20, 2020

@fuzxxl We will enable the above instructions ASAP. 🙂

zhangfannie on Aug 19, 2020

So, I’ve now sketched out a prototype using the UNIX assembler. The following new instructions are required:

VEOR (possibly already implemented)
VBIT
VBSL
VLD1 (possibly already implemented)
VLD1.P (might already be implemented)
FMOVD $imm64, Fr (translates to a load from a literal pool; having FMOVQ too would be nice)
VMOVI (seems like that one is already implemented)
VLD1R.P (implemented, but incorrectly; cf. example above)
VCMTST
VSUB (already implemented but defective)
VAND (possibly already implemented)
VUSHR (possibly already implemented)
VSHL (possibly already implemented)
UADDLV (possibly already implemented)
VMOV Vx.y[z], Vx.y[z] (possibly already implemented)
VUXTL
VUXTL2
VADD (possibly already implemented)
VST1 (possibly already implemented)
PRFM (possibly already implemented; nice to have)

Especially important is the literal-pool FMOVD $imm64 as I can’t replace that one with a WORD directive or similar due to the impossibility of manipulating literal pools directly.

clausecker on Aug 19, 2020

@zhangfannie Best would be to have support for all SIMD and FP instructions, but fixing those I mentioned might be a good start. I have not written the code yet, so I can’t tell for sure what exactly it is going to involve. If you like, I can try to implement the algorithm using the GNU assembler and then tell you what instructions I used. How does that sound?

clausecker on Aug 13, 2020

I found a bunch more issues with the assembler. Is it okay if I make a large issue just gathering all of them?