go: cmd/asm: ARM64 instructions FLDPQ and FSTPQ not supported
What version of Go are you using (go version
)?
$ go version go version go1.15 linux/amd64
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOSUMDB="sum.golang.org" GOTMPDIR="" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build668753232=/tmp/go-build -gno-record-gcc-switches"
What did you do?
Assemble the following ARM64 assembly file:
TEXT foo(SB),0,$0-0
FLDPQ (R0), (F0, F1)
FSTPQ (F0, F1), (R0)
What did you expect to see?
An object file with instructions that disassemble to the GNU-style instructions
0: ad400400 ldp q0, q1, [x0]
4: ad000400 stp q0, q1, [x0]
What did you see instead?
$ go tool asm test.s
test.s:2: unrecognized instruction "FLDPQ"
test.s:3: unrecognized instruction "FSTPQ"
asm: assembly of test.s failed
It appears that only FLDPS
and FLDPD
are supported. Going through the instruction lists, it appears that no FP instructions with 128 bit instruction size seem to be supported at all. It would be great to have them added as I need them for a project.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 40 (16 by maintainers)
Commits related to this issue
- Add ARM64 prototype for Count8 This prototype performs at 2.25 GB/s on a Raspberry Pi 3B and is written for the GNU assembler. I was unable to write it directly for the Go assembler as critical defe... — committed to clausecker/pospop by clausecker 4 years ago
- cmd/internal/obj/arm64: enable some SIMD instructions Enable VBSL, VBIT, VCMTST, VUXTL VUXTL2 and FMOVQ SIMD instructions required by the issue #40725. And FMOVQ instrucion is used to move a large co... — committed to golang/go by zhangfannie 4 years ago
- cmd/asm: add more SIMD instructions on arm64 This CL adds USHLL, USHLL2, UZP1, UZP2, and BIF instructions requested by #40725. And since UXTL* are aliases of USHLL*, this CL also merges them into one... — committed to golang/go by lijunchen 4 years ago
- cmd/asm: fix the issue of moving 128-bit integers to vector registers on arm64 The CL 249758 added `FMOVQ $vcon, Vd` instruction and assembler used 128-bit simd literal-loading to load `$vcon` from p... — committed to golang/go by zhangfannie 4 years ago
- cmd/asm: add several arm64 SIMD instructions This patch enables VSLI, VUADDW(2), VUSRA and FMOVQ SIMD instructions required by the issue #40725. And the GNU syntax of 'FMOVQ' is 128-bit ldr/str(immed... — committed to golang/go by zhangfannie 4 years ago
- Make use of newly added instructions With 3089ef6, the Go project finally supports VUSRA, FMOVQ, VUADDW, and VUADDW2. Make use of them. Unfortunately, not all addressing modes seem to be supported ... — committed to clausecker/pospop by clausecker 4 years ago
Thanks, will do so. I’ll proceed and prototype the algorithm in the GNU assembler so I can tell you which instructions exactly I’m going to need.
Some more valid (I hope) instructions that don’t encode:
There’s probably a bunch more. Right now, almost every single SIMD instruction I want to use is broken in the assembler. This is not a sustainable state of affairs.
Please don’t take this as a critisism, but as an observer of a number of this class of request, the best results are obtained when the OP, you in this case, can enumerate exactly which instructions to add. I have no explanation why requests for all XXX instructions are unsuccessful, but encourage you to list precisely the instructions you would like to see added as there is anecdotal evidence that requests formed in this way are resolved faster.
@clausecker The patch is ready and is under internal review. I will submit it as soon as possible. Thank you.
@clausecker We will do it. Thank you.
@zhangfannie I’m not sure what you mean. The idea was to have the literal-loading form of
FMOVQ
bewhere
foo
andbar
will be placed in a nearby literal pool such that they are adjacent. This then compiles to something likeOr perhaps I misunderstood something here?
@cherrymui @clausecker Thank you for the solution. I will post a patch to write 128 bit with two immediates.
@fuzxxl You are welcome. We will enable the assembler’s support for these instructions ASAP. 🙂
The instruction I want is listed in the section “LDR (literal, SIMD&FP)” of the ARM reference:
here,
label
has a PC-relative addressing mode referring to a literal in a nearby literal pool. These instructions should therefore be translated to Go syntax asThe assembler needs to place the immediate in a nearby literal pool and substitute its address.
Of course, the assembler is free to use the “FMOV (scalar, immediate)” variant if the immediate fits. Similarly, it should translate these to
movi
ormvni
if the immediate is of the appropriate form.The desired usage is to load a bit mask into a SIMD register:
Perhaps the mnemonics
MOVS
,MOVD
, andMOVQ
might indeed be better here as the instructions do not directly deal with floating point numbers.@fuzxxl We will enable the above instructions ASAP. 🙂
So, I’ve now sketched out a prototype using the UNIX assembler. The following new instructions are required:
VEOR
(possibly already implemented)VBIT
VBSL
VLD1
(possibly already implemented)VLD1.P
(might already be implemented)FMOVD $imm64, Fr
(translates to a load from a literal pool; havingFMOVQ
too would be nice)VMOVI
(seems like that one is already implemented)VLD1R.P
(implemented, but incorrectly; cf. example above)VCMTST
VSUB
(already implemented but defective)VAND
(possibly already implemented)VUSHR
(possibly already implemented)VSHL
(possibly already implemented)UADDLV
(possibly already implemented)VMOV Vx.y[z], Vx.y[z]
(possibly already implemented)VUXTL
VUXTL2
VADD
(possibly already implemented)VST1
(possibly already implemented)PRFM
(possibly already implemented; nice to have)Especially important is the literal-pool
FMOVD $imm64
as I can’t replace that one with aWORD
directive or similar due to the impossibility of manipulating literal pools directly.@zhangfannie Best would be to have support for all SIMD and FP instructions, but fixing those I mentioned might be a good start. I have not written the code yet, so I can’t tell for sure what exactly it is going to involve. If you like, I can try to implement the algorithm using the GNU assembler and then tell you what instructions I used. How does that sound?
I found a bunch more issues with the assembler. Is it okay if I make a large issue just gathering all of them?