go: cmd/asm: ARM64 instructions FLDPQ and FSTPQ not supported

What version of Go are you using (go version)?

$ go version
go version go1.15 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build668753232=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Assemble the following ARM64 assembly file:

TEXT foo(SB),0,$0-0
        FLDPQ (R0), (F0, F1)
        FSTPQ (F0, F1), (R0)

What did you expect to see?

An object file with instructions that disassemble to the GNU-style instructions

   0:	ad400400 	ldp	q0, q1, [x0]
   4:	ad000400 	stp	q0, q1, [x0]

What did you see instead?

$ go tool asm test.s
test.s:2: unrecognized instruction "FLDPQ"
test.s:3: unrecognized instruction "FSTPQ"
asm: assembly of test.s failed

It appears that only FLDPS and FLDPD are supported. Going through the instruction lists, it appears that no FP instructions with 128 bit instruction size seem to be supported at all. It would be great to have them added as I need them for a project.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 40 (16 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks, will do so. I’ll proceed and prototype the algorithm in the GNU assembler so I can tell you which instructions exactly I’m going to need.

Some more valid (I hope) instructions that don’t encode:

VLD1R.P 1(R1), [V9.B8] // displacement 8 is incorrectly accepted
FMOVD $0x8040201008040201, F10 // won't generate a literal pool load, won't encode
VCMTST V8.B8, V9.B8, V9.B8 // unknown instruction
VSUB V10.B8, V9.B8, V10.B8 // invalid rrr, VADD works

There’s probably a bunch more. Right now, almost every single SIMD instruction I want to use is broken in the assembler. This is not a sustainable state of affairs.

Best would be to have support for all SIMD and FP instructions, but fixing those I mentioned might be a good start.

Please don’t take this as a critisism, but as an observer of a number of this class of request, the best results are obtained when the OP, you in this case, can enumerate exactly which instructions to add. I have no explanation why requests for all XXX instructions are unsuccessful, but encourage you to list precisely the instructions you would like to see added as there is anecdotal evidence that requests formed in this way are resolved faster.

@clausecker The patch is ready and is under internal review. I will submit it as soon as possible. Thank you.

@clausecker We will do it. Thank you.

@zhangfannie I’m not sure what you mean. The idea was to have the literal-loading form of FMOVQ be

    FMOVQ $foo, $bar, F0

where foo and bar will be placed in a nearby literal pool such that they are adjacent. This then compiles to something like

    FMOVQ pool(PC), F0
    ...
pool:
    DWORD $foo
    DWORD $bar

Or perhaps I misunderstood something here?

@cherrymui @clausecker Thank you for the solution. I will post a patch to write 128 bit with two immediates.

@fuzxxl You are welcome. We will enable the assembler’s support for these instructions ASAP. 🙂

The instruction I want is listed in the section “LDR (literal, SIMD&FP)” of the ARM reference:

LDR <St>, label
LDR <Dt>, label
LDR <Qt>, label

here, label has a PC-relative addressing mode referring to a literal in a nearby literal pool. These instructions should therefore be translated to Go syntax as

FMOVS $imm32, Fr
FMOVD $imm64, Fr
FMOVQ $imm128, Fr

The assembler needs to place the immediate in a nearby literal pool and substitute its address.

Of course, the assembler is free to use the “FMOV (scalar, immediate)” variant if the immediate fits. Similarly, it should translate these to movi or mvni if the immediate is of the appropriate form.

The desired usage is to load a bit mask into a SIMD register:

FMOVD $0x8040201008040201, F0

Perhaps the mnemonics MOVS, MOVD, and MOVQ might indeed be better here as the instructions do not directly deal with floating point numbers.

@fuzxxl We will enable the above instructions ASAP. 🙂

So, I’ve now sketched out a prototype using the UNIX assembler. The following new instructions are required:

  • VEOR (possibly already implemented)
  • VBIT
  • VBSL
  • VLD1 (possibly already implemented)
  • VLD1.P (might already be implemented)
  • FMOVD $imm64, Fr (translates to a load from a literal pool; having FMOVQ too would be nice)
  • VMOVI (seems like that one is already implemented)
  • VLD1R.P (implemented, but incorrectly; cf. example above)
  • VCMTST
  • VSUB (already implemented but defective)
  • VAND (possibly already implemented)
  • VUSHR (possibly already implemented)
  • VSHL (possibly already implemented)
  • UADDLV (possibly already implemented)
  • VMOV Vx.y[z], Vx.y[z] (possibly already implemented)
  • VUXTL
  • VUXTL2
  • VADD (possibly already implemented)
  • VST1 (possibly already implemented)
  • PRFM (possibly already implemented; nice to have)

Especially important is the literal-pool FMOVD $imm64 as I can’t replace that one with a WORD directive or similar due to the impossibility of manipulating literal pools directly.

@zhangfannie Best would be to have support for all SIMD and FP instructions, but fixing those I mentioned might be a good start. I have not written the code yet, so I can’t tell for sure what exactly it is going to involve. If you like, I can try to implement the algorithm using the GNU assembler and then tell you what instructions I used. How does that sound?

I found a bunch more issues with the assembler. Is it okay if I make a large issue just gathering all of them?