可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

What is the simplest way to work with big-endian values in RISC-V at the assembly language level? That is, how to load a big-endian value from memory into a register, work with the register value in native-endian (little-endian), then store it back into memory in big-endian. 16, 32 and 64 bit values are used in many network protocols and file formats.

I couldn't find a byte-swap instruction (equivalent to BSWAP on x86 or REV on ARM) in the manual, nor anything about big-endian loads and stores.

回答1:

There is no mention of a byte-swap instruction in the latest RISC-V User-Level ISA Manual (version 2.1). However, the manual has a placeholder for “B” Standard Extension for Bit Manipulation. Some draft materials from that extension's working group are collected on GitHub. In particular, the draft specification talks about a grev instruction (generalized reverse) that can do 16, 32 and 64-bit byte-swaps:

This instruction provides a single hardware instruction that can implement all of byte-order swap, bitwise reversal, short-order-swap, word-order-swap (RV64), nibble-order swap, bitwise reversal in a byte, etc, all from a single hardware instruction. It takes in a single register value and an immediate that controls which function occurs, through controlling the levels in the recursive tree at which reversals occur.

~~The extension B working group was "dissolved for bureaucratic reasons in November 2017" before they could finalize the spec.~~

In 2020 the working group is active again, posting their work at the linked GitHub repo.

As a result, there currently doesn't seem to be anything simpler than doing the usual shift-mask-or dance. I couldn't find any assembly language bswap intrinsic in the GCC or clang riscv ports. As an example, here's a disassembly of the bswapsi2 function (which byte-swaps a 32-bit value) emitted by the riscv64-linux-gnu-gcc compiler version 8.1.0-12:

000000000000068a <__bswapsi2>:
 68a:   0185169b                slliw   a3,a0,0x18
 68e:   0185579b                srliw   a5,a0,0x18
 692:   8fd5                    or      a5,a5,a3
 694:   66c1                    lui     a3,0x10
 696:   4085571b                sraiw   a4,a0,0x8
 69a:   f0068693                addi    a3,a3,-256 # ff00 <__global_pointer$+0xd6a8>
 69e:   8f75                    and     a4,a4,a3
 6a0:   8fd9                    or      a5,a5,a4
 6a2:   0085151b                slliw   a0,a0,0x8
 6a6:   00ff0737                lui     a4,0xff0
 6aa:   8d79                    and     a0,a0,a4
 6ac:   8d5d                    or      a0,a0,a5
 6ae:   2501                    sext.w  a0,a0
 6b0:   8082                    ret

回答2:

The RISC-V ISA has no explicit byte swapping instructions. Your best bet is to use a C builtin to perform this calculation, which in GCC land would be something like __builtin_bswap32(). This gives the compiler the most information possible so it can make good decisions. With the current set of defined ISAs you'll almost certainly end up calling into a routine, but if a B extension is ever defined you will transparently get better generated code. The full set of defined builtins is availiable online: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html .

If you're stuck doing this in assembly, then your best bet is to call into an existing byte swap routine. The canonical one for a 32-bit swap is __bswapsi2, which is part of libgcc -- you're probably using that anyway, so it'll be around. That's what the compiler currently does so all you're losing is eliding the function call when there's a better implementation available.

As a concrete example, here's my example C function

unsigned swapb(unsigned in) { return __builtin_bswap32(in); }

and the generated assembly

swapb:
    addi    sp,sp,-16
    sd  ra,8(sp)
    call    __bswapsi2
    ld  ra,8(sp)
    sext.w  a0,a0
    addi    sp,sp,16
    jr  ra

回答3:

Note that while it's nice a pretty and convenient to have an instruction to do it, the __bswapsi2 function used in other answers will run at around 400 MB/s on a 1.5 GHz HiFive Unleashed, which is quite a lot faster than the gigE interface is ever going to moved data around.

Even on the HiFive1 running at the default 256 MHz it will do 60 MB/s and you've only got 16 KB of RAM and a bunch of GPIOs that you're not going to wiggle at more than a few MHz or maybe 10s of MHz.

I'm on the BitManipulation working group. The full GREV instruction needs a fair bit of hardware (something close to a multiplier) so small microcontrollers might never include it. However we're planing to use the same GREVI opcodes that give full word bit reversal and byte order reversal and implement them as simpler special-case instructions that don't need much circuitry and hopefully everyone will include them.

回答4:

Unlike x86, RISC-V doesn't have something like movbe (which can load and byte-swap in one instruction).

Thus, on RISC-V you load/store as usual and after/before the load/store you have to swap the bytes with extra instructions.

The RISC-V "B" (Bitmanip) extension (version 0.92) contains generalized bit reverse instructions (grev, grevi) and several pseudo-instructions that you could use for byte swapping:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
RISC-V    ARM      X86      Comment
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
rev       RBIT     ☐        bit reverse
rev8.h    REV16    ☐        byte-reverse half-word (lower 16 bit)
rev8.w    REV32    ☐        byte-reverse word (lower 32 bit)
rev8      REV      BSWAP    byte-reverse whole register
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

(Table based on Table 2.5, RISC-V Bitmanip Extension V0.92, page 18)

As of 2020-03, the "B" extension has draft status, thus support in hardware and emulators is limited.

Without the "B" extension you have to implement the byte swapping with several base instructions. See for example page 16 in the "B" specification or look at the disassembled code of the __builtin_bswap16, __builtin_bswap32 and __builtin_bswap64 gcc/clang intrinsics.