What is the simplest way to work with big-endian values in RISC-V at the assembly language level? That is, how to load a big-endian value from memory into a register, work with the register value in native-endian (little-endian), then store it back into memory in big-endian. 16, 32 and 64 bit values are used in many network protocols and file formats.
I couldn't find a byte-swap instruction (equivalent to BSWAP on x86 or REV on ARM) in the manual, nor anything about big-endian loads and stores.
There is no mention of a byte-swap instruction in the latest RISC-V User-Level ISA Manual (version 2.1). However, the manual has a placeholder for “B” Standard Extension for Bit Manipulation. Some draft materials from that extension's working group are collected on GitHub. In particular, the draft specification talks about a grev
instruction (generalized reverse) that can do 16, 32 and 64-bit byte-swaps:
This instruction provides a single hardware instruction that can implement all of byte-order swap, bitwise reversal, short-order-swap, word-order-swap (RV64), nibble-order swap, bitwise reversal in a byte, etc, all from a single hardware instruction. It takes in a single register value and an immediate that controls which function occurs, through controlling the levels in the recursive tree at which reversals occur.
The extension B working group was "dissolved for bureaucratic reasons in November 2017" before they could finalize the spec.
In 2020 the working group is active again, posting their work at the linked GitHub repo.
As a result, there currently doesn't seem to be anything simpler than doing the usual shift-mask-or dance. I couldn't find any assembly language bswap intrinsic in the GCC or clang riscv ports. As an example, here's a disassembly of the bswapsi2
function (which byte-swaps a 32-bit value) emitted by the riscv64-linux-gnu-gcc
compiler version 8.1.0-12:
000000000000068a <__bswapsi2>:
68a: 0185169b slliw a3,a0,0x18
68e: 0185579b srliw a5,a0,0x18
692: 8fd5 or a5,a5,a3
694: 66c1 lui a3,0x10
696: 4085571b sraiw a4,a0,0x8
69a: f0068693 addi a3,a3,-256 # ff00 <__global_pointer$+0xd6a8>
69e: 8f75 and a4,a4,a3
6a0: 8fd9 or a5,a5,a4
6a2: 0085151b slliw a0,a0,0x8
6a6: 00ff0737 lui a4,0xff0
6aa: 8d79 and a0,a0,a4
6ac: 8d5d or a0,a0,a5
6ae: 2501 sext.w a0,a0
6b0: 8082 ret
The RISC-V ISA has no explicit byte swapping instructions. Your best bet is to use a C builtin to perform this calculation, which in GCC land would be something like __builtin_bswap32()
. This gives the compiler the most information possible so it can make good decisions. With the current set of defined ISAs you'll almost certainly end up calling into a routine, but if a B extension is ever defined you will transparently get better generated code. The full set of defined builtins is availiable online: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html .
If you're stuck doing this in assembly, then your best bet is to call into an existing byte swap routine. The canonical one for a 32-bit swap is __bswapsi2
, which is part of libgcc -- you're probably using that anyway, so it'll be around. That's what the compiler currently does so all you're losing is eliding the function call when there's a better implementation available.
As a concrete example, here's my example C function
unsigned swapb(unsigned in) { return __builtin_bswap32(in); }
and the generated assembly
swapb:
addi sp,sp,-16
sd ra,8(sp)
call __bswapsi2
ld ra,8(sp)
sext.w a0,a0
addi sp,sp,16
jr ra
Note that while it's nice a pretty and convenient to have an instruction to do it, the __bswapsi2 function used in other answers will run at around 400 MB/s on a 1.5 GHz HiFive Unleashed, which is quite a lot faster than the gigE interface is ever going to moved data around.
Even on the HiFive1 running at the default 256 MHz it will do 60 MB/s and you've only got 16 KB of RAM and a bunch of GPIOs that you're not going to wiggle at more than a few MHz or maybe 10s of MHz.
I'm on the BitManipulation working group. The full GREV instruction needs a fair bit of hardware (something close to a multiplier) so small microcontrollers might never include it. However we're planing to use the same GREVI opcodes that give full word bit reversal and byte order reversal and implement them as simpler special-case instructions that don't need much circuitry and hopefully everyone will include them.
Unlike x86, RISC-V doesn't have something like movbe
(which can load and byte-swap in one instruction).
Thus, on RISC-V you load/store as usual and after/before the load/store you have to swap the bytes with extra instructions.
The RISC-V "B" (Bitmanip) extension (version 0.92) contains generalized bit reverse instructions (grev
, grevi
) and several pseudo-instructions that you could use for byte swapping:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
RISC-V ARM X86 Comment
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
rev RBIT ☐ bit reverse
rev8.h REV16 ☐ byte-reverse half-word (lower 16 bit)
rev8.w REV32 ☐ byte-reverse word (lower 32 bit)
rev8 REV BSWAP byte-reverse whole register
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
(Table based on Table 2.5, RISC-V Bitmanip Extension V0.92, page 18)
As of 2020-03, the "B" extension has draft status, thus support in hardware and emulators is limited.
Without the "B" extension you have to implement the byte swapping with several base instructions. See for example page 16 in the "B" specification or look at the disassembled code of the __builtin_bswap16
, __builtin_bswap32
and __builtin_bswap64
gcc/clang intrinsics.