What is the simplest way to work with big-endian values in RISC-V at the assembly language level? That is, how to load a big-endian value from memory into a register, work with the register value in native-endian (little-endian), then store it back into memory in big-endian. 16, 32 and 64 bit values are used in many network protocols and file formats.
I couldn't find a byte-swap instruction (equivalent to BSWAP on x86 or REV on ARM) in the manual, nor anything about big-endian loads and stores.
The RISC-V ISA has no explicit byte swapping instructions. Your best bet is to use a C builtin to perform this calculation, which in GCC land would be something like
__builtin_bswap32()
. This gives the compiler the most information possible so it can make good decisions. With the current set of defined ISAs you'll almost certainly end up calling into a routine, but if a B extension is ever defined you will transparently get better generated code. The full set of defined builtins is availiable online: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html .If you're stuck doing this in assembly, then your best bet is to call into an existing byte swap routine. The canonical one for a 32-bit swap is
__bswapsi2
, which is part of libgcc -- you're probably using that anyway, so it'll be around. That's what the compiler currently does so all you're losing is eliding the function call when there's a better implementation available.As a concrete example, here's my example C function
and the generated assembly
Unlike x86, RISC-V doesn't have something like
movbe
(which can load and byte-swap in one instruction).Thus, on RISC-V you load/store as usual and after/before the load/store you have to swap the bytes with extra instructions.
The RISC-V "B" (Bitmanip) extension (version 0.92) contains generalized bit reverse instructions (
grev
,grevi
) and several pseudo-instructions that you could use for byte swapping:(Table based on Table 2.5, RISC-V Bitmanip Extension V0.92, page 18)
As of 2020-03, the "B" extension has draft status, thus support in hardware and emulators is limited.
Without the "B" extension you have to implement the byte swapping with several base instructions. See for example page 16 in the "B" specification or look at the disassembled code of the
__builtin_bswap16
,__builtin_bswap32
and__builtin_bswap64
gcc/clang intrinsics.There is no mention of a byte-swap instruction in the latest RISC-V User-Level ISA Manual (version 2.1). However, the manual has a placeholder for “B” Standard Extension for Bit Manipulation. Some draft materials from that extension's working group are collected on GitHub. In particular, the draft specification talks about a
grev
instruction (generalized reverse) that can do 16, 32 and 64-bit byte-swaps:The extension B working group was "dissolved for bureaucratic reasons in November 2017" before they could finalize the spec.In 2020 the working group is active again, posting their work at the linked GitHub repo.
As a result, there currently doesn't seem to be anything simpler than doing the usual shift-mask-or dance. I couldn't find any assembly language bswap intrinsic in the GCC or clang riscv ports. As an example, here's a disassembly of the
bswapsi2
function (which byte-swaps a 32-bit value) emitted by theriscv64-linux-gnu-gcc
compiler version 8.1.0-12:Note that while it's nice a pretty and convenient to have an instruction to do it, the __bswapsi2 function used in other answers will run at around 400 MB/s on a 1.5 GHz HiFive Unleashed, which is quite a lot faster than the gigE interface is ever going to moved data around.
Even on the HiFive1 running at the default 256 MHz it will do 60 MB/s and you've only got 16 KB of RAM and a bunch of GPIOs that you're not going to wiggle at more than a few MHz or maybe 10s of MHz.
I'm on the BitManipulation working group. The full GREV instruction needs a fair bit of hardware (something close to a multiplier) so small microcontrollers might never include it. However we're planing to use the same GREVI opcodes that give full word bit reversal and byte order reversal and implement them as simpler special-case instructions that don't need much circuitry and hopefully everyone will include them.