I've seen in some posts/videos/files that they are zero-padded to look bigger than they are, or match "same file size" criteria some file system utilities have for moving files, mostly they are either prank programs, or malware.
But I often wondered, what would happen if the file corrupted, and would "load" the next set of "instructions" that are in the big zero-padded space at the end of the file?
Would anything happen? What's the instruction set for 0x0
?
The decoding of 0
bytes completely depends on the CPU architecture. On many architectures, instruction are fixed length (for example 32-bit), so the relevant thing would be 00 00 00 00
(using hexdump notation).
On most Linux distros, clang/llvm comes with support for multiple target architectures built-in (clang -target
and llvm-objdump
), unlike gcc / gas / binutils, so I was able to use that to check for some architectures I didn't have cross-gcc / binutils installed for. Use llvm-objdump --version
to see the supported list. (But I didn't figure out how to get it to disassemble a raw binary like binutils objdump -b binary
, and my clang won't create SPARC binaries on its own.)
On x86, 00 00
(2 bytes) decodes (http://ref.x86asm.net/coder32.html) as an 8-bit add
with a memory destination. The first byte is the opcode, the 2nd byte is the ModR/M that specifies the operands.
This usually segfaults right away (if eax/rax
isn't a valid pointer), or segfaults once execution falls off the end of the zero-padded part into an unmapped page. (This happens in real life because of bugs like falling off the end of _start
without making an exit system call), although in those cases the following bytes aren't always all zero. e.g. data, or ELF metadata.)
x86 64-bit mode: ndisasm -b64 /dev/zero | head
:
address machine code disassembly
00000000 0000 add [rax],al
x86 32-bit mode (-b32
):
00000000 0000 add [eax],al
x86 16-bit mode: (-b16
):
00000000 0000 add [bx+si],al
AArch32 ARM mode: cd /tmp
&& dd if=/dev/zero of=zero bs=16 count=1
&& arm-none-eabi-objdump -z -D -b binary -marm zero
. (Without -z
, objdump skips over large blocks of all-zero and shows ...
)
addr machine code disassembly
0: 00000000 andeq r0, r0, r0
ARM Thumb/Thumb2: arm-none-eabi-objdump -z -D -b binary -marm --disassembler-options=force-thumb zero
0: 0000 movs r0, r0
2: 0000 movs r0, r0
AArch64: aarch64-linux-gnu-objdump -z -D -b binary -maarch64 zero
0: 00000000 .inst 0x00000000 ; undefined
MIPS32: echo .long 0 > zero.S
&& clang -c -target mips zero.S
&& llvm-objdump -d zero.o
zero.o: file format ELF32-mips
Disassembly of section .text:
0: 00 00 00 00 nop
PowerPC 32 and 64-bit: -target powerpc
and -target powerpc64
. IDK if any extensions to PowerPC use the 00 00 00 00
instruction encoding for anything, or if it's still an illegal instruction on modern IBM POWER chips.
zero.o: file format ELF32-ppc (or ELF64-ppc64)
Disassembly of section .text:
0: 00 00 00 00 <unknown>
IBM S390: clang -c -target systemz zero.S
zero.o: file format ELF64-s390
Disassembly of section .text:
0: 00 00 <unknown>
2: 00 00 <unknown>