How to convert Linux 32-bit gcc inline assembly to

I'm attempting to convert RR0D Rasta Ring 0 Debugger from 32-bit mode to 64-bit mode (long mode) in Linux, using gcc. I'm familiar with x86 32-bit assembly (in MS-DOS environment) but I'm a beginner in x86 64-bit assembly and in Linux assembly programming in general.

This project is for production use (I need a working non-source debugger), but I also attempt to learn how to do the 32-bit to 64-bit conversion. If possible, I attempt to find a universal way to do 32-bit to 64-bit conversion that could be done on any 32-bit program using regular expressions (so that it can be automatized). I'm aware that no general solution exists (64-bit code may take more space than 32-bit code etc. and consume more stack etc.), but even in that case automatically converted code would serve as a starting point.

The idea would be to keep 8-bit and 16-bit operands as is, and replace 32-bit operands with 64-bit operands. This approach will naturally fail if pushw %ax; pushw %bx; popl %ecx is replaced with pushw %ax; pushw %bx; popq %rcx, but well-behaved programs usually don't push two 16-bit operands and then pop one 32-bit operand, or do they?.

These are the conversions so far:

Edit: Fix: pusha / pushad can be replaced with consecutive push'es, because pusha / pushad commands push the value of sp / esp before the actual push of sp, and push sp works the same way in 286+, but differently in 8088/8086 The Assembly Language database. This difference not an issue here (for 386+ code). pusha and pushad can thus be replaced with consecutive push commands.

An alternative is similar as in OpenSolaris' privregs.h code.

Edit: Fix: use 64-bit memory addressing for all commands.

pusha -> push %ax; push %cx; push %dx; push %bx; push %sp; push %bp; push %si; push %di.

Edit: Fix: A valid alternative (using lea), note that x86 processors are little-endian: pusha -> movq %rax, -8(%rsp); lea -8(%rsp), %rax; mov %ax, -10(%rsp); movq -8(%rsp), %rax; movw %cx, -4(%rsp); movw %dx, -6(%rsp); movw %bx, -8(%rsp); movw %bp, -12(%rsp); movw %si, -14(%rsp); movw %di, -16(%rsp); lea -16(%rsp), %rsp.
pushad -> push %rax; push %rcx; push %rdx; push %rbx; push %rsp; push %rbp; push %rsi; push %rdi.

Edit: Fix: A valid alternative (using lea): pushad -> movq %rax, -8(%rsp); movq %rcx, -16(%rsp); movq %rdx, -24(%rsp); movq %rbx, -32(%rsp); lea -32(%rsp), %rax; movq %rax, -40(%rsp); movq -8(%rsp), %rax; movq %rbp, -48(%rsp); movq %rsi, -56(%rsp); movq %rdi, -64(%rsp); lea -64(%rsp), %rsp.

Edit: Fix: popa and popad pop the value of sp / esp but discard it (Intel instruction set - popa/popad) . Let's pop it into bx / rbx.
popa -> popw %di; popw %si; popw %bp; popw %bx; popw %bx; popw %dx; popw %cx; popw %ax.
popad -> popq %rdi; popq %rsi; popq %rbp; popq %rbx; popq %rbx; popq %rdx; popq %rcx; popq %rax.
pushfd -> pushfq.
popfd -> popfq.
Edit: push of segment registers, eg. pushw %ds -> pushw %ax; pushw %ax; movw %ds, %ax; movw %ax, 2(%rsp); popw %ax.
Edit: pop of segment registers, eg. popw %ds -> pushw %ax; movw 2(%rsp), %ax; movw %ax, %ds; popw %ax.
Edit: inc %reg16 -> add $1, %reg16, eg. inc %ax -> add $1, %ax.
Edit: dec %reg16 -> sub $1, %reg16, eg. dec %ax -> sub $1, %ax.
Edit: inc %reg32 -> add $1, %reg64, eg. inc %eax -> add $1, %rax.
Edit: dec %reg32 -> sub $1, %reg64, eg. dec %eax -> sub $1, %rax.
Edit: aaa -> ?
Edit: aad -> ?
Edit: aam -> ?
Edit: aas -> ?
Edit: arpl -> ?
Edit: bound -> ?
Edit: daa -> ?
Edit: das -> ?
Edit: lahf -> ?
Edit: sahf -> ?
Edit Fix: any command with direct operand that does not fit in 32-bit operand size in 64-bit mode, eg. pushl $0xDEADBEEF -> pushq %rax; pushq %rax; movq $0xDEADBEEF, %rax; movq %rax, 8(%rsp); popq %rax.
ret with immediate operand: I think in this case the source code must be backtraced to see the sizes of last pushed operands, and act accordingly, eg. pushl %eax; ret 4 -> pushq %rax; ret 8.
Edit:: syscalls: int $0x80 -> pushq %rdi; movq %rbp, %r9; movq %rdi, %r8; movq %rbx, %rdi; xchgq %rcx, %rsi; -- replace %rax value using a substitution list --; syscall; popq %rdi; xchgq %rcx, %rsi (note: 32-bit syscalls may have more than 6 parameters, 6 in registers and the rest in stack, 64-bit syscalls may never have more than 6 parameters).

Edit: What else should be taken into account? What other conversions would be needed to convert 32-bit code to 64-bit code (to be run in long mode) ?