Introduction to shellcode writing, exploiting part 4

In previous articles, we have used shellcodes as a set of bytes that we copied to certain memory positions where we later made the EIP point to, but it is time to understand what those bytes are and how we can build our own shellcodes. A shellcode usually serves to launch a shell (hence its name), although we can actually do anything with it.

A shellcode is nothing more than the set of opcodes (instructions in hexadecimal) that the processor will execute to perform a specific action. Shellcodes are usually written in assembly language since it allows us total control over the execution process as well as a smaller size of the shellcode.

First of all, it should be clarified that software accesses hardware through the kernel. Programs make system calls to execute certain actions, which are approved or denied by the kernel depending on the process’s privilege level and the action to be executed. To make a system call, certain values must be assigned to certain registers. Once the execution scenario is prepared, the interrupt 0x80 is called. We can see the different system calls with:

cat /usr/include/asm-generic/unistd.h |grep ‘NR’

The syscall number will be assigned to the EAX register and the parameters in EBX, ECX, EDX, ESI, and EDI.

Someone is surely thinking, why not program the function to be executed in C and obtain the opcodes from it? The answer is simple, compilers add “garbage” to the code and we are interested in making the shellcode as small as possible. As an example, we will program an exit(0) in C and ASM to compare the instructions in ASM in each of them.

vi salir.c

#include <stdlib.h>
void main() {
        exit(0);
}

We compile the binary statically (the code of the included libraries is copied into our program) so that we can disassemble the code of the functions defined in the libraries from gdb.

gcc –static salir.c -o salir
gdb salir
(gdb) set disassembly-flavor intel

ASM of the main, calls the exit function:

(gdb) disassemble main

Dump of assembler code for function main:
   0x08048254 <+0>:    push   ebp
   0x08048255 <+1>:    mov    ebp,esp
   0x08048257 <+3>:    and    esp,0xfffffff0
   0x0804825a <+6>:    sub    esp,0x10
   0x0804825d <+9>:    mov    DWORD PTR [esp],0x0
   0x08048264 <+16>:    call   0x8048b30 <exit>
End of assembler dump.

ASM of the exit function, as we can see, two system calls (int 0x80) are made when we only need the last one:

(gdb) disassemble _exit

Dump of assembler code for function _exit:
   0x0804f730 <+0>:    mov    ebx,DWORD PTR [esp+0x4]
   0x0804f734 <+4>:    mov    eax,0xfc
   0x0804f739 <+9>:    int    0x80
   0x0804f73b <+11>:    mov    eax,0x1
   0x0804f740 <+16>:    int    0x80
   0x0804f742 <+18>:    hlt    
End of assembler dump.

Our shellcode could work simply with:

mov ebx,DWORD PTR [esp+0x4] --> Set EBX to 0(function parameter)
mov eax,0x1 --> Execute INT 1
int 0x80 --> Execute syscall

To program in ASM we will need an assembler (converts ASM code to machine code):

apt-get install nasm
vi salir.asm

section .text
global _start
_start:
xor eax, eax ; EAX --> 0
xor ebx, ebx ; EBX(parametro funcion) --> 0
mov eax, 0x01 ; EAX --> 1
int 0x80 ; Ejecuta SYSCALL

We assemble the code:

nasm -f elf salir.asm
ld salir.o -o salir

We check that the system call is made using strace:

kr0m@reversedbox:~$ strace ./salir

execve("./salir", ["./salir"], [/* 16 vars */]) = 0
_exit(0) = ?

We obtain the opcodes:

kr0m@reversedbox:~$ objdump -M intel-mnemonic -d salir

salir: file format elf32-i386
Disassembly of section .text:

08048060 <_start>:
 8048060:    31 c0                    xor    eax,eax
 8048062:    31 db                    xor    ebx,ebx
 8048064:    b8 01 00 00 00           mov    eax,0x1
 8048069:    cd 80                    int    0x80

NOTE: It should be noted that a shellcode CANNOT have NULLs since this would indicate a variable end , thus causing the rest of the opcodes not to continue, leaving the shellcode partially executed.

The above shellcode would be: x31xc0x31xdbxb8x01x00x00x00xcdx80 as we can see there are null characters!!

Some tricks to avoid NULLs in shellcodes are:

Assign 0 to a register: xor REG,REG
Reset the entire register using XOR REG,REG and then use reduced versions of the register to assign the final value since 00000000 00000001 == 00000001:

xor eax, eax

We replace:

mov eax, 0x01 --> mov al, 0x01

Applying these “tricks” would result in:

section .text
global _start
_start:
xor eax, eax ; EAX --> 0
xor ebx, ebx ; EBX(function parameter) --> 0
mov al, 0x01 ; EAX --> 1
int 0x80 ; Ejecuta SYSCALL

We reassemble:

nasm -f elf salir.asm && ld salir.o -o salir
kr0m@reversedbox:~$ objdump -M intel-mnemonic -d salir

salir: file format elf32-i386
Disassembly of section .text:

08048060 <_start>:
 8048060:    31 c0                    xor    eax,eax
 8048062:    31 db                    xor    ebx,ebx
 8048064:    b0 01                    mov    al,0x1
 8048066:    cd 80                    int    0x80

Como podemos observar ya no hay NULLs, la shellcode se ha reducido en tamaño y el resultado de su ejecución es exactamente el mismo:

kr0m@reversedbox:~$ strace ./salir

execve("./salir", ["./salir"], [/* 16 vars */]) = 0
_exit(0) = ?

También es posible realizar la operación inversa, es decir sacar el código ASM a partir de la shellcode:

kr0m@reversedbox:~$ echo -ne “x31xc0x31xdbxb0x01xcdx80” | ndisasm -u -

00000000  31C0              xor eax,eax
00000002  31DB              xor ebx,ebx
00000004  B001              mov al,0x1
00000006  CD80              int 0x80

In this link, I leave a table that is very useful for x86 instructions.

This is just a small introduction to the functioning of shellcodes, their operation, and some aspects to consider when coding them. A shellcode that executes an exit(0) is not very useful, in future chapters, we will start with more elaborate shellcodes ;)

Introduction to shellcode writing, exploiting part 4

See Also