In previous articles, we have used shellcodes as a set of bytes that we copied to certain memory positions where we later made the EIP point to, but it is time to understand what those bytes are and how we can build our own shellcodes. A shellcode usually serves to launch a shell (hence its name), although we can actually do anything with it.
A shellcode is nothing more than the set of opcodes (instructions in hexadecimal) that the processor will execute to perform a specific action. Shellcodes are usually written in assembly language since it allows us total control over the execution process as well as a smaller size of the shellcode.
First of all, it should be clarified that software accesses hardware through the kernel. Programs make system calls to execute certain actions, which are approved or denied by the kernel depending on the process’s privilege level and the action to be executed. To make a system call, certain values must be assigned to certain registers. Once the execution scenario is prepared, the interrupt 0x80 is called. We can see the different system calls with:
The syscall number will be assigned to the EAX register and the parameters in EBX, ECX, EDX, ESI, and EDI.
Someone is surely thinking, why not program the function to be executed in C and obtain the opcodes from it? The answer is simple, compilers add “garbage” to the code and we are interested in making the shellcode as small as possible. As an example, we will program an exit(0) in C and ASM to compare the instructions in ASM in each of them.
#include <stdlib.h>
void main() {
exit(0);
}
We compile the binary statically (the code of the included libraries is copied into our program) so that we can disassemble the code of the functions defined in the libraries from gdb.
gdb salir
(gdb) set disassembly-flavor intel
ASM of the main, calls the exit function:
Dump of assembler code for function main:
0x08048254 <+0>: push ebp
0x08048255 <+1>: mov ebp,esp
0x08048257 <+3>: and esp,0xfffffff0
0x0804825a <+6>: sub esp,0x10
0x0804825d <+9>: mov DWORD PTR [esp],0x0
0x08048264 <+16>: call 0x8048b30 <exit>
End of assembler dump.
ASM of the exit function, as we can see, two system calls (int 0x80) are made when we only need the last one:
Dump of assembler code for function _exit:
0x0804f730 <+0>: mov ebx,DWORD PTR [esp+0x4]
0x0804f734 <+4>: mov eax,0xfc
0x0804f739 <+9>: int 0x80
0x0804f73b <+11>: mov eax,0x1
0x0804f740 <+16>: int 0x80
0x0804f742 <+18>: hlt
End of assembler dump.
Our shellcode could work simply with:
mov ebx,DWORD PTR [esp+0x4] --> Set EBX to 0(function parameter)
mov eax,0x1 --> Execute INT 1
int 0x80 --> Execute syscall
To program in ASM we will need an assembler (converts ASM code to machine code):
vi salir.asm
section .text
global _start
_start:
xor eax, eax ; EAX --> 0
xor ebx, ebx ; EBX(parametro funcion) --> 0
mov eax, 0x01 ; EAX --> 1
int 0x80 ; Ejecuta SYSCALL
We assemble the code:
ld salir.o -o salir
We check that the system call is made using strace:
execve("./salir", ["./salir"], [/* 16 vars */]) = 0
_exit(0) = ?
We obtain the opcodes:
salir: file format elf32-i386
Disassembly of section .text:
08048060 <_start>:
8048060: 31 c0 xor eax,eax
8048062: 31 db xor ebx,ebx
8048064: b8 01 00 00 00 mov eax,0x1
8048069: cd 80 int 0x80
NOTE: It should be noted that a shellcode CANNOT have NULLs since this would indicate a variable end , thus causing the rest of the opcodes not to continue, leaving the shellcode partially executed.
The above shellcode would be: x31xc0x31xdbxb8x01x00x00x00xcdx80 as we can see there are null characters!!
Some tricks to avoid NULLs in shellcodes are:
- Assign 0 to a register: xor REG,REG
- Reset the entire register using XOR REG,REG and then use reduced versions of the register to assign the final value since 00000000 00000001 == 00000001:
xor eax, eax
We replace:
mov eax, 0x01 --> mov al, 0x01
Applying these “tricks” would result in:
section .text
global _start
_start:
xor eax, eax ; EAX --> 0
xor ebx, ebx ; EBX(function parameter) --> 0
mov al, 0x01 ; EAX --> 1
int 0x80 ; Ejecuta SYSCALL
We reassemble:
kr0m@reversedbox:~$ objdump -M intel-mnemonic -d salir
salir: file format elf32-i386
Disassembly of section .text:
08048060 <_start>:
8048060: 31 c0 xor eax,eax
8048062: 31 db xor ebx,ebx
8048064: b0 01 mov al,0x1
8048066: cd 80 int 0x80
Como podemos observar ya no hay NULLs, la shellcode se ha reducido en tamaño y el resultado de su ejecución es exactamente el mismo:
execve("./salir", ["./salir"], [/* 16 vars */]) = 0
_exit(0) = ?
También es posible realizar la operación inversa, es decir sacar el código ASM a partir de la shellcode:
00000000 31C0 xor eax,eax
00000002 31DB xor ebx,ebx
00000004 B001 mov al,0x1
00000006 CD80 int 0x80
In this link, I leave a table that is very useful for x86 instructions.
This is just a small introduction to the functioning of shellcodes, their operation, and some aspects to consider when coding them. A shellcode that executes an exit(0) is not very useful, in future chapters, we will start with more elaborate shellcodes ;)