When we compile code in C, the compiler decides where to insert each part of code and data into the binary. So far, this has not been a problem since the kernel only had one function, but as it grows, the first instructions may not correspond to the main() function.
Before we start, it is recommended that you read these previous articles:
- Boot Sector
- Interrupts
- Memory
- Stack
- IF-ELSE
- Functions
- Memory Segmentation
- Reading Data from Disk
- Entering Protected Mode 32bits
- Compilation, Linking, Stack Management, and Variables in C
- Pointers
- Kernel
Let’s see how the following code would look like:
void some_function () {
}
void main () {
char* video_memory = (char*) 0xb8000;
*video_memory = 'X';
some_function ();
}
We compile:
We show the ASM code:
kernel2.o: file format elf32-i386-freebsd
Disassembly of section .text:
00000000 <some_function>:
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: 5d pop ebp
4: c3 ret
5: 90 nop
6: 90 nop
7: 90 nop
8: 90 nop
9: 90 nop
a: 90 nop
b: 90 nop
c: 90 nop
d: 90 nop
e: 90 nop
f: 90 nop
00000010 <main>:
10: 55 push ebp
11: 89 e5 mov ebp,esp
13: 50 push eax
14: b8 00 80 0b 00 mov eax,0xb8000
19: 89 45 fc mov DWORD PTR [ebp-0x4],eax
1c: 8b 45 fc mov eax,DWORD PTR [ebp-0x4]
1f: c6 00 58 mov BYTE PTR [eax],0x58
22: e8 d9 ff ff ff call 0 <some_function>
27: 83 c4 04 add esp,0x4
2a: 5d pop ebp
2b: c3 ret
We generate the disk image with the new kernel:
cat boot_sect.bin kernel2.bin > os-image2
We show the opcodes of the disk image:
00000000: 8816 5c7d bd00 9089 ecbb 5d7d e80b 00e8 ..\}......]}....
00000010: 1a00 e820 01e8 ee00 ebfe 608a 073c 0074 ... ......`..<.t
00000020: 09b4 0ecd 1083 c301 ebf1 61c3 60b4 0eb0 ..........a.`...
00000030: 0acd 10b0 0dcd 1061 c360 b900 0083 f904 .......a.`......
00000040: 741c 89d0 83e0 0f04 303c 397e 0204 07bb t.......0<9~....
00000050: 6b7c 29cb 8807 c1ca 0483 c101 ebdf bb66 k|)............f
00000060: 7ce8 b6ff 61c3 3078 3030 3030 0060 52b4 |...a.0x0000.`R.
00000070: 0288 f0b6 00b5 00b1 02cd 1372 075a 38f0 ...........r.Z8.
00000080: 7512 61c3 bb9c 7ce8 90ff e89f ff88 e6e8 u.a...|.........
00000090: a7ff eb06 bbac 7ce8 80ff ebfe 4469 736b ......|.....Disk
000000a0: 2072 6561 6420 6572 726f 7200 496e 636f read error.Inco
000000b0: 7272 6563 7420 6e75 6d62 6572 206f 6620 rrect number of
000000c0: 7365 6374 6f72 7320 7265 6164 0000 0000 sectors read....
000000d0: 0000 0000 00ff ff00 0000 9acf 00ff ff00 ................
000000e0: 0000 92cf 0017 00cd 7c00 0060 ba00 800b ........|..`....
000000f0: 008a 03b4 403c 0074 0b66 8902 83c3 0183 ....@<.t.f......
00000100: c202 ebed 61c3 fa0f 0116 e57c 0f20 c066 ....a......|. .f
00000110: 83c8 010f 22c0 ea1b 7d08 0066 b810 008e ...."...}..f....
00000120: d88e d08e c08e e08e e8bd 0000 0900 89ec ................
00000130: e816 0000 00bb 997d e8df fee8 eefe bb00 .......}........
00000140: 10b6 018a 165c 7de8 23ff c3bb 797d 0000 .....\}.#...y}..
00000150: e896 ffff ffe8 a692 ffff ebfe 0053 7461 .............Sta
00000160: 7274 6564 2069 6e20 3136 2d62 6974 2052 rted in 16-bit R
00000170: 6561 6c20 4d6f 6465 004c 616e 6465 6420 eal Mode.Landed
00000180: 696e 2033 322d 6269 7420 5072 6f74 6563 in 32-bit Protec
00000190: 7465 6420 4d6f 6465 004c 6f61 6469 6e67 ted Mode.Loading
000001a0: 206b 6572 6e65 6c20 696e 746f 206d 656d kernel into mem
000001b0: 6f72 7900 0000 0000 0000 0000 0000 0000 ory.............
000001c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000001d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000001e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000001f0: 0000 0000 0000 0000 0000 0000 0000 55aa ..............U.
00000200: 5589 e55d c390 9090 9090 9090 9090 9090 U..]............
00000210: 5589 e550 b800 800b 0089 45fc 8b45 fcc6 U..P......E..E..
00000220: 0058 e8d9 ffff ff83 c404 5dc3 .X........].
We can see that the second sector (our kernel) starts with the opcodes of the function some_function() instead of main(). Therefore, when the memory position 0x1000 is reached, some_function() will be executed, the ret instruction will be executed, and it will return to the bootloader and finish.
To solve this problem, most operating systems use a small trick: they use an assembly routine that will look for the main label and call it. The main label is external to the routine, so it must be defined as external.
; Ensures that we jump straight into the kernel βs entry function.
[ bits 32] ; We βre in protected mode by now , so use 32 - bit instructions.
[ extern main ] ; Declare that we will be referencing the external symbol main
call main ; invoke main () in our C kernel
jmp $ ; Hang forever when we return from the kernel
We compile the routine. It is worth noting that we will generate an elf binary, which is necessary if we want to use the extern statement.
We link the loading routine and the kernel. The linker will replace the main label with the address within the binary where the code corresponding to the function is located.
This trick works because the linker respects the order in which the objects to be linked are indicated, in this case kernel_entry.o kernel2.o. We generate the disk image:
We load it into Qemu: