Hex <-> Dec & Video Memory to ASCII string interpreter

Existen varias necesidades que se presentan en el proceso de programación de un SO, la primera de ellas es la conversión entre base decimal y hexadecimal, la otra es la interpretación de carácteres hexadecimales a ASCII. En este artículo veremos un script en python que cubre con las dos necesidades anteriormente descritas.

El script en cuestión es el siguiente:

#!/usr/bin/env python
import os
import re
from textwrap import wrap

os.system('clear')
print('----------------------------------------------------------')
print('| Hex <-> Dec & Video Memory to ASCII string interpreter |')
print('----------------------------------------------------------')
print('')

print('-- Select one option:')
print('1- Convert number')
print('2- Interpret Hex string as ASCII')
action = int(input(''))

if action == 1:
    print('')
    print('-- Select input base:')
    print('1- Decimal')
    print('2- Hexadecimal')
    base = int(input(''))
    
    if base != 1 and base != 2:
        print('++ ERROR: Incorrect base selected')
        exit()

    while True:
        print('')
        print('-- Introduce number, Ctrl+c to exit:')
        number = input('')

        if base == 1:
            hexadecimal = hex(int(number))
            print('Hex: {0}'.format(hexadecimal))
        elif base == 2:
            decimal = int(number, 16)
            print('Dec: {0}'.format(decimal))
        else:
            print('++ ERROR: Incorrect base selected')
            exit()
            
elif action == 2:
    while True:
        print('')
        print('-- Introduce Memory chars in GDB format, Ctrl+c to exit:')
        print("Enter/Paste your content. Ctrl-D to process it.")
        content = []
        while True:
            try:
                line = input()
            except EOFError:
                break
            content.append(line)
            
        print('--------------')
        #print(content)
        finalText = []
        for hexdata in content:
            asciiString = []
            asciiStringText = []
            memoryAddress = hexdata.split(':')[0]
            #print('memoryAddress:')
            #print(memoryAddress)
            rawData = hexdata.split(':')[1]
            #print('rawData:')
            #print(rawData)
            patt = re.compile("[^\t]+")
            for data in patt.findall(rawData):
                asciiChar_Attr = data.split('0x')[1]
                isChar = False
                for byteToDecode in wrap(asciiChar_Attr, 2):
                    #print(charToDecode)
                    # We dont want to decode format chars
                    if isChar:
                        #print('Decoding: {0}'.format(byteToDecode))
                        try:
                            asciiChar = bytearray.fromhex(byteToDecode).decode()
                        except:
                            print('Error decoding char')
                            asciiChar = '_'
                            
                        asciiString.append(asciiChar)
                        asciiStringText.append(asciiChar)
                    else:
                        asciiString.append(byteToDecode)
                    
                    isChar = not isChar
            
            count = 0
            asciiStringTextSorted = []
            for char in asciiStringText:
                if (count % 2) == 0:
                    tempVar = char
                else:
                    asciiStringTextSorted.append(char)
                    asciiStringTextSorted.append(tempVar)
                    
                count = count + 1
                    
            asciiStringTextSortedString = ''.join(asciiStringTextSorted)
            finalText.append(asciiStringTextSortedString)
            #print('Processed Address: {0} -> {1}'.format(memoryAddress, asciiStringTextSortedString))
            
            #print('{0} {1}'.format(memoryAddress, asciiString))
            
            finalAsciiString = []
            count = 0
            finalAsciiString.append('\t')
            finalAsciiString.append('0x')
            for char in asciiString:
                finalAsciiString.append(char)
                if (count % 4) == 3 and count < 15:
                    finalAsciiString.append('\t')
                    finalAsciiString.append('0x')

                count = count + 1
                
            finalAsciiStringStringified = ''.join(finalAsciiString)
            print('{0}: {1}'.format(memoryAddress, finalAsciiStringStringified))
        
        print('--------------')
        print('')
        print('finalText: {0}'.format(finalText))

else:
    print('++ ERROR: Incorrect option introduced')

Su fincionamiento es muy sencillo, por una parte tiene la conversión entre bases que no precisa de explicación y por otra parte la interpretación de los carácteres de la memoria de video.

Para interpretar los carácteres de la meoria de video debemos dumpearlos primero en GDB, lo que nos mostraría una salida de este estilo.

gef➤  x/20x 0xb8000  
0xb8000: 0x0f650f57 0x0f630f6c 0x0f6d0f6f 0x0f200f65  
0xb8010: 0x0f6f0f74 0x0b530f20 0x0b650b74 0x0b6c0b6c  
0xb8020: 0x0b740b61 0x0b720b6f 0x0e530e4f 0x0f620f20  
0xb8030: 0x0f200f79 0x0f720f4b 0x0f6d0f30 0x0f760f20  
0xb8040: 0x0f2e0f30 0x0f620f32 0x0f000f00 0x0f000f00

Si pegamos el contenido de memoria anterior en el script obtenemos:

----------------------------------------------------------  
| Hex <-> Dec & Video Memory to ASCII string interpreter |  
----------------------------------------------------------  
  
-- Select one option:  
1- Convert number  
2- Interpret Hex string as ASCII  
2  
  
-- Introduce Memory chars in GDB format, Ctrl+c to exit:  
Enter/Paste your content. Ctrl-D to process it.  
0xb8000: 0x0f650f57 0x0f630f6c 0x0f6d0f6f 0x0f200f65  
0xb8010: 0x0f6f0f74 0x0b530f20 0x0b650b74 0x0b6c0b6c  
0xb8020: 0x0b740b61 0x0b720b6f 0x0e530e4f 0x0f620f20  
0xb8030: 0x0f200f79 0x0f720f4b 0x0f6d0f30 0x0f760f20  
0xb8040: 0x0f2e0f30 0x0f620f32 0x0f000f00 0x0f000f00  
--------------  
0xb8000:  0x0fe0fW 0x0fc0fl 0x0fm0fo 0x0f 0fe  
0xb8010:  0x0fo0ft 0x0bS0f  0x0be0bt 0x0bl0bl  
0xb8020:  0x0bt0ba 0x0br0bo 0x0eS0eO 0x0fb0f   
0xb8030:  0x0f 0fy 0x0fr0fK 0x0fm0f0 0x0fv0f   
0xb8040:  0x0f.0f0 0x0fb0f2 0x0f0f 0x0f0f  
--------------  
  
finalText: Welcome ', 'to Stell', 'atorOS b', 'y Kr0m v', '0.2b\x00\x00\x00\x00

Como podemos ver ha interpretado los carácteres y ha dejado los bytes de formato. Finalmente nos muestra solo los carácteres en ASCII separados por los rangos de memoria introducidos, por ejemplo la posición 0xb8000 corresponde al texto “Welcome “, la dirección 0xb8010 a “to Stell” y así sucesivamente.

NOTA: Este código seguramente se pueda integrar en GDB a modo de plugin.

Hex <-> Dec & Video Memory to ASCII string interpreter

Ver también