PSXDEV : Forensic debugging for the win!

Spent some time today trying to find a bug that would only trigger when uploading some code to the PSX and not when loading from CD. Turns out there is a methodology called forensic debugging, and I've been doing it without knowing !

The bug

So I'm debugging this little psx game engine I've been at for a few months now, and right now, there's two ways you can load the level's data. You can use the classic way, and load it from the CD, but it means you need to generate an ISO in order to test your builds, and you need some kind of ODE if you want to be able to test on real hardware.

The other way is to load the level directly in the PSX memory at the correct address, then upload your executable.

With nops, that would be :

# enable debug mode
nops /debug
# load level data
nops /bin 0x800b2410 Overlay.lvl1 #<- this is the overlays loading address
# load 'engine' executable
nops /exe main.ps-exe

Unfortunately, sometimes it won't work, and you're met with a black screen after upload. Hopefully, you can still access the PSX registers though. And we can find some interesting informations there !

PSX registers

If you set unirom's debug kernel before uploading anything, you should be able to access them with nops /regs.
In my case, here is what I got when met with the BSOD :

     stat =0x00004000     badv =0x00000000
       r0 =0x00000000       at =0x80040000       v0 =0x00000000       v1 =0x275A0C80
       a0 =0x800425B4       a1 =0x800B37EC       a2 =0x800B3B94       a3 =0x800B3904
       t0 =0x000000DD       t1 =0x800B2574       t2 =0x80042008       t3 =0x00001000
       t4 =0x00000017       t5 =0x00000017       t6 =0xFFFFFFDF       t7 =0xFFFFFB1E
       s0 =0x800B2510       s1 =0x800425B4       s2 =0x00000000       s3 =0x8003BABC
       s4 =0x800ABDF4       s5 =0x00000000       s6 =0x80042008       s7 =0x00000008
       t8 =0xFFFFFB1E       t9 =0x00000000       k0 =0x00000000       k1 =0x00000F1C
       gp =0xA0010FF0       sp =0x801FFC04       fp =0x80042008       ra =0x8001507C
     rapc =0x800150A0
       hi =0x00000000       lo =0x00000000       sr =0x4000FF14     caus =0x1000001C

DBE - Bus Error on data load/store (0x7)

So first, this : DBE - Bus Error on data load/store (0x7) . I suspected from experience (did I mention I crash the PSX a lot ?) that it had something to do with a null pointer.

So I decided to check those registers. I mean there must be some useful information in there, however cryptic this looks to my newbie eyes...

Turns out it's all there : https://psx-spx.consoledev.net/cpuspecifications/#cpu-registers

And it also turns out that the two most interesting pieces of information for us right now are the ra and pc address. ra is the return address, the adress to which the cpu should go back when the current subroutine returns.

And pc is the program counter, and will let you know where exactly the cpu was when it crashed.

So with that information, I thought about checking the linker map file that's generated when compiling the program, and look for something near those adresses.
I couldn't find the exact addresses in this file, but I could see that it must have something to do with the drawQuad() function that's at 0x80014ec0 :

.text.drawQuad
                0x0000000080014ec0      0x358 src/graphics.o
                0x0000000080014ec0                drawQuad
 .text.drawTri  0x0000000080015218      0x314 src/graphics.o
                0x0000000080015218                drawTri

And once again, psxdev's @Nicolas Noble came to the rescue with two words : addr2line and objdump.

addr2line

addr2line "converts addresses into file names and line numbers.".

Straightforward enough (remember to use your toolchain's binaries though):

mipsel-linux-gnu-addr2line -e main.elf 0x8001507C

yielded a laconic

./src/graphics.c:352

So there you have it, just look at the file, at the line, and realize your immense doofusness.

// If tim mode  == 0 | 1, set CLUT coordinates
if ( (mesh->tim->mode & 0x3) < 2 ) {
    setClut(poly4,             
            mesh->tim->crect->x,
            mesh->tim->crect->y
    );
}

Trying to set a CLUT on non-existent tim data... easily fixed with a simple check :

if (mesh->tim){
    // If tim mode  == 0 | 1, set CLUT coordinates
    if ( (mesh->tim->mode & 0x3) < 2 ) {
        setClut(poly4,             
                mesh->tim->crect->x,
                mesh->tim->crect->y
        );
    }
}

objdump

objdump "displays information from object files."

Specifically with the -d flag, objdump can give us a clue as to what went wrong exactly :

mipsel-linux-gnu-objdump -d main.elf | grep 800150a0

yields :

80015078:    02202025     move    a0,s1
8001507c:    8e03000c     lw    v1,12(s0)
80015080:    00000000     nop
80015084:    8c620000     lw    v0,0(v1)
80015088:    00000000     nop
8001508c:    30420002     andi    v0,v0,0x2
80015090:    1440000a     bnez    v0,800150bc <drawQuad+0x1fc>
80015094:    00000000     nop
80015098:    8c630004     lw    v1,4(v1)
8001509c:    00000000     nop
800150a0:    84620000     lh    v0,0(v1)
800150a4:    84630002     lh    v1,2(v1)

the lw instruction here can be translated to :

Load into v0 the content of what's at the address stored in v1 + 0

Looking at those instructions, nothing stands out as illegal, but I might be overlooking something.

Anyway, in that case, addr2line was enough to find the culprit and fix the issue !

Sources

https://discord.com/channels/642647820683444236/663664210525290507/860539386689093662
https://linux.die.net/man/1/addr2line
https://linux.die.net/man/1/objdump
https://psx-spx.consoledev.net/cpuspecifications/#cpu-registers