PCEngine-FX.com

PCE-FX Homebrew Development => Localizations, Games, Apps, Docs => Topic started by: dshadoff on 12/05/2018, 12:31 AM

Title: Translation patch – POST 2 – Isolating Text & Print Function
Post by: dshadoff on 12/05/2018, 12:31 AM
I'm putting together a few articles on the subject of translation patch creation; I'm hoping that the forum-post format (thread per major subject) will allow a lively supplemental discussion to follow each post and explore a little deeper into areas in which people are interested. In the event that an initial post on a topic is too large, I'll try to break it into sections and post consecutively to the same thread.

Part of the process of selecting a game is determining how difficult the technical portion will be.

HuCard games generally use the 8x8 tiles available in hardware – but can define their own custom character sets, often making it difficult to search for text (due to the custom encoding). I'm not going to focus on this type of game today, but I will mention that the best place to start is to locate the definition of the character set (held in VRAM), so the encoding can be determined. This would be done with Mednafen in the same way as any character graphics would be found/isolated, and worked back to the source location.

Today's focus, however, is on locating the print function and script from CDROM games (or at least trying to).

Before We Start
Note that I will be using hexadecimal a lot in this post; since the 6502 convention is to prefix values with the dollar sign and capitalize the letters, I will try to ensure that this is done for addresses and values that the processor uses (i.e. '$F8'). For offsets into the ISO file, I generally use the 'C' convention of the prefix '0x', often with lowercase alphabetic characters ('0xffff'). And for a string of bytes, I hope that just using the pattern of repeating 2 digits+space is adequate.  It'll make sense... (I hope).


Where to Start ?

Like any sufficiently difficult puzzle, the key is to start at the most basic/simple/already familiar part, working outwards and solving the unknown at the edge of what it already known.

In this case, the key is the kanji graphics – NEC had the foresight to put a substantial kanji character set into the System Card, so that game developers wouldn't have to create their own character set definitions for the huge set of kanji in the Japanese language (effort which is better spent on other graphics). In order to make use of it, the game needs to make a system card call with a 2-byte SJIS value, getting the graphics data back in a buffer. This in turn means that the text to be printed is either stored directly in SJIS, or in a source format from which SJIS can be created easily.

The EX_GETFNT Call

The EX_GETFNT function is at location $E060, and the system card functions always expect parameters to be passed via the zero-page location between $F8 and $FF (or in registers).

For EX_GETFNT, the parameters are passed as follows:
$F8/$F9 = Kanji code – note: this processor is little-endian, so $F8 holds the least-significant byte (LSB), and $F9 holds the most significant (MSB)
$FA/$FB = destination address for the graphics (32-byte buffer)
$FF = transfer mode ($00 for 16x16 size; $01 for 12x12 size)


Mednafen's Debugger

If you've never used Mednafen's debugger before, it's indispensible for this kind of work. You should get accustomed to the debugging functions and features.

OK, The Debugger Returned... Now What ?

So now, the debugger appeared again, and the game stopped. The disassembly list shows the jump table, just like the last time you left it, with the E060 line highlighted. You might ask yourself... "now what ?"

Get your deerstalker cap out of the closet... the game is afoot !

I've attached a screenshot of this exact moment while playing Dead of the Brain 1:

(?action=dlattach&topic=23355.0&attach=3056&image)

If you look closely, you'll see that I've also put a few red rectangles around some key information:

The patchwork square(ish) block of coloured numbers is zero page memory, which you will frequently consult while debugging; I put boxes around each of the parameters which EX_GETFNT uses... so:
$F8/$F9 -> shows us that the SJIS character is $8352 (remember, LSB is stored first)
$FA/$FB -> show us that the graphics buffer is at $3529
$FF -> shows us that the 12x12 version of the character is being requested

I placed another box in a list area – this is a traceback queue, which tells us where the processor has been before it came here. If you hit 'g' then put 'C778' in as the address, Mednafen will display a disassembly of the most recently-executed section of the game's print function.


Suggested Clue Gathering

A short list of things I usually do next is as follows (but other people may have a different approach):

If all of this works out, you are well on your way... but if it doesn't, here are a few possibilities:

Next Steps (Still Early Days)

Next, you could continue in either of two places – the script, or the print function.

For the script:

You may want to make a small adjustment in the script (within the ISO, where you found it):
In order to really understand the script organization, though, you'll need to understand some more about the tokens, and the overall complexity of the strings. For that, you'll need at least some of the print function to be disassembled and understood.

For the print function:

Use a disassembler, and read the code in order to distil meaning from it.
...I know, easier said than done - but as I mentioned at the beginning of the post, start with things that are obvious, and comment them until you reach the edge of what is obvious. Including the scratchpad RAM usage. And a 100% understanding isn't always needed in order to get what you need.

So, this will start with the part leading up to the call to EX_GETFNT; if you trace back enough, you'll find the loop where it fetches the string's characters, and checks token values. At some point, as you try to understand what the original programmer was doing, you may reach a dead end... at that point, look for other familiar things, such as accesses to the VRAM (another 'fixed truth' of the machine are the VDC hardware addresses), and look at how they manipulate data and so on.

It's not a trivial piece of work, so you will need patience and an inquisitive nature to accomplish this. Chances are, you will at some point find something that looks like a bug. Maybe it is a bug, but the programmer 'fixed' it with a countervailing bug elsewhere. Or the programmer had a strange way of viewing the problem and implemented the solution in a completely counter-intuitive and inefficient way. Ah, the joys of examining somebody else's code...

Reverse-engineering somebody else's program without source code is not easy (it's often difficult even with source code!), but – thinking of it as a puzzle – it can be incredibly satisfying to solve.

I'm going to repeat this, because I don't think I can stress it enough – while understanding the print function, I found the most important thing was determining what scratchpad memory was being used for, so whatever you do, don't skip documenting that.

Hopefully, you will eventually come up with something like the files I am posting here – but it will take some time. Mednafen's single step function ('S' in the debugger) is also helpful, and so is setting other breakpoints to go over the boring parts. With a debugging emulator, we now have the luxury of seeing what values are reasonable (by viewing them 'live'), where branches actually take us, and so on. Much easier than just using a paper disassembly.


Notes (follow-up on my 'clue gathering' suggestions above):
Attached are my commented disassemblies of the print function, for your perusal:

printfunc-disassembly-ramuse.asm
printfunc-disassembly.asm


To Study/Consider in Advance of the Next Post

Next post: the print function patch

Continued: Part III (https://www.pcengine-fx.com/forums/index.php?topic=23357.0)