Translation patch – POST 3 – Print Function Modification

Started by dshadoff, 12/07/2018, 05:21 PM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

dshadoff

We left off with a commented disassembly of the print function, and some questions about how one would implement a modification for western languages.

Some hints from the disassembly and knowledge of the machine's hardware:
  • This print function is capable of printing both 16x16 and 12x12, as the 'kanji size' flag to the EX_GETFNT routine is not a constant, derived from PRINTFLAGS third-least-significant bit
  • Similarly, the horizontal spacing is set by a flag within the PRINTFLAGS token's payload byte's most-significant 4 bits (shifted right of course)
  • The 'normal' tiling/character layout on this machine would easily support 8x8 or 8x16 printing, which is ideal for latin-type letters
Tinkering around, we can modify the PRINTFLAGS (token=07) payload byte to adjust the amount of offset per character. I found that 'PRINTFLAGS=0x' (16 pix wide; 0 pix offset) worked well, and 'PRINTFLAGS=4x' (12 pix wide; 4 pix offset) also worked fine, but if I went one step further to 'PRINTFLAGS=8x' (8 pix wide), it didn't print well. Most other values also fail. So effectively, we realize that the appearance of being a very flexible print routine is only skin deep: since the original authors only believed that those two values were going to be needed, those appear to be the only values tested.  After digging into the print function, we realize that the sizes are hardcoded at various spots, so the design itself is deceptive.  This sort of thing happens a lot in "other peoples' code"; take nothing at face value, and take nothing for granted.


Previous Attempt at print function modification on Dead of the Brain

First, I need to give a bunch of credit to Tomaitheous (aka Bonknuts, aka Rich), who put a lot of effort into locating the print function, some initial disassembly, and working on a print function modification with very lofty goals. His goal was to create the first variable-width font (VWF) modification on the PC Engine, with a narrow font which could fit a lot of information into the box (compensating for the fact that English - and even more so French - uses more characters than Japanese to express the same thought). He also planned to use compression to squeeze it all into the small available space.

Without his effort I probably wouldn't have proceeded on the script extract. He accomplished a lot when the tools weren't really around to do much debugging.  Unfortunately, the initial text insertion revealed bugs in the print function, and while we worked together to try to identify and resolve these, ultimately we each became busy with jobs/life and progress stopped. One possible contributing factor is that the original print function has some weird quirks about tile allocation which I noticed during my own disassembly/patch efforts this past summer, (and I still don't completely understand some of the design decisions the original author made).


My Philosophy

When creating a print function modification, you can do it one of two ways:
  • Replace the entire print function with one of your own making. In this case, you bear the risk that some external function makes an unexpected jump into the print function for its own undocumented purposes, expecting a certain output... which your replacement code likely won't replicate.  Games of this generation were all still written in assembler which promotes such bad behaviour; the next generation of consoles used higher-level languages which reduced these sorts of shenanigans.
  • Or, you can perform the minimum possible change to the original function, which should preserve even obscure, hidden functionality.
...I'm not sure which approach the original print modification took, but it was ambitious. It may indeed have started as "minimum possible", but it likely ended up replacing existing code to some extent.

While I continue to have a great amount of respect for what Rich attempted (and EsperKnight later also looked into), I personally prefer to start with the "minimum possible" approach:
  • I want the original kanji to remain printable -> in case I failed to locate all of the script for some reason, play-testing should still print the text as a clue to finding/fixing it.
  • I want to work within the limits of the existing print function to extend functionality to include a simple 8x12 or 8x16 latin character print function which snaps into the same touchpoints the existing print function does.
  • Once the "minimalist" function is working, I may consider the variable-width font as a 'stretch goal' in case I still wanted to get fancy once everything was working.  But I would certainly hold on to the 'cruder', working version as a fallback.  Keep in mind that any added complexity brings added risk, and I prefer to start my project with as little risk as possible (projects are full of surprises as it is).
...So this is exactly what I did exactly did this past summer: I implemented a much simpler print function modification, touching as little of the existing print function as possible, and retaining the ability to print kanji.

I have attached all of my code here, including a repost of the print function disassembly, now including patch points for illustration purposes.


-> See printfunc_disasm.zip for files printfunc-disassembly.asm and printfunc-disassembly-ramuse.asm


What exactly do you mean by "PATCH POINT" ?

Of course, all the code resides at fixed locations in memory, and we want to add some code. As I mentioned before, we could replace the whole function... but that could have complications. So, we locate the exact step where we need to intervene and replace it with a 'jump' to our own code block. The 'jump' instruction takes up space, obliterating one or two operations which were previously there, so we also need to transplant those instructions in our routine, whether we execute new code, or 'short-circuit' back out to the original code.

Note:  those instructions might set flags which are examined later, so we will need to keep that in mind and tread carefully in order to preserve such flags.


Steps

First, I felt I had to understand the existing print function completely. I already had some disassembly notes from long ago, but they were actually only about 40% of what I wanted. Worse, I hadn't taken the time back then to completely document the print function's RAM usage – maybe I thought I could remember it... well, I couldn't.  Document the print function – your notes will be needed later (at some point).

Once I had the disassembly, there were a few key touchpoints:
  • When the print character is read, the original function check for special characters below $20, which clear the screen, end messages, and so on. Otherwise, it assumes that all text is going to be double-byte SJIS (first byte is a value above $80).  This is PATCH POINT #1; it needs to realize that there are valid values between $20 and $80.
  • The new print function should re-use as much as possible of the existing 'tile allocate', and 'print' functionality (primarily 'shift'-'mix'-'display' steps).
  • I needed to be clear about how to adjust the print function, since the new font was narrower; boundary adjustments (crossing the limits of am 8x8 tile) are different. Since the new font is narrower than the kanji font, it's simpler than if the new font were wider.
  • I needed a font to display. I decided on an 8x12 font for a couple of reasons: a) 12 pixels is enough to provide for adequate descenders ('g', 'j', 'p', 'q', and 'y' which dip below the line on which everything else rests), and matches the 12x12 kanji height.  b) free space to store the font was limited, but there was enough to store a 12-pixel font without needing to compress it.
  • I needed to be sure that there was room available to fit it. Based on the data on disc, there was an apparent empty area at about $DA50, until $DFFF (the end of the sector, also the end of the bank).  So I put it there, at $DA60. I also needed a couple of scratchpad values, but I couldn't rely on random areas in the $2000 segment, so I just used the adjacent memory (since it runs from RAM anyway), at $DA5C.

The print function modification simply adds the following functionality:
  • Check whether the next character is between $20 and $80. If not, do whatever the function would have done if not patched. Otherwise, go to the special print function.
  • For latin characters, use the same 16x16 graphic buffer that the EX_GETFNT function uses, but put our own graphic data inside.
  • Return control back to the print function, but leave a flag set, so that the pixel-adjustment (for 12- or 16-, and now 8-pixel characters) can adjust their pointers properly to get ready for the next character's starting point.

A word about creating the font data:

You could put this together as a series of byte-definitions in assembler, but a graphic is easier to edit, review, and so on.  I have included a 1-bit BMP (careful, it's a relatively rare format; lots of editors try to bulk it out to more bit depth).  I used a free editor I found online called 'Pixelformer' which preserved bit-depth.

Based on the fact that the inserted script will be French, the font will need to include accented characters which aren't part of the standard ASCII set.  On the other hand, we aren't using all of the standard ASCII characters, so several of them have been replaced with accented characters.

Once the font is visually OK, the BMP can be converted into a binary file, which can then be inserted.  For this, I used a program called 'FEIDIAN' to do the bit conversion.


-> See font.zip for files DotBNewFont-8x12.bmp, NewFont8x12.bin, and GetFontBIN.bat which contains the FEIDIAN command-line parameters needed.  FEIDIAN is available separately on GitLab


Putting the print function modification together:

The best way to make even the simplest piece of assembler code is to use the assembler.  Of course, it can be done by hand (I've done it many times), and you may convince yourself that it's small enough... but sooner or later you'll miscalculate an offset, or you'll need to insert a line and need to adjust everything.

However, the PCEAS assembler only outputs to a new file; it doesn't directly patch an existing one.  So I had to write another program ('filepatch') to patch the base ISO file based on the PCEAS output (i.e. copy blocks of bytes from one file into the other file at a given location).  It's parameterized to allow entry of: {target file, target offset, patch file, patch offset, patch length}.
Then I wrote a shell script to sequentially patch each of the necessary blocks to be patched this way (it's a UNIX batch file, but should be trivial to switch to a Windows BAT file).


-> See dotb_printpatch.zip for files DotB1_patch_2018.asm, DotB1_patch_2018.lst, filepatch.c, and patch_dotb (shell script)


I think this provides a fair overview of what's needed in the patch, and I really hope that the comments in the code provide enough specific detail to give a programmer not just an overview of a print function modification, but a concrete example to learn from.

Please reply to this thread with any questions, follow-ups, comments, etc.


Next post:  Extracting the script

Continued: Part IV

dshadoff

It looks like ZIP files are not allowed on this board (in fact, most file types are not).
So I'll try to upload things as possible.
The print function disassembly files were changed to .TXT, and attached to the top post in this thread.

This post should contain the FONT-related files... but doesn't contain all of them.

In order to post, I had to change the BAT file to a TXT file, and I am unable to change the BIN file to anything usable... you'll just need to create it for yourself by running the BAT file with FEIDIAN against the BMP file.

dshadoff

And these are the patch-related files (added TXT file suffix to them to get past the board filter).