12/23/2024: Localization News - Team Innocent

PC-FX Localization for Team Innocent is released, a pre-Christmas gift!! In a twist, it feels like the NEC PC-FX got more attention in 2024 than any other time I can remember! Caveat: The localizers consider the "v0.9" patch a BETA as it still faces technical hurdles to eventually subtitle the FMV scenes, but they consider it very much playable.
github.com/TeamInnocent-EnglishPatchPCFX
x.com/DerekPascarella/PCFXNews
Main Menu

PC-FX homebrew development.

Started by elmer, 01/26/2015, 03:40 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

OldRover

#200
To continue on the ADPCM kick, I figure I should add in some more details.

So in my example, I give it a starting address of 0x2000. That's because I've loaded my ADPCM array data into KRAM at that point. Loading data into KRAM is easy...

eris_king_set_kram_write(0x2000, 1);
for(i = 0; i < sizeof(voxarray)-1; i++)
eris_king_kram_write(voxarray[i]);

I have a u16 array called voxarray that has all the ADPCM data in it. The two parameters of the first function call are the address and the auto-increment amount (1 word, in this case). Each time you write a word to KRAM, the pointer moves up 1 word, so you don't have to keep setting the address. Nifty and convenient methinks.

So... getting the ADPCM data into your source code... well, this is where having some coding knowledge worked out well for me. I coded a simple utility called any2arr which takes any file you give it and creates a header with the data of that file as a u16 array.

frozenutopia.com/pcfx/any2arr.7z

Making ADPCM files is also easy... just snag a copy of sox. To make the samples for Asteroid Challenge FX, I used sox like so:

sox -r 16000 boom.wav boom.vox
The -r 16000 tells it to use 16kHz, the .wav is obviously my source audio, and the .vox is the output file. sox knows to make an ADPCM file based on the .vox extension. So, just convert your file to a .vox with sox, run the .vox file through any2arr, and you've got your ADPCM data, ready to include into your program.

EDIT: Forgot to mention... since we're using words here, take the filesize of your .vox file and divide it in half to get the length. Take that divided value and add it to your starting address to get the ending address that you need. I think I did mention this briefly in the first post about this but it bears mentioning again, because reasons.
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

elmer

Quote from: OldRover on 03/22/2016, 04:38 PMSo... getting the ADPCM data into your source code... well, this is where having some coding knowledge worked out well for me. I coded a simple utility called any2arr which takes any file you give it and creates a header with the data of that file as a u16 array.
Good stuff!  :)

But you're really going-out-of-your-way to avoid using "objcopy" or an assembly file with an "incbin", aren't you!  :wink:

http://www.linuxjournal.com/content/embedding-file-executable-aka-hello-world-version-5967

************

On my side of things ... GCC is now correctly passing the varargs to "sprintf" ... where it all just dies.  ](*,)

OldRover

Quote from: elmer on 03/22/2016, 05:18 PMBut you're really going-out-of-your-way to avoid using "objcopy" or an assembly file with an "incbin", aren't you!  :wink:
Honestly... I had no idea about that... heh :D I just go with what I know... remember, I'm not a hardcore coder like you are, I'm just a game developer who can code a little. :)
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

elmer

Quote from: OldRover on 03/22/2016, 05:49 PMHonestly... I had no idea about that... heh :D I just go with what I know... remember, I'm not a hardcore coder like you are, I'm just a game developer who can code a little. :)
The point is ... you're finding problems, and you're solving them. That's brilliant!  8)

It's just that sometimes you're hitting problems that have already been solved, and this is one of those cases.  :wink:

objcopy -I binary -O elf32-v810 -B v810 filename.bin filename.o

OldRover

Well, the one problem I still can't figure out is the SCSI bit. I'm experimenting with KING's SCSI ports now. I read 0x600 and 0x602 for now... 0x602 *should* contain the SCSI status, if I'm reading the daifukkat docs correctly. It's returning 1101100, which indicates Busy, Request, C/D, and I/O are all set to 1. I figured maybe I could use this to determine if the system knew when the audio track had finished playing. However, the status never changes. So I guess this is out.
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

elmer

Quote from: OldRover on 03/22/2016, 07:33 PMIt's returning 1101100, which indicates Busy, Request, C/D, and I/O are all set to 1. I figured maybe I could use this to determine if the system knew when the audio track had finished playing. However, the status never changes. So I guess this is out.
That sounds very-much-like what I was seeing in the Xanadu 2's custom CD-reading code.

I think that those bits are all just to do with controlling communication over the SCSI bus itself.

If you want to find out if the CD has finished playing a track, I suspect that you'll have to send it some kind of a "status query" command.

OldRover

Here's what the SCSI-2 docs say, but I am not able to make heads or tails of this, honestly...

QuotePLAY AUDIO commands with the immediate bit set in the audio control
mode return status as soon as the command has been validated (which may
involve a seek to the starting address). The playback operation
continues and may complete without notification to the initiator.
Error termination of audio operations shall be reported to the
initiator by returning immediate CHECK CONDITION status to the next
command (except for REQUEST SENSE and INQUIRY.)  The deferred error
sense data (see 8.2.14.2.) is used to indicate that the error is not
due to the current command.

The status of the play operation may be determined by issuing a REQUEST
SENSE command.  The sense key is set to NO SENSE and the audio status
(see 14.2.10) is reported in the additional sense code qualifier field.
1, I have no idea how to set the audio control mode. The docs are very, very dry and hard to follow. 2, I have no idea how to receive the results from an SCSI command.
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

elmer

Quote from: OldRover on 03/22/2016, 07:48 PMHere's what the SCSI-2 docs say, but I am not able to make heads or tails of this, honestly...
Hahaha ... welcome to my world!  :wink:

QuotePLAY AUDIO commands with the immediate bit set in the audio control...
Basically, the return-code from "PLAY AUDIO" just says whether the CD drive has validated and accepted the "PLAY AUDIO" command.

You've got to send the drive a "REQUEST SENSE" command and look at the return code to see if it is still playing.


QuoteI have no idea how to receive the results from an SCSI command.
That's going to be a problem.

You should see an example of reading the result of a "REQUEST SENSE" command somewhere in Alex's examples ... I can't imagine how he could have managed to read data from the CD without using it.

OldRover

Unfortunately, I see no mention of it in his examples, and of course, his source is next to impossible for someone like me to read... but I tried at least. :)

http://www.staff.uni-mainz.de/tacke/scsi/SCSI2-08.html

^^^ has the "details" of REQUEST SENSE. This is where getting a response becomes dim... I have no idea what to read to get this response. It says it returns 18 bytes... where?

Okay so one other thing is that eris_low_scsi_command() returns an int... but does not actually say what the return value might actually be. From what limited amount I understand of our lovely assembler here, I assume that this is the entirety of the function:

_eris_low_scsi_command:
mov lp, r15
mov 0, r14
mov 1, r13
movea 0x84, r0, r12
mov 3, r11
mov r0, r10
out.h r11, 0x600[r0]
out.h r0, 0x604[r0]
out.h r0, 0x600[r0]
add 2, r11
out.h r12, 0x604[r0]

scsi_delay

out.h r13, 0x600[r0]
out.h r13, 0x604[r0]

scsi_delay

out.h r13, 0x600[r0]
out.h r11, 0x604[r0]

scsi_delay

1: jal _eris_low_scsi_get_phase
be 1b

scsi_delay

out.h r13, 0x600[r0]
out.h r0, 0x604[r0]

scsi_delay

1: jal _eris_low_scsi_get_phase
cmp 4, r10 # command
bne 1b

mov 3, r11
mov 2, r10
out.h r11, 0x600[r0]
out.h r10, 0x604[r0]

1: in.h 0x602[r0], r11
andi 0x20, r11, r11
be 1b

br 2f
1:
mov 1, r10
mov r0, r11
cmp r14, r7
bnh 3f
ld.b 0[r6], r11
add 1, r6
3:
add 1, r14
movea 0x11, r0, r12
out.h r0, 0x600[r0]
out.h r11, 0x604[r0]

out.h r10, 0x600[r0]
out.h r10, 0x604[r0]

out.h r10, 0x600[r0]
out.h r12, 0x604[r0]

3: in.h 0x602[r0], r11
andi 0x20, r11, r11
bne 3b

out.h r10, 0x600[r0]
out.h r10, 0x604[r0]

3: in.h 0x602[r0], r11
andi 0x20, r11, r11
be 3b

2: jal _eris_low_scsi_get_phase
cmp 4, r10 # command
be 1b

mov 1, r11
mov 3, r10
out.h r11, 0x600[r0]
out.h r0, 0x604[r0]
out.h r10, 0x600[r0]
out.h r0, 0x604[r0]

mov r14, r10
mov r15, lp
jmp [lp]
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

OldRover

#209
Using eris_low_scsi_status() freezes the system. Every. Single. Time. No matter what's going on... even if nothing's going on.

Sending another SCSI command freezes the system... even if I've also used eris_low_scsi_abort(). eris_low_scsi_abort() itself does not lock up the system, but using eris_low_scsi_command() a second time does.

At this point, my only guess is that something in the SCSI setup is misconfigured.

EDIT: Using eris_low_scsi_reset() allows me to use eris_low_scsi_command() a second time... but this is more than likely not the way it's supposed to work...
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

elmer

Quote from: OldRover on 03/22/2016, 09:28 PMUsing eris_low_scsi_status() freezes the system. Every. Single. Time. No matter what's going on... even if nothing's going on.
All that I can say is that Alex's SCSI and SCSI_DMA examples seem to work ... well, they do in Mednafen, I've not tried them on my PC-FXGA.

That suggests that he's got the basics right ... but he may not have provided all the functions that you need (i.e. a REQUEST SENSE function).


Quote from: elmer on 03/22/2016, 05:18 PMOn my side of things ... GCC is now correctly passing the varargs to "sprintf" ... where it all just dies.  ](*,)
So, it turns out that I'd fixed the varargs stack allocation, but I still had an error in the code-generation that actually saved the varargs themselves to the stack.  #-o

With that fixed, then modifying Alex's "hello" example's printing code to ...

[tt]        printstr("Hello World!", 10, 0x20, 1);
        printstr("Love, NEC", 11, 0x38, 0);
        i = sprintf(str, "Eat %X!", 0xdeadbeef);
        printstr(str, ((32 - i) / 2), 0x48, 0);
[/tt]
... gives ...

IMG


That's using "strlen" and "sprintf" ... and "sprintf" requires a "malloc", so we've got functional memory-allocations, too!  :D

It's "Miller Time" ... but with something that actually has some flavor, instead.  :wink:

OldRover

#211
Yes, they work... but unfortunately, they explain nada. He never once uses eris_low_scsi_status() in his examples, although I am seeing that the function is being called in the assembly source in several other functions.

I am not sure what eris_low_scsi_data_in() does. Perhaps this is how to get the return values from things like REQUEST SENSE. Of course, the docs offer absolutely no context... as usual. Oh well... only one way to find out, I guess.

Quote from: elmer on 03/22/2016, 10:04 PMThat's using "strlen" and "sprintf" ... and "sprintf" requires a "malloc", so we've got functional memory-allocations, too!  :D

It's "Miller Time" ... but with something that actually has some flavor, instead.  :wink:
Haha :D well that's awesome... good string functions are always nice to have. Hopefully this doesn't introduce too much overhead. And root beer, please... :lol:

EDIT: Because REQUEST_SENSE requires me to use eris_low_scsi_command(), of course actually using it crashes the machine after I've already sent 0x48. Something is clearly jamming up SCSI when I tell it to play an audio track.

EDIT2: ...and if I sent a REQUEST_SENSE before I send the audio play command, it *also* jams up the system. What. The. Fuck.

EDIT3: I commented out the eris_low_scsi_data_in() line... which was returning 0 anyway so I am pretty sure that this is *not* how to get return results. The system crashes when I send a second SCSI command. I'm no expert here but I am positive at this point that this is exactly where the flaw is. Something to do with using eris_low_scsi_command() from within the C code is the culprit. Using eris_low_scsi_reset() does, of course, reset the SCSI subsystem so additional commands can be sent... but I am 99.9999% positive that this is not the way you're supposed to have to do things. The only other thing I can think of is something to do with the SCSI phase... although it appears that eris_low_scsi_command() is already waiting for the correct phase... AAAAAAAAAAAAAA
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

elmer

Quote from: OldRover on 03/22/2016, 10:43 PMYes, they work... but unfortunately, they explain nada. He never once uses eris_low_scsi_status() in his examples, although I am seeing that the function is being called in the assembly source in several other functions.
I just had a quick look, and the impression that I'm getting is that the "eris_low_*" functions really aren't supposed to be called by themselves from C.

It's hard to tell because the assembly code is very hard-to-read because he hasn't bothered to use constants, or sensible labels, or to actually add comments to explain what's going on.

I can see that there's going to have to be some major re-writing going on.

Arkhan Asylum

This reminds me, that I really need to make some sort of effort to setup that PCFXGA crap.

lol.

This "max-level forum psycho" (:lol:) destroyed TWO PC Engine groups in rage: one by Aaron Lambert on Facebook "Because Chris 'Shadowland' Runyon!," then the other by Aaron Nanto "Because Le NightWolve!" Him and PCE Aarons don't have a good track record together... Both times he blamed the Aarons in a "Look-what-you-made-us-do?!" manner, never himself nor his deranged, destructive, toxic turbo troll gang!

OldRover

Quote from: elmer on 03/23/2016, 12:09 AMI can see that there's going to have to be some major re-writing going on.
I'll take the plunge and attempt to learn V810 assembly better then... looks like we've got a massive project on our hands here and it's gonna take some combined brainpower.
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

elmer

Quote from: guest on 03/23/2016, 05:16 AMThis reminds me, that I really need to make some sort of effort to setup that PCFXGA crap.
Bueller? Bueller? Bueller?


Quote from: OldRover on 03/23/2016, 07:26 AMI'll take the plunge and attempt to learn V810 assembly better then... looks like we've got a massive project on our hands here and it's gonna take some combined brainpower.
To be fair to Alex ... I suspect that at-least-some of the reason for his sparse assembly-code was just working with the old GAS assembler from binutils-2.10.

The GNU folks put a lot of work into making GAS more "programmer-friendly" over the last 15 years.

If you've programmed any CPU in assembler before, then I think that you'll find it really pleasant, especially if you've tried to read other early-RISC assembly, like MIPS or SH2.

The big thing that initially seems "weird" to folks that are used to 6502/Z80/68000/x86 is that there are no addressing-modes in the instructions. Everything is just register-to-register, and you have to load/save registers to memory explicitly.

It makes-up for that by having lots of registers, so that you really don't need to load/save "temporary" stuff inside a function very often. And the "big-win" is that most instructions run in 1-cycle (effective throughput, with a 5-cycle pipeline) .

With the CPU running at 21Mhz, with mostly-single-cycle instructions, it's one-heck-of-a-lot faster than the 7MHz HuC6280 with it's 2-to-7-cycle instructions.

Well ... until you hit a pipeline-stall, anyway. The docs from the Nintendo Seminar that I sent you do a good job of explaining the basic theory, and the (few) pipeline-stall conditions.

I already posted an example of what my V810 assembly-code looks like, taking advantage of running the source-code through the C-preprocessor to provide "macro" capability to the GNU assembler ...

https://www.pcengine-fx.com/forums/index.php?topic=19619.msg421022#msg421022

elmer

FWIW, I'd forgotten what it was like to program in C on some of these old architectures.

GCC on the PC-FX really, really, really doesn't like working with 8-bit or 16-bit local/global/struct variables.

You're much, much, much better off declaring everything as an "int" or "unsigned", or "int32_t" and "uint32_t" if you care about the sizes.

This is one of those cases where "portable" C code ... isn't. It'll cripple your performance.  #-o

OldRover

In cycle-critical applications, such as a really busy action game, a good coder knows that to get top performance, you use the most CPU-efficient variable type and you give not one shit about how much space it takes up in memory. If using ints is the fastest, then you use ints... bottom line. :)
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

elmer

#218
I took a quick look at the VirtualBoy's "libgccvb" source code, and was surprised to see so many uses of "u8" and "u16" in the code.

The V810 CPU was designed to handle 32-bit variables ... and it doesn't do any arithmetic operations on 16-bit or 8-bit values.
That means that the compiler needs to do a lot of masking/sign-extending when it's asked to deal with 16-bit or 8-bit variables, just so that it keeps the results correct within the limits of 16-bit or 8-bit rounding.

You really should be using "int" and "unsigned" as much as possible, and avoid "short" and "char" variables.

I thought that it would be interesting to see how the different GCC compiler versions compile a couple of simple C functions.

In each case, the original libgccvb version is first, and then 1 or 2 versions replacing the "u16" and "u8" variables with "unsigned" instead.

It seems strange to me that GCC 4.4.2 is doing such a relatively-poor job compared to GCC 2.9.5 or GCC 4.7.4, I wonder what went wrong?

All examples are compiled with "-O2 -fomit-frame-pointer".


****************************************************************************************
****************************************************************************************

void copymem (u8* dest, const u8* src, u16 num)
{
  u16 i;
  for (i = 0; i < num; i++) {
    *dest++ = *src++;
  }
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_copymem: andi 65535,r8,r8    _copymem: andi 65535,r8,r8    _copymem: andi 65535,r8,r8
          be .L1                        mov 0,r10                     be .L4
          addi -1,r8,r11                cmp r8,r10                    mov 0,r10
          andi 65535,r11,r11            bnl .L4             .L3:      mov r7,r11
          add 1,r11           .L6:      add 1,r10                     add r10,r11
          add r6,r11                    ld.b 0[r7],r11                ld.b 0[r11],r12
.L3:      ld.b 0[r7],r10                andi 65535,r10,r10            mov r6,r11
          add 1,r7                      add 1,r7                      add r10,r11
          st.b r10,0[r6]                st.b r11,0[r6]                add 1,r10
          add 1,r6                      add 1,r6                      st.b r12,0[r11]
          cmp r11,r6                    cmp r8,r10                    andi 65535,r10,r11
          bne .L3                       bl .L6                        cmp r11,r8
.L1:      jmp [r31]           .L4:      jmp [r31]                     bh .L3
                                                            .L4:      jmp [r31]

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********


****************************************************************************************
****************************************************************************************

void copymem2 (u8* dest, const u8* src, unsigned num)
{
  unsigned i;
  for (i = 0; i < num; i++) {
    *dest++ = *src++;
  }
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_copymem2:mov r6,r11          _copymem2:mov 0,r11           _copymem2:cmp r0,r8
          add r8,r11                    cmp r8,r11                    be .L10
          cmp 0,r8                      bnl .L10                      mov 0,r10
          be .L7              .L12:     ld.b 0[r7],r10      .L9:      mov r7,r11
.L11:     ld.b 0[r7],r10                add 1,r11                     add r10,r11
          add 1,r7                      add 1,r7                      ld.b 0[r11],r12
          st.b r10,0[r6]                st.b r10,0[r6]                mov r6,r11
          add 1,r6                      add 1,r6                      add r10,r11
          cmp r11,r6                    cmp r8,r11                    st.b r12,0[r11]
          bne .L11                      bl .L12                       add 1,r10
.L7:      jmp [r31]           .L10:     jmp [r31]                     cmp r10,r8
                                                                      bh .L9
                                                            .L10:     jmp [r31]

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********


****************************************************************************************
****************************************************************************************

void addmem (u8* dest, const u8* src, u16 num, u8 offset)
{
  u16 i;
  for (i = 0; i < num; i++) {
    *dest++ = (*src++ + offset);
  }
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_addmem:  andi 65535,r8,r8    _addmem:  andi 65535,r8,r8    _addmem:  andi 65535,r8,r8
          andi 255,r9,r9                mov 0,r11                     andi 255,r9,r9
          cmp 0,r8                      andi 255,r9,r9                cmp r0,r8
          be .L13                       cmp r8,r11                    be .L20
          addi -1,r8,r11                bnl .L22                      mov 0,r10
          andi 65535,r11,r11  .L24:     mov r9,r10          .L19:     mov r7,r11
          add 1,r11                     add 1,r11                     add r10,r11
          add r6,r11                    ld.b 0[r7],r12                ld.b 0[r11],r12
.L15:     ld.b 0[r7],r10                andi 65535,r11,r11            mov r6,r11
          add 1,r7                      add r12,r10                   add r10,r11
          add r9,r10                    add 1,r7                      add r9,r12
          st.b r10,0[r6]                st.b r10,0[r6]                add 1,r10
          add 1,r6                      add 1,r6                      st.b r12,0[r11]
          cmp r11,r6                    cmp r8,r11                    andi 65535,r10,r11
          bne .L15                      bl .L24                       cmp r11,r8
.L13:     jmp [r31]           .L22:     jmp [r31]                     bh .L19
                                                            .L20:     jmp [r31]

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********


****************************************************************************************
****************************************************************************************

void addmem2 (u8* dest, const u8* src, unsigned num, u8 offset)
{
  unsigned i;
  for (i = 0; i < num; i++) {
    *dest++ = (*src++ + offset);
  }
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_addmem2: mov r6,r11          _addmem2: mov 0,r12           _addmem2: andi 255,r9,r9
          andi 255,r9,r9                andi 255,r9,r9                cmp r0,r8
          add r8,r11                    cmp r8,r12                    be .L20
          cmp 0,r8                      bnl .L22                      mov 0,r10
          be .L18             .L24:     mov r9,r10          .L19:     mov r7,r11
.L22:     ld.b 0[r7],r10                ld.b 0[r7],r11                add r10,r11
          add 1,r7                      add 1,r12                     ld.b 0[r11],r12
          add r9,r10                    add r11,r10                   mov r6,r11
          st.b r10,0[r6]                add 1,r7                      add r10,r11
          add 1,r6                      st.b r10,0[r6]                add r9,r12
          cmp r11,r6                    add 1,r6                      st.b r12,0[r11]
          bne .L22                      cmp r8,r12                    add 1,r10
.L18:     jmp [r31]                     bl .L24                       cmp r10,r8
                              .L22:     jmp [r31]                     bh .L19
                                                            .L20:     jmp [r31]

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********


****************************************************************************************
****************************************************************************************

void addmem3 (u8* dest, const u8* src, unsigned num, unsigned offset)
{
  unsigned i;
  for (i = 0; i < num; i++) {
    *dest++ = (*src++ + offset);
  }
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_addmem3: cmp 0,r8            _addmem3: mov 0,r12           _addmem3: cmp r0,r8
          be .L24                       cmp r8,r12                    be .L25
          andi 255,r9,r9                bnl .L28                      andi 255,r9,r9
          add r6,r8           .L30:     mov r9,r10                    mov 0,r10
.L26:     ld.b 0[r7],r10                ld.b 0[r7],r11      .L24:     mov r7,r11
          add 1,r7                      add 1,r12                     add r10,r11
          add r9,r10                    add r11,r10                   ld.b 0[r11],r12
          st.b r10,0[r6]                add 1,r7                      mov r6,r11
          add 1,r6                      st.b r10,0[r6]                add r10,r11
          cmp r8,r6                     add 1,r6                      add r9,r12
          bne .L26                      cmp r8,r12                    st.b r12,0[r11]
.L24:     jmp [r31]                     bl .L30                       add 1,r10
                              .L28:     jmp [r31]                     cmp r10,r8
                                                                      bh .L24
                                                            .L25:     jmp [r31]

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********


****************************************************************************************
****************************************************************************************

elmer

I think that I have figured-out how to let GCC know that "ld" instruction sign-extends variables into an int.

Here are a coupe of examples of how it effects the code with newlib's "strlen" function, and then some variations on it.

The variations show how the generated code changes when things get a little bit more complex when modifying "strlen" to change the comparison so that the compiler can't just short-cut the check for zero.

The thing to pay particular attention to is the number of instructions in the inner loop.

It shows, again, that if you choose to use C on a processor like the V810, then there are definitely tricks to know that will improve the code-generation.

****************************************************************************************
****************************************************************************************

ORIGINAL FUNCTION FROM NEWLIB 2.2.0

size_t strlen (const char *str)
{
  const char *start = str;
  while (*str)
    str++;
  return str - start;
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_strlen:  ld.b 0[r6],r10      _strlen:  ld.b 0[r6],r10      _strlen:  ld.b 0[r6],r10
          cmp 0,r10                     mov r6,r11                    shl 24,r10
          be .L42                       cmp r0,r10                    sar 24,r10
          mov r6,r10                    be .L46                       be .L39
.L41:     add 1,r10           .L47:     add 1,r6                      mov r6,r10
          ld.b 0[r10],r11               ld.b 0[r6],r10      .L40:     add 1,r10
          cmp 0,r11                     cmp r0,r10                    ld.b 0[r10],r11
          bne .L41                      bne .L47                      shl 24,r11
          sub r6,r10          .L46:     mov r6,r10                    bne .L40
          jmp [r31]                     sub r11,r10                   sub r6,r10
.L42:     mov 0,r10                     jmp [r31]           .L39:     jmp [r31]
          jmp [r31]


****************************************************************************************
****************************************************************************************

MARK THE END-OF-STRING WITH A NON-ZERO CONSTANT

size_t strlen2 (const char *str)
{
  const char *start = str;
  while (*str != 1)
    str++;
  return str - start;
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_strlen2: ld.b 0[r6],r10      _strlen2: ld.b 0[r6],r10      _strlen2: ld.b 0[r6],r11
          cmp 1,r10                     mov r6,r11                    shl 24,r11
          be .L47                       cmp 1,r10                     sar 24,r11
          mov r6,r10                    be .L51                       cmp 1,r11
.L46:     add 1,r10           .L52:     add 1,r6                      be .L49
          ld.b 0[r10],r11               ld.b 0[r6],r10                mov r6,r10
          cmp 1,r11                     cmp 1,r10           .L46:     add 1,r10
          bne .L46                      bne .L52                      ld.b 0[r10],r11
          sub r6,r10          .L51:     mov r6,r10                    shl 24,r11
          jmp [r31]                     sub r11,r10                   sar 24,r11
.L47:     mov 0,r10                     jmp [r31]                     cmp 1,r11
          jmp [r31]                                                   bne .L46
                                                                      sub r6,r10
                                                                      jmp [r31]
                                                            .L49:     mov 0,r10
                                                                      jmp [r31]


****************************************************************************************
****************************************************************************************

PASS THE END-OF-STRING MARKER IN AS A "char" PARAMETER

int strlen3 (const char *str, char eos)
{
  const char *start = str;
  while (*str != eos)
    str++;
  return str - start;
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_strlen3: shl 24,r7           _strlen3: shl 24,r7           _strlen3: ld.b 0[r6],r10
          sar 24,r7                     sar 24,r7                     shl 24,r7
          ld.b 0[r6],r10                ld.b 0[r6],r10                mov r7,r12
          cmp r7,r10                    mov r6,r11                    shl 24,r10
          be .L52                       cmp r7,r10                    sar 24,r12
          mov r6,r10                    be .L56                       cmp r7,r10
.L51:     add 1,r10           .L57:     add 1,r6                      be .L56
          ld.b 0[r10],r11               ld.b 0[r6],r10                mov r6,r10
          cmp r7,r11                    cmp r7,r10          .L53:     add 1,r10
          bne .L51                      bne .L57                      ld.b 0[r10],r11
          sub r6,r10          .L56:     mov r6,r10                    shl 24,r11
          jmp [r31]                     sub r11,r10                   sar 24,r11
.L52:     mov 0,r10                     jmp [r31]                     cmp r12,r11
          jmp [r31]                                                   bne .L53
                                                                      sub r6,r10
                                                                      jmp [r31]
                                                            .L56:     mov 0,r10
                                                                      jmp [r31]


****************************************************************************************
****************************************************************************************

PASS THE END-OF-STRING MARKER IN AS AN "int" PARAMETER

int strlen4 (const char *str, int eos)
{
  const char *start = str;
  while (*str != eos)
    str++;
  return str - start;
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_strlen4: ld.b 0[r6],r10      _strlen4: ld.b 0[r6],r10      _strlen4: ld.b 0[r6],r10
          cmp r7,r10                    mov r6,r12                    shl 24,r10
          be .L57                       cmp r7,r10                    sar 24,r10
          mov r6,r10                    be .L61                       cmp r7,r10
.L56:     add 1,r10           .L62:     add 1,r6                      be .L63
          ld.b 0[r10],r11               ld.b 0[r6],r10                mov r6,r10
          cmp r7,r11                    mov r10,r11         .L60:     add 1,r10
          bne .L56                      cmp r7,r11                    ld.b 0[r10],r11
          sub r6,r10                    bne .L62                      shl 24,r11
          jmp [r31]           .L61:     mov r6,r10                    sar 24,r11
.L57:     mov 0,r10                     sub r12,r10                   cmp r7,r11
          jmp [r31]                     jmp [r31]                     bne .L60
                                                                      sub r6,r10
                                                                      jmp [r31]
                                                            .L63:     mov 0,r10
                                                                      jmp [r31]


****************************************************************************************
****************************************************************************************

elmer

#220
Just a quick (technical) update on the PC-FX toolchain ...


The Good:

A new stack-frame layout is implemented, and R2 is now the permanent-frame-pointer instead of the compiler just using R29 whenever a frame-pointer is needed.

*****************************

GCC 1999-ABI V850 STACK FRAME (old PC-FX GCC 2.9.5 compiler)

CALLER
          incoming-arg4
ap->      16-bytes-reserved

CALLEE
          saved-lp
          saved-??
fp->      saved-fp
          local-variables
          outgoing-arg?
          outgoing-arg4
sp->      16-bytes-reserved

*****************************

GCC 2016-ABI V810 STACK FRAME (new PC-FX GCC 4.7.4 compiler)

CALLER
fp-> ap-> incoming-arg4

CALLEE
          saved-fp
          saved-lp
          saved-??
          local-variables
          outgoing-arg?
sp->      outgoing-arg4

*****************************


"-mprolog-function" is working, but I've stopped it from being automatically-enabled whenever any optimization is requested.

The new stack frame layout reduces the code-size of the prolog functions so that there's a good chance that they'll stay in the V810's instruction cache more often. Note: the new prolog functions always save the FP and the LP when they're used.

A stack backtrace is now possible when either "-fno-omit-frame-pointer" or "-mprolog-function" is used.

Any C "leaf" functions (i.e. functions that don't call other functions) will omit the prolog function if they don't destroy any callee-saved register, and so small-fast-utility-code will still run as-fast-as-possible.

The NEC-standard register conventions are still the same, except for R2 now being the FP.

Any assembly langauge code that reads arguments off the stack will need to subtract 16 from their offset.


The Bad:

Any C "interrupt-handler" functions are probably broken at the moment, until I get around to fixing them.

Does anyone actually write interrupt-handlers in C???

The compiler generates some pretty slow register-saving code for them, so I sort-of assume that folks just write then in assembly. Am I wrong?


The Future: (long term - i.e. not until Xanadu is finished)

I'd like to add a few compiler intrinsics for some of the V810 opcodes, particularly the string opcodes and the in/out opcodes. That would allow the compiler to easily in-line some stuff that people have to drop into assembly to do.

It would also be a thought to contemplate changing the standard register usage so that R26-R29 are not callee-saved registers, and so avoid the compiler from having to save them on the stack whenever someone wants to use a string opcode. But doing so would break all current assembly-language code, and I suspect that people wouldn't want that. "Yes", the change in stack-offset in the new ABI also breaks things ... but that's an easy thing to find/fix. Changing ALL the registers would be a much more complicated thing to fix.

elmer

I fixed the "interrupt_handler" to where it's working again, although I'm not using the helper-functions anymore, because I really can't see the point.

I could make the code a tiny bit smarter ... but IMHO it's already a little bit better than GCC's V850 code, so any further work on it can wait.

Now ... this is it for me for a while on the PC-FX, or else I'll be in trouble!  :wink:

************************************

volatile int __attribute__ ((zda)) zda_frame_count = 0;

__attribute__ ((interrupt_handler)) void my_irq1 (void)
{
  for (int i = 0; i < 100; i++)
    zda_frame_count++;
}

_my_irq1: add -4,sp
          st.w r1,0[sp]
          add -8,sp
          st.w r10,0[sp]
          movea 100,r0,r10
          st.w r11,4[sp]
.L7:      ld.w zdaoff(_zda_frame_count)[r0],r11
          add -1,r10
          add 1,r11
          st.w r11,zdaoff(_zda_frame_count)[r0]
          cmp 0,r10
          bne .L7
          ld.w 0[sp],r10
          ld.w 4[sp],r11
          add 8,sp
          ld.w 0[sp],r1
          add 4,sp
          reti

************************************

volatile int sda_frame_count = 0;

__attribute__ ((noinline)) void increment_sda_frame_count (void)
{
  sda_frame_count++;
}

__attribute__ ((interrupt_handler)) void my_irq2 (void)
{
  for (int i = 0; i < 100; i++)
    increment_sda_frame_count();
}

_increment_sda_frame_count:
          ld.w sdaoff(_sda_frame_count)[gp],r10
          add 1,r10
          st.w r10,sdaoff(_sda_frame_count)[gp]
          jmp [r31]

_my_irq2: add -4,sp
          st.w r1,0[sp]
          mov sp,r1
          addi -72,sp,sp
          st.w r29,-12[r1]
          st.w fp,-4[r1]
          movea 100,r0,r29
          mov r1,fp
          st.w r6,-72[r1]
          st.w r7,-68[r1]
          st.w r8,-64[r1]
          st.w r9,-60[r1]
          st.w r10,-56[r1]
          st.w r11,-52[r1]
          st.w r12,-48[r1]
          st.w r13,-44[r1]
          st.w r14,-40[r1]
          st.w r15,-36[r1]
          st.w r16,-32[r1]
          st.w r17,-28[r1]
          st.w r18,-24[r1]
          st.w r19,-20[r1]
          st.w r30,-16[r1]
          st.w lp,-8[r1]
.L3:      add -1,r29
          jal _increment_sda_frame_count
          cmp 0,r29
          bne .L3
          ld.w -4[fp],r1
          ld.w -72[fp],r6
          ld.w -68[fp],r7
          ld.w -64[fp],r8
          ld.w -60[fp],r9
          ld.w -56[fp],r10
          ld.w -52[fp],r11
          ld.w -48[fp],r12
          ld.w -44[fp],r13
          ld.w -40[fp],r14
          ld.w -36[fp],r15
          ld.w -32[fp],r16
          ld.w -28[fp],r17
          ld.w -24[fp],r18
          ld.w -20[fp],r19
          ld.w -16[fp],r30
          ld.w -12[fp],r29
          ld.w -8[fp],lp
          mov fp,sp
          mov r1,fp
          ld.w 0[sp],r1
          add 4,sp
          reti

************************************

NightWolve

#222
Quote from: OldRover on 03/22/2016, 04:38 PMTo continue on the ADPCM kick, I figure I should add in some more details.
...
So... getting the ADPCM data into your source code... well, this is where having some coding knowledge worked out well for me. I coded a simple utility called any2arr which takes any file you give it and creates a header with the data of that file as a u16 array.

frozenutopia.com/pcfx/any2arr.7z

Making ADPCM files is also easy... just snag a copy of sox. To make the samples for Asteroid Challenge FX, I used sox like so:

sox -r 16000 boom.wav boom.vox
The -r 16000 tells it to use 16kHz, the .wav is obviously my source audio, and the .vox is the output file. sox knows to make an ADPCM file based on the .vox extension. So, just convert your file to a .vox with sox, run the .vox file through any2arr, and you've got your ADPCM data, ready to include into your program.

EDIT: Forgot to mention... since we're using words here, take the filesize of your .vox file and divide it in half to get the length. Take that divided value and add it to your starting address to get the ending address that you need. I think I did mention this briefly in the first post about this but it bears mentioning again, because reasons.
I gained some experience when I switched to SOX (http://sox.sourceforge.net/) for the Ys IV dub work, so wanted to contribute some more to your tangent should it be useful.

We found that extracting ADPCM and converting to wave sometimes introduces a crazy DC offset to the wave which looks like this when you open it with Audacity:

IMG

It *should* instead look like this, properly centered:

IMG

To fix it, you could 1) open a wave every time in Audacity, select it all (CTRL+A) and use the Normalize effect's "Remove DC offset" without the Amplitude gain option, or 2) add a switch with SOX to handle it/prevent its introduction right on the spot.

The Ys IV Dub kit which I made available way back demos SOX usage in the universal batch files.

ysutopia.net/downloads/ys4/YS4_DUB_KITv2.zip

Of all the batch files, the YS4_CONVERT_VOX_TO_WAVE.bat has the proper command line to eliminate the DC offset should you notice its appearance in the tracks of whatever game you're extracting.

@ECHO This Ys IV batch file needs 2000/XP/Vista/7++ to work.
@ECHO Won't work on old non-NT platforms: Win98/ME, etc.

FOR %%I IN (*.vox) DO sox.exe -r 16000 -e oki-adpcm "%%I" "%%~nI.wav"

REM Append below to the above commandline to eliminate dcshift
REM highpass 10
REM E.g. : sox.exe -r 16000 -e oki-adpcm "%%I" "%%~nI.wav" highpass 10

It's not used by default, and I recommend only use that highpass switch when you check the waves in Audacity and witness a crazy DC shift as shown. So, the trick is you'd simply append "highpass 10" to the command:

FOR %%I IN (*.vox) DO sox.exe -r 16000 -e oki-adpcm "%%I" "%%~nI.wav" highpass 10
That above assumes you neatly extracted all the Japanese APDCM clips to .VOX files.

For completion, a universal command to convert new English wave files to VOX would look like this (the counterpart batch command):

@ECHO This Ys IV batch file needs 2000/XP/Vista/7++ to work.
@ECHO Won't work on old non-NT platforms: Win98/ME, etc.

FOR %%I IN (*.wav) DO sox.exe -G "%%I" -r 16000 -e oki-adpcm "%%~nI.vox"

Well, that's what I wanted to share.



I also wanted to look into your problem with SCSI issues, but that might get me more involved than I want. I could help given my experience building up TurboRip, but the lack of experience in low level assembly for consoles is the issue.

I'll try though. You were asking about a struct member where status info from the CD drive is returned to determine success/failure. That's the SENSE_DATA structure you want to look at.

struct SRB_ExecSCSICmd                   // Offset
{                                        // HX/DEC
    BYTE        SRB_Cmd;                 // 00/000 ASPI command code = SC_EXEC_SCSI_CMD
    BYTE        SRB_Status;              // 01/001 ASPI command status byte
    BYTE        SRB_HaId;                // 02/002 ASPI host adapter number
    BYTE        SRB_Flags;               // 03/003 ASPI request flags
    DWORD       SRB_Hdr_Rsvd;            // 04/004 Reserved
    BYTE        SRB_Target;              // 08/008 Target's SCSI ID
    BYTE        SRB_Lun;                 // 09/009 Target's LUN number
    WORD        SRB_Rsvd1;               // 0A/010 Reserved for Alignment
    DWORD       SRB_BufLen;              // 0C/012 Data Allocation Length
    LPBYTE      SRB_BufPointer;          // 10/016 Data Buffer Pointer
    BYTE        SRB_SenseLen;            // 14/020 Sense Allocation Length
    BYTE        SRB_CDBLen;              // 15/021 CDB Length
    BYTE        SRB_HaStat;              // 16/022 Host Adapter Status
    BYTE        SRB_TargStat;            // 17/023 Target Status
    VOID        FAR *SRB_PostProc;       // 18/024 Post routine
    BYTE        SRB_Rsvd2[20];           // 1C/028 Reserved, MUST = 0
    BYTE        CDBByte[16];             // 30/048 SCSI CDB
    SENSE_DATA_FMT  SenseArea;           // 50/064 Request Sense buffer
};

I don't know how it's defined in your PCFX situation, can't help you there, but it's normally defined as "SenseArea" after the CDB and here are its members:

typedef struct _SENSE_DATA_FMT {

    BYTE    ErrorCode;          // Error Code (70H or 71H)
    BYTE    SegmentNum;         // Number of current segment descriptor
    BYTE    SenseKey;           // Sense Key(See bit definitions too)
    BYTE    InfoByte0;          // Information MSB
    BYTE    InfoByte1;          // Information MID
    BYTE    InfoByte2;          // Information MID
    BYTE    InfoByte3;          // Information LSB
    BYTE    AddSenLen;          // Additional Sense Length
    BYTE    ComSpecInf0;        // Command Specific Information MSB
    BYTE    ComSpecInf1;        // Command Specific Information MID
    BYTE    ComSpecInf2;        // Command Specific Information MID
    BYTE    ComSpecInf3;        // Command Specific Information LSB
    BYTE    AddSenseCode;       // Additional Sense Code
    BYTE    AddSenQual;         // Additional Sense Code Qualifier
    BYTE    FieldRepUCode;      // Field Replaceable Unit Code
    BYTE    SenKeySpec15;       // Sense Key Specific 15th byte
    BYTE    SenKeySpec16;       // Sense Key Specific 16th byte
    BYTE    SenKeySpec17;       // Sense Key Specific 17th byte
    BYTE    AddSenseBytes;      // Additional Sense Bytes
BYTE    PaddByte;           // Make it an even DWORD-padded 20-byte structure

} SENSE_DATA_FMT;

For getting errors out of this thing, well, my post would get much longer... I'll wait for more input if you really need help and haven't made further progress on this since your last post. I built a function to convert all SCSI error/status codes to readable strings from the MMC/SCSI-3 docs, and I don't want to paste that in here, but perhaps that's something you'd want/could be used.

elmer

#223
Quote from: NightWolve on 04/11/2016, 11:31 PMI gained some experience when I switched to SOX (sox.sourceforge.net/) for the Ys IV dub work, so wanted to contribute some more to your tangent should it be useful.

We found that extracting ADPCM and converting to wave sometimes introduces a crazy DC offset to the wave which looks like this when you open it with Audacity:

IMG
Thanks!  :D

As pointed-out in the other thread, this is exactly the problem that I'm having with the Xanadu tracks.

My creaky old brain has finally figured out what's going on, and I don't believe that we should have the same problem on the PC-FX.

That's because the PC-FX is using a later generation of ADPCM chip that supports sample clipping/saturation.

The very-old OKI MSM5205 that the PCE uses didn't support that, and its math ends up wrapping around and causing nasty audio glitches ... there are warnings about it in the OKI manual where they recommend that you only use 80% of the dynamic range in order to avoid problems (i.e. +/-29191 instead of +/-32767).

AFAIK, that's almost-certainly the problem that you're hearing when you're using Dave's tools to convert your stuff to the PCE.

I expect that his stuff is working 100% correctly.

But SOX is written for newer OKI ADPCM chips which do support sample clipping/saturation (like the PC-FX), and so it gets the decoding/encoding math wrong when you're trying to convert sounds for the OKI MSM5205 in the PCE ... and those errors would look exactly like what we're seeing.

I've just checked the SOX source code ... it definitely shouldn't be used to encode/decode a .VOX file for the PCE.


QuoteI also wanted to look into your problem with SCSI issues, but that might get me more involved than I want. I could help given my experience building up TurboRip, but the lack of experience in low level assembly for consoles is the issue.
You're absolutely right about the SCSI SENSE command ... the problem that we're having on the PC-FX is that the SCSI interface is extremely low-level ... it actually looks a lot like Hudson's "fast" CD routines that I've been disassembling and trying to understand.

We're not getting any data back from the SENSE command at all ... which I believe is a liberis problem that's just because Alex never handled anything other than DATA read commands.

It's just another thing to add to the huge list of things to fix.

Mednafen

PC-FX uses a slight modification of the OKI ADPCM algorithm, so if you were to encode audio as OKI ADPCM, it'll sound noisy and have weird clipping when played back on the PC-FX.  And IIRC, the ADPCM encoder in MPCONV2 is buggy and uses a slightly different encoding algorithm(including a typo'd LUT value) than the PC-FX, so even it will tend to produce noisy ADPCM, particularly on source audio that uses full dynamic range.

Mednafen

Quick and dirty code that proooobably(:p) works right(just pack nybbles low to high, little-endian):

static const int step_sizes[49] =
{
 16, 17, 19, 21, 23, 25, 28, 31, 34, 37, 41, 45, 50,
 55, 60, 66, 73, 80, 88, 97, 107, 118, 130, 143, 157,
 173, 190, 209, 230, 253, 279, 307, 337, 371, 408, 449,
 494, 544, 598, 658, 724, 796, 876, 963, 1060, 1166, 1282, 1411, 1552
};

static const int step_index_deltas[16] =
{
 -1, -1, -1, -1, 2, 4, 6, 8,
 -1, -1, -1, -1, 2, 4, 6, 8
};

typedef struct
{
 int predictor;
 int step_size_index;
 int delta_mask;

 long long error_sum;
} encoder_ctx_t;

void encoder_reset(encoder_ctx_t* ctx)
{
 ctx->predictor = 0;
 ctx->step_size_index = 0;
}

void encoder_init(encoder_ctx_t* ctx, int linear_ip, unsigned char raw_rate)
{
 if(linear_ip)
  ctx->delta_mask = ~((1 << raw_rate) - 1);
 else
  ctx->delta_mask = ~0;

 ctx->error_sum = 0;

 encoder_reset(ctx);
}

unsigned char encoder_encode(encoder_ctx_t* ctx, int16_t samp)
{
 const int32_t delta = (samp >> 1) - ctx->predictor;
 const int32_t abs_delta = abs(delta);
 const int32_t cur_ss = step_sizes[ctx->step_size_index];
 int32_t m;
 uint8_t nyb;

 m = (abs_delta / cur_ss) - 1;

 if(m < 0)
  m = 0;
 if(m > 7)
  m = 7;

 nyb = m | ((delta < 0) ? 8 : 0);

 //
 //
 //
 ctx->predictor += ((step_sizes[ctx->step_size_index] * ((nyb & 7) + 1)) & ctx->delta_mask) * ((nyb & 8) ? -1 : 1);
 if(ctx->predictor < -16384)
  ctx->predictor = -16384;

 if(ctx->predictor > 16383)
  ctx->predictor = 16383;

 ctx->step_size_index += step_index_deltas[nyb];
 if(ctx->step_size_index < 0)
  ctx->step_size_index = 0;

 if(ctx->step_size_index > 48)
  ctx->step_size_index = 48;

 ctx->error_sum += abs(samp - (int32_t)((uint32_t)ctx->predictor << 1));

#if 0
 {
  int16_t tmp = ctx->predictor << 1;
  fwrite(&tmp, 2, 1, stderr);
 }
#endif

 return nyb;
}

elmer

Quote from: Mednafen on 04/12/2016, 06:14 PMQuick and dirty code that proooobably(:p) works right(just pack nybbles low to high, little-endian):
Thanks!  :D

That's going to need a bit of studying ... but I was already taking a look at your code in Mednafen's soundbox.cpp.

Errrmmm ... that's definitely a bit different the standard OKI 12-bit codec implementation.

I'm going to need a little bit of time to delve into the details to understand the implications of the differences.

I guess that the 1st thing that I'm noticing is that it's dealing in 14-bit samples, and that the multiplier for the code bits is definitely different ... (n&7) + 1 instead of the (2*(n&7) + 1) / 2 that I'd expect.

But it is saturating, so I guess that I wasn't wrong on that aspect.

Well ... it's something to look at more-deeply once I've dealt with the PCE codec.

Mednafen

Should probably mention these libraries in case anyone wants to make an all-in-one encoder and isn't already familiar with them:

https://uazu.net/fidlib/
http://www.mega-nerd.com/libsndfile/
http://www.mega-nerd.com/SRC/

elmer

#228
Quote from: elmer on 04/12/2016, 07:53 PMThat's going to need a bit of studying ... but I was already taking a look at your code in Mednafen's soundbox.cpp.

Errrmmm ... that's definitely a bit different the standard OKI 12-bit codec implementation.

I'm going to need a little bit of time to delve into the details to understand the implications of the differences.
OK, I finally understand what's going on.  :)

I checked the PC-FX manual for the HuC6230 sound chip and here's what I found ...

ADPCM Sample Rates:

  31.47KHz / 15.73KHz / 7.87KHz / 3.93KHz

  The 15.73KHz / 7.87KHz / 3.93KHz rates support optional LERP.

ADPCM Sample Range:

  0..4095.875 (12-bit output, 15-bit internal)

ADPCM Initialization:

  Sample(N) = 2048.000
  Step(N)   = 16

  Sample clamps to 0 min or 4095.875 max.

On Decompression:

  SampleDelta(N) = ((Code(N) + 1) * Step(N-1)) / 8

  (Is this integer rounded, or stored as 15-bit???)

On Compression:

  Code(N) = ((8 * SampleDelta(N)) / Step(N-1)) - 1

  (Code(N) = Min 0, Max 7)


Now Mednafen's code makes sense ... thanks!

Yes, that's an interesting twist on the standard OKI algorithm.  :-k

I'm a bit disappointed that the PC-FX is only outputing 12-bit samples ... but at-least it's raised the sample rate to nearly 32KHz.

It looks like an easy algorithm to add into my converter ... I think that the code that Rypheca posted should work nicely!


Quote from: Mednafen on 04/14/2016, 06:17 PMShould probably mention these libraries in case anyone wants to make an all-in-one encoder and isn't already familiar with them:

https://uazu.net/fidlib/
http://www.mega-nerd.com/libsndfile/
http://www.mega-nerd.com/SRC/
Thanks, those are definitely some interesting libraries.  8)

But personally, I'm really, really, really, trying to avoid doing anything too comprehensive, because I don't see the point in attempting to compete with SOX or Audacity for the majority of the audio processing.

I just want to make sure that the final 16KHz 16-bit .wav -> 16KHz ADPCM .voc stage is actually correct for the chips that we're using.

NightWolve

#229
Quote from: OldRover on 03/22/2016, 07:48 PM2, I have no idea how to receive the results from an SCSI command.
Err, that's what I was trying to address, using the ModeSense10 SCSI command is something trickier (and I wouldn't wanna go into that myself here).

But my best guess for status results of a SCSI command would be to look at the 17th byte passed the CDBByte[16] array (the SCSI command packet array). In both ASPI and SPTI programming styles, to communicate with SCSI devices, the SENSE_DATA structure has always been placed right after CDBByte[16]. So the 17th byte would be the ErrorCode, the 19th byte would be the SenseKey, etc.

If ErrorCode is 0x00, the command is pending, the drive still working. If it's 0x01, you got success, and if it's 0x04, you got an error which is when you then need to check 3 variables, SenseKey, AddSenQual, and AddSenseCode to convert the error to a human readable string.

Here's a short sample on that:

switch (bySenseKey) {
// MEDIUM ERROR: Indicates that the command terminated with a non-recovered error condition
// that may have been caused by a flaw in the medium or an error in the recorded data.
// This sense key may also be returned if the device server is unable to distinguish
// between a flaw in the medium and a specific hardware failure (i.e., sense key 4h).
case KEY_MEDIUMERR:
switch (gAddSenseCode) {
case 0x02:
switch (gAddSenQual) {
case 0x00:
return "No seek complete (Unreadable sector: unburned/gap/bad disc or lens)";
}
break;
case 0x06:
switch (gAddSenQual) {
case 0x00:
return "No reference position found";
}
break;
case 0x11:
switch (gAddSenQual) {
case 0x00:
return "Unrecovered read error";
case 0x01:
return "Read retries exhausted";
case 0x05:
return "Layered-Error Correction uncorrectable error";
case 0x06:
return "CIRC unrecovered error";
case 0x0F:
return "Error reading UPC/EAN number";
case 0x10:
return "Error reading ISRC number";
}
break;
}

I should think at least for PC-FX, a 1994 console, that the CD component is a fully compliant SCSI device, so it should behave similarly and that info should be available right after CDBByte[16]. Dunno about the PCE though, David Shadoff described it as a "hacked audio CD player" so it'd be a bit more primitive I gather.

elmer

Quote from: NightWolve on 04/14/2016, 10:34 PM
Quote from: OldRover on 03/22/2016, 07:48 PM2, I have no idea how to receive the results from an SCSI command.
Err, that's what I was trying to address, using the ModeSense10 SCSI command is something trickier (and I wouldn't wanna go into that myself).
Ahhh ... what you're missing is that we're not getting anything sensible back from the SCSI device.  :wink:

The PC-FX SCSI interface is done at a very, very low level ... and the library routines that Alex wrote for reading data from the CD just don't actually handle reading back the results of the "SENSE" command, or any results, really (unless I'm missing something).

The routines are very, very "fragile", they're not like Microsoft's SCSI commands, and they "break" and crash the PC-FX when they're not used 100% as-expected.

That's not surprising (to me), it just means that we need to wrap some error-handling code around them to make them "friendly".

The point (to me) is that Alex created a wonderful set of "groundwork" that can be expanded to build a robust and full-featured library for PC-FX development.

But it's not "friendly" yet ... and The Old Rover is doing a great job in pointing out areas that need to be improved.

NightWolve

Quote from: elmer on 04/14/2016, 11:16 PMThe routines are very, very "fragile", they're not like Microsoft's SCSI commands, and they "break" and crash the PC-FX when they're not used 100% as-expected.
Right, but I think it's a SCSI thing/standard, not a Microsoft or Adaptec thing in what I was pointing out. Somewhere, you have the CDBByte[16] array in the PC-FX code which is the SCSI command packet array, and if you have that, the 17th byte would be the SCSI return ErrorCode, and the 19th byte would be SenseKey, etc. It's a hunch. So, that's where I would look.

He also wanted to know how to use the ModeSense10 command, which is how you detect the drive's speed setting, return the information on its abilities, stuff like that, but I don't wanna get into that one, it's tricky and it along with ModeSelect10 lock up the drive easy when used improperly. ;)

I locked up my drives plenty of times when developing TurboRip. It's sending a bad SCSI command to the drive that did the trick. One of the safeguards though that I learned from looking at Adaptec's code was to loop through all detected CD drives, and set their timeouts to 15 seconds. By default, some manufacturers in the past set insane timeouts to like 15 minutes or so which is why you'd get stuck seemingly for good with a bad command or issue, etc...

elmer

Here's a bit of good news ... a developer on one of the VirtualBoy forums is giving my V810 GCC compiler a test drive.

Here's the better news ... his C code showed-up a huge mistake/bug that I'd made in the compiler!

Hopefully I've fixed that particular bug now, but this all goes to show how important it is to test the compiler with lots of different code so that these sort of problems show up.

It'll be interesting to see if he unearths any more problems.  8-[

OldRover

Nice. Tis good to see that there's still progress on this. :D
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

elmer

Quote from: OldRover on 07/13/2016, 01:29 PMNice. Tis good to see that there's still progress on this. :D
Yep ... slowly, it's mainly a question of priorities, and that the toolchain needs more testing.

I've fixed an 2nd bug, this time in the GNU assembler that was causing it to assert() and crash when it automatically replaced out-of-range short branches with long jumps (nice feature).

I don't know if you found that one in testing, but it does explain some weird stuff that I was seeing while doing the Zeroigar translation. Turned out to be a stupid 1-character cut-n-paste typo!  :oops:

elmer

Quote from: elmer on 07/15/2016, 11:06 AMYep ... slowly, it's mainly a question of priorities, and that the toolchain needs more testing.
Well, the VirtualBoy guy (jorgeche) has got his engine/game-demo running with my patched GCC now, and I guess that means that it's pretty much got as thorough a testing as it's going to for a while.

I'll try and clean up the patches for a "release" sometime in the next few weeks/months (there's no rush, right?).

One thing that was particularly interesting about jorgeche's engine code is that he's basically implemented a lot of C++ object-oriented features in C as macros.

It made me curious enough that I tried building the GCC C++ compiler for the V810/PC-FX, and it actually compiled!

Anyone interested in C++, or rather "Embedded C++" (the saner subset for small machines) programming for the PC-FX?

FYI ... most of my game development work over the years on lots of platforms was done in C++, but basically following the same restrictions as Embedded C++.

Console manufacturers don't allow you to use exceptions, and C++'s horrible and slow RTTI is trivially-replacable with much faster type-safe dynamic casting using functions/macros if you don't use multiple-inheritance. OTOH any kind of dynamic casting would be slow on the PC-FX.

OldRover

Not a fan of C++ myself but there are plenty of others who are so it'd be wise to keep support for it, imo.
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

Krimstah

Serious respect Elmer, enjoyed reading this thread , you too the old rover!

Just a question where do i start if i want to contribute,

My coding abilities are limited but i am willing to learn . Anywhere i should start?

I have two working PC FX's

Also i have read that code can be executed from the memory card slot is this true?

If so can a hacked memory card with an sd slot be used similar to what the gamecube can do?

Looking forward to hearing from you.

elmer

jorgeche managed to trigger another "hidden" bug in the compiler.  #-o

This time it was only with large C functions and when the frame-pointer was enabled in the compiler.

Turns out that GCC was "helpfully" rearranging the order of CPU instructions so that things were put on the stack before the stack was actually adjusted ... causing havoc if an interrupt happened.

That may be a bit too technical for most folks ... but any programmer here that's had to track down a bug like that knows what an absolute swine it is to track down the cause when the CPU just starts randomly jumping into strange locations because the stack has been corrupted! Without Mednafen, it would have been almost impossible to find.

Anyway ... I finally figured out how to convince GCC not to be so stupid, and everything seems good now.

Simple C++ test code seems to work fine with "new" and "delete" using the regular C malloc() and free() from newlib.

Those probably shouldn't be used in practice in a game-engine on the PC-FX, but replacing them with something more sophisticated can wait until work on liberis resumes.

The VirtualBoy guys are wanting to release a new version of their IDE/engine/game with the latest compiler, so I'm cleaning it all up for the first official release of the new compiler patches.

*****************************

Quote from: Krimstah on 07/28/2016, 09:32 PMAlso i have read that code can be executed from the memory card slot is this true?

If so can a hacked memory card with an sd slot be used similar to what the gamecube can do?
I've heard that the PC-FX will load/run code off of the BMP memory card, but I've not tried it, and I have little idea of how you'd sensibly get any code on there in the first place.

I suppose that you'd have to burn a program on a PC-FX CD that would then write the 2nd program onto the BMP itself.

Sounds like a PITA, and I'm not sure what the gain would be ... unless someone either hacked a BMP to add circuitry to communicate to a PC, or someone designed a PC-to-PCFX joypad communication cable.

At the moment, it's just easier to develop in Mednafen and then burn a CD to test stuff on a real console.

It's even easier if you have a PC-FXGA card that already has all the PC communications side built into it.

Unless you're an electronics engineer and want to design a BMP-to-sdcard modification, then it's never going to work that simply.


Quote from: Krimstah on 07/28/2016, 09:32 PMJust a question where do i start if i want to contribute,

My coding abilities are limited but i am willing to learn . Anywhere i should start?
I'm not sure that we're at the point where there's much that anyone can contribute unless they have skills/experience in specific areas.

If someone is an electronics engineer, then it would be nice to have that PC-to-PCFX cable or BMP hack designed.

If someone is an expert in Japanese, then it would be nice to have more of the PC-FX hardware documentation translated.

If someone has a lot of experience at designing/creating assembly libraries for low-level hardware access, then liberis still needs a lot of work.

If someone has a lot of experience at designing/creating graphics tools or modern IDEs, then the new compiler and Mednafen could be wrapped-up in an easy-to-use package for newcomers to start with (like the VirtualBoy guys have done with VBDE).

If someone is a graphic artist, then there will eventually need to be some artwork done for a demos to show off the machine, but it's still too early for that.

If you don't have any of those skills, then I'm not sure what help you can provide at this time that's going to help out.

Helping others get interested in the platform is one way ... for instance, letting people know about the Zeroigar translation so that they can see one of the more-interesting games on the machine, that was pretty impressive in its story-telling.

Learning to program is always good, and you may want to get started with HuC on the PC Engine since the PC-FX uses 2 of the same VDC chips inside it, so anything that you learn should transfer over fairly easily later on.

Nazi NecroPhile

Quote from: elmer on 07/29/2016, 02:28 PMIf you don't have any of those skills, then I'm not sure what help you can provide at this time that's going to help out.
I will cheer from the sidelines and offer free corn if you ever find yourself in Nebraska.
Ultimate Forum Bully/Thief/Saboteur/Clone Warrior! BURN IN HELL NECROPHUCK!!!

OldRover

Knowledge of HuC might be useful for the functionality of liberis that trap15 and I wrote that mimics HuC's standard library, but HuC itself is painfully limited compared to your gcc build. A great deal of pure HuC code (no inline assembly) will port over with minimal changes (which I demonstrated with Asteroid Challenge FX), but it seems a better idea to just write from the ground up if you've written HuC code that has to specifically work around HuC's limitations (which Asteroid Challenge did, and with a vengeance).
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

Krimstah

Thanks Elmer,

I'm afraid I don't meet the selection criteria for what needs doing at the moment but i will continue to follow this thread for any advancements and have a go at HuC,

As for translating Japanese have you tried contacting Nana from

https://www.patreon.com/nana_vs_nana

He is currently in the process of translating many PC98 games to english , he may be of help to you.

Thanks

elmer

Quote from: elmer on 07/19/2016, 11:54 AMWell, the VirtualBoy guy (jorgeche) has got his engine/game-demo running with my patched GCC now, and I guess that means that it's pretty much got as thorough a testing as it's going to for a while.

I'll try and clean up the patches for a "release" sometime in the next few weeks/months (there's no rush, right?).
Wow ... has that much time passed???  :shock:

The VirtualBoy developers finally released the new version of their engine/toolchain/ide last week, including my V810-patched GCC compiler and tools ...

http://www.planetvb.com/modules/news/article.php?storyid=426


I haven't received any bug reports from them in the last 9 months, so I'm going to assume that I can say that the compiler is pretty solid and has passed all of their testing.  8)

OldRover

Quote from: elmer on 03/18/2017, 09:37 PMI haven't received any bug reports from them in the last 9 months, so I'm going to assume that I can say that the compiler is pretty solid and has passed all of their testing.  8)
This is not a bad thing. :D Does that mean that your toolchain is end-user-ready? Or at least close to it? :D
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

Psycho Punch

Quote from: Krimstah on 09/28/2016, 07:49 PMThanks Elmer,

I'm afraid I don't meet the selection criteria for what needs doing at the moment but i will continue to follow this thread for any advancements and have a go at HuC,

As for translating Japanese have you tried contacting Nana from

https://www.patreon.com/nana_vs_nana

He is currently in the process of translating many PC98 games to english , he may be of help to you.

Thanks
Woah that's pretty cool, and I bet that the Screamer PC98 translation came out from there (can't find a list of games). Finally someone is tackling that problem consistently and I'll probablydo my first patreon donation soon :)

I'm also glad he (she?) is also translating adult games without any kind of prejudice whatsoever. I'm not interested in them (and some are frankly repulsive lol) but they are also cultural tokens of 80/90's Japan so they have their importance too. I'd trade those for early PC88 adventure games anyday but beggars can't be choosers and anything translated at all is already excellent.

Yet another Off-Topic post by Punch! :lol:
This Toxic Turbo Turd/Troll & Clone Warrior calls himself "Burning Fight!!" at Neo-Geo.com
For a good time reach out to: aleffrenan94@gmail.com or punchballmariobros@gmail.com
Like DildoKobold, dildos are provided free of charge, no need to bring your own! :lol:
He too ran scripts to steal/clone this forum which blew up the error logs! I deleted THOUSANDS of errors cause of this nutcase!
how_to_spell_ys_sign_origin_ver.webp

elmer

Quote from: OldRover on 03/18/2017, 10:35 PMDoes that mean that your toolchain is end-user-ready? Or at least close to it? :D
I guess that it means that I should finally release the patches, and stop worrying about how ugly the build process is for the toolchain.  :roll:

I checked-in the changes to Alex's liberis and pcfxtools into github a few months ago.

But as far as the end-user goes, liberis is still fundamentally broken in a number of places, and we still don't have access to the CD.

So, I'd say that it's ready for folks to start working on fixing liberis ... but I'm going to be busy for a while yet, so I won't have time to help until later on this year.

Does anyone actually want to start working on fixing liberis?  :-k

OldRover

What needs to be fixed on the CD part?
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

elmer

Quote from: OldRover on 03/19/2017, 06:46 PMWhat needs to be fixed on the CD part?
I don't know, I haven't looked at it. The bug-report was from you ...


Quote from: OldRover on 03/22/2016, 09:28 PMUsing eris_low_scsi_status() freezes the system. Every. Single. Time. No matter what's going on... even if nothing's going on.

Sending another SCSI command freezes the system... even if I've also used eris_low_scsi_abort(). eris_low_scsi_abort() itself does not lock up the system, but using eris_low_scsi_command() a second time does.