Print Page - Faster fade out code?

Title: Faster fade out code?
Post by: DildoKKKobold on 09/20/2015, 11:30 AM

Here is my fade out code:


fade_out()
{
   int i, clr;
   char j;
   for (j = 0; j <8; j++)
   {
      for (i=0; i < 64; i++)
      {
         clr = get_color(i);
         if ((clr&7) > 0) clr = clr - 1;
         if ((clr&56) > 0) clr = clr - 8;
         if ((clr&448) > 0) clr = clr - 64;
         set_color(i, clr);
       }
       vsync();
       for (i=64; i < 128; i++)
      {
         clr = get_color(i);
         if ((clr&7) > 0) clr = clr - 1;
         if ((clr&56) > 0) clr = clr - 8;
         if ((clr&448) > 0) clr = clr - 64;
         set_color(i, clr);
       }
       vsync();
       for (i=128; i < 192; i++)
      {
         clr = get_color(i);
         if ((clr&7) > 0) clr = clr - 1;
         if ((clr&56) > 0) clr = clr - 8;
         if ((clr&448) > 0) clr = clr - 64;
         set_color(i, clr);
       }
       vsync();
       for (i=192; i < 256; i++)
      {
         clr = get_color(i);
         if ((clr&7) > 0) clr = clr - 1;
         if ((clr&56) > 0) clr = clr - 8;
         if ((clr&448) > 0) clr = clr - 64;
         set_color(i, clr);
       }
       vsync();
       for (i=256; i < 320; i++)
      {
         clr = get_color(i);
         if ((clr&7) > 0) clr = clr - 1;
         if ((clr&56) > 0) clr = clr - 8;
         if ((clr&448) > 0) clr = clr - 64;
         set_color(i, clr);
       }
       vsync();
       for (i=320; i < 384; i++)
      {
         clr = get_color(i);
         if ((clr&7) > 0) clr = clr - 1;
         if ((clr&56) > 0) clr = clr - 8;
         if ((clr&448) > 0) clr = clr - 64;
         set_color(i, clr);
       }
       vsync();
       for (i=384; i < 448; i++)
      {
         clr = get_color(i);
         if ((clr&7) > 0) clr = clr - 1;
         if ((clr&56) > 0) clr = clr - 8;
         if ((clr&448) > 0) clr = clr - 64;
         set_color(i, clr);
       }
       vsync();
       for (i=448; i < 512; i++)
      {
         clr = get_color(i);
         if ((clr&7) > 0) clr = clr - 1;
         if ((clr&56) > 0) clr = clr - 8;
         if ((clr&448) > 0) clr = clr - 64;
         set_color(i, clr);
       }
       vsync();
   }
   cls();
   reset_satb();
   satb_update();
   vsync();
}

Unfortunately, even when dividing it up into 64 color blocks, it still causes flicker in real hardware. I'm guessing its just too slow to write in C, but I'm not good enough with assembly. Any help would be appreciated!

Title: Re: Faster fade out code?
Post by: touko on 09/20/2015, 01:05 PM

I think the best way for optimising your routine is doing the fade in a buffer for all palettes and transfer all the buffer after a vsync in asm with a tia bloc transfer .

like that:

/* A 256 bytes buffer is enough for 8 palettes */
int my_buffer[1024];

#asm

stz $402
stz $403

tia _my_buffer , $404 , 1023

#endasm

My fade routine is close to yours, and is very fast,but in ASM .

Title: Re: Faster fade out code?
Post by: OldMan on 09/20/2015, 02:28 PM

int dv;
.
.
.
clr = get_color(i);
dv = 0;

/* low color bits : 0000 0111 */
if (clr & 0x0007 ) dv ++;

/* mid color bits : 0011 1000 */
if (clr & 0x0028 ) dv = dv+ 8; /* iirc, this is faster than += in huc. try both */

/* hi color bits : 1 1100 0000 */
if (clr & 0x01c0 ) dv = dv + 64;

clr = clr - dv;
set_color(i, clr);
.
.
.
...............................................................................
that's off the top of my head, so double check the hex values :) And everything else....

Assuming it works, the next step is to move that to asm; iirc, an int parameter will come in in the A and X registers, so color won't have to be loaded; the rest is a pretty straightfoward conversion.
( look up the asm code in the listing to see how it's done. That's how I learned 650x asm :)

For the low sets of bits, you can just check the low byte; for the high set, shift the color right (?) for the check (then you can and with 0xe0 ) But that's a bit of asm optimization you may not need (or want to do).
.................................................................................
Just out of curiosity, why are you ckecking for >0 anyway? clr can't be negative, an & won't change that. And != 0 is true in C, so you don't have to compare the result to anything....

Title: Re: Faster fade out code?
Post by: TurboXray on 09/20/2015, 04:23 PM

What touko said. Wait for vblank, read all colors (sprite and BG, if desired) into a buffer in ram. Do your alterations to those values in ram, then wait for vync and upload the changes during vblank. Rinse repeat. HuC is still going to be slow, but if you do the operations during normal/whole frames but only update the changes during vblank - it should do the job.

Fading out is easy, fading in is a bit more complex.

On a side note; I had an idea for a RGB to YUV conversion table, for special fading type of effect. YUV is nicer to work with IMO and gives you a wider range of features. Of course, going from YUV to RGB will need a different set of tables and take a bit longer.

Title: Re: Faster fade out code?
Post by: spenoza on 09/20/2015, 06:05 PM

Quote from: TurboXray on 09/20/2015, 04:23 PMFading out is easy, fading in is a bit more complex.

When you fade in you know what your colors are going to be, so couldn't you pre-calculate/prepare the fade-in and then just cycle through the known palettes?

Title: Re: Faster fade out code?
Post by: TurboXray on 09/20/2015, 06:37 PM

Yeah, you need to know the values to reach rather than just checking for overflow or floor (fixed value). It requires a little more logic for testing for overflow for the addition process. There are quite a few ways to do fade in approach, but they are all more complex with more operations than a simple fade out.

Fixed point deltas are one way. That requires a very large buffer (entries * RGB * delta or 512*3*2) in ram and setup the initial distance calculation for each color (which can take some time). It's wasteful on ram, but every R/G/B is faded in equally.

Rate of change delta is another way. It requires a smaller size buffer, but doesn't fade in equally. You basically take the R/G/B color of the destination and copy these as the deltas to subtract from the destination palette block (your first initial setup). On every call, you subtract 1 from each delta, then subtract from the RGB block and write to the buffer to be uploaded. The R/G/B destination values are never altered, just used as a value to subtract from. When the delta reaches 0, because on each call you subtract by one, it means that particular R/G/B is at full value. As you can see, lower values will reach their max value faster than large values. It looks decent, and it's fast.

Title: Re: Faster fade out code?
Post by: touko on 09/21/2015, 08:02 AM

@DarkKobold:And may be you don't need to fade all palettes .

Title: Re: Faster fade out code?
Post by: Gredler on 09/21/2015, 10:08 AM

Quote from: touko on 09/21/2015, 08:02 AM@DarkKobold:And may be you don't need to fade all palettes .

This is what I was thinking, only apply the fade to specific sprites that cant be animated onto the screen (background and ui elements).

Title: Re: Faster fade out code?
Post by: touko on 09/21/2015, 10:22 AM

Yes is useless to do it on the 32 palettes 99% of the time .
In practice it's 4->6 palettes max .

Title: Re: Faster fade out code?
Post by: DildoKKKobold on 09/21/2015, 10:43 AM

Quote from: TurboXray on 09/20/2015, 06:37 PMYeah, you need to know the values to reach rather than just checking for overflow or floor (fixed value). It requires a little more logic for testing for overflow for the addition process. There are quite a few ways to do fade in approach, but they are all more complex with more operations than a simple fade out.

Fixed point deltas are one way. That requires a very large buffer (entries * RGB * delta or 512*3*2) in ram and setup the initial distance calculation for each color (which can take some time). It's wasteful on ram, but every R/G/B is faded in equally.

Rate of change delta is another way. It requires a smaller size buffer, but doesn't fade in equally. You basically take the R/G/B color of the destination and copy these as the deltas to subtract from the destination palette block (your first initial setup). On every call, you subtract 1 from each delta, then subtract from the RGB block and write to the buffer to be uploaded. The R/G/B destination values are never altered, just used as a value to subtract from. When the delta reaches 0, because on each call you subtract by one, it means that particular R/G/B is at full value. As you can see, lower values will reach their max value faster than large values. It looks decent, and it's fast.

This is way too complex, you just need to subtract each color from 511, and do it backwards. Excuse the psuedo code.

int palette_holder[512];

for c = 0 to 511
palette_holder[c] = 511 - get_color(i);

for i = 1 to 7
for j = 0 to 511
clr = palette_holder[j];
dv = 0;

/* low color bits : 0000 0111 */
if (clr & 0x0007 ) dv ++;

/* mid color bits : 0011 1000 */
if (clr & 0x0028 ) dv = dv+ 8; /* iirc, this is faster than += in huc. try both */

/* hi color bits : 1 1100 0000 */
if (clr & 0x01c0 ) dv = dv + 64;
clr2 = get_color(j);
clr = clr - (dv XOR 511);
clr2 = clr2+ dv;
set_color(i, clr2);
palette_holder[j] = clr;

Not sure if HuC has XOR though.

Title: Re: Faster fade out code?
Post by: touko on 09/21/2015, 02:55 PM

If i remember correctly XOR is A^B ..

Title: Re: Faster fade out code?
Post by: DildoKKKobold on 10/01/2016, 01:11 AM

Quote from: touko on 09/20/2015, 01:05 PMI think the best way for optimising your routine is doing the fade in a buffer for all palettes and transfer all the buffer after a vsync in asm with a tia bloc transfer .

like that:

/* A 256 bytes buffer is enough for 8 palettes */
int my_buffer[1024];

#asm

stz $402
stz $403

tia _my_buffer , $404 , 1023

#endasm

My fade routine is close to yours, and is very fast,but in ASM .

So, rather than create a buffer for all the palettes at once, I'd like to split it up into chunks for 64 "colors" at a time. The problem is, I have no idea what the stz $402, stz $403 lines do. Set addresses to zero... but why?

Title: Re: Faster fade out code?
Post by: cabbage on 10/01/2016, 04:00 AM

Code Select

#include "huc.h"
int my_buffer[1024];
main(){
#asm
   stz $402
   stz $403
   tia _my_buffer, $404, 511
#endasm
}

compiles successfully...
a variable declared in c, e.g. my_buffer, is accessed with a leading underscore in asm, as in _my_buffer

Title: Re: Faster fade out code?
Post by: touko on 10/01/2016, 08:05 AM

QuoteThis fails to compile:

19000 2D:B710 tia _my_buffer, $404, 511
Undefined symbol in operand field!

int my_buffer[1024], must be a global variable (before the main procedure) .
Tested and works fine for me, but beware the #asm and #endasm must'n be to the left hedge,you must add 1 or 2 space caracters before .

EDIT:cabbage has answered .

@DarkKobold:You can also copy the content of your VCE palettes in the buffer the same way

tai $404 , _my_buffer , 511

More faster than in C .

Title: Re: Faster fade out code?
Post by: DildoKKKobold on 10/01/2016, 12:25 PM

Quote from: touko on 10/01/2016, 08:05 AM
QuoteThis fails to compile:

19000 2D:B710 tia _my_buffer, $404, 511
Undefined symbol in operand field!
int my_buffer[1024], must be a global variable (before the main procedure) .
Tested and works fine for me, but beware the #asm and #endasm must'n be to the left hedge,you must add 1 or 2 space caracters before .

EDIT:cabbage has answered .

@DarkKobold:You can also copy the content of your VCE palettes in the buffer the same way

tai $404 , _my_buffer , 511

More faster than in C .

Oh, I didn't realize it needed to be a global. That is 2k of ram dedicated just to fadeout then...

Title: Re: Faster fade out code?
Post by: touko on 10/01/2016, 12:39 PM

QuoteOh, I didn't realize it needed to be a global. That is 2k of ram dedicated just to fadeout then...

Why 2k ??
You really need 1k,but in fact less, because all the palettes are rarelly/never used at the same time .
i think a 6 palettes buffer is enough, it's 192 bytes,and this should speed up a lot your fade routine(because no need to copy/fade unused palettes) .

Title: Re: Faster fade out code?
Post by: TurboXray on 10/03/2016, 11:19 AM

Some PCE games just have pre-compiled fades for palettes. Sure, it takes up rom (not a whole lot though) but it doesn't need any ram.

Title: Re: Faster fade out code?
Post by: Arkhan Asylum on 10/04/2016, 01:34 AM

The trick is to write one fade routine. It's not a fade in or a fade out. It's a "fade to this palette" routine.

It will work for fading to black, or for fading to a color. Just make sure you pay attention to the way PCE lays out a palette, and inc/dec accordingly. It's a little easier if you work with octal number formatting. It seems more intuitive that way.

I do this and use a work palette (the current screen palette), and a target palette, and simply fade to it, and copy when ready.

Title: Re: Faster fade out code?
Post by: touko on 10/04/2016, 06:01 AM

QuoteThe trick is to write one fade routine. It's not a fade in or a fade out. It's a "fade to this palette" routine.

Yes i did this too .
You can fade from/to a palette,and for exemple a fade out is concidered as fade to a black palette.

Title: Re: Faster fade out code?
Post by: TurboXray on 10/04/2016, 02:29 PM

Yeah, a fade "to". Not that anyone probably cares, but there are weighted and unweighted fade methods (as in R/G/B elements as they reach a target value). Obviously weighted takes more time, but looks better for fades to black or from black IMO.

And if you're ambitious enough, you can even do real time RGB->YUV, make changes, and then back again for additional effects. Though that's probably more fancy than practical - "look what I can do" type effect.

Title: Re: Faster fade out code?
Post by: touko on 10/04/2016, 03:04 PM

QuoteObviously weighted takes more time, but looks better for fades to black or from black IMO.

I fade each RGB values,and not a single inc/dec for the whole color .

Title: Re: Faster fade out code?
Post by: TurboXray on 10/04/2016, 03:09 PM

Quote from: touko on 10/04/2016, 03:04 PM
QuoteObviously weighted takes more time, but looks better for fades to black or from black IMO.
I fade each RGB values,and not a single inc/dec for the whole color .

Yeah, but do you use a fixed point value (8bit:8bit) for each R/G/B element? That's what I mean by weighted. Otherwise, you'll get to the target value of each R/G/B faster on one element vs the others. I.e. it's not uniform across all elements within a color slot's R/G/B system.

Title: Re: Faster fade out code?
Post by: touko on 10/04/2016, 03:23 PM

QuoteYeah, but do you use a fixed point value (8bit:8bit) for each R/G/B element? That's what I mean by weighted.

No,no mine it's more simple, better than a simple inc/dec,but definitely not as advanced as that you call weighted .

Quoteit's not uniform across all elements within a color slot's R/G/B system.

Yes , but very acceptable .
For best result with a RGB fade, you must normalise all the values, before fading .

Title: Re: Faster fade out code?
Post by: Arkhan Asylum on 10/04/2016, 08:21 PM

Weighted "looks" better, but in practice is never noticeable to anyone playing a game.

I put in a weighted one into Inferno and ended up not using it.

Title: Re: Faster fade out code?
Post by: elmer on 10/04/2016, 08:37 PM

Quote from: Psycho Arkhan on 10/04/2016, 08:21 PMWeighted "looks" better, but in practice is never noticeable to anyone playing a game.

That's been my experience, too.

If the fade is running fast-enough that people aren't given the time to stare at and analyze each individual step, then you can actually get away with a really-simple algorithm and find that it still looks good on the screen (as in, I've never, ever, heard anyone complain about it).

Title: Re: Faster fade out code?
Post by: Arkhan Asylum on 10/05/2016, 02:41 AM

Atlantean's fades aren't weighted IIRC. They just do your basic "fade each color til it gets where it needs to be".

People claim it "makes things go grey", but, whatever.

Title: Re: Faster fade out code?
Post by: elmer on 10/05/2016, 11:39 AM

Quote from: Psycho Arkhan on 10/05/2016, 02:41 AMPeople claim it "makes things go grey", but, whatever.

Yep, that's the simple way of doing it that I got away with for many years (until hardware fades took over). :wink:

If I ever wanted to be "clever" and do it properly, then I'd probably use the classic integer version of the Bresenham line-draw algorithm to decide when to change each RGB component.

Title: Re: Faster fade out code?
Post by: TurboXray on 10/06/2016, 03:54 PM

That's basically what I was referring to with the fixed point values (a tiny LUT for a bresenham line algo).

Sometimes you just write stuff.. just because you can. I think most fade to routines don't need to be done in realtime gameplay (i.e. end of stage, or transition into another area, etc - where the action is paused). At least from what I've needed it for, but I can see some situations were it might be needed in game during normal gameplay. I always just used recalculated subpalettes for that, but I guess if you setup a background process system or "thread", you could call such a routine a head of time and have it do the fade steps as needed (kind of like a background timed-sliced decompression process).

Arkhan, did you use the fade routines while the gameplay was in action, or did you pause the action and use the fade as a transition? Just curious.

Title: Re: Faster fade out code?
Post by: Arkhan Asylum on 10/06/2016, 04:07 PM

Atlantean fades while SOME stuff is happening on screen. I am pretty sure the fades will work mid-game, though.

Most of the fading was for transitioning, though. No point in fading mid-game in a shootything

Reflectron had a bunch of palette cycling during the entire game.

Title: Re: Faster fade out code?
Post by: DildoKKKobold on 10/06/2016, 04:46 PM

Quote from: touko on 09/20/2015, 01:05 PMI think the best way for optimising your routine is doing the fade in a buffer for all palettes and transfer all the buffer after a vsync in asm with a tia bloc transfer .

like that:

/* A 256 bytes buffer is enough for 8 palettes */
int my_buffer[1024];

#asm

stz $402
stz $403

tia _my_buffer , $404 , 1023

#endasm

My fade routine is close to yours, and is very fast,but in ASM .

So, I'm really confused.

First, why the store zero in $402 and $403?

Second, if I want to split this into chunks of 64, how do I tell it $404 plus 64 etc?

This is why I hate assembly. I know, I'm definitely the idiot of the thread.

Title: Re: Faster fade out code?
Post by: Gredler on 10/06/2016, 04:56 PM

I feel the need to post something to make DK feel like less of an idiot.

Can't you just store all the palettes to an array, then create a loop that lerps from black to the array color or the array to black?

#Python
#C#
#Scripting

Title: Re: Faster fade out code?
Post by: OldMan on 10/06/2016, 05:33 PM

QuoteFirst, why the store zero in $402 and $403?

Select palette slot 0

QuoteSecond, if I want to split this into chunks of 64, how do I tell it $404 plus 64 etc

You don't use $404+anything. $404/$405 is the color.
You set $402/$403 to slot number you want. (0, 64, 128. etc)

In general you set the starting slot # in $402/$403, and send the colors to $404/$405.
The slot increments after every write to high byte (I think), so you can just loop through the
palette and pump them out. If you want to break it into 64 color chunks, write the first 64 colors,
then the next 64 colors, etc.

You can set the slot number to start an any slot, evn in the middle of a palette, afaik.

QuoteCan't you just store all the palettes to an array, then create a loop that lerps from black to the array color or the array to black?

Yes. But that won't update the palettes until you send it to the vce.
If you really want to, you can read the color from the vce, fade it, and write it back, without any intermediate array. It's just quicker to use an array. (Less overhead)

Title: Re: Faster fade out code?
Post by: TurboXray on 10/06/2016, 11:23 PM

Quote from: DildoKKKobold on 10/06/2016, 04:46 PM
Quote from: touko on 09/20/2015, 01:05 PMI think the best way for optimising your routine is doing the fade in a buffer for all palettes and transfer all the buffer after a vsync in asm with a tia bloc transfer .

like that:

/* A 256 bytes buffer is enough for 8 palettes */
int my_buffer[1024];

#asm

stz $402
stz $403

tia _my_buffer , $404 , 1023

#endasm

My fade routine is close to yours, and is very fast,but in ASM .
So, I'm really confused.

First, why the store zero in $402 and $403?

Second, if I want to split this into chunks of 64, how do I tell it $404 plus 64 etc?

This is why I hate assembly. I know, I'm definitely the idiot of the thread.

It's not assembly, but direct hardware interfacing. You could access the ports directly in C.

$402/$403 make up a single port to the VCE color # or slot. Because there are 512 color slots in the VCE, it's larger than a 8bit value for a single port to handle. So the 16bit port is spread across two 8bit ports; 0x402 is the LSByte and 0x403 is the MSByte.

VCE and VDC ports tend to have what is known as "latch" system. This means when you write the upper address of a 16bit port, it triggers the transferring the contents of the two ports to the internal place it needs to go (be it VCE or VDC).

In the case of the VCE, 0x402 is the LSB, and 0x403 is the MSB and latch. Once 0x403 is written to, the contents are transferred to whatever reg internal to the VCE. But not until that latch port is accessed - so the order of port pair access if very important.

On the VCE, here are some ports:
0x402/0x403 = is the color slot you want to update.
0x404/0x405 = the color value to update on the corresponding color slot.

One other thing to note: While you can constantly tell the VCE what specific color you want to update, it does have an "auto increment" internal mechanism that automatically advances to the next color slot after a successful write/update (i.e. latch port). Same with reading color data from the VCE.

Title: Re: Faster fade out code?
Post by: DildoKKKobold on 11/02/2016, 12:55 AM

int my_buffer[64];

fade_out()
{

   int i, clr;
   char j,k;
   for (j = 0; j <8; j++)
   {

      #asm

          stz $402
          stz $403
       #endasm
      for (i=0;i<512;i+=64)
      {
         for (k=0; k<64; k++)
         {
            clr = get_color(i+k);
            if (clr&7) clr = clr - 1;
            if (clr&56) clr = clr - 8;
            if (clr&448) clr = clr - 64;
            my_buffer[k] = clr;
         }
          vsync();
       #asm
          tia _my_buffer , $404 , 64
       #endasm
       }
   }
   cls();
   reset_satb();
   satb_update();
   vsync();
}

So, here's my attempt to split it into 64 color chunks. It... fails. Miserably. I'd assume its latching fine, and should store the current state of the latches, through each vsync.

Title: Re: Faster fade out code?
Post by: OldMan on 11/02/2016, 01:17 AM

Quoteint my_buffer[64];
.
.
.
tia _my_buffer , $404 , 64

Ints are 2 bytes. You're only transferring 32 'colors'. Try using 128 as length

Also, be careful mixing ints and chars. HuC doesn't promote chars to ints.

Title: Re: Faster fade out code?
Post by: touko on 11/02/2016, 05:11 AM

And be careful, you have only one pointer in the VCE's hardware color table .
you must select color entry for writing AND for reading(except if you want to read a pallet and write the next one) like that :

#asm
; // May be not needed here because get_color() select the pallet entry every time.
stz $402
stz $403
#endasm
for (i=0;i<512;i+=64)
{
for (k=0; k<64; k++)
{
clr = get_color(i+k);
if (clr&7) clr = clr - 1;
if (clr&56) clr = clr - 8;
if (clr&448) clr = clr - 64;
my_buffer[k] = clr;
}
vsync();
#asm
stz $402
stz $403
tia _my_buffer , $404 , 64
#endasm

Else you read in pallet 0, and write in pallet 2,as you did .

Title: Re: Faster fade out code?
Post by: DildoKKKobold on 11/02/2016, 08:20 PM

for posterity, here is the final function. I will be testing it on hardware shortly. Whoever sees this in the near or far future, feel free to use it.

fade_out()
{

   int i, clr;
   char j,k;
   for (j = 0; j <8; j++)
   {
      for (i=0;i<512;i+=64)
      {
         for (k=0; k<64; k++)
         {
            clr = get_color(i+k);
            if (clr&7) clr = clr - 1;
            if (clr&56) clr = clr - 8;
            if (clr&448) clr = clr - 64;
            my_buffer[k] = clr;
         }
          vsync();
          clr = get_color(i-1);
       #asm
          tia _my_buffer , $404 , 128
       #endasm
       }
   }
   cls();
   reset_satb();
   satb_update();
   vsync();
}

EDIT: Oh, and a huge thanks to everyone for working with me. I'm glad that a simple fade didn't kill catastrophy.

Title: Re: Faster fade out code?
Post by: Gredler on 11/02/2016, 08:34 PM

Quote from: DildoKKKobold on 11/02/2016, 08:20 PMfor posterity, here is the final function. I will be testing it on hardware shortly. Whoever sees this in the near or far future, feel free to use it.

fade_out()
{

   int i, clr;
   char j,k;
   for (j = 0; j <8; j++)
   {
      for (i=0;i<512;i+=64)
      {
         for (k=0; k<64; k++)
         {
            clr = get_color(i+k);
            if (clr&7) clr = clr - 1;
            if (clr&56) clr = clr - 8;
            if (clr&448) clr = clr - 64;
            my_buffer[k] = clr;
         }
          vsync();
          clr = get_color(i-1);
       #asm
          tia _my_buffer , $404 , 128
       #endasm
       }
   }
   cls();
   reset_satb();
   satb_update();
   vsync();
}

EDIT: Oh, and a huge thanks to everyone for working with me. I'm glad that a simple fade didn't kill catastrophy.

That would have been catastrophic

Title: Re: Faster fade out code?
Post by: touko on 11/03/2016, 04:18 AM

if you want your routine more faster you can translate this in assembly

Quotefor (j = 0; j <8; j++)
{
for (i=0;i<512;i+=64)
{
for (k=0; k<64; k++)
{
clr = get_color(i+k);
if (clr&7) clr = clr - 1;
if (clr&56) clr = clr - 8;
if (clr&448) clr = clr - 64;
my_buffer[k] = clr;
}

This loop is slow as hell .

Title: Re: Faster fade out code?
Post by: TurboXray on 11/03/2016, 02:07 PM

Quote from: DildoKKKobold on 11/02/2016, 08:20 PMfor posterity, here is the final function. I will be testing it on hardware shortly. Whoever sees this in the near or far future, feel free to use it.

fade_out()
{

   int i, clr;
   char j,k;
   for (j = 0; j <8; j++)
   {
      for (i=0;i<512;i+=64)
      {
         for (k=0; k<64; k++)
         {
            clr = get_color(i+k);
            if (clr&7) clr = clr - 1;
            if (clr&56) clr = clr - 8;
            if (clr&448) clr = clr - 64;
            my_buffer[k] = clr;
         }
          vsync();
          clr = get_color(i-1);
       #asm
          tia _my_buffer , $404 , 128
       #endasm
       }
   }
   cls();
   reset_satb();
   satb_update();
   vsync();
}

EDIT: Oh, and a huge thanks to everyone for working with me. I'm glad that a simple fade didn't kill catastrophy.

It's going to cause snow on the real system if this takes too long (goes into active display), which I think it will.

Use the Txx instruction and read the color data directly into your my_buffer during vblank. Then do your modifications on the array - when each iteration is has completed (one iteration of j), then wait for vsync and Txx the buffer back to color ram port. That should prevent any snow on screen.

If you slightly modify your code like this:

Quotefade_out()
{

int i, clr;
char j,k;

vsync();

for (j = 0; j <8; j++)
{
for (i=0;i<512;i+=64)
{

temp = i
#asm
lda temp
sta $402
lda temp+1
sta $403
tai $404, _my_buffer, 128
#endasm

for (k=0; k<64; k++)
{
clr = my_buffer[k];
if (clr&7) clr = clr - 1;
if (clr&56) clr = clr - 8;
if (clr&448) clr = clr - 64;
my_buffer[k] = clr;
}
vsync();
#asm
lda temp
sta $402
lda temp+1
sta $403
tia _my_buffer , $404 , 128
#endasm
}
}
cls();
reset_satb();
satb_update();
vsync();
}

It should get everything done during vblank and avoid snow on screen on the real system. That also includes reading in the next 64 colors as well (both transfers together only take 1.5k cpy cycles). Note: You'll need a global variable "temp" or some such name, so that you can access the function's instance variable in asm. Also, I think the read port is $404 and not $406. If not, then change it to $406.

Title: Re: Faster fade out code?
Post by: DildoKKKobold on 11/03/2016, 03:19 PM

Quote from: TurboXray on 11/03/2016, 02:07 PMIt's going to cause snow on the real system if this takes too long (goes into active display), which I think it will.

Use the Txx instruction and read the color data directly into your my_buffer during vblank. Then do your modifications on the array - when each iteration is has completed (one iteration of j), then wait for vsync and Txx the buffer back to color ram port. That should prevent any snow on screen.

So, that is why I put the vsync right before the transfer - even if the code takes two frames, the transfer will still only occur right after a vblank. I'm actually concerned that if I make the code faster, i'll have to put in delays, as the fade is already pretty fast now. Its a minimum of 64 frames already.

Granted, I need to try my new code in hardware. I'll also try yours.

As a question - I keep needing to add globals to do ASM. Could a future version of HuC do locals?

Title: Re: Faster fade out code?
Post by: OldMan on 11/03/2016, 03:44 PM

QuoteI'm actually concerned that if I make the code faster, i'll have to put in delays, as the fade is already pretty fast now.

Suck up the need to add a delay. Use a down-counter so its tuneable. It's not too bad if you use a dedicated fade routine. You can use the wait time to do other things, like loading new gfx....
Just my opinion.

QuoteAs a question - I keep needing to add globals to do ASM. Could a future version of HuC do locals?

You can do that already. It's just a pain, as they have to be accessed via the Huc Stack pointer, which is slow...

Title: Re: Faster fade out code?
Post by: TurboXray on 11/03/2016, 04:17 PM

Quote from: DildoKKKobold on 11/03/2016, 03:19 PM
Quote from: TurboXray on 11/03/2016, 02:07 PMIt's going to cause snow on the real system if this takes too long (goes into active display), which I think it will.

Use the Txx instruction and read the color data directly into your my_buffer during vblank. Then do your modifications on the array - when each iteration is has completed (one iteration of j), then wait for vsync and Txx the buffer back to color ram port. That should prevent any snow on screen.
So, that is why I put the vsync right before the transfer - even if the code takes two frames, the transfer will still only occur right after a vblank. I'm actually concerned that if I make the code faster, i'll have to put in delays, as the fade is already pretty fast now. Its a minimum of 64 frames already.

Hmm.. there's an issue you might not be aware of; every time you read or write to any VCE regs (that includes $400 and $401), and you'll cause the VCE not to be able to read from pixel bus that the VDC is constantly outputting to. What happens, is that since it can't read from the pixel bus - it will output the last color (pixel) that it read from the pixel bus. You get horizontal 'stretches' of colors across the screen - i.e. snow. Not just a the borders, but anywhere on a scanline. This actually happens when you turn the display "off", but since its all one color for the screen - you can't see the pixel "stretching". This is different from other color update interfere of other systems, where if update a color while display is active - you see that color update as corruption on screen. The VCE doesn't do this, but reading and writing from any VCE port gives the same stretching behavior regardless (read or write, color update regs or other VCE regs).

Here's an example video where I purposely do it:
http://youtu.be/-xU9uuRzLwo

So any access to the VCE does this, not just reading or writing. If your routine does manage to read in and modify all 64 colors withing vblank, and you update on the following frame - then you'll be fine. And if that's the case, then don't worry about the code changes I made (unless you want more resource during vblank to do something else, but it doesn't look like it. You'd have to make a completely different system/function for that).

QuoteGranted, I need to try my new code in hardware. I'll also try yours.

Test yours first, and if it's good then don't worry about mine. Just keep my code in mind. I.e. the approach I took, as you might want to do a more flexible fade routines in the future.

QuoteAs a question - I keep needing to add globals to do ASM. Could a future version of HuC do locals?

What TheOldMan said. It's a pain, because you have to generate the .s file, look at what index represents that variable, then go back and write an indirect-index load from it. It's not just that the instance variable inside the function is a stack object, but there's no clean way to access it in asm without knowing what the index is on that stack for that specific instance variable. Indeed, it would be nice for HuC to pass this on to asm block. If the assembler had a way to make scope equates, then HuC could generate a function scope equate list for each function (the index into the stack). Globals are just easier to transfer stuff to.

Title: Re: Faster fade out code?
Post by: touko on 11/04/2016, 05:24 AM

QuoteAs a question - I keep needing to add globals to do ASM. Could a future version of HuC do locals?

Actually as declared:
int i, clr;
char j,k;

Are treated as local variables,if you want local in asm, you must use the stack ($100 -> $106 must be safe enough),or you can use the classic push/pop .
But in fact you have a bunch of temporary global variables already reserved(like __temp,<_al,<_bl,etc..) .

Title: Re: Faster fade out code?
Post by: Arkhan Asylum on 11/14/2016, 09:49 PM

Quote from: TheOldMan on 11/03/2016, 03:44 PM
QuoteI'm actually concerned that if I make the code faster, i'll have to put in delays, as the fade is already pretty fast now.
Suck up the need to add a delay. Use a down-counter so its tuneable. It's not too bad if you use a dedicated fade routine. You can use the wait time to do other things, like loading new gfx....
Just my opinion.

QuoteAs a question - I keep needing to add globals to do ASM. Could a future version of HuC do locals?
You can do that already. It's just a pain, as they have to be accessed via the Huc Stack pointer, which is slow...

The entirety of Atlantean is written with global variables.

Just saying.

lol

ASM doesn't have a concept of local, really. you push/pop things to make them "local", simply by saving the state of all of the registers so you can fuck around with them again before popping the stack back to reset everything, but yeah

global = <3

you'll get faster code.

Title: Re: Faster fade out code?
Post by: elmer on 12/01/2016, 08:58 PM

I'm thinking that this sort of stuff is so commonly needed, that it really should be built into the HuC library.

Fading down is easy ... but the fun comes when you're fading back in. :wink:

There you need to know what the desired palette is, and have that stored in memory somewhere.

And you need the buffer where you're going to calculate what the next set of colors is that you're going to send to the VDC.

Has anyone avoid needing *both* of these "target" and "current" buffers needing to be in memory at once?

The classic "cheap" fade-down looks OK, but the same thing run in reverse (increment color component if not at target) has the effect of greying everything out, and then having the stronger colors appear later on.

It's not horrible, and I've done that before, and I believe that Arkhan does it, too.

A more sophisticated fade is nearly 3 times slower ... but that's still fast enough to calculate all 512 colors in only 2/3 of a 60Hz frame, and nobody runs a fade at 60-steps-per-second.

Does anyone have any opinion about including a decent fade in HuC?

Title: Re: Faster fade out code?
Post by: TurboXray on 12/01/2016, 09:42 PM

So.. what are the dynamics at play in this design? Speed and memory size? Is the routine going to be an automatic thing? As in, it only takes a rate argument for fade in/out? If so, does it have complete control of the main code until it's finished? Or is it a lighter process, that only does up to 64 sets of colors per call and is divided into prep and update functions, allowing game code to run at the same time (or at least same frame)? How much memory are you going to require (important for hucard projects)? Is the work buffer user defined and passed along as a pointer (so if can be reused for something else in the project)? Or is it an internal static defined size, that takes away from ram regardless?

I never really liked this trying to make one size fits all thing when designing libs/stuff for HuC. It'd be nice if it was something they directly included into the main source file (different small libs), than trying to attach it exiting library (in startup). Though I think doing that would require restructuring the main lib bank, and having support for bank directive directly in HuC.

Title: Re: Faster fade out code?
Post by: DildoKKKobold on 12/01/2016, 10:14 PM

As an update, of course my code didn't work (for the reasons Bonknuts illustrated). His did. No surprise there.

I don't think this needs to be in the core of HuC. It would serve better as example code, which someone could work to their needs.

Title: Re: Faster fade out code?
Post by: elmer on 12/01/2016, 10:34 PM

Quote from: TurboXray on 12/01/2016, 09:42 PMSo.. what are the dynamics at play in this design? Speed and memory size? Is the routine going to be an automatic thing? As in, it only takes a rate argument for fade in/out? If so, does it have complete control of the main code until it's finished? Or is it a lighter process, that only does up to 64 sets of colors per call and is divided into prep and update functions, allowing game code to run at the same time (or at least same frame)? How much memory are you going to require (important for hucard projects)? Is the work buffer user defined and passed along as a pointer (so if can be reused for something else in the project)? Or is it an internal static defined size, that takes away from ram regardless?

Good questions! :-k

Taking control of the system would be impolite.

The goal would be to provide function calls that the HuC user can use to provide fast alternatives to writing their own code.

For example ...

void __fastcall get_colors( int *pbuffer<__td> );
void __fastcall get_colors( int index<color_reg>, int *pbuffer<__td>, unsigned char count<__tl> );

void __fastcall set_colors( int *pbuffer<__ts> );
void __fastcall set_colors( int index<color_reg>, int *pbuffer<__ts>, unsigned char count<__tl> );

void __fastcall fade_colors( int *psource<__si>, int *pdestination<__di>, unsigned char count<__al>, unsigned char fade<acc> );

Those make up a simple set of functions that do everything that DK wanted in Catastrophy, and ended up writing in either slow C code, or fast inline-assembly.

They do it fast, and they keep things flexible enough that you can use as-much or as-little resources as you need.

The "get" and "set" functions use TAI & TIA instructions for fast processing.

"count" is limited to a maximum of 128 for fast indexing

"fade" is a value 0-7.

I *think* that's enough basic functionality for the end-user to build pretty much whatever they want.

Can you think of a better *practical* design?

QuoteI never really liked this trying to make one size fits all thing when designing libs/stuff for HuC. It'd be nice if it was something they directly included into the main source file (different small libs), than trying to attach it exiting library (in startup). Though I think doing that would require restructuring the main lib bank, and having support for bank directive directly in HuC.

Making the libraries modular would be great ... but it's going to take a significant time-investment from whoever wants to do it.

Since there's no linker phase and dead-code elimination, so from what I'm seeing, HuC is pretty-much a behemoth right now.

But ... there is some argument for providing common functionality within the library itself, especially since the code that HuC generates to do the same stuff if you do it in C (like DK did for catastrophy) is going to be much larger and slower than the same code hand-written in assembly.

Quote from: DildoKKKobold on 12/01/2016, 10:14 PMI don't think this needs to be in the core of HuC. It would serve better as example code, which someone could work to their needs.

It's not like a "fade" routine is an uncommon requirement.

1) Your C code is big and slow, and generates a lot of Hu6280 code that a hand-written assembly function doesn't. That's not you ... that's just HuC.

2) Have you got a fade-up working, yet? :wink:

Title: Re: Faster fade out code?
Post by: ccovell on 12/02/2016, 01:14 AM

Quote from: elmer on 12/01/2016, 10:34 PM"fade" is a value 0-7.

I have no stake in this, but thinking down the road it might be better to add more granularity (0..15 or more) right now. For example, many Sega games have more levels of fading by fading out Red & Green at different speeds before finally doing Blue... and it looks fantastic and far smoother than 8 steps as on the PCE.

I did something similar (using lookup tables) for my HuZero game.

Title: Re: Faster fade out code?
Post by: touko on 12/02/2016, 03:26 AM

i made some fade out/in in this intro some times ago :
https://youtu.be/B3pdwEza6j4

Title: Re: Faster fade out code?
Post by: elmer on 12/02/2016, 11:35 AM

Quote from: touko on 12/02/2016, 03:26 AMi made some fade out/in in this intro some times ago :
https://youtu.be/B3pdwEza6j4

That looks nice! :)

So what technique did you use for the processing each step of the fade up/down?

Quote from: ccovell on 12/02/2016, 01:14 AMI have no stake in this, but thinking down the road it might be better to add more granularity (0..15 or more) right now. For example, many Sega games have more levels of fading by fading out Red & Green at different speeds before finally doing Blue... and it looks fantastic and far smoother than 8 steps as on the PCE.

I did something similar (using lookup tables) for my HuZero game.

I haven't heard of that stepping technique before, it sounds interesting.

Do you have any more details?

It's trivial to switch to a 0..15 range, even if I'm only processing 8 steps, so I've done that.

I definitely agree with using a table-based approach ... it gives you the flexibility to change the tables and get a fade-to-white, or a fade-to-sepia, or to correct for any gamma differences.

For HuC, I suspect that it's just a case of the tradeoff between quality and memory usage for the tables.

I'm also limited by trying to keep compatibility with HuCard usage rather then just using self-modifying code.

Here's an implementation that uses a single 64-byte table for a simple 8-step fade ... can anyone suggest improvements?

Code Select

; fade_colors(int *psrc [__si], int *pdst [__di], char count, char level)
; ----
; fade down an array of colors
; ----
; psrc:  source buffer
; pdst:  destination buffer
; count: # of colors, (0-128)
; level: level of fading (0 = black, 7 = full)
; ----
; color: color value,   GREEN:  bit 6-8
;                       RED:    bit 3-5
;                       BLUE:   bit 0-2
; ----

_fade_colors.4: asl     a                       ; 2 fade level (0-15)
                asl     a                       ; 2
                and     #$38                    ; 2
                sta     <__ah                   ; 4

                lda     <__al                   ; 4 # of colors
                beq     .l2                     ; 2
                asl     a                       ; 2

                phx                             ; 3

                ; 129 cycle inner loop.
                ; fade GREEN

.l1:            dey                             ; 2
                lda     [__si],y                ; 7 src color hi-byte
                dey                             ; 2
                lsr     a                       ; 2
                lda     [__si],y                ; 7 src color lo-byte
                iny                             ; 2
                sta     <__al                   ; 4
                rol     a                       ; 2
                rol     a                       ; 2
                rol     a                       ; 2
                and     #7                      ; 2
                ora     <__ah                   ; 4
                tax                             ; 2
                lda     fade_table,x            ; 5
                asl     a                       ; 2
                asl     a                       ; 2
                asl     a                       ; 2
                tax                             ; 2

                ; fade RED

                lda     <__al                   ; 4 src color lo-byte
                ror     a                       ; 2
                ror     a                       ; 2
                ror     a                       ; 2
                and     #7                      ; 2
                ora     <__ah                   ; 4
                sax                             ; 3
                ora     fade_table,x            ; 5
                asl     a                       ; 2
                asl     a                       ; 2
                asl     a                       ; 2
                tax                             ; 2

                cla                             ; 2
                rol     a                       ; 2
                sta     [__di],y                ; 7 dst color hi-byte
                dey                             ; 2

                ; fade BLUE

                lda     <__al                   ; 4 src color lo-byte
                and     #7                      ; 2
                ora     <__ah                   ; 4
                sax                             ; 3
                ora     fade_table,x            ; 5
                sta     [__di],y                ; 7 dst color lo-byte
                cpy     #0                      ; 2
                bne     .l1                     ; 4

                plx                             ; 4
.l2:            rts                             ; 7

fade_table:     .db     0, 0, 0, 0, 0, 0, 0, 0
                .db     0, 0, 0, 0, 1, 1, 1, 1
                .db     0, 0, 1, 1, 1, 1, 2, 2
                .db     0, 0, 1, 1, 2, 2, 3, 3
                .db     0, 1, 1, 2, 2, 3, 3, 4
                .db     0, 1, 1, 2, 3, 4, 4, 5
                .db     0, 1, 2, 3, 3, 4, 5, 6
                .db     0, 1, 2, 3, 4, 5, 6, 7

Title: Re: Faster fade out code?
Post by: touko on 12/02/2016, 12:23 PM

QuoteSo what technique did you use for the processing each step of the fade up/down?

The simpliest, add/sub 1 for each RGB componant .

Title: Re: Faster fade out code?
Post by: elmer on 12/02/2016, 01:11 PM

Quote from: touko on 12/02/2016, 12:23 PM
QuoteSo what technique did you use for the processing each step of the fade up/down?
The simpliest, add/sub 1 for each RGB componant .

Well, the fade-down is easy ... but what did you do for the fade-up? :-k

Are you using the simple "add 1 if not at target" for each component?

That tends to grey things out a little during the fade-up, like this ...

Target GRB : 456

Step 0 GRB : 000
Step 1 GRB : 111
Step 2 GRB : 222
Step 3 GRB : 333
Step 4 GRB : 444
Step 5 GRB : 455
Step 6 GRB : 456

It's not a bad effect, and most people don't notice/care about it.

Just curious.

Quote from: elmer on 12/02/2016, 11:35 AM
Quote from: ccovell on 12/02/2016, 01:14 AMI have no stake in this, but thinking down the road it might be better to add more granularity (0..15 or more) right now. For example, many Sega games have more levels of fading by fading out Red & Green at different speeds before finally doing Blue... and it looks fantastic and far smoother than 8 steps as on the PCE.

I did something similar (using lookup tables) for my HuZero game.
I haven't heard of that stepping technique before, it sounds interesting.

Do you have any more details?

I presume that you're talking about taking advantage ot the human eye's perception of brightness.

The RGB to Y (brightness) formula is ...

Y = 0.299R + 0.587G + 0.114B

So, to reduce percieved brightness, you need to remove more of the green than you do of the blue.

From a practical implementation POV, do you mean something like this? :-k

It changes things to a (0..17) range instead of (0..15).

This would provide a sort-of-half-step in the color transition, and delay the blue and red component fades.

fade_table_g: .db 0, 0, 0, 0, 0, 0, 0, 0
.db 0, 0, 0, 0, 0, 0, 0, 0
fade_table_r: .db 0, 0, 0, 0, 0, 0, 0, 0
fade_table_b: .db 0, 0, 0, 0, 0, 0, 0, 0
.db 0, 0, 0, 0, 1, 1, 1, 1
.db 0, 0, 0, 1, 1, 1, 1, 1
.db 0, 0, 1, 1, 1, 1, 2, 2
.db 0, 0, 1, 1, 1, 2, 2, 2
.db 0, 0, 1, 1, 2, 2, 2, 3
.db 0, 0, 1, 1, 2, 2, 3, 3
.db 0, 1, 1, 2, 2, 3, 3, 4
.db 0, 1, 1, 2, 2, 3, 3, 4
.db 0, 1, 1, 2, 3, 3, 4, 4
.db 0, 1, 1, 2, 3, 3, 4, 5
.db 0, 1, 2, 2, 3, 4, 5, 5
.db 0, 1, 2, 2, 3, 4, 5, 6
.db 0, 1, 2, 3, 4, 4, 5, 6
.db 0, 1, 2, 3, 4, 5, 6, 7
.db 0, 1, 2, 3, 4, 5, 6, 7
.db 0, 1, 2, 3, 4, 5, 6, 7
.db 0, 1, 2, 3, 4, 5, 6, 7

Or just this easier-to-read version with step (0..9) ...

fade_table_g: .db 0, 0, 0, 0, 0, 0, 0, 0
fade_table_r: .db 0, 0, 0, 0, 0, 0, 0, 0
fade_table_b: .db 0, 0, 0, 0, 0, 0, 0, 0
.db 0, 0, 0, 0, 1, 1, 1, 1
.db 0, 0, 1, 1, 1, 1, 2, 2
.db 0, 0, 1, 1, 2, 2, 3, 3
.db 0, 1, 1, 2, 2, 3, 3, 4
.db 0, 1, 1, 2, 3, 4, 4, 5
.db 0, 1, 2, 3, 3, 4, 5, 6
.db 0, 1, 2, 3, 4, 5, 6, 7
.db 0, 1, 2, 3, 4, 5, 6, 7
.db 0, 1, 2, 3, 4, 5, 6, 7

Title: Re: Faster fade out code?
Post by: touko on 12/02/2016, 03:14 PM

Quotebut what did you do for the fade-up?

You start with an entire black palette, and you add 1 for each component until you reach the good palette .
in fact for fade in and fade out i have a palette for reference to reach (black for a fade out, and the object's palette for fade in) ,i read directly the corresponding colors in the VCE,i make the fade and store it in a buffer(for sending later with TIA) .
I have a 256 bytes buffer for fading multiple palettes at same time .

QuoteIt's not a bad effect, and most people don't notice/care about it.

You're right, but is not noticeable as long as your fade is fast enough.
I think for best result, you must nomalise all your RGB component for each color first to avoid a dominant color at the end of fade .
EG : reaching a 444 or 222,333 and start the add/sub after that,and you end with 0 for each component . .

Title: Re: Faster fade out code?
Post by: TurboXray on 12/02/2016, 04:24 PM

I personally like the idea of the fade table, for speed reasons. As in, fade is actually just a "brightness" (loosely termed) state of the palette, and fade is the transition from one level of brightness to another over time.

But of course, I'd say make this not a built in library function - but something the programmer can just include. I mean, there's no critical reason why it should be the very main bank of the main lib - so having it as a function with ASM inside of it, is no slower than the far call to the far end of the main lib. Speaking of which, there's probably a good number of stuff that probably should be in the main lib bank to begin with. And some stuff could easily be moved to include-able functions. I'm going to look into this as soon as winter break starts.

Title: Re: Faster fade out code?
Post by: ccovell on 12/02/2016, 05:34 PM

My table was a wasteful 512 bytes, mapping all 512 colours to the next step down... so it's a fadeout routine only. Anyway, the code (minus the table):

Code Select

Fade_Down:	;Fades a specified palette down 1 step!
	;A = Palette entry (0,$10,$20,$30...)
	;X = 0/1 = BG or Sprite
;-------------------------------------------------------
	;copy our specified palette from VCE to RAM
	sta $0402       ;Point to colours
	stx $0403
	pha
	phx
	phy
	TAI $0404,temp_pal,32
;----------
	clx
.fade_loop:
	lda	temp_pal,X
	tay
	lda	temp_pal+1,X	;(MSB is 0 or 1)
	and	#1		;All other bits were set in VCE.
	beq	.lopal
;MSB was high; (leave it as-is...)
	lda	PALFADE1HiTblLSB,Y	;Get LSB
	sta	temp_pal,X
	cpy	#64	;64th entry and up, MSB=1
	bcs	.next_entry
.zeromsb:
	stz	temp_pal+1,X
	bra	.next_entry
;----------
.lopal:	;MSB will always be zero anyway
	lda	PALFADE1LoTblLSB,Y	;Get LSB
	sta	temp_pal,X
.next_entry:
	inx
	inx
	cpx	#32
	bne	.fade_loop
;--------
	;now copy RAM back to VCE
	ply
	plx
	pla
	sta $0402       ;Point to colours
	stx $0403
	TIA temp_pal,$0404,32
	; A, Y, and X should be preserved here.
	rts

Title: Re: Faster fade out code?
Post by: TurboXray on 12/02/2016, 05:37 PM

So.. maybe this would be helpful?

The call code...

Code Select

      ldy iterations  
      ldx #low(xfer_source)
      clc
      jsr xfer_ZP

The self modifying code sitting in... Zeropage!

Code Select

xfer_entry:
.loop
      tia source,dest,num
      set                               ;2
      adc #$nn                          ;5 (RMW+T)
    bcc .skip                           ;4:2
      inc <low((.loop & 0xff)+2)+1      ;6
      clc                               ;2
.skip
      dey                               ;2
      bne .loop                         ;4
    rts                                 ;7

xfer_source = (.loop & 0xff) + BASE_ZP + 1
xfer_dest = (.loop & 0xff) + BASE_ZP + 3
xfer_num = (.loop & 0xff) + BASE_ZP + 5     
xfer_ZP = (.loop & 0xff) + BASE_ZP

Of course, it needs to be copied at least once to ZP buffer. That's what all the address translations are for via the equates.

Title: Re: Faster fade out code?
Post by: elmer on 12/03/2016, 02:26 PM

Quote from: TurboXray on 12/02/2016, 05:37 PMOf course, it needs to be copied at least once to ZP buffer. That's what all the address translations are for via the equates.

Yes, I do like using a self-modifying Txx instruction in ZP. :wink:

This is what I've got, which uses HuC's __fastcall convention to have the compiler-itself set up the ZP locations that it can ...

Code Select

; --------
; Alternate names when the parameter-passing area is used for
; a self-modifying Txx instruction.
;

__tc    = $20F8
__ts    = $20F9
__td    = $20FB
__tl    = $20FD
__tr    = $20FF

; set_colors(int *pbuffer [__ts] )
; set_colors(int index [color_reg], int *pbuffer [__ts], int count [__tl] )
; ----
; index:   index in the palette (0-511)
; pbuffer: source buffer
; count:   # of colors, (1-512)
; ----

_set_colors.1:  stz     color_reg_l
                stz     color_reg_h

                stz     <__tl+0
                lda     #>512
                sta     <__tl+1

_set_colors.3:  lda     #$E3 ; TIA
                sta     <__tc
                lda     #$60 ; RTS
                sta     <__tr
                lda     #<color_data
                sta     <__td+0
                lda     #>color_data
                sta     <__td+1
                asl     <__tl+0
                rol     <__tl+1
                jmp     __tc

Title: Re: Faster fade out code?
Post by: TurboXray on 12/03/2016, 04:46 PM

Yeah, classic Txx in ram setup.

But what I posted was a "safe" version. So that you don't delayed interrupts (in this case, since it's in vblank, the TIRQ routine), through smaller transfers and a small/fast iteration overhead.

It can also work for active video, with scanline interrupts and TIMER interrupts all firing like mad. The sample playback might be a little bit of jitter, but nothing Genesis cringe worthy. And the VDC buffer for next line should be able to absorb any delay (as long as the routine is tight). The best of all words: TIMER, H-int, and Txx availability. And the cost, if you did 32byte transfers, is 8cycles per byte instead of the 7cycles per byte. It makes Txx more usable IMO. Plus, it's a chance to use the T flag... who doesn't like using the T flag???

Title: Re: Faster fade out code?
Post by: elmer on 12/03/2016, 07:41 PM

Quote from: TurboXray on 12/03/2016, 04:46 PMBut what I posted was a "safe" version. So that you don't delayed interrupts (in this case, since it's in vblank, the TIRQ routine), through smaller transfers and a small/fast iteration overhead.

Good point! :wink:

I was trying to keep a similar interface to the original functions which read a single sample, and I was thinking that they'd be run after a vsync() in order to avoid snow on the screen.

But you're right, I should still limit the TAI size in order to avoid blocking the TIMER interrupt.

My bad. :oops:

QuotePlus, it's a chance to use the T flag... who doesn't like using the T flag???

Hahaha ... also a good point, but keeping the low-byte of the address in the A reg and doing ...

adc #$20 ;2
sta <low((.loop & 0xff)+2) ;4

... is a cycle faster, and it avoids messing with my stack-pointer-in-X.

Of course ... I throw away that cycle with the setup and the JSR, but that's a tradeoff for not needing a permanent routine in ZP.

There's certainly an argument for have a few of these Txx-32-byte subroutines in regular RAM to use for self-modifying code, and IIRC, HuC already has one somewhere. :-k

Here's a corrected code, and of course, things are much cleaner in the CDROM version ...

Code Select

; --------
; Alternate names when the parameter-passing area is used for
; a self-modifying Txx instruction.
;

__tc    = $20F8
__ts    = $20F9
__td    = $20FB
__tl    = $20FD
__tr    = $20FF

; set_colors(int *pbuffer [__ts] )
; set_colors(int index [color_reg], int *pbuffer [__ts], unsigned char count [acc] )
; ----
; index:   index in the palette (0-511)
; pbuffer: source buffer
; count:   # of 16-color palettes, (1-32)
; ----

_set_colors.1:  stz     color_reg_l
                stz     color_reg_h
                lda     #32

_set_colors.3:  tay
.if (!CDROM)
                lda     #$E3 ; TIA
                sta     <__tc
                lda     #$60 ; RTS
                sta     <__tr
                lda     #$04
                sta     <__td+0
                sta     <__td+1
		lda     #$20
                sta     <__tl+0
                stz     <__tl+1
                lda     <__ts+0
.l1:            jsr     __tc
                adc     #$20
                sta     <__ts+0
                bcc     .l2
                inc     <__ts+1
.l2:            dey
                bne     .l1
                rts
.else
                lda     <__ts+1
                sta     .l1+2
                lda     <__ts+0
                sta     .l1+1
.l1:            tia     $0000,color_data,$0020
                adc     #$20
                sta     .l1+1
                bcc     .l2
                inc     .l1+2
.l2:            dey
                bne     .l1
                rts
.endif

Title: Re: Faster fade out code?
Post by: TurboXray on 12/04/2016, 11:57 AM

Quote from: elmer on 12/03/2016, 07:41 PM
QuotePlus, it's a chance to use the T flag... who doesn't like using the T flag???
Hahaha ... also a good point, but keeping the low-byte of the address in the A reg and doing ...

adc #$20 ;2
sta <low((.loop & 0xff)+2) ;4

... is a cycle faster, and it avoids messing with my stack-pointer-in-X.

Of course ... I throw away that cycle with the setup and the JSR, but that's a tradeoff for not needing a permanent routine in ZP.

Good catch! My code is actually a little bit longer (I clipped it for the post), as it was meant for a demoscene part where it transfers two 28bytes segments, then sends a sample to the DAC - on large 90k cpu cycle loop. It's one of those dual circular interference patterns things, but translucent like the good ones.

Title: Re: Faster fade out code?
Post by: ccovell on 12/11/2016, 09:06 PM

Psychic World on the Sega Master System had a pretty smooth and nice-looking fade in & out routine, so I wanted to find out how it did it. Turns out it was a very simple 3-step process: for fade-outs, ramp down the red channel to zero, then do the same with green and blue sequentially. Almost a bit too primitive, but it actually looks good in-game.

The SMS has only 3 shades of each colour channel (compared to the 7 per on the PCE) so a fade would have only a total of 4 steps if all 3 channels were simply faded out at the same time. Doing it sequentially gives 10 steps total. I'm sure similarly nice effects can be achieved on the PCE.

(https://chrismcovell.com/images/Bonk2Anim.gif)
Above, using the SMS' limited colour space.

Title: Re: Faster fade out code?
Post by: Arkhan Asylum on 12/11/2016, 11:38 PM

Quote from: ccovell on 12/11/2016, 09:06 PMPsychic World on the Sega Master System had a pretty smooth and nice-looking fade in & out routine, so I wanted to find out how it did it. Turns out it was a very simple 3-step process: for fade-outs, ramp down the red channel to zero, then do the same with green and blue sequentially. Almost a bit too primitive, but it actually looks good in-game.

I was doing this for Inferno on MSX, but the excessive amount of red in the game made it look pretty stupid, lol.

I ended up just going with a normal fade instead.

Title: Re: Faster fade out code?
Post by: TurboXray on 12/12/2016, 12:19 AM

Chris, do you have a gif of Sonic's fade out (on the Genesis)?

Title: Re: Faster fade out code?
Post by: ccovell on 12/12/2016, 01:02 AM

Sorry, no.

Title: Re: Faster fade out code?
Post by: elmer on 12/12/2016, 01:33 AM

Quote from: ccovell on 12/11/2016, 09:06 PMPsychic World on the Sega Master System had a pretty smooth and nice-looking fade in & out routine, so I wanted to find out how it did it. Turns out it was a very simple 3-step process: for fade-outs, ramp down the red channel to zero, then do the same with green and blue sequentially. Almost a bit too primitive, but it actually looks good in-game.

That's an interesting effect ... thanks for sharing that! :D

I don't know (yet) if I like if I like it, or not, but it's definitely "effective".

Title: Re: Faster fade out code?
Post by: Arkhan Asylum on 12/12/2016, 02:23 AM

Quote from: elmer on 12/12/2016, 01:33 AM
Quote from: ccovell on 12/11/2016, 09:06 PMPsychic World on the Sega Master System had a pretty smooth and nice-looking fade in & out routine, so I wanted to find out how it did it. Turns out it was a very simple 3-step process: for fade-outs, ramp down the red channel to zero, then do the same with green and blue sequentially. Almost a bit too primitive, but it actually looks good in-game.
That's an interesting effect ... thanks for sharing that! :D

I don't know (yet) if I like if I like it, or not, but it's definitely "effective".

there's alot of interesting fades you can do if you focus on blue-shenanigans, but most people refer to them as coding errors/bugs/not knowing what you're doing.

As opposed to just wanting a weird blue swirly fade that looks cool.

Title: Re: Faster fade out code?
Post by: ccovell on 12/12/2016, 03:32 AM

Quote from: Psycho Arkhan on 12/12/2016, 02:23 AMthere's alot of interesting fades you can do if you focus on blue-shenanigans, but most people refer to them as coding errors/bugs/not knowing what you're doing.

I beg to differ. I'd call it Sega's signature style on the Genesis.

Title: Re: Faster fade out code?
Post by: TurboXray on 12/12/2016, 12:06 PM

I don't think Arkhan is saying its wrong, just that people perceive it as wrong - for whatever reason they apply (bug/unknowledgeable/etc).

Check this out:
http://info.sonicretro.org/SCHG_How-to:Improve_the_fade_in%5Cfade_out_progression_routines_in_Sonic_1

QuoteFrom Sonic Team's point of view, it may not be incorrect and is possibly intentional, however, from a logical point of view, this is incorrect fading

Some people's kids.. I swear.

Title: Re: Faster fade out code?
Post by: Arkhan Asylum on 12/12/2016, 01:28 PM

Yeah, I mean, what the fuck is "incorrect" fading, anyways. Did it make it to the target colors eventually? Did it look neat?

I find it pretty derpy that people commenting on stuff from the 80s are saying fades that involve color swirlies, or something are wrong.

I mean if it fades and never makes it to where you want, or it causes actual software issues, I'd say that is wrong.

but, fades are inherently lawless. You can fade whatever to whatever.

People seem to think the only correct fade is one that fades from black or to black completely in unison.

Title: Re: Faster fade out code?
Post by: Gredler on 12/12/2016, 04:47 PM

So you'd say it's a derpy lerpy?

Thank you, I'll be here all week.

Title: Re: Faster fade out code?
Post by: Arkhan Asylum on 12/12/2016, 05:16 PM

Quote from: Gredler on 12/12/2016, 04:47 PMSo you'd say it's a derpy lerpy?

Thank you, I'll be here all week.

yes. exactly, lol.

I did a normalized palette fade (read: correct by codesnobbery standards), and it looked worse than the "wrong fade" that goes to grays first.

Sometimes "right" can get bent.

Title: Re: Faster fade out code?
Post by: ccovell on 12/12/2016, 05:59 PM

Heck, my favourite fade is the Game Over one from NES Ninja Gaiden. ;-D

edit:

Quote from: TurboXray on 12/12/2016, 12:06 PMCheck this out:
http://info.sonicretro.org/SCHG_How-to:Improve_the_fade_in%5Cfade_out_progression_routines_in_Sonic_1

Holy sperglord-levels of missing-the-point on that page! The eye is sensitive to the different channels of colour differently, and so Sega was exploiting that, not to mention fading through 8 levels only will look too abrupt and might look like it's "shaking" to some viewers. :-P

Title: Re: Faster fade out code?
Post by: Arkhan Asylum on 12/13/2016, 12:41 AM

Quote from: ccovell on 12/12/2016, 05:59 PMHoly sperglord-levels of missing-the-point on that page!

I loled while eating a cupcake.

Did you know cupcake can come out of your nose?

I do now.

Blue fades are probably one of the most interesting things to dick with, as blue is the most interesting of the 3 colors in terms of our eyes perceiving it.

Title: Re: Faster fade out code?
Post by: DildoKKKobold on 12/13/2016, 01:01 AM

Holy cow, this thread took a life of its own. Next someone is going to figure out how to do a Lucas-wipe transition on the PC Engine!

Title: Re: Faster fade out code?
Post by: Arkhan Asylum on 12/13/2016, 02:06 AM

Quote from: DildoKKKobold on 12/13/2016, 01:01 AMHoly cow, this thread took a life of its own. Next someone is going to figure out how to do a Lucas-wipe transition on the PC Engine!

what one is that?

I made a few on MSX before settling on T&E soft's gridwipe because it's fuckin cool looking.

PCEngine-FX.com

PCE-FX Homebrew Development => Localizations, Games, Apps, Docs => Topic started by: DildoKKKobold on 09/20/2015, 11:30 AM