Here is my fade out code:
fade_out()
{
int i, clr;
char j;
for (j = 0; j <8; j++)
{
for (i=0; i < 64; i++)
{
clr = get_color(i);
if ((clr&7) > 0) clr = clr - 1;
if ((clr&56) > 0) clr = clr - 8;
if ((clr&448) > 0) clr = clr - 64;
set_color(i, clr);
}
vsync();
for (i=64; i < 128; i++)
{
clr = get_color(i);
if ((clr&7) > 0) clr = clr - 1;
if ((clr&56) > 0) clr = clr - 8;
if ((clr&448) > 0) clr = clr - 64;
set_color(i, clr);
}
vsync();
for (i=128; i < 192; i++)
{
clr = get_color(i);
if ((clr&7) > 0) clr = clr - 1;
if ((clr&56) > 0) clr = clr - 8;
if ((clr&448) > 0) clr = clr - 64;
set_color(i, clr);
}
vsync();
for (i=192; i < 256; i++)
{
clr = get_color(i);
if ((clr&7) > 0) clr = clr - 1;
if ((clr&56) > 0) clr = clr - 8;
if ((clr&448) > 0) clr = clr - 64;
set_color(i, clr);
}
vsync();
for (i=256; i < 320; i++)
{
clr = get_color(i);
if ((clr&7) > 0) clr = clr - 1;
if ((clr&56) > 0) clr = clr - 8;
if ((clr&448) > 0) clr = clr - 64;
set_color(i, clr);
}
vsync();
for (i=320; i < 384; i++)
{
clr = get_color(i);
if ((clr&7) > 0) clr = clr - 1;
if ((clr&56) > 0) clr = clr - 8;
if ((clr&448) > 0) clr = clr - 64;
set_color(i, clr);
}
vsync();
for (i=384; i < 448; i++)
{
clr = get_color(i);
if ((clr&7) > 0) clr = clr - 1;
if ((clr&56) > 0) clr = clr - 8;
if ((clr&448) > 0) clr = clr - 64;
set_color(i, clr);
}
vsync();
for (i=448; i < 512; i++)
{
clr = get_color(i);
if ((clr&7) > 0) clr = clr - 1;
if ((clr&56) > 0) clr = clr - 8;
if ((clr&448) > 0) clr = clr - 64;
set_color(i, clr);
}
vsync();
}
cls();
reset_satb();
satb_update();
vsync();
}
Unfortunately, even when dividing it up into 64 color blocks, it still causes flicker in real hardware. I'm guessing its just too slow to write in C, but I'm not good enough with assembly. Any help would be appreciated!
I think the best way for optimising your routine is doing the fade in a buffer for all palettes and transfer all the buffer after a vsync in asm with a tia bloc transfer .
like that:
/* A 256 bytes buffer is enough for 8 palettes */
int my_buffer[1024];
#asm
stz $402
stz $403
tia _my_buffer , $404 , 1023
#endasm
My fade routine is close to yours, and is very fast,but in ASM .
int dv;
.
.
.
clr = get_color(i);
dv = 0;
/* low color bits : 0000 0111 */
if (clr & 0x0007 ) dv ++;
/* mid color bits : 0011 1000 */
if (clr & 0x0028 ) dv = dv+ 8; /* iirc, this is faster than += in huc. try both */
/* hi color bits : 1 1100 0000 */
if (clr & 0x01c0 ) dv = dv + 64;
clr = clr - dv;
set_color(i, clr);
.
.
.
...............................................................................
that's off the top of my head, so double check the hex values :) And everything else....
Assuming it works, the next step is to move that to asm; iirc, an int parameter will come in in the A and X registers, so color won't have to be loaded; the rest is a pretty straightfoward conversion.
( look up the asm code in the listing to see how it's done. That's how I learned 650x asm :)
For the low sets of bits, you can just check the low byte; for the high set, shift the color right (?) for the check (then you can and with 0xe0 ) But that's a bit of asm optimization you may not need (or want to do).
.................................................................................
Just out of curiosity, why are you ckecking for >0 anyway? clr can't be negative, an & won't change that. And != 0 is true in C, so you don't have to compare the result to anything....
What touko said. Wait for vblank, read all colors (sprite and BG, if desired) into a buffer in ram. Do your alterations to those values in ram, then wait for vync and upload the changes during vblank. Rinse repeat. HuC is still going to be slow, but if you do the operations during normal/whole frames but only update the changes during vblank - it should do the job.
Fading out is easy, fading in is a bit more complex.
On a side note; I had an idea for a RGB to YUV conversion table, for special fading type of effect. YUV is nicer to work with IMO and gives you a wider range of features. Of course, going from YUV to RGB will need a different set of tables and take a bit longer.
Quote from: TurboXray on 09/20/2015, 04:23 PMFading out is easy, fading in is a bit more complex.
When you fade in you know what your colors are going to be, so couldn't you pre-calculate/prepare the fade-in and then just cycle through the known palettes?
Yeah, you need to know the values to reach rather than just checking for overflow or floor (fixed value). It requires a little more logic for testing for overflow for the addition process. There are quite a few ways to do fade in approach, but they are all more complex with more operations than a simple fade out.
Fixed point deltas are one way. That requires a very large buffer (entries * RGB * delta or 512*3*2) in ram and setup the initial distance calculation for each color (which can take some time). It's wasteful on ram, but every R/G/B is faded in equally.
Rate of change delta is another way. It requires a smaller size buffer, but doesn't fade in equally. You basically take the R/G/B color of the destination and copy these as the deltas to subtract from the destination palette block (your first initial setup). On every call, you subtract 1 from each delta, then subtract from the RGB block and write to the buffer to be uploaded. The R/G/B destination values are never altered, just used as a value to subtract from. When the delta reaches 0, because on each call you subtract by one, it means that particular R/G/B is at full value. As you can see, lower values will reach their max value faster than large values. It looks decent, and it's fast.
@DarkKobold:And may be you don't need to fade all palettes .
Quote from: touko on 09/21/2015, 08:02 AM@DarkKobold:And may be you don't need to fade all palettes .
This is what I was thinking, only apply the fade to specific sprites that cant be animated onto the screen (background and ui elements).
Yes is useless to do it on the 32 palettes 99% of the time .
In practice it's 4->6 palettes max .
Quote from: TurboXray on 09/20/2015, 06:37 PMYeah, you need to know the values to reach rather than just checking for overflow or floor (fixed value). It requires a little more logic for testing for overflow for the addition process. There are quite a few ways to do fade in approach, but they are all more complex with more operations than a simple fade out.
Fixed point deltas are one way. That requires a very large buffer (entries * RGB * delta or 512*3*2) in ram and setup the initial distance calculation for each color (which can take some time). It's wasteful on ram, but every R/G/B is faded in equally.
Rate of change delta is another way. It requires a smaller size buffer, but doesn't fade in equally. You basically take the R/G/B color of the destination and copy these as the deltas to subtract from the destination palette block (your first initial setup). On every call, you subtract 1 from each delta, then subtract from the RGB block and write to the buffer to be uploaded. The R/G/B destination values are never altered, just used as a value to subtract from. When the delta reaches 0, because on each call you subtract by one, it means that particular R/G/B is at full value. As you can see, lower values will reach their max value faster than large values. It looks decent, and it's fast.
This is way too complex, you just need to subtract each color from 511, and do it backwards. Excuse the psuedo code.
int palette_holder[512];
for c = 0 to 511
palette_holder[c] = 511 - get_color(i);
for i = 1 to 7
for j = 0 to 511
clr = palette_holder[j];
dv = 0;
/* low color bits : 0000 0111 */
if (clr & 0x0007 ) dv ++;
/* mid color bits : 0011 1000 */
if (clr & 0x0028 ) dv = dv+ 8; /* iirc, this is faster than += in huc. try both */
/* hi color bits : 1 1100 0000 */
if (clr & 0x01c0 ) dv = dv + 64;
clr2 = get_color(j);
clr = clr - (dv XOR 511);
clr2 = clr2+ dv;
set_color(i, clr2);
palette_holder[j] = clr;
Not sure if HuC has XOR though.
If i remember correctly XOR is A^B ..
Quote from: touko on 09/20/2015, 01:05 PMI think the best way for optimising your routine is doing the fade in a buffer for all palettes and transfer all the buffer after a vsync in asm with a tia bloc transfer .
like that:
/* A 256 bytes buffer is enough for 8 palettes */
int my_buffer[1024];
#asm
stz $402
stz $403
tia _my_buffer , $404 , 1023
#endasm
My fade routine is close to yours, and is very fast,but in ASM .
So, rather than create a buffer for all the palettes at once, I'd like to split it up into chunks for 64 "colors" at a time. The problem is, I have no idea what the stz $402, stz $403 lines do. Set addresses to zero... but why?
#include "huc.h"
int my_buffer[1024];
main(){
#asm
stz $402
stz $403
tia _my_buffer, $404, 511
#endasm
}
compiles successfully...
a variable declared in c, e.g. my_buffer, is accessed with a leading underscore in asm, as in _my_buffer
QuoteThis fails to compile:
19000 2D:B710 tia _my_buffer, $404, 511
Undefined symbol in operand field!
int my_buffer[1024], must be a global variable (before the main procedure) .
Tested and works fine for me, but beware the #asm and #endasm must'n be to the left hedge,you must add 1 or 2 space caracters before .
EDIT:cabbage has answered .
@DarkKobold:You can also copy the content of your VCE palettes in the buffer the same way
tai $404 , _my_buffer , 511
More faster than in C .
Quote from: touko on 10/01/2016, 08:05 AMQuoteThis fails to compile:
19000 2D:B710 tia _my_buffer, $404, 511
Undefined symbol in operand field!
int my_buffer[1024], must be a global variable (before the main procedure) .
Tested and works fine for me, but beware the #asm and #endasm must'n be to the left hedge,you must add 1 or 2 space caracters before .
EDIT:cabbage has answered .
@DarkKobold:You can also copy the content of your VCE palettes in the buffer the same way
tai $404 , _my_buffer , 511
More faster than in C .
Quote from: touko on 10/01/2016, 08:05 AMQuoteThis fails to compile:
19000 2D:B710 tia _my_buffer, $404, 511
Undefined symbol in operand field!
int my_buffer[1024], must be a global variable (before the main procedure) .
Tested and works fine for me, but beware the #asm and #endasm must'n be to the left hedge,you must add 1 or 2 space caracters before .
EDIT:cabbage has answered .
@DarkKobold:You can also copy the content of your VCE palettes in the buffer the same way
tai $404 , _my_buffer , 511
More faster than in C .
Oh, I didn't realize it needed to be a global. That is 2k of ram dedicated just to fadeout then...
QuoteOh, I didn't realize it needed to be a global. That is 2k of ram dedicated just to fadeout then...
Why 2k ??
You really need 1k,but in fact less, because all the palettes are rarelly/never used at the same time .
i think a 6 palettes buffer is enough, it's 192 bytes,and this should speed up a lot your fade routine(because no need to copy/fade unused palettes) .
Some PCE games just have pre-compiled fades for palettes. Sure, it takes up rom (not a whole lot though) but it doesn't need any ram.
The trick is to write one fade routine. It's not a fade in or a fade out. It's a "fade to this palette" routine.
It will work for fading to black, or for fading to a color. Just make sure you pay attention to the way PCE lays out a palette, and inc/dec accordingly. It's a little easier if you work with octal number formatting. It seems more intuitive that way.
I do this and use a work palette (the current screen palette), and a target palette, and simply fade to it, and copy when ready.
QuoteThe trick is to write one fade routine. It's not a fade in or a fade out. It's a "fade to this palette" routine.
Yes i did this too .
You can fade from/to a palette,and for exemple a fade out is concidered as fade to a black palette.
Yeah, a fade "to". Not that anyone probably cares, but there are weighted and unweighted fade methods (as in R/G/B elements as they reach a target value). Obviously weighted takes more time, but looks better for fades to black or from black IMO.
And if you're ambitious enough, you can even do real time RGB->YUV, make changes, and then back again for additional effects. Though that's probably more fancy than practical - "look what I can do" type effect.
QuoteObviously weighted takes more time, but looks better for fades to black or from black IMO.
I fade each RGB values,and not a single inc/dec for the whole color .
Quote from: touko on 10/04/2016, 03:04 PMQuoteObviously weighted takes more time, but looks better for fades to black or from black IMO.
I fade each RGB values,and not a single inc/dec for the whole color .
Yeah, but do you use a fixed point value (8bit:8bit) for each R/G/B element? That's what I mean by weighted. Otherwise, you'll get to the target value of each R/G/B faster on one element vs the others. I.e. it's not uniform across all elements within a color slot's R/G/B system.
QuoteYeah, but do you use a fixed point value (8bit:8bit) for each R/G/B element? That's what I mean by weighted.
No,no mine it's more simple, better than a simple inc/dec,but definitely not as advanced as that you call weighted .
Quoteit's not uniform across all elements within a color slot's R/G/B system.
Yes , but very acceptable .
For best result with a RGB fade, you must normalise all the values, before fading .
Weighted "looks" better, but in practice is never noticeable to anyone playing a game.
I put in a weighted one into Inferno and ended up not using it.
Quote from: Psycho Arkhan on 10/04/2016, 08:21 PMWeighted "looks" better, but in practice is never noticeable to anyone playing a game.
That's been my experience, too.
If the fade is running fast-enough that people aren't given the time to stare at and analyze each individual step, then you can actually get away with a really-simple algorithm and find that it still looks good on the screen (as in, I've never, ever, heard anyone complain about it).
Atlantean's fades aren't weighted IIRC. They just do your basic "fade each color til it gets where it needs to be".
People claim it "makes things go grey", but, whatever.
Quote from: Psycho Arkhan on 10/05/2016, 02:41 AMPeople claim it "makes things go grey", but, whatever.
Yep, that's the simple way of doing it that I got away with for many years (until hardware fades took over). :wink:
If I ever wanted to be "clever" and do it properly, then I'd probably use the classic integer version of the Bresenham line-draw algorithm to decide when to change each RGB component.
That's basically what I was referring to with the fixed point values (a tiny LUT for a bresenham line algo).
Sometimes you just write stuff.. just because you can. I think most fade to routines don't need to be done in realtime gameplay (i.e. end of stage, or transition into another area, etc - where the action is paused). At least from what I've needed it for, but I can see some situations were it might be needed in game during normal gameplay. I always just used recalculated subpalettes for that, but I guess if you setup a background process system or "thread", you could call such a routine a head of time and have it do the fade steps as needed (kind of like a background timed-sliced decompression process).
Arkhan, did you use the fade routines while the gameplay was in action, or did you pause the action and use the fade as a transition? Just curious.
Atlantean fades while SOME stuff is happening on screen. I am pretty sure the fades will work mid-game, though.
Most of the fading was for transitioning, though. No point in fading mid-game in a shootything
Reflectron had a bunch of palette cycling during the entire game.
Quote from: touko on 09/20/2015, 01:05 PMI think the best way for optimising your routine is doing the fade in a buffer for all palettes and transfer all the buffer after a vsync in asm with a tia bloc transfer .
like that:
/* A 256 bytes buffer is enough for 8 palettes */
int my_buffer[1024];
#asm
stz $402
stz $403
tia _my_buffer , $404 , 1023
#endasm
My fade routine is close to yours, and is very fast,but in ASM .
So, I'm really confused.
First, why the store zero in $402 and $403?
Second, if I want to split this into chunks of 64, how do I tell it $404 plus 64 etc?
This is why I hate assembly. I know, I'm definitely the idiot of the thread.
I feel the need to post something to make DK feel like less of an idiot.
Can't you just store all the palettes to an array, then create a loop that lerps from black to the array color or the array to black?
#Python
#C#
#Scripting
QuoteFirst, why the store zero in $402 and $403?
Select palette slot 0
QuoteSecond, if I want to split this into chunks of 64, how do I tell it $404 plus 64 etc
You don't use $404+anything. $404/$405 is the color.
You set $402/$403 to slot number you want. (0, 64, 128. etc)
In general you set the starting slot # in $402/$403, and send the colors to $404/$405.
The slot increments after every write to high byte (I think), so you can just loop through the
palette and pump them out. If you want to break it into 64 color chunks, write the first 64 colors,
then the next 64 colors, etc.
You can set the slot number to start an any slot, evn in the middle of a palette, afaik.
QuoteCan't you just store all the palettes to an array, then create a loop that lerps from black to the array color or the array to black?
Yes. But that won't update the palettes until you send it to the vce.
If you really want to, you can read the color from the vce, fade it, and write it back, without any intermediate array. It's just quicker to use an array. (Less overhead)
Quote from: DildoKKKobold on 10/06/2016, 04:46 PMQuote from: touko on 09/20/2015, 01:05 PMI think the best way for optimising your routine is doing the fade in a buffer for all palettes and transfer all the buffer after a vsync in asm with a tia bloc transfer .
like that:
/* A 256 bytes buffer is enough for 8 palettes */
int my_buffer[1024];
#asm
stz $402
stz $403
tia _my_buffer , $404 , 1023
#endasm
My fade routine is close to yours, and is very fast,but in ASM .
So, I'm really confused.
First, why the store zero in $402 and $403?
Second, if I want to split this into chunks of 64, how do I tell it $404 plus 64 etc?
This is why I hate assembly. I know, I'm definitely the idiot of the thread.
It's not assembly, but direct hardware interfacing. You
could access the ports directly in C.
$402/$403 make up a single port to the VCE color # or slot. Because there are 512 color slots in the VCE, it's larger than a 8bit value for a single port to handle. So the 16bit port is spread across two 8bit ports; 0x402 is the LSByte and 0x403 is the MSByte.
VCE and VDC ports tend to have what is known as "latch" system. This means when you write the upper address of a 16bit port, it triggers the transferring the contents of the two ports to the internal place it needs to go (be it VCE or VDC).
In the case of the VCE, 0x402 is the LSB, and 0x403 is the MSB and latch. Once 0x403 is written to, the contents are transferred to whatever reg internal to the VCE. But not until that latch port is accessed - so the order of port pair access if very important.
On the VCE, here are some ports:
0x402/0x403 = is the color
slot you want to update.
0x404/0x405 = the color
value to update on the corresponding color slot.
One other thing to note: While you
can constantly tell the VCE what specific color you want to update, it does have an "auto increment" internal mechanism that automatically advances to the next color slot after a successful write/update (i.e. latch port). Same with reading color data from the VCE.
int my_buffer[64];
fade_out()
{
int i, clr;
char j,k;
for (j = 0; j <8; j++)
{
#asm
stz $402
stz $403
#endasm
for (i=0;i<512;i+=64)
{
for (k=0; k<64; k++)
{
clr = get_color(i+k);
if (clr&7) clr = clr - 1;
if (clr&56) clr = clr - 8;
if (clr&448) clr = clr - 64;
my_buffer[k] = clr;
}
vsync();
#asm
tia _my_buffer , $404 , 64
#endasm
}
}
cls();
reset_satb();
satb_update();
vsync();
}
So, here's my attempt to split it into 64 color chunks. It... fails. Miserably. I'd assume its latching fine, and should store the current state of the latches, through each vsync.
Quoteint my_buffer[64];
.
.
.
tia _my_buffer , $404 , 64
Ints are 2 bytes. You're only transferring 32 'colors'. Try using 128 as length
Also, be careful mixing ints and chars. HuC doesn't promote chars to ints.
And be careful, you have only one pointer in the VCE's hardware color table .
you must select color entry for writing AND for reading(except if you want to read a pallet and write the next one) like that :
#asm
; // May be not needed here because get_color() select the pallet entry every time.
stz $402
stz $403
#endasm
for (i=0;i<512;i+=64)
{
for (k=0; k<64; k++)
{
clr = get_color(i+k);
if (clr&7) clr = clr - 1;
if (clr&56) clr = clr - 8;
if (clr&448) clr = clr - 64;
my_buffer[k] = clr;
}
vsync();
#asm
stz $402
stz $403
tia _my_buffer , $404 , 64
#endasm
Else you read in pallet 0, and write in pallet 2,as you did .
for posterity, here is the final function. I will be testing it on hardware shortly. Whoever sees this in the near or far future, feel free to use it.
fade_out()
{
int i, clr;
char j,k;
for (j = 0; j <8; j++)
{
for (i=0;i<512;i+=64)
{
for (k=0; k<64; k++)
{
clr = get_color(i+k);
if (clr&7) clr = clr - 1;
if (clr&56) clr = clr - 8;
if (clr&448) clr = clr - 64;
my_buffer[k] = clr;
}
vsync();
clr = get_color(i-1);
#asm
tia _my_buffer , $404 , 128
#endasm
}
}
cls();
reset_satb();
satb_update();
vsync();
}
EDIT: Oh, and a huge thanks to everyone for working with me. I'm glad that a simple fade didn't kill catastrophy.
Quote from: DildoKKKobold on 11/02/2016, 08:20 PMfor posterity, here is the final function. I will be testing it on hardware shortly. Whoever sees this in the near or far future, feel free to use it.
fade_out()
{
int i, clr;
char j,k;
for (j = 0; j <8; j++)
{
for (i=0;i<512;i+=64)
{
for (k=0; k<64; k++)
{
clr = get_color(i+k);
if (clr&7) clr = clr - 1;
if (clr&56) clr = clr - 8;
if (clr&448) clr = clr - 64;
my_buffer[k] = clr;
}
vsync();
clr = get_color(i-1);
#asm
tia _my_buffer , $404 , 128
#endasm
}
}
cls();
reset_satb();
satb_update();
vsync();
}
EDIT: Oh, and a huge thanks to everyone for working with me. I'm glad that a simple fade didn't kill catastrophy.
That would have been catastrophic
if you want your routine more faster you can translate this in assembly
Quotefor (j = 0; j <8; j++)
{
for (i=0;i<512;i+=64)
{
for (k=0; k<64; k++)
{
clr = get_color(i+k);
if (clr&7) clr = clr - 1;
if (clr&56) clr = clr - 8;
if (clr&448) clr = clr - 64;
my_buffer[k] = clr;
}
This loop is slow as hell .
Quote from: DildoKKKobold on 11/02/2016, 08:20 PMfor posterity, here is the final function. I will be testing it on hardware shortly. Whoever sees this in the near or far future, feel free to use it.
fade_out()
{
int i, clr;
char j,k;
for (j = 0; j <8; j++)
{
for (i=0;i<512;i+=64)
{
for (k=0; k<64; k++)
{
clr = get_color(i+k);
if (clr&7) clr = clr - 1;
if (clr&56) clr = clr - 8;
if (clr&448) clr = clr - 64;
my_buffer[k] = clr;
}
vsync();
clr = get_color(i-1);
#asm
tia _my_buffer , $404 , 128
#endasm
}
}
cls();
reset_satb();
satb_update();
vsync();
}
EDIT: Oh, and a huge thanks to everyone for working with me. I'm glad that a simple fade didn't kill catastrophy.
It's going to cause snow on the real system if this takes too long (goes into active display), which I think it will.
Use the Txx instruction and read the color data directly into your my_buffer during vblank. Then do your modifications on the array - when each iteration is has completed (one iteration of j), then wait for vsync and Txx the buffer back to color ram port. That should prevent any snow on screen.
If you slightly modify your code like this:
Quotefade_out()
{
int i, clr;
char j,k;
vsync();
for (j = 0; j <8; j++)
{
for (i=0;i<512;i+=64)
{
temp = i
#asm
lda temp
sta $402
lda temp+1
sta $403
tai $404, _my_buffer, 128
#endasm
for (k=0; k<64; k++)
{
clr = my_buffer[k];
if (clr&7) clr = clr - 1;
if (clr&56) clr = clr - 8;
if (clr&448) clr = clr - 64;
my_buffer[k] = clr;
}
vsync();
#asm
lda temp
sta $402
lda temp+1
sta $403
tia _my_buffer , $404 , 128
#endasm
}
}
cls();
reset_satb();
satb_update();
vsync();
}
It should get everything done during vblank and avoid snow on screen on the real system. That also includes reading in the next 64 colors as well (both transfers together only take 1.5k cpy cycles). Note: You'll need a global variable "temp" or some such name, so that you can access the function's instance variable in asm. Also, I think the read port is $404 and not $406. If not, then change it to $406.
Quote from: TurboXray on 11/03/2016, 02:07 PMIt's going to cause snow on the real system if this takes too long (goes into active display), which I think it will.
Use the Txx instruction and read the color data directly into your my_buffer during vblank. Then do your modifications on the array - when each iteration is has completed (one iteration of j), then wait for vsync and Txx the buffer back to color ram port. That should prevent any snow on screen.
So, that is why I put the vsync right before the transfer - even if the code takes two frames, the transfer will still only occur right after a vblank. I'm actually concerned that if I make the code faster, i'll have to put in delays, as the fade is already pretty fast now. Its a minimum of 64 frames already.
Granted, I need to try my new code in hardware. I'll also try yours.
As a question - I keep needing to add globals to do ASM. Could a future version of HuC do locals?
QuoteI'm actually concerned that if I make the code faster, i'll have to put in delays, as the fade is already pretty fast now.
Suck up the need to add a delay. Use a down-counter so its tuneable. It's not too bad if you use a dedicated fade routine. You can use the wait time to do other things, like loading new gfx....
Just my opinion.
QuoteAs a question - I keep needing to add globals to do ASM. Could a future version of HuC do locals?
You can do that already. It's just a pain, as they have to be accessed via the Huc Stack pointer, which is slow...
Quote from: DildoKKKobold on 11/03/2016, 03:19 PMQuote from: TurboXray on 11/03/2016, 02:07 PMIt's going to cause snow on the real system if this takes too long (goes into active display), which I think it will.
Use the Txx instruction and read the color data directly into your my_buffer during vblank. Then do your modifications on the array - when each iteration is has completed (one iteration of j), then wait for vsync and Txx the buffer back to color ram port. That should prevent any snow on screen.
So, that is why I put the vsync right before the transfer - even if the code takes two frames, the transfer will still only occur right after a vblank. I'm actually concerned that if I make the code faster, i'll have to put in delays, as the fade is already pretty fast now. Its a minimum of 64 frames already.
Hmm.. there's an issue you might not be aware of; every time you read or write to
any VCE regs (that includes $400 and $401), and you'll cause the VCE not to be able to read from pixel bus that the VDC is constantly outputting to. What happens, is that since it can't read from the pixel bus - it will output the last color (pixel) that it read from the pixel bus. You get horizontal 'stretches' of colors across the screen - i.e. snow. Not just a the borders, but anywhere on a scanline. This actually happens when you turn the display "off", but since its all one color for the screen - you can't see the pixel "stretching". This is different from other color update interfere of other systems, where if update a color while display is active - you see that color update as corruption on screen. The VCE doesn't do this, but reading and writing from any VCE port gives the same stretching behavior regardless (read or write, color update regs or other VCE regs).
Here's an example video where I purposely do it:
http://youtu.be/-xU9uuRzLwo
So any access to the VCE does this, not just reading or writing. If your routine does manage to read in and modify all 64 colors withing vblank, and you update on the following frame - then you'll be fine. And if that's the case, then don't worry about the code changes I made (unless you want more resource during vblank to do something else, but it doesn't look like it. You'd have to make a completely different system/function for that).
QuoteGranted, I need to try my new code in hardware. I'll also try yours.
Test yours first, and if it's good then don't worry about mine. Just keep my code in mind. I.e. the approach I took, as you might want to do a more flexible fade routines in the future.
QuoteAs a question - I keep needing to add globals to do ASM. Could a future version of HuC do locals?
What TheOldMan said. It's a pain, because you have to generate the .s file, look at what index represents that variable, then go back and write an indirect-index load from it. It's not just that the instance variable inside the function is a stack object, but there's no clean way to access it in asm without knowing what the index is on that stack for that specific instance variable. Indeed, it would be nice for HuC to pass this on to asm block. If the assembler had a way to make scope equates, then HuC could generate a function scope equate list for each function (the index into the stack). Globals are just easier to transfer stuff to.
QuoteAs a question - I keep needing to add globals to do ASM. Could a future version of HuC do locals?
Actually as declared:
int i, clr;
char j,k;
Are treated as local variables,if you want local in asm, you must use the stack ($100 -> $106 must be safe enough),or you can use the classic push/pop .
But in fact you have a bunch of temporary global variables already reserved(like __temp,<_al,<_bl,etc..) .
Quote from: TheOldMan on 11/03/2016, 03:44 PMQuoteI'm actually concerned that if I make the code faster, i'll have to put in delays, as the fade is already pretty fast now.
Suck up the need to add a delay. Use a down-counter so its tuneable. It's not too bad if you use a dedicated fade routine. You can use the wait time to do other things, like loading new gfx....
Just my opinion.
QuoteAs a question - I keep needing to add globals to do ASM. Could a future version of HuC do locals?
You can do that already. It's just a pain, as they have to be accessed via the Huc Stack pointer, which is slow...
The entirety of Atlantean is written with global variables.
Just saying.
lol
ASM doesn't have a concept of local, really. you push/pop things to make them "local", simply by saving the state of all of the registers so you can fuck around with them again before popping the stack back to reset everything, but yeah
global = <3
you'll get faster code.
I'm thinking that this sort of stuff is so commonly needed, that it really should be built into the HuC library.
Fading down is easy ... but the fun comes when you're fading back in. :wink:
There you need to know what the desired palette is, and have that stored in memory somewhere.
And you need the buffer where you're going to calculate what the next set of colors is that you're going to send to the VDC.
Has anyone avoid needing *both* of these "target" and "current" buffers needing to be in memory at once?
The classic "cheap" fade-down looks OK, but the same thing run in reverse (increment color component if not at target) has the effect of greying everything out, and then having the stronger colors appear later on.
It's not horrible, and I've done that before, and I believe that Arkhan does it, too.
A more sophisticated fade is nearly 3 times slower ... but that's still fast enough to calculate all 512 colors in only 2/3 of a 60Hz frame, and nobody runs a fade at 60-steps-per-second.
Does anyone have any opinion about including a decent fade in HuC?
So.. what are the dynamics at play in this design? Speed and memory size? Is the routine going to be an automatic thing? As in, it only takes a rate argument for fade in/out? If so, does it have complete control of the main code until it's finished? Or is it a lighter process, that only does up to 64 sets of colors per call and is divided into prep and update functions, allowing game code to run at the same time (or at least same frame)? How much memory are you going to require (important for hucard projects)? Is the work buffer user defined and passed along as a pointer (so if can be reused for something else in the project)? Or is it an internal static defined size, that takes away from ram regardless?
I never really liked this trying to make one size fits all thing when designing libs/stuff for HuC. It'd be nice if it was something they directly included into the main source file (different small libs), than trying to attach it exiting library (in startup). Though I think doing that would require restructuring the main lib bank, and having support for bank directive directly in HuC.
As an update, of course my code didn't work (for the reasons Bonknuts illustrated). His did. No surprise there.
I don't think this needs to be in the core of HuC. It would serve better as example code, which someone could work to their needs.
Quote from: TurboXray on 12/01/2016, 09:42 PMSo.. what are the dynamics at play in this design? Speed and memory size? Is the routine going to be an automatic thing? As in, it only takes a rate argument for fade in/out? If so, does it have complete control of the main code until it's finished? Or is it a lighter process, that only does up to 64 sets of colors per call and is divided into prep and update functions, allowing game code to run at the same time (or at least same frame)? How much memory are you going to require (important for hucard projects)? Is the work buffer user defined and passed along as a pointer (so if can be reused for something else in the project)? Or is it an internal static defined size, that takes away from ram regardless?
Good questions! :-k
Taking control of the system would be impolite.
The goal would be to provide function calls that the HuC user can use to provide fast alternatives to writing their own code.
For example ...
void __fastcall get_colors( int *pbuffer<__td> );
void __fastcall get_colors( int index<color_reg>, int *pbuffer<__td>, unsigned char count<__tl> );
void __fastcall set_colors( int *pbuffer<__ts> );
void __fastcall set_colors( int index<color_reg>, int *pbuffer<__ts>, unsigned char count<__tl> );
void __fastcall fade_colors( int *psource<__si>, int *pdestination<__di>, unsigned char count<__al>, unsigned char fade<acc> );
Those make up a simple set of functions that do everything that DK wanted in Catastrophy, and ended up writing in either slow C code, or fast inline-assembly.
They do it fast, and they keep things flexible enough that you can use as-much or as-little resources as you need.
The "get" and "set" functions use TAI & TIA instructions for fast processing.
"count" is limited to a maximum of 128 for fast indexing
"fade" is a value 0-7.
I *think* that's enough basic functionality for the end-user to build pretty much whatever they want.
Can you think of a better *practical* design?
QuoteI never really liked this trying to make one size fits all thing when designing libs/stuff for HuC. It'd be nice if it was something they directly included into the main source file (different small libs), than trying to attach it exiting library (in startup). Though I think doing that would require restructuring the main lib bank, and having support for bank directive directly in HuC.
Making the libraries modular would be great ... but it's going to take a significant time-investment from whoever wants to do it.
Since there's no linker phase and dead-code elimination, so from what I'm seeing, HuC is pretty-much a behemoth right now.
But ... there is some argument for providing common functionality within the library itself, especially since the code that HuC generates to do the same stuff if you do it in C (like DK did for catastrophy) is going to be much larger and slower than the same code hand-written in assembly.
Quote from: DildoKKKobold on 12/01/2016, 10:14 PMI don't think this needs to be in the core of HuC. It would serve better as example code, which someone could work to their needs.
It's not like a "fade" routine is an uncommon requirement.
1) Your C code is big and slow, and generates a lot of Hu6280 code that a hand-written assembly function doesn't. That's not you ... that's just HuC.
2) Have you got a fade-up working, yet? :wink:
Quote from: elmer on 12/01/2016, 10:34 PM"fade" is a value 0-7.
I have no stake in this, but thinking down the road it might be better to add more granularity (0..15 or more) right now. For example, many Sega games have more levels of fading by fading out Red & Green at different speeds before finally doing Blue... and it looks fantastic and far smoother than 8 steps as on the PCE.
I did something similar (using lookup tables) for my HuZero game.
i made some fade out/in in this intro some times ago :
https://youtu.be/B3pdwEza6j4
Quote from: touko on 12/02/2016, 03:26 AMi made some fade out/in in this intro some times ago :
https://youtu.be/B3pdwEza6j4
That looks nice! :)
So what technique did you use for the processing each step of the fade up/down?
Quote from: ccovell on 12/02/2016, 01:14 AMI have no stake in this, but thinking down the road it might be better to add more granularity (0..15 or more) right now. For example, many Sega games have more levels of fading by fading out Red & Green at different speeds before finally doing Blue... and it looks fantastic and far smoother than 8 steps as on the PCE.
I did something similar (using lookup tables) for my HuZero game.
I haven't heard of that stepping technique before, it sounds interesting.
Do you have any more details?
It's trivial to switch to a 0..15 range, even if I'm only processing 8 steps, so I've done that.
I definitely agree with using a table-based approach ... it gives you the flexibility to change the tables and get a fade-to-white, or a fade-to-sepia, or to correct for any gamma differences.
For HuC, I suspect that it's just a case of the tradeoff between quality and memory usage for the tables.
I'm also limited by trying to keep compatibility with HuCard usage rather then just using self-modifying code.
Here's an implementation that uses a single 64-byte table for a simple 8-step fade ... can anyone suggest improvements?
; fade_colors(int *psrc [__si], int *pdst [__di], char count, char level)
; ----
; fade down an array of colors
; ----
; psrc: source buffer
; pdst: destination buffer
; count: # of colors, (0-128)
; level: level of fading (0 = black, 7 = full)
; ----
; color: color value, GREEN: bit 6-8
; RED: bit 3-5
; BLUE: bit 0-2
; ----
_fade_colors.4: asl a ; 2 fade level (0-15)
asl a ; 2
and #$38 ; 2
sta <__ah ; 4
lda <__al ; 4 # of colors
beq .l2 ; 2
asl a ; 2
phx ; 3
; 129 cycle inner loop.
; fade GREEN
.l1: dey ; 2
lda [__si],y ; 7 src color hi-byte
dey ; 2
lsr a ; 2
lda [__si],y ; 7 src color lo-byte
iny ; 2
sta <__al ; 4
rol a ; 2
rol a ; 2
rol a ; 2
and #7 ; 2
ora <__ah ; 4
tax ; 2
lda fade_table,x ; 5
asl a ; 2
asl a ; 2
asl a ; 2
tax ; 2
; fade RED
lda <__al ; 4 src color lo-byte
ror a ; 2
ror a ; 2
ror a ; 2
and #7 ; 2
ora <__ah ; 4
sax ; 3
ora fade_table,x ; 5
asl a ; 2
asl a ; 2
asl a ; 2
tax ; 2
cla ; 2
rol a ; 2
sta [__di],y ; 7 dst color hi-byte
dey ; 2
; fade BLUE
lda <__al ; 4 src color lo-byte
and #7 ; 2
ora <__ah ; 4
sax ; 3
ora fade_table,x ; 5
sta [__di],y ; 7 dst color lo-byte
cpy #0 ; 2
bne .l1 ; 4
plx ; 4
.l2: rts ; 7
fade_table: .db 0, 0, 0, 0, 0, 0, 0, 0
.db 0, 0, 0, 0, 1, 1, 1, 1
.db 0, 0, 1, 1, 1, 1, 2, 2
.db 0, 0, 1, 1, 2, 2, 3, 3
.db 0, 1, 1, 2, 2, 3, 3, 4
.db 0, 1, 1, 2, 3, 4, 4, 5
.db 0, 1, 2, 3, 3, 4, 5, 6
.db 0, 1, 2, 3, 4, 5, 6, 7
QuoteSo what technique did you use for the processing each step of the fade up/down?
The simpliest, add/sub 1 for each RGB componant .
Quote from: touko on 12/02/2016, 12:23 PMQuoteSo what technique did you use for the processing each step of the fade up/down?
The simpliest, add/sub 1 for each RGB componant .
Well, the fade-down is easy ... but what did you do for the fade-up? :-k
Are you using the simple "add 1 if not at target" for each component?
That tends to grey things out a little during the fade-up, like this ...
Target GRB : 456
Step 0 GRB : 000
Step 1 GRB : 111
Step 2 GRB : 222
Step 3 GRB : 333
Step 4 GRB : 444
Step 5 GRB : 455
Step 6 GRB : 456It's not a bad effect, and most people don't notice/care about it.
Just curious.
Quote from: elmer on 12/02/2016, 11:35 AMQuote from: ccovell on 12/02/2016, 01:14 AMI have no stake in this, but thinking down the road it might be better to add more granularity (0..15 or more) right now. For example, many Sega games have more levels of fading by fading out Red & Green at different speeds before finally doing Blue... and it looks fantastic and far smoother than 8 steps as on the PCE.
I did something similar (using lookup tables) for my HuZero game.
I haven't heard of that stepping technique before, it sounds interesting.
Do you have any more details?
I presume that you're talking about taking advantage ot the human eye's perception of brightness.
The RGB to Y (brightness) formula is ...
Y = 0.299R + 0.587G + 0.114BSo, to reduce percieved brightness, you need to remove more of the green than you do of the blue.
From a practical implementation POV, do you mean something like this? :-k
It changes things to a (0..17) range instead of (0..15).
This would provide a sort-of-half-step in the color transition, and delay the blue and red component fades.
fade_table_g: .db 0, 0, 0, 0, 0, 0, 0, 0
.db 0, 0, 0, 0, 0, 0, 0, 0
fade_table_r: .db 0, 0, 0, 0, 0, 0, 0, 0
fade_table_b: .db 0, 0, 0, 0, 0, 0, 0, 0
.db 0, 0, 0, 0, 1, 1, 1, 1
.db 0, 0, 0, 1, 1, 1, 1, 1
.db 0, 0, 1, 1, 1, 1, 2, 2
.db 0, 0, 1, 1, 1, 2, 2, 2
.db 0, 0, 1, 1, 2, 2, 2, 3
.db 0, 0, 1, 1, 2, 2, 3, 3
.db 0, 1, 1, 2, 2, 3, 3, 4
.db 0, 1, 1, 2, 2, 3, 3, 4
.db 0, 1, 1, 2, 3, 3, 4, 4
.db 0, 1, 1, 2, 3, 3, 4, 5
.db 0, 1, 2, 2, 3, 4, 5, 5
.db 0, 1, 2, 2, 3, 4, 5, 6
.db 0, 1, 2, 3, 4, 4, 5, 6
.db 0, 1, 2, 3, 4, 5, 6, 7
.db 0, 1, 2, 3, 4, 5, 6, 7
.db 0, 1, 2, 3, 4, 5, 6, 7
.db 0, 1, 2, 3, 4, 5, 6, 7
Or just this easier-to-read version with step (0..9) ...
fade_table_g: .db 0, 0, 0, 0, 0, 0, 0, 0
fade_table_r: .db 0, 0, 0, 0, 0, 0, 0, 0
fade_table_b: .db 0, 0, 0, 0, 0, 0, 0, 0
.db 0, 0, 0, 0, 1, 1, 1, 1
.db 0, 0, 1, 1, 1, 1, 2, 2
.db 0, 0, 1, 1, 2, 2, 3, 3
.db 0, 1, 1, 2, 2, 3, 3, 4
.db 0, 1, 1, 2, 3, 4, 4, 5
.db 0, 1, 2, 3, 3, 4, 5, 6
.db 0, 1, 2, 3, 4, 5, 6, 7
.db 0, 1, 2, 3, 4, 5, 6, 7
.db 0, 1, 2, 3, 4, 5, 6, 7
Quotebut what did you do for the fade-up?
You start with an entire black palette, and you add 1 for each component until you reach the good palette .
in fact for fade in and fade out i have a palette for reference to reach (black for a fade out, and the object's palette for fade in) ,i read directly the corresponding colors in the VCE,i make the fade and store it in a buffer(for sending later with TIA) .
I have a 256 bytes buffer for fading multiple palettes at same time .
QuoteIt's not a bad effect, and most people don't notice/care about it.
You're right, but is not noticeable as long as your fade is fast enough.
I think for best result, you must nomalise all your RGB component for each color first to avoid a dominant color at the end of fade .
EG : reaching a 444 or 222,333 and start the add/sub after that,and you end with 0 for each component . .
I personally like the idea of the fade table, for speed reasons. As in, fade is actually just a "brightness" (loosely termed) state of the palette, and fade is the transition from one level of brightness to another over time.
But of course, I'd say make this not a built in library function - but something the programmer can just include. I mean, there's no critical reason why it should be the very main bank of the main lib - so having it as a function with ASM inside of it, is no slower than the far call to the far end of the main lib. Speaking of which, there's probably a good number of stuff that probably should be in the main lib bank to begin with. And some stuff could easily be moved to include-able functions. I'm going to look into this as soon as winter break starts.
My table was a wasteful 512 bytes, mapping all 512 colours to the next step down... so it's a fadeout routine only. Anyway, the code (minus the table):
Fade_Down: ;Fades a specified palette down 1 step!
;A = Palette entry (0,$10,$20,$30...)
;X = 0/1 = BG or Sprite
;-------------------------------------------------------
;copy our specified palette from VCE to RAM
sta $0402 ;Point to colours
stx $0403
pha
phx
phy
TAI $0404,temp_pal,32
;----------
clx
.fade_loop:
lda temp_pal,X
tay
lda temp_pal+1,X ;(MSB is 0 or 1)
and #1 ;All other bits were set in VCE.
beq .lopal
;MSB was high; (leave it as-is...)
lda PALFADE1HiTblLSB,Y ;Get LSB
sta temp_pal,X
cpy #64 ;64th entry and up, MSB=1
bcs .next_entry
.zeromsb:
stz temp_pal+1,X
bra .next_entry
;----------
.lopal: ;MSB will always be zero anyway
lda PALFADE1LoTblLSB,Y ;Get LSB
sta temp_pal,X
.next_entry:
inx
inx
cpx #32
bne .fade_loop
;--------
;now copy RAM back to VCE
ply
plx
pla
sta $0402 ;Point to colours
stx $0403
TIA temp_pal,$0404,32
; A, Y, and X should be preserved here.
rts
So.. maybe this would be helpful?
The call code...
ldy iterations
ldx #low(xfer_source)
clc
jsr xfer_ZP
The self modifying code sitting in... Zeropage!
xfer_entry:
.loop
tia source,dest,num
set ;2
adc #$nn ;5 (RMW+T)
bcc .skip ;4:2
inc <low((.loop & 0xff)+2)+1 ;6
clc ;2
.skip
dey ;2
bne .loop ;4
rts ;7
xfer_source = (.loop & 0xff) + BASE_ZP + 1
xfer_dest = (.loop & 0xff) + BASE_ZP + 3
xfer_num = (.loop & 0xff) + BASE_ZP + 5
xfer_ZP = (.loop & 0xff) + BASE_ZP
Of course, it needs to be copied at least once to ZP buffer. That's what all the address translations are for via the equates.
Quote from: TurboXray on 12/02/2016, 05:37 PMOf course, it needs to be copied at least once to ZP buffer. That's what all the address translations are for via the equates.
Yes, I do like using a self-modifying Txx instruction in ZP. :wink:
This is what I've got, which uses HuC's __fastcall convention to have the compiler-itself set up the ZP locations that it can ...
; --------
; Alternate names when the parameter-passing area is used for
; a self-modifying Txx instruction.
;
__tc = $20F8
__ts = $20F9
__td = $20FB
__tl = $20FD
__tr = $20FF
; set_colors(int *pbuffer [__ts] )
; set_colors(int index [color_reg], int *pbuffer [__ts], int count [__tl] )
; ----
; index: index in the palette (0-511)
; pbuffer: source buffer
; count: # of colors, (1-512)
; ----
_set_colors.1: stz color_reg_l
stz color_reg_h
stz <__tl+0
lda #>512
sta <__tl+1
_set_colors.3: lda #$E3 ; TIA
sta <__tc
lda #$60 ; RTS
sta <__tr
lda #<color_data
sta <__td+0
lda #>color_data
sta <__td+1
asl <__tl+0
rol <__tl+1
jmp __tc
Yeah, classic Txx in ram setup.
But what I posted was a "safe" version. So that you don't delayed interrupts (in this case, since it's in vblank, the TIRQ routine), through smaller transfers and a small/fast iteration overhead.
It can also work for active video, with scanline interrupts and TIMER interrupts all firing like mad. The sample playback might be a little bit of jitter, but nothing Genesis cringe worthy. And the VDC buffer for next line should be able to absorb any delay (as long as the routine is tight). The best of all words: TIMER, H-int, and Txx availability. And the cost, if you did 32byte transfers, is 8cycles per byte instead of the 7cycles per byte. It makes Txx more usable IMO. Plus, it's a chance to use the T flag... who doesn't like using the T flag???
Quote from: TurboXray on 12/03/2016, 04:46 PMBut what I posted was a "safe" version. So that you don't delayed interrupts (in this case, since it's in vblank, the TIRQ routine), through smaller transfers and a small/fast iteration overhead.
Good point! :wink:
I was trying to keep a similar interface to the original functions which read a single sample, and I was thinking that they'd be run after a vsync() in order to avoid snow on the screen.
But you're right, I should still limit the TAI size in order to avoid blocking the TIMER interrupt.
My bad. :oops:
QuotePlus, it's a chance to use the T flag... who doesn't like using the T flag???
Hahaha ... also a good point, but keeping the low-byte of the address in the A reg and doing ...
adc #$20 ;2
sta <low((.loop & 0xff)+2) ;4
... is a cycle faster, and it avoids messing with my stack-pointer-in-X.
Of course ... I throw away that cycle with the setup and the JSR, but that's a tradeoff for not needing a permanent routine in ZP.
There's certainly an argument for have a few of these Txx-32-byte subroutines in regular RAM to use for self-modifying code, and IIRC, HuC already has one somewhere. :-k
Here's a corrected code, and of course, things are much cleaner in the CDROM version ...
; --------
; Alternate names when the parameter-passing area is used for
; a self-modifying Txx instruction.
;
__tc = $20F8
__ts = $20F9
__td = $20FB
__tl = $20FD
__tr = $20FF
; set_colors(int *pbuffer [__ts] )
; set_colors(int index [color_reg], int *pbuffer [__ts], unsigned char count [acc] )
; ----
; index: index in the palette (0-511)
; pbuffer: source buffer
; count: # of 16-color palettes, (1-32)
; ----
_set_colors.1: stz color_reg_l
stz color_reg_h
lda #32
_set_colors.3: tay
.if (!CDROM)
lda #$E3 ; TIA
sta <__tc
lda #$60 ; RTS
sta <__tr
lda #$04
sta <__td+0
sta <__td+1
lda #$20
sta <__tl+0
stz <__tl+1
lda <__ts+0
.l1: jsr __tc
adc #$20
sta <__ts+0
bcc .l2
inc <__ts+1
.l2: dey
bne .l1
rts
.else
lda <__ts+1
sta .l1+2
lda <__ts+0
sta .l1+1
.l1: tia $0000,color_data,$0020
adc #$20
sta .l1+1
bcc .l2
inc .l1+2
.l2: dey
bne .l1
rts
.endif
Quote from: elmer on 12/03/2016, 07:41 PMQuotePlus, it's a chance to use the T flag... who doesn't like using the T flag???
Hahaha ... also a good point, but keeping the low-byte of the address in the A reg and doing ...
adc #$20 ;2
sta <low((.loop & 0xff)+2) ;4
... is a cycle faster, and it avoids messing with my stack-pointer-in-X.
Of course ... I throw away that cycle with the setup and the JSR, but that's a tradeoff for not needing a permanent routine in ZP.
Good catch! My code is actually a little bit longer (I clipped it for the post), as it was meant for a demoscene part where it transfers two 28bytes segments, then sends a sample to the DAC - on large 90k cpu cycle loop. It's one of those dual circular interference patterns things, but translucent like the good ones.
Psychic World on the Sega Master System had a pretty smooth and nice-looking fade in & out routine, so I wanted to find out how it did it. Turns out it was a very simple 3-step process: for fade-outs, ramp down the red channel to zero, then do the same with green and blue sequentially. Almost a bit too primitive, but it actually looks good in-game.
The SMS has only 3 shades of each colour channel (compared to the 7 per on the PCE) so a fade would have only a total of 4 steps if all 3 channels were simply faded out at the same time. Doing it sequentially gives 10 steps total. I'm sure similarly nice effects can be achieved on the PCE.
(https://chrismcovell.com/images/Bonk2Anim.gif)
Above, using the SMS' limited colour space.
Quote from: ccovell on 12/11/2016, 09:06 PMPsychic World on the Sega Master System had a pretty smooth and nice-looking fade in & out routine, so I wanted to find out how it did it. Turns out it was a very simple 3-step process: for fade-outs, ramp down the red channel to zero, then do the same with green and blue sequentially. Almost a bit too primitive, but it actually looks good in-game.
I was doing this for Inferno on MSX, but the excessive amount of red in the game made it look pretty stupid, lol.
I ended up just going with a normal fade instead.
Chris, do you have a gif of Sonic's fade out (on the Genesis)?
Sorry, no.
Quote from: ccovell on 12/11/2016, 09:06 PMPsychic World on the Sega Master System had a pretty smooth and nice-looking fade in & out routine, so I wanted to find out how it did it. Turns out it was a very simple 3-step process: for fade-outs, ramp down the red channel to zero, then do the same with green and blue sequentially. Almost a bit too primitive, but it actually looks good in-game.
That's an interesting effect ... thanks for sharing that! :D
I don't know (yet) if I like if I like it, or not, but it's definitely "effective".
Quote from: elmer on 12/12/2016, 01:33 AMQuote from: ccovell on 12/11/2016, 09:06 PMPsychic World on the Sega Master System had a pretty smooth and nice-looking fade in & out routine, so I wanted to find out how it did it. Turns out it was a very simple 3-step process: for fade-outs, ramp down the red channel to zero, then do the same with green and blue sequentially. Almost a bit too primitive, but it actually looks good in-game.
That's an interesting effect ... thanks for sharing that! :D
I don't know (yet) if I like if I like it, or not, but it's definitely "effective".
there's alot of interesting fades you can do if you focus on blue-shenanigans, but most people refer to them as coding errors/bugs/not knowing what you're doing.
As opposed to just wanting a weird blue swirly fade that looks cool.
Quote from: Psycho Arkhan on 12/12/2016, 02:23 AMthere's alot of interesting fades you can do if you focus on blue-shenanigans, but most people refer to them as coding errors/bugs/not knowing what you're doing.
I beg to differ. I'd call it Sega's signature style on the Genesis.
I don't think Arkhan is saying its wrong, just that people perceive it as wrong - for whatever reason they apply (bug/unknowledgeable/etc).
Check this out:
http://info.sonicretro.org/SCHG_How-to:Improve_the_fade_in%5Cfade_out_progression_routines_in_Sonic_1
QuoteFrom Sonic Team's point of view, it may not be incorrect and is possibly intentional, however, from a logical point of view, this is incorrect fading
Some people's kids.. I swear.
Yeah, I mean, what the fuck is "incorrect" fading, anyways. Did it make it to the target colors eventually? Did it look neat?
I find it pretty derpy that people commenting on stuff from the 80s are saying fades that involve color swirlies, or something are wrong.
I mean if it fades and never makes it to where you want, or it causes actual software issues, I'd say that is wrong.
but, fades are inherently lawless. You can fade whatever to whatever.
People seem to think the only correct fade is one that fades from black or to black completely in unison.
So you'd say it's a derpy lerpy?
Thank you, I'll be here all week.
Quote from: Gredler on 12/12/2016, 04:47 PMSo you'd say it's a derpy lerpy?
Thank you, I'll be here all week.
yes. exactly, lol.
I did a normalized palette fade (read: correct by codesnobbery standards), and it looked worse than the "wrong fade" that goes to grays first.
Sometimes "right" can get bent.
Heck, my favourite fade is the Game Over one from NES Ninja Gaiden. ;-D
edit:
Quote from: TurboXray on 12/12/2016, 12:06 PMCheck this out:
http://info.sonicretro.org/SCHG_How-to:Improve_the_fade_in%5Cfade_out_progression_routines_in_Sonic_1
Holy sperglord-levels of missing-the-point on that page! The eye is sensitive to the different channels of colour differently, and so Sega was exploiting that, not to mention fading through 8 levels only will look too abrupt and might look like it's "shaking" to some viewers. :-P
Quote from: ccovell on 12/12/2016, 05:59 PMHoly sperglord-levels of missing-the-point on that page!
I loled while eating a cupcake.
Did you know cupcake can come out of your nose?
I do now.
Blue fades are probably one of the most interesting things to dick with, as blue is the most interesting of the 3 colors in terms of our eyes perceiving it.
Holy cow, this thread took a life of its own. Next someone is going to figure out how to do a Lucas-wipe transition on the PC Engine!
Quote from: DildoKKKobold on 12/13/2016, 01:01 AMHoly cow, this thread took a life of its own. Next someone is going to figure out how to do a Lucas-wipe transition on the PC Engine!
what one is that?
I made a few on MSX before settling on T&E soft's gridwipe because it's fuckin cool looking.