Graphic, Sound & Coding Tips/Tricks/FX/Etc. Tools for Development

OldMan · 01/18/2016, 10:52 PM

QuoteThat must be an huc thing

Could be. I didn't hunt it down

I suspect part of the HuC startup code does things like disable the user irqs, resets the display size, and turns the display off.
Which is why I wish I could replace the HuC startup sequence.

(Yes, I know it's possible. But I haven't gone through it enough to actually do it

elmer · 01/18/2016, 11:07 PM

Quote from: TheOldMan on 01/18/2016, 10:52 PMWhich is why I wish I could replace the HuC startup sequence.

One day you may decide to take a serious look at CC65.

Their PCE-specific support is still pretty shallow at the moment, but it's not actually necessary.

It only took me a few hours to put in my own startup code to get a test program running, and the flexibilty that it offers in where you put things (like its startup code and libraries) is absolutely tremendous.

It was really pleasant to be able to actually compile real ANSI C library code on the PCE (for utilities, if nothing else).

touko · 03/04/2016, 05:41 AM

i found on real hardware that large VRAM transferts with TXX can cause some random artifacts on screen (you can see it at start or after some repetitive reset).
Even with 32 bytes chunks,the problem is solved with classic transferts LDA/STA(i think CPU can be halted,and not with Txx) .
Maybe if Txx occur in some hblank timings the VDC can miss some write because he is busy,and CPU can be halted ??,i don't know if it's possible or not, but i cannot find any other good explanation .
Of course that large transferts were done with display off .

ccovell · 04/04/2016, 08:18 PM

Well, if you're doing HBlank timing, the problem could be this:

Acknowledging the HBlank and setting up the VDC for the next Hblank split requires writing to the VDC register.

Your non-interrupt code might be writing to the VDC as well.

So an H-interrupt might occur at any time between writing to the VDC register, address low-byte, and address high-byte, and data (or TXX inst.) of course.

So, that's 3 different places close together where the wrong data could go to the wrong add/reg, causing corruption (seen in BAT / tile errors).

TurboXray · 04/04/2016, 09:45 PM

Quote from: touko on 03/04/2016, 05:41 AMi found on real hardware that large VRAM transferts with TXX can cause some random artifacts on screen (you can see it at start or after some repetitive reset).
Even with 32 bytes chunks,the problem is solved with classic transferts LDA/STA(i think CPU can be halted,and not with Txx) .
Maybe if Txx occur in some hblank timings the VDC can miss some write because he is busy,and CPU can be halted ??,i don't know if it's possible or not, but i cannot find any other good explanation .
Of course that large transferts were done with display off .

Txx cannot be interrupted by any interrupt (although /RDY works just fine). If you have a setup where the first hsync line is setting display attributes (X/Y position ,etc), it can delay it. Are you using HuC mixed with ASM?

touko · 04/05/2016, 06:37 AM

@bonk:I use a custom huc, but i'am writing all my code in asm and all routines are custom/rewrited .
I let only huc managing the banks data .

@chris: No hblank timing, i'am only loading a simple background(no interrupt but vblank),i also tried with a simple Txx with SEI before,no code in my interrupt routines,same result .

touko · 04/17/2016, 10:28 AM

Hi, i did some tests on my SGX with the two methods for copying bytes between VRAM .
lda/sta and trb/tsb with 255 loops, on my sgx (with display off and on,same result)and trb/tsb is much faster,seems 2 raster lines less than lda/sta .

I don't know how take trb/tsb exactly compared to lda/sta , but less cycles for sure .

switching x-res to 512 pixels, has no effect,result is more or less the same, all two seem to be faster but not by a lot.
My tests confirm the ~15 cycles / instruction (trb/tsb $0002 and $0003),lda/sta are much slower .

lda/sta between the 2 vdc,is roughly the same, even if it is a little bit faster .

The question is, why TIA has the expected results(7 cycles/byte), when the others have not ??
My conclusion is ,reading VDC is much slower than writing .

elmer · 04/19/2016, 10:31 AM

Quote from: touko on 04/17/2016, 10:28 AMMy conclusion is ,reading VDC is much slower than writing .

Interesting results, thanks for doing the tests.

I wonder if Hudson put a 1-deep-write-buffer into the VDC chip so that the CPU wouldn't normally have to be delayed (beyond the single-cycle that we know about), and then the VDC could actually write the 16-bit value a few VDC cycles later when the next access-slot comes around.

It couldn't to that to a read request, and would (potentially) have to delay the processor until the next access-slot plus maybe an extra VDC cycle to handle delays within the chip itself.

I've got absolutely no proof, this is just an off-the-cuff theory to explain the results.

touko · 04/19/2016, 10:55 AM

interresting, but if you have a read/write buffer, isn't here for no delay(or at least the less as possible) ???

elmer · 04/19/2016, 03:06 PM

Quote from: touko on 04/19/2016, 10:55 AMinterresting, but if you have a read/write buffer, boring , but if you have a read / write buffer , isn't here for no delay(or at least the less as possible) ???

That's why I said "write-buffer", not "read-write-buffer" ... they're different things.

IMHO, it wouldn't make a lot of sense to design the VDC to pre-read the read-pointer into a buffer because you've got limited bandwidth, and you're much-more-likely to be using it for writing to VRAM.

Wouldn't a pre-read also be a lot more complicated to implement in silicon, especially for a function that doesn't get used that much?

Don't the timings seem to show that we can write as-fast-as-possible (except for the added 1 cycle on all VDC accesses)?

TurboXray · 05/14/2016, 02:31 AM

You know.. I don't remember if I tested just straight reading from the VDC during active display.

touko · 05/28/2016, 10:01 AM

But why reading the VDC is slower ??

TurboXray · 06/06/2016, 06:29 PM

I don't think it's reading that's slower, but switching back and forth between reading and writing.

touko · 06/07/2016, 04:53 AM

Quote from: TurboXray on 06/06/2016, 06:29 PMI don't think it's reading that's slower, but switching back and forth between reading and writing.

I noticed the same thing for read/write between the two VDCs.

TurboXray · 06/19/2016, 12:37 AM

TED mapping and register info updated on first post with links.

TurboXray · 06/23/2016, 06:06 PM

Dynamic tiles:

You figure a topic like this would have been talked/discussed to death, but there was something I thought was easy in approach, but helped rid the graphic look of short repetitive tile patterns.

The idea is as follows: Have different dynamic images (sets of 8 frames), but make them all have a common shared tile on the left and the right. Then you could seem them all together for a long pattern that didn't look like it consistently repeats; it'd look closer to a tilemap layout rather than dynamic tiles.

Depending on how you implement it, it does have its limitations and resource drag, but it can help break up shallow patterns with little resource overhead if done right.

TurboXray · 06/30/2016, 10:24 PM

I find that macros are a nice way to do Case type scenarios (compare lists):

Code Select

__DecodeMainFX:

        __CreateCase #$0c, __DirectVol
        __CreateCase #$0f, __SetSpeed
        __CreateCase #$0d, __PatternBreak
        __CreateCase #$e9, __NoteReTrigger
  rts

    
    
__DirectVol:
  rts
  
__SetSpeed:
  rts
  
__PatternBreak:
  rts

__NoteReTrigger
  rts

TurboXray · 07/06/2016, 02:37 PM

This belongs here:

Quote from: ccovell on 07/05/2016, 07:14 PM
Quote from: TurboXray on 07/05/2016, 12:26 PMccovell: How did I not know about this demo!?
I've asked myself that question, too. :-| Maybe people were preoccupied at the time.

Anyway, depending on the odd/even phase of the VCE at the time that you make it static, dithered patterns consistently give either a usable red (verging a tiny amount towards orange) channel over 15 shades, or a cyan (harder to wrangle because it is 2 RGB channels combined.) So:

Another test demo: https://chrismcovell.com/data/OldIsBeaut-Test.zip

I didn't use any special tools; removing the R channel from 24-bit pictures is good enough, and the R can be remapped separately to greyscale, or some ramped red/grey; the remaining G&B can be remapped down to PCE BG palettes, since there are now 64 colours possible per tile that have to be reduced.

elmer · 07/06/2016, 11:13 PM

Quote from: TurboXray on 07/06/2016, 02:37 PMThis belongs here:

Definitely! You guy are talking about really esoteric and fascinating stuff here ... way beyond my level as a "practical" programmer. It's a brilliant read!

TurboXray · 07/13/2016, 01:22 AM

Something for PCEAS:

Code Select

AlignByte256  .macro
  .org ( (* + 255) & $ff00)
  
  .endm

Like this:

Code Select

    AlignByte256
MyData: .incbin "mydata.dat"

Just in case you anyone didn't know how to do alignment in PCEAS. Plus, the macro makes it look clean.

Also, if you include your binary on the same line as your label, you can use sizeof() in PCEAS.

touko · 08/10/2016, 11:01 AM

I have an idea for map's collisions detection.
For now I'm using detection directly in VRAM ( very convenient for shooters ), but how to take in account many possibilities as dectructible or not, instant death , walkable or not ?.
If you do not have too many possibilities, the deal is to use tiles's pallettes, you can have up to 16 possibilities.
But you have to deal also with the wrap around scrolling (if your game scroll of course).

elmer · 08/10/2016, 01:22 PM

Quote from: touko on 08/10/2016, 11:01 AMIf you do not have too many possibilities, the deal is to use tiles's pallettes, you can have up to 16 possibilities.

Ouch, that sounds like a terrible waste of palettes!

Traditionally you'd use a separate collision map in main RAM/ROM with all the info that you need.

You main map would either be tile (8x8) or block (16x16) based.

Then you can either have a full collision map of the same size, or just index the collision/properties based upon the tile or block number.

You can either do the properties as 1 byte per tile/block, or if you just need a number of 1 bit flags, it's easy to access up to 2048 different tile/block attributes with a "bit attribute_table,x" instruction.

This sort of thing does (of course) need some tool-support to be usable in practice.

I believe that Mappy and ProMotion, for one example, both support a separate collision layer.

Now that ProMotion has dropped the price on the full product, and finally has an older version for free, I can't see much reason for folks to avoid using it.

touko · 08/11/2016, 04:29 AM

QuoteTraditionally you'd use a separate collision map in main RAM/ROM with all the info that you need.

Of course, but the idea is to avoid that file.

Gredler · 09/01/2016, 01:06 PM

This script for Photoshop popped up on a vfx artist group I am in, I am going to give it a shot for some vfx for a side project I am working on, but thought this was something in someone here might find useful.

The script creates animation sheets from a layered file of frames of animation,

https://github.com/bogdanrybak/spritesheet-generator

TurboXray · 11/11/2016, 09:14 PM

Extended dynamic tiles: Take the idea of dynamic tiles to a more advance approach.

Take this current image here:
/game_image.png

The dynamic tiles for this set would be something like this:
/tileset_1a.png

The red arrows are to indicate that the image shifts 8 pixels to its destination form. Typical for dynamic tiles, but here's the catch - they don't wrap. Instead, when the transition is the to last frame, or to the first frame, you update the tilemap itself by shifting these over entry in the map (left or right).

In this demo, I setup a block of 256 tiles to be the dynamic tileset. Of course, only 121 tiles are used in this example.

Why do this? Because you can make a whole second BG layer, as a real map layer, that can have these above "objects" anywhere in the map. The reason why they can be placed next to each other, is that there's also one common dynamic tile on the right or left sides to ALL the objects. The skulls and the lava actually can't be placed on the same line (map line) going across the screen for obvious reasons, but the window frames can be placed next to each other, at different vertical positions - etc. The "window" object represents the dynamic objects that I'm trying to show as example here. The skulls and lava are special case.

In this example, the second layer scrolls independently left or right, but not up and down - that's fixed. Only because the solution is a little more complicated, not that it can't be done. And in that case, the skulls would have to be sprites, an upper tile lava bubbles would also have to be sprites.

This type of effect isn't limited to dungeons or caves. Instead, imaging an open area where there is sky and clouds. Instead of the brick being the common connecting block, you could have a solid color sky block (say.. blue). You could even have different horizontal strips of clouds moving at different speeds (parallax), and the foreground would scroll as it's own layer (parallax clouds on far layer, foreground interactive layer). The cloud "objects" wouldn't have to be a fixed pattern or placement either, as long as they are separated by a single common block (8x8) in their "map" section. Each object can also have its own subpalette associated with it, so you're not limited to 16 colors for the whole fake BG layer.

Now for the resource part: This has to be all done in ASM. The tiles need to be embedded opcodes for speed. 256 tiles @ 4bit color depth takes 41k cpu cycles to update in a single screen, or 34% cpu resource. Of course, this being the far background layer - it typically scrolls slower than the foreground layer for most games (not all, though). In that case, assuming the far BG layer scrolls a half speed or multiples of half speed - you can divide that 34% requirement over two frames. And use the VDC VDMA transfer (set the res to high res mode before doing this) to move the final buffer of dynamic tiles over the map section, of if you like - keep two copies of the BAT active area in vram, updating it as needed for both, then switch to the alt one once the dynamic tile sequence is finished.

It's also not as easy as that. The BAT has to be updated. This is a read, test, conditional modify, write method. The fast way I could find to go about it, cost 30% cpu resource (updates a whole screen), but a more flexible method I did that read and wrote vram was 32% cpu resource. Again, depending on how you're using this advance dynamic tile setup - that could be divide over three frames; ~10.5% per frame. I won't go into details about that, because it would be easier to show what I mean in some demo code. But the above game (PC dos game) would benefit from that.

And honestly, I'd probably mix a little sprites as BG objects (edges) like Ys III does just to break up the hard edges.

tl;dr
You can do a whole screen with specials objects inside a dynamic tile system without having to write to a full screen buffer. And of course, doing more advanced than the simple 16x16 block pattern of typical dynamic tiles done in PCE games.

Psycho Punch · 11/11/2016, 11:01 PM

bonknuts, the second image's not showing

TurboXray · 11/12/2016, 12:13 AM

Ok. Fixed. How about now?

TurboXray · 11/12/2016, 01:00 AM

Here's an example with the cloud objects..

/cloud_set.png

This has seven dynamic objects; 212 total tiles. As you can see, the common tile for all of them is the leading blue tile (column). The dynamic objects can be placed anywhere in 8x8 grid location, as long as the one column of common tile separates them.. though they could be separated by more for whatever reason; that would be placement in a tilemap strip.

This isn't the best example, but it should show something other than classic brick style that's so common on the PCE.

Also note, but not pictured above, object can be joined by a common column of tiles - think about how mountain ranges are connected, or clouds are normally joined in tilemap setups. So it's possible to have those types of connections too. Or actually mix it up; certain objects belong to certain common column sharing (tiles).

Lastly, the red "1" and "2" would represent two different tilemap strips at different speeds. But the draw back to this, is the objects in the second column would need their own distinct dynamic set. They can't share objects of "different speeds" for obvious reasons (the updates to the animation isn't the same between the two map strips). But it allow parallax on the pseudo layer.

touko · 11/12/2016, 05:36 AM

If you have less than 192 tiles (in H32) and enough vram, i think it's better to use DMA for dynamic tiles .
In your first exemple 121 tiles is 31ko of vram, easily doable(obviously more for SGX) .
You can also mix the two techniques .

You can maybe use a 1 or2bp for 2nd layer's tiles,this technique was very used on C64 games :

Note tiles was also used for player shoots .

Of course is like you call "classic brik style" but i think done cleverly,it can do a very good parallax/2nd layer effect .

TurboXray · 11/12/2016, 03:03 PM

Quote from: touko on 11/12/2016, 05:36 AMIf you have less than 192 tiles (in H32) and enough vram, i think it's better to use DMA for dynamic tiles .
In your first exemple 121 tiles is 31ko of vram, easily doable(obviously more for SGX) .

Well, if it's all sitting in vram with all 8 frames, then there's no need to even do a vDMA since you have direct access to any from *and* you are modifying the map each frame.

And that's an option. And would make parallax parts of the map (pseudo layer) of any object pretty easy to do (since you have access to all objects, scroll speed of the object is directly related to which sets of frames you point).

But generally, I don't like wasting vram like that unless there's a really big benefit for doing it.

QuoteYou can maybe use a 1 or2bp for 2nd layer's tiles,this technique was very used on C64 games :
Note tiles was also used for player shoots .

Of course is like you call "classic brik style" but i think done cleverly,it can do a very good parallax/2nd layer effect .

Yeah, but those are different because it's one single repeating pattern (brink) - different subject and different approach. The object method uses a map that allows any configuration and placement of those animated objects.

touko · 11/12/2016, 03:19 PM

QuoteWell, if it's all sitting in vram with all 8 frames, then there's no need to even do a vDMA since you have direct access to any from *and* you are modifying the map each frame.

It's more difficult and is consuming more CPU to modify each tiles entry than swapping tiles datas IMO,if you have enoug VRAM to spare, VDMA is almost free .

Quoteyou are modifying the map each frame.

What do you mean ???, by hand with CPu ??

QuoteBut generally, I don't like wasting vram like that unless there's a really big benefit for doing it.

Of course you're right, but if you have VRAM to spare why not ??
You can freeing CPU for others purpose ;-) .

TurboXray · 11/12/2016, 04:10 PM

Quote from: touko on 11/12/2016, 03:19 PM
QuoteWell, if it's all sitting in vram with all 8 frames, then there's no need to even do a vDMA since you have direct access to any from *and* you are modifying the map each frame.
It's more difficult and is consuming more CPU to modify each tiles entry than swapping tiles datas IMO,if you have enoug VRAM to spare, VDMA is almost free .

More advance effects need more difficult setups and more cpu resource.

Quote
Quoteyou are modifying the map each frame.
What do you mean ???, by hand with CPu ??

Just as I said earlier, the pseudo BG layer is made up of dynamic tile objects - no simple brick style repeating pattern. Those objects are attached to a separate map layer that gets composited into the regular BAT layer. Once each object completes its frame rotation, it gets set back to #0 and the tilemap is updated with the new position (advance the pseudo tilemap layer to the next 8x8 entry and do a new composite into the regular map/bat).

QuoteOf course you're right, but if you have VRAM to spare why not ??
You can freeing CPU for others purpose ;-) .

*If* you have it to spare, sure. But it highly depends on the setup (how many tiles you want to use or how many sprite frames you want to have in memory to keep updating bandwidth down to a minimum).

touko · 11/13/2016, 09:00 AM

QuoteJust as I said earlier, the pseudo BG layer is made up of dynamic tile objects - no simple brick style repeating pattern. Those objects are attached to a separate map layer that gets composited into the regular BAT layer. Once each object completes its frame rotation, it gets set back to #0 and the tilemap is updated with the new position (advance the pseudo tilemap layer to the next 8x8 entry and do a new composite into the regular map/bat).

Ah ok, i see now ..

Quote*If* you have it to spare, sure. But it highly depends on the setup (how many tiles you want to use or how many sprite frames you want to have in memory to keep updating bandwidth down to a minimum).

i agree,and i think it's more suited for SGX than PCE.

TurboXray · 11/15/2016, 02:22 PM

With context this might seem a little confusing, but..

Code Select

	st2 #$xx
	st2 #$xx
	st2 #$xx
	st2 #$xx
	st2 #$xx
	st2 #$xx
	st2 #$xx
	st2 #$xx
	bbr0 zp0,.skip0
	rts
.skip0
	st2 #$xx
	st2 #$xx
	st2 #$xx
	st2 #$xx
	st2 #$xx
	st2 #$xx
	st2 #$xx
	st2 #$xx
	bbr1 zp0,.skip1
	rts
.skip1

(Doesn't have to be all ST2 opcodes; can be st1/st2 as well)

I.e. you can break up long runs of pixel writes as short blocks, and control the length (in a course amount) with a bitmask in a series of ZP variables. In this example, I'm writing lines instead of columns because I have the VDC write incrementor set just right.

My transparency demo (that uses TF4 BG) could really benefit from this. You can do dynamic tiles are columns, or as single bitmap lines (the VDC allows either write method). Each have their advantages and disadvantages. Column writing allows easy re-positioning to make a large area scroll horizontally with only have frames - but it's more complicated if you do vertical scrolling (shifting). Line mode allows vertical scrolling, as well as doing hsync sine wave effects and vertical scaling effects, as well as easy vertical mirroring - but is more difficult to scroll horizontally.

All this is in relation to really large "brick style" dynamic blocks. Stuff half the size of the screen, or possibly larger than the screen itself (I have such a demo effect that uses this, it just needs a real demo to be part of).

The TF4 transparency demo for PCE, if anyone has seen it, basically leaves the first two planes of a PCE tile (p0,p1) for tile data. That's 4 colors, but more if you use subpalettes (3*16 + 1= 49 colors to be exact). The second composite tile of the 4bit tile is plane 2/3, which the cpu writes a large dynamic tileset data to. The cloud layer is made up of three colors, and the tiles are 4 colors. Each color of the could layer corresponds to a set of 4 hue tinted colors in the current subpalette. With color #0 on the cloud layer showing normal colors of the tile underneath it. Like I said, you can use different subpalettes for any of the tiles, as well as they all have a cloud hue tinted set in them (can be whatever and different from tile to tile).

Two issues with this approach for the demo: the background "area" that's affected by the transparency part needs to be actual bitmap buffer. This is easily done with tiles; you just stream the right edge of the screen (off screen) with a single column of tiles when needed. Not a big deal and barely any cpu resource to do this (the nice thing is you can do easily tile flipping support this way that the PCE normally doesn't support). In the TF4 pce TP demo, only the area where the cloud layer is, needs to be this bitmap thingy. The rest of the tilemap can be regular tiles, meaning the buffer doesn't need to be that large if you don't need it to be.

The second issues; the most efficient way to write the cloud layer. If you've seen the demo, you'll notice at some point that when the map keeps scrolling, transparency overlay gets stuck. That's because the demo was never finished. But it's also because the demo doesn't handle "wrap around". So what you're seeing is a linear stretch, and then something it can't handle (wrap around). To handle wrap around, you need to be able to write with specific start and stop points. In the TF4 demo, it does "line" mode. This allows it to write 1/8 of the whole image in one long st1/st2 opcode output to the VDC. To put that into prospective: 256x112 (I think that's the height of the cloud layer) would take 256x112 @ 2bit = 7,168 bytes to write to vram. At 5 cycles a byte, that's 35,840 cpu cycles or 30% cpu resource. In actuality though, since there are gaps in the cloud layer, those can be stores as ST2 opcodes - saving one write per 8x1 blank area. For the sake of example, let's say that is 15% blank space. That brings down the cpu write sequence to ~25% cpu resource. Now notice that the cloud layer is at the bottom of the screen? 7,168 * 5 cycles = 35,840 cycles / 455 cycles (one scanline) = 79 scanlines. This means I can actually do this during the top of active display; I have enough time to race the display, leaving vblank free and leaving the rest of active display free (I'll just assume active display is 224 scanlines tall). Another method, if the cloud layer was at the top of the screen instead of the bottom, would be the "trail" the display so the changes being made on that frame don't show as you're writing - blah blah blah.

Here's the video.. (touko uploaded it)

Draft: I have a little more to write, so I'll either update this post or just post some more..

Ok, so the second issue isn't cpu resource (at least not yet) but getting the dynamic offset image to the screen image buffer, and have it wrap around. One easy to do to his is store the image as column data, and after you cycle through all 8 frames of the shifted image, offset the column to + 1. Of course, mod (%) by the length for wrap around. The concept is simple. But here in lines the problem; the composite tile format. While this helps us in letting the VDC do the transparency work for us (this is how plane format facilitates transparency effects - a crude way), the composite format is now a hindrance. For column writes, can only write to one set of plane pairs, this means only 16 bytes can be written before you have to increment the vram pointer. This is going to take somewhere between 28-34 cycles *if* you embedded that into the graphic data itself (using A:X to hold vram offset, or X as an index to a table). That adds another 15k cycles on top of the 35k (unoptimized version; no gap optimiziation). Another approach is to write only one line of pixels per tile; write 1/8 of each composite tile. If the height of the image block to write to the screen is 112 pixels, that's 14 lines at 2 bytes each.. so 28 bytes written before a vram pointer re-position is needed. 7,168 / 28 * 34cycles = 8,704 cycles. Not bad. Almost cut the overhead by half.

If we know the block of data is wider than it is tall, we could optimize for that horizontal writes - but this introduces other problems. With column writing, it's easy to offset every 8 pixels when needed. Line writing doesn't allow that. If you break line writing up to smaller segments, like the original code show above, you can have a string of data at a smaller course length to the buffer. You can even jump into the middle of a string of data (opcodes), or anywhere from start to finish. If you think about this from the left side vs right side problem, dealing with wrap around when there isn't alignment, the right side is going to be the problem. The left side can be dealt with by jumping into the middle or whatever offset of the stream length (above is 8 pixel writes, to segments of 8 pixels - and the shift frame takes care of the intra pixel offset inside that 8pixel segment).

But how to deal with the right side. One way is to handle an over spill area. This is an offscreen area allows the extra data to be ineffective to the display. The downside is now the buffer is a little bit wider. Not doing an over spill area means having to write out manually the remaining bytes (pick your poison). Both methods are more complex than the column mode, and both methods require a good size jmp/jsr table for offsets. They also require data of "sets" (shift sets) to be banked align so that one routine works for all data shift sets. And to top it off, you still have to reposition the vram pointer once per "line" write (if you start from the left first, this is only once per line even on a wrap around point). What's the advantage in cycles? Well, the current cloud layer is something like 256px wide. So that's 32 paired writes (32 8x1 @2bit line segments) for 64 bytes. The code for 8x1 cells has an overhead of 8cycles (BBR instruction), so that averages out to be 5.5 cycles a byte written in a block of 8 segments (16 bytes). And only one 26-34cycle over head for vram reposition. But you'll have overhead from the spill area write, each line, to include into that account.

In the end, the line method will be super convoluted and might only be slightly faster than the column mode version, and generating those tables for the offsets is going to be a huge pain in the ass - but all said and done, the line method would allow you to sine wave effects both horizontally and vertically like with a normal PCE map/bg layer, on top of having vertical scrolling ability too (animate the layer scrolling up from the bottom). The column method is easier, but can't do anything like the line method can do. So like I said, super convoluted but it's also a one and done type of deal. Once working, it'll be a really powerful effect for the PCE.

As far as those jump tables are concerned, I'd most surely write a PC app to generate that code. No amount of macros in PCEAS is going to make that an easy job.

Is this extreme? You bet. But is this doable? Completely. And from cpu resource perspective, incredibly doable. It might not be representative of what any dev would do back in the day, but this isn't what that's about. This is about pushing the system to its limits - to see what it can do.

Just to note: the cloud layer does not have to remain static. It can scroll at its own speed in either direction (right or left). Both methods work, and both methods allow the cloud layer to scroll left or right, but line method allows for additional effects to be applied to that layer.

Also note, the transition line.. right above the cloud layer - those are no longer 2bit tiles. They're 4bit tiles, a allowing the mountain range to use 4 colors total, and still have a 5th one as well as any static cloud pixel data (more colors). So no, the whole screen doesn't need to be made up for 2bit colored tiles. But even for the areas that are, you still have subpalettes to break up the color usage, and that the transparecy layer will still apply to those subpalettes.

touko · 11/15/2016, 02:52 PM

I love this demo ;-)

roflmao · 11/15/2016, 03:34 PM

Awesome!

TurboXray · 11/15/2016, 04:27 PM

Fixed some typos. I should do a proper column mode one, with additional subpalettes. But I'll need something other than a mountain range to show it off.

TurboXray · 11/16/2016, 07:30 PM

Is anyone interested in doing some quick pixel art, tilemap work?

I'm looking for 256x4096 pixel image (for vertical scrolling), made of 8x8 blocks/tiles of 4 colors. Basically 2bit color, so 3 unique colors per tile plus one common global color. Can use any of the 16 subpalettes for those 3 unique colors.

I have a vertical scrolling demo that I'm going to code over thanksgiving weekend. Just need some graphic assets. Just to note, tiles count can be up to 4096 tiles or more, and can be vertical or horizontally flipped. Space or canyon, or whatever theme you want (vertical shmups) - can transition. If you're not an artist but have suggestions of graphics to use that could be converted down to these limitations, post it! Ala SpaceMegaforce, Musha, etc.

ccovell · 11/17/2016, 09:05 AM

Here are some graphics that fit the spec. You don't have to give me credit in your final game.

Nazi NecroPhile · 11/17/2016, 09:18 AM

That's one tall mushroom!

esteban · 11/17/2016, 03:16 PM

Quote from: ccovell on 11/17/2016, 09:05 AMHere are some graphics that fit the spec. You don't have to give me credit in your final game.

Rock•On

TurboXray · 11/17/2016, 06:49 PM

Not sure that serves my purpose. Hmm.. Tell-you-what, I'll make it an unlock-able via control code.

johnnykonami · 11/17/2016, 06:56 PM

Quote from: ccovell on 11/17/2016, 09:05 AMHere are some graphics that fit the spec. You don't have to give me credit in your final game.

Dammit, I thought today at work (at my new job) I would just browse PCEFX for a minute. then this came up. Thanks, Trump!

ccovell · 11/17/2016, 07:02 PM

Quote from: TurboXray on 11/17/2016, 06:49 PMNot sure that serves my purpose. Hmm.. Tell-you-what, I'll make it an unlock-able via control code.

Let me guess: waggle the joystick back and forth for 30 seconds to unlock?

TurboXray · 11/18/2016, 02:00 PM

Yuss! And.. maybe.. if detected while the code is being entered.. a little "fap" icon with appear on and off in sync.

TurboXray · 12/18/2016, 01:17 PM

I have a lot of the ideas for doing stuff on the PCE, some public and mentioned here, and some not (private). I really want to get around to show casing these ideas in some demo form. It's fine and dandy to talk about them, but I know in my heart of hearts that no one is probably going to implement them. Since these demos I'm working on are to do with tips and tricks, and how to implement them - I'll post my progress here.

BG layer made up of sprites. I know I've already posted a bit on this, in this thread, but now that I have some free time (finished my last of my Finals on Thursday), I wanted to demo this idea. So this is what I'm working on:

/32x32_meta_tileset_vram_address.png
I'm doing this one by hand (making the decoding LUTs for the above metatile set), and to get through it quickly (making the tables), I have some redundant tiles in vram. Just ignore those for now.

Some perspective: for this demo the gameplay/action window is 208x176 (it could be longer with a status bar, which is irrelevant). The foreground layer is a map where each entry is 32x32pixel metatile. Each entry is used to index a series of LUTs, to break down the metatile into hardware sprites. To keep things optimal, it's best to have the majority of metatile entries decode directly into a single hardware 32x32 sprite. This is to keep the SATB usage as low as possibly, as well as the number of objects per scanline at minimal (not pixels per scanline, but objects per scanline - since now with a clipped display of 208, the sprite pixel limit exceeds the width of the display by 48pixels). Some entries are made up of paired 32x16 or 16x32 hardware sprites, to save on vram wastage. Some even have 16x16 sprite entries. The blue blocks in the metatiles, specifically in pairs, represents no hardware sprite.

So a screen display of 208x176 has a max object capacity of 7x7 metatiles. I'm keeping this example simple by parsing every metatile of the map, relative to the display area, every time there is screen movement. I could optimize this to cover just the 2 sides of the screen (diagonal direction for scrolling), and cache the hardware objects so I only have to update their X/Y positions instead of rebuilding them every frame (on a scroll change), but the complexity of such a map engine takes time. And there are some other ideas/demos I want to make in my in between semester break. So, every time there's a scroll change, every object gets re-decoded. I've set the limit at 300 cpu cycles per 32x32 metatile decode, and assumed the worse case scenario is that all metatiles translated into real hardware sprites (blank or null entries are normally quickly bypassed), then I'm looking at 7x7 = 49 x 300 = 14,700 cpu cycles or 12% cpu resource. And in that process of decoding, I'm also filling/updating a buffer/temp collision map in ram. So some of that 300 cycles is used for the collision map. So far in the decode routine that I've written, I haven't come close to the 300 limit, but I'll know in the end.

Just FYI - this foreground map setup isn't really made to simulate something like Super Mario games; I just wanted something to show that was more than just "blocks". This method has limitations that affects design, but some games (with or without modification) could be used to represent what you can do with this. One of those limitations, is depth. And by depth, I mean that the surface the main character walks on - should be a solid line (how they walk on it and not necessarily how its drawn). There are some careful designs where this doesn't have to be true, but then level design gets more complicated.

I think this video of Super Adventure Island presents some stages for visualization:

When I mentioned trying to keep a "flat surface" for the character to walk on, I don't mean that the pixels have to be flat all the way across the surface. In the video, look at 2:52 with the top of the stones graphics. Higgans walks on them as if they were flat, but clearly you can see small pixel gaps on the surface. This is fine. In the same level, the dirt/ground foreground area is fine, but the grass "foliage" should be made a little bit more sparse if it's going to appear in front or behind the character.

@13:19 - is probably the perfect example of how to use this sprite foreground method.
@17:06 - the sprite map as the back layer, and the hardware BG layer as the foreground layer.
@21:00 - imperfect surface (snow) treated as perfect flat surface and slopes. Perfectly doable. Trees are fine too. The snowflakes, in front or behind the sprite object layer, works too. Might be some slight issues for the fortress graphic at the end of the stage (would have to be modified).
@23:50-24:16 - implements fine with sprites, even the columns at the end. But the transition beyond 24:16 would have to be a little handled differently. But once @24:28, then it's fine again (switch to BAT as foreground, sprites as background).

TurboXray · 12/23/2016, 01:24 PM

I updated the main thread title and added tools section to the first post. I'll add more links. If you guys have tools, post the links there and I'll update the main post.

Added bizhawk emulator for PCE; main feature is LUA scripting. Perfect for when you need a graphic overlay of what's happening in your game (or hacking in general). Really nice and easy to use.

TurboXray · 12/25/2016, 12:16 PM

I was adapting my palette sorting app for 2bit and 3bit tiles support, when I came across a small bug in the merging routine. For some reason it was off by 1, so it only merged if source palette was less than the destination palette instead of <=. It didn't have a big impact on 4bit tiles, but it sure did on 2bit and 3bit. But.. it did have some surprising effect:
/Untitled-4.png

Now the difference is between 1 and 3 subpalettes smaller for 4bit images. Hah! A nice xmas present for me

esteban · 12/28/2016, 10:52 AM

Quote from: TurboXray on 12/25/2016, 12:16 PMHah! A nice xmas present for me

It sounds like you are having A Very TurboXMAS.

elmer · 12/28/2016, 02:37 PM

Quote from: TurboXray on 12/25/2016, 12:16 PMNow the difference is between 1 and 3 subpalettes smaller for 4bit images. Hah! A nice xmas present for me

Nice! It's always good to fix a bug and have it make a noticeable improvement.