OMG! ZIRIA! ZIRIA!! ZIRIA!!! IT ACTUALLY HAPPENED!! 34 YEARS LATER!! The epic/legendary Tengai Makyou/Far East of Eden: Ziria JRPG has finally been localized! Supper the Subtitler struck again! Simply unstoppable, NOTHING can prevent him from TOTAL PCECD localization domination!!!! WHACHA GONNA DO BROTHER?!?!
Main Menu

CC65 and the PCE

Started by elmer, 02/27/2015, 02:36 PM

Previous topic - Next topic

0 Members and 3 Guests are viewing this topic.

elmer

#100
Quote from: TheOldMan on 02/23/2016, 03:28 AM
QuoteAnd I guess really, the real dilemma for me is the whole Squirrel thing.
Don't worry about it. From what I've seen, we can put the player in whatever page we want, and fix the startup/irq code to handle it. It shouldn't be any worse than the Huc modifications.

My worry is getting the Huc functions to work. Guess I'll install this, and see what I can do..
It really shouldn't be too hard to get any ASM source transferred over.

It's the subtle changes in syntax that'll annoy you.

Like "TAM #$40" instead of "TAM #6", or the horrible old original "LDA (<$F8),Y" instead of "LDA [$F8],Y".

I might see if I can hack it to accept the square bracket syntax (the TAM syntax doesn't bother me, since that's what Mednafen displays ... although I could always hack that, too).

EDIT:

They do allow "TAM6" instead of "TAM #6".

Looks like it should be easy to change the '(' to a '[', but that will probably kill the assembler's ability to properly deal with the SNES's 65816 24-bit "far" addresses.

I suspect that we don't really care too deeply about that!  :wink:

Oh ... one last thing, the MACRO syntax is totally different.

Gredler

Let me just say thank you guys for furthering the tool progression, for non-programmer types like myself it's endlessly helpful to read through your opinions and documentation! Bravo, I owe you guys some beers or sodas!

MooZ

I started working on a pceas/ca65 stuffs.
The macro syntax is different but you can get something close but not enough to avoid doing the job twice.
Example :
https://github.com/BlockoS/HuDK/blob/master/include/pceas/word.inc
vs
https://github.com/BlockoS/HuDK/blob/master/include/ca65/word.inc

elmer

Quote from: elmer on 02/23/2016, 11:39 AMIt's the subtle changes in syntax that'll annoy you.

Like "TAM #$40" instead of "TAM #6", or the horrible old original "LDA (<$F8),Y" instead of "LDA [$F8],Y".

I might see if I can hack it to accept the square bracket syntax (the TAM syntax doesn't bother me, since that's what Mednafen displays ... although I could always hack that, too).

...

They do allow "TAM6" instead of "TAM #6".

Looks like it should be easy to change the '(' to a '[', but that will probably kill the assembler's ability to properly deal with the SNES's 65816 24-bit "far" addresses.
I guess that I should really ask the programmers here ... do you care about this stuff?

In some ways, I don't like the idea of changing CC65 "just-for-the-sake-of-it".

As it is now, it's using the offical syntax for both the HuC6280 and the 65816.

It's pceas that's got the "TAM" wrong, and while I personally find the '[' indirection easier to read than the '(' version, and while there are lots of 6502 assemblers use it ... it's not the "official" syntax.

I think that I can make CA65 accept either syntax (on everything except the 65816) ... but is that an "ugly" hack that just shouldn't be done?

How do you guys feel about it?

MooZ

I would fix the non standard compiler instead.

OldMan

QuoteHow do you guys feel about it?
Personally....I'll adapt. Either '[' or '(' is fine

Can I macro (something like) MPR6 = $40?

elmer

Quote from: TheOldMan on 02/24/2016, 02:04 AM
QuoteHow do you guys feel about it?
Personally....I'll adapt. Either '[' or '(' is fine

Can I macro (something like) MPR6 = $40?
Do you mean just a regular constant, i.e. "TAM #MPR6"? Sure, that would work.

If you use the "TAM6" syntax, then you could just write a macro for that on pceas.


Quote from: MooZ on 02/24/2016, 01:41 AMI would fix the non standard compiler instead.
??? CC65 ???

CC65 has properly compiled the standard C89-compliant code that I've thrown at it ... which HuC's small-C-based compiler doesn't.

I'm not sure what you mean.

MooZ

You got me wrong :)
I was meaning pceas.

elmer

Quote from: MooZ on 02/24/2016, 12:42 PMYou got me wrong :)
I was meaning pceas.
Ahhh ... I thought that was probably what you meant, thanks for the clarification.

--------------

I've asked the CC65 developers what their opinion is on putting in an option for using square-brackets instead of braces.

As a "programmer", I personally prefer the square-bracket, because it makes things easier to read (to me), and because I like to put expressions in braces, particularly when they are complex expressions, so I prefer

    lda (structure+member_offset),y

rather than

    lda structure+member_offset,y

It's just not possible to use a complex expression in some 6502 instructions if the opening brace of that expression changes the meaning of the instruction.

But, like TheOldMan, I can adapt if necessary.

MooZ

I agree. The code is more readable with square-brackets.

elmer

Quote from: elmer on 02/24/2016, 12:57 PMI've asked the CC65 developers what their opinion is on putting in an option for using square-brackets instead of braces.
They're OK with the idea in principle ... so now it's all up to the details.  :)

I've submitted a patch to them that will allow us to use the square-bracket syntax in any assembly source file, controlled by a ".feature" setting.

I look forward to hearing what changes they ask for.


Quote from: Gredler on 02/23/2016, 01:33 PMLet me just say thank you guys for furthering the tool progression, for non-programmer types like myself it's endlessly helpful to read through your opinions and documentation! Bravo, I owe you guys some beers or sodas!
Definitely beer ... I've heard that soda is bad for you!  :wink:

As I said a year ago in this very thread ...


Quote from: elmer on 02/28/2015, 08:28 PMI'm just stirring things up a bit with a different set of expectations.

If I don't back up my observations with some action to improve things ... then you'll have every right to just dismiss me as a whining idiot.

Arkhan Asylum

Quote from: TheOldMan on 02/23/2016, 03:28 AM
QuoteAnd I guess really, the real dilemma for me is the whole Squirrel thing.
Don't worry about it. From what I've seen, we can put the player in whatever page we want, and fix the startup/irq code to handle it. It shouldn't be any worse than the Huc modifications.

My worry is getting the Huc functions to work. Guess I'll install this, and see what I can do..
Yeah, I'd figure the HuCard stuff is the trickier part since the CD one just calls into existing stuff.    Since potentially using CC65 lets us literally pick everything, we can stop things from getting shoved around how we don't want.

I haven't had time to look at PCE code due to MSXing.   


I'd also be more worried about the screen/vram related library functions, because they are more involved/goony.



As for the ( vs [.

Use both.   

tniasm for MSX supports () and [] for indirection.   It's nice to have that kind of support.

This "max-level forum psycho" (:lol:) destroyed TWO PC Engine groups in rage: one by Aaron Lambert on Facebook "Because Chris 'Shadowland' Runyon!," then the other by Aaron Nanto "Because Le NightWolve!" Him and PCE Aarons don't have a good track record together... Both times he blamed the Aarons in a "Look-what-you-made-us-do?!" manner, never himself nor his deranged, destructive, toxic turbo troll gang!

elmer

I've got no idea how long it's going to take to get the CC65 patch approved and put into the main line.

If anyone wants/needs a custom-build of CC65 with the new "square-bracket" option, then I can upload one to my dropbox.

OldRover

As long as by making things standard you don't introduce a whole shitload of additional bloat, I'm all for that. :lol:
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

elmer

Quote from: OldRover on 02/28/2016, 07:57 PMAs long as by making things standard you don't introduce a whole shitload of additional bloat, I'm all for that. :lol:
It's a tiny patch. The "documentation" and the stuff for enabling the patch by using the ".feature" command together dwarf the actual patch itself.

Here you go, it's in the "queue" ...

https://github.com/cc65/cc65/pull/269

elmer

FWIW, the option for using "[]" for indirection is now in the mainline CC65 repository.  :)

https://github.com/cc65/cc65

MooZ


freem

alright, I've been trying to get the hang of doing PCE development with ca65 and ld65...

Managed to load a font and write to BAT, though I've been having some trouble dealing with VRAM locations...

IMG

rom and source code is available here; requires gnu make and a recent ca65/ld65 set.

With relation to VRAM troubles, I have to offset the location the file gets loaded by 4 (it was previously 3 before I changed how I was calculating the VRAM address)... not sure how to go about fixing this, so I'm hoping one of the PCE gurus will check it out and lend me a hand :p

touko

#118
Hi, first you set your VRAM position badly

you did it like that:
st0 #VDCREG_MAWR
st1 #<$0804
st2 #>$0804

It must be:
st0 #VDCREG_MAWR
st1 #<($0800 >> 4)
st2 #>($0800 >> 4)

All VRAM read/write addresses must be (my_address >> 4), and for sprite patterns (my_address >> 5)

PS:Your source code is very clean  :wink:

freem

#119
Quote from: touko on 03/11/2016, 04:31 AMPS:Your source code is very clean  :wink:
thanks, that was one of my goals :)

Quote from: touko on 03/11/2016, 04:31 AMyou set your VRAM position badly

you did it like that:
st0 #VDCREG_MAWR
st1 #<$0804
st2 #>$0804

It must be:
st0 #VDCREG_MAWR
st1 #<($0800 >> 4)
st2 #>($0800 >> 4)
I just tried this out in mednafen, and this seems wrong... the VRAM looks like this:

dropboxusercontent.com/u/6447287/pcedev/800_lsr4-vram.png

Quote from: touko on 03/11/2016, 04:31 AMAll VRAM read/write addresses must be (my_address >> 4), and for sprite patterns (my_address >> 5)
That sounds right for the BAT and SATB areas, but I'm having trouble with just getting the tiles into VRAM properly...

if I just use $0800 as the VRAM source address, a la

st0 #VDCREG_MAWR
st1 #<$0800
st2 #>$0800

this is what I get in the VRAM viewer:

dropboxusercontent.com/u/6447287/pcedev/800_regular-vram.png

(edit)
and for reference, here's what the VRAM looks like when I load into $0804:

dropboxusercontent.com/u/6447287/pcedev/804_dumbluck-vram.png

touko

#120
EDIT: Sorry i was wrong, VRAM read/write addresses are good  :P
You only shift tile/sprite addresses, not VRAM ones .

I tried to load your font with pceas and it works fine .

Try this for loading your font:
                lda #.bank( gfx4BPP_font )
      tam #2
      inc A
      tam #3
      
      st0 #0
      st1 #<( $800 )
      st2 #>( $800 )
      
      st0 #2
      tia gfx4BPP_font , $0002 , $800

elmer

#121
Your vram_clearBAT code isn't correct ... you're using Y without initializing it.

Your loop should look more like ...


                lda     #$80
                ldx     #32
@rowLoop:       ldy     #32
@colLoop:       sta     a:VDC_DATA_LO
                stz     a:VDC_DATA_HI
                dey
                bne     @colLoop
                dex
                bne     @rowLoop



Your font loading code isn't correct ... you're using Y without initializing it (because font_loadSize is $0800).

                ; load loop goes here
                ldx     font_loadSize
                beq     @checkOuter
                cly
@fontLoadLoop:  lda     (fontDataAddr),y



Fix those bugs, and you'll find that you can write to VRAM $0800 as your expect.

freem

Quote from: touko on 03/11/2016, 05:47 AMI tried to load your font with pceas and it works fine .

Try this for loading your font:
...
tia gfx4BPP_font , $0002 , $800
I keep forgetting I'm not coding straight 6502 and have access to things like this; thanks :)
Though I did have to change $800 to $800*2 since the VRAM expects word values.

Quote from: elmer on 03/11/2016, 12:13 PMYour vram_clearBAT code isn't correct ... you're using Y without initializing it.
aha, so I was getting by on pure luck. ;) I figured that routine was broken somehow, due to being a late night coding exercise.

Quote from: elmer on 03/11/2016, 12:13 PMYour font loading code isn't correct ... you're using Y without initializing it (because font_loadSize is $0800).
ah, there we go, that would be it. guess I wasn't running the branch code properly in my head :)

Thanks for your help, elmer and touko; this weekend is going to be a lot of fun.

here's the fixed example, in case anyone wants it:
http://www.ajworld.net/pcedev/pce-example01_ca65-fixed.zip
other comments/critique are welcome :)

touko

#123
QuoteI keep forgetting I'm not coding straight 6502 and have access to things like this; thanks :)
Though I did have to change $800 to $800*2 since the VRAM expects word values.
Of course, $800 was just a value to see if your font start to load correctly at $800 in VRAM ..

Quoteother comments/critique are welcome :)
A little tips,no need to write the LOW value in VRAM if it's the same all the time in your loop,because it's buffered .
Write it before the loop, and write only the HIGH byte in your loop, it's 2x time faster  :wink:
For exemple, writing 32 words in VRAM

   ldx #32
   lda #low_byte
   sta $0002
loop:
   stz $0003
   dex
   bne loop

elmer

Quote from: touko on 03/11/2016, 02:22 PMA little tips,no need to write the LOW value in VRAM if it's the same all the time in your loop,because it's buffered .
Write it before the loop, and write only the HIGH byte in your loop, it's 2x time faster  :wink:
For exemple, writing 32 words in VRAM

   ldx #32
   lda #low_byte
   sta $0002
loop:
   stz $0003
   dex
   bne loop
Hmmm ... I'm not sure if that counts as "very-clever", or "fugly". Either way ... for-gawd's-sake, please put some comments in the code when you do that, or it'll bite you in the ass when you least expect it.

BTW ... has it been confirmed that you get the full-benefit of the trick on real hardware?

I'm curious if the loop is still slow enough to avoid overrunning the CPU-to-VRAM bandwidth and causing cycle stalls from that (the way that we've only just found out about the unexpected TSB/TRB delay).

touko

#125
QuoteBTW ... has it been confirmed that you get the full-benefit of the trick on real hardware?
Yes tested on my SGX,the 2 bytes are buffered, and writed to VRAM only when you write to $0003 .
You canot update only the low byte in VRAM  .

It's very well explained in charles's doc .

elmer

Quote from: touko on 03/12/2016, 01:41 PM
QuoteBTW ... has it been confirmed that you get the full-benefit of the trick on real hardware?
Yes tested on my SGX,the 2 bytes are buffered, and writed to VRAM only when you write to $0003 .
You canot update only the low byte in VRAM  .

It's very well explained in charles's doc .
I not arguing about how the buffering works ... just questioning the actual speed improvement.

For instance ...


       st1 #$00 ; 4+1 cycles
.loop: st2 #$00 ; 4+1 cycles
       st2 #$00 ; 4+1 cycles
       st2 #$00 ; 4+1 cycles
       st2 #$00 ; 4+1 cycles
       st2 #$00 ; 4+1 cycles
       st2 #$00 ; 4+1 cycles
       st2 #$00 ; 4+1 cycles
       st2 #$00 ; 4+1 cycles
       dex
       bne .loop


Will this really clear VRAM 8-words-at-a-time, at 40 cycles per 16 bytes, i.e. 2.5 cycles per byte?

As Bonknuts found out with the TSB/TRB test ... there was an unexpected delay in the turn-around between the read and the write, presumably caused by the VDC having to wait for a CPU read/write slot in the VRAM cycle timings.

touko

#127
Ah ,ok, i understand, of course it's faster in a dev point of view(and I do not take in any count some latency here), but i don't know if it's really faster due to some latency between each write,and i don't know if they are those latencies for stx, i think they are also present for lda/sta .

The tsb/trb case is apart because it read/write the same VRAM region (it needs 2 CPU slots),and bonk's tests was in low resolution mode.
I'll do some tests with the 2 methods to see if there is a difference.

Quotepresumably caused by the VDC having to wait for a CPU read/write slot in the VRAM cycle timings.
I think so, and if it's the case, latency should be done in MED/HIGH res mode.

TurboXray

The speed penalty from my TRB/TSB, comes from the VDC doing something internal from vram read to vram wright - hence the delay.

 I didn't encounter any additional stuff for just sequential back-to-back writes (on screen or otherwise). Though technically you could hit an unavailable slot during active display, but that's going to be in partial master clock cycles (/RDY) and not whole instruction cycles. But I never noticed it in the timings of my sequential write tests (the error is probably spread over too many writes to be noticeable).

touko

#129
QuoteThe speed penalty from my TRB/TSB, comes from the VDC doing something internal from vram read to vram wright - hence the delay.
Yes TRB/TSB assume that there is no delay, because it suppose you use them on RAM,it's not the case with VDC, you only can write/read when a CPU slot is available, and this can cause delay,the more is for 256 px res ..

TurboXray

Quote from: touko on 04/08/2016, 04:22 AM
QuoteThe speed penalty from my TRB/TSB, comes from the VDC doing something internal from vram read to vram wright - hence the delay.
Yes TRB/TSB assume that there is no delay, because it suppose you use them on RAM,it's not the case with VDC, you only can write/read when a CPU slot is available, and this can cause delay,the more is for 256 px res ..
But that's the point; I don't think it's a cpu slot availability thing with TRB/TSB. I think it's something more. Because I can do a series of st2 or sta on the msb, or straight read lda on the msb, and never really see much of that slot offset phase in negative performance. With TRB/TSB, on the MSB, it's has something to do with the VDC switching from a vram read operation to a vram write operation - and not the cpu access slot availability.