10/31/2023: Localization News - Dead of the Brain 1!

No, NOT a trick, a Halloween treat! Presenting the Dead of the Brain 1 English patch by David Shadoff for the DEAD last official PC Engine CD game published by NEC before exiting the console biz in 1999! I helped edit/betatest and it's also a game I actually finished in 2023, yaaay! Shubibiman also did a French localization. github.com/dshadoff/DeadoftheBrain
twitter.com/NightWolve/PCENews
Main Menu

PCE PCM

Started by TurboXray, 11/25/2016, 05:54 PM

Previous topic - Next topic

0 Members and 0 Guests are viewing this topic.

TurboXray

I was digging through my PCM code from over the summer, and looked at an unfinished project; 15.3khz @ four 8bit PCM channels using just 2 hardware channels, for a total of 8 channels.

 It has no frequency scaling or "mod" type sample-based synth, just regular sample playback. I had problems mixing in the last channel of the 4. So yesterday, I fixed it and it's all working. There's software volume for each mixed channel too. I use the video scanline interrupt to playback a buffer of mixed samples to a paired channel 10bit output. I added up all the numbers, and it's 47% cpu resource to do this. Probably not attractive enough number for some projects, but it's a working example of a proof of concept.

 So I started modifying it for 7khz output (no video interrupt required). I'm looking at ~24% cpu resource for the same thing, but at 7khz. So 8channels on PCE, 4 are PCM at 8bit res. I think someone was interested in this (touko maybe?). I dunno, but considering Air Zonk eats up that same amount for its music engine with just one channel - I thought that was pretty good. It's possible to mix more software channels in, but it doesn't seem worth it. I mean, what are you going to do with something like 8 PCM channel anyway???

 I might be able to drop the resource down a couple of percentage points on the 7khz 4 channel  version with some optimization. I'll have to see what I can do.

Update:
 Here's my batch of 7khz sample scaling vs 14khz sample scaling demo roms. On the real system, the 14khz performs better than on emulators thanks to the analog filtering. It's still not a difference, or as much as I expected, going with double the frequency. But there is more 'punch' to some of the samples. Or at least on my stereo system. http://www.pcedev.net/HuPCMDriver/7khz_and_14khz.zip <- try them out on the real system (not emulator).

TurboXray

http://www.pcedev.net/HuPCMDriver/8bitmixer_test1.zip
That's just two 8bit samples mixed at a time. Anyone have any good 4 sample mix set they can think of to demo this driver?

touko

QuoteSo 8channels on PCE, 4 are PCM at 8bit res. I think someone was interested in this (touko maybe?)
Yes of course,even if i have already a 2 channel PCM with compression .

elmer

#3
Quote from: TurboXray on 11/26/2016, 01:18 AMhttp://www.pcedev.net/HuPCMDriver/8bitmixer_test1.zip
That's just two 8bit samples mixed at a time. Anyone have any good 4 sample mix set they can think of to demo this driver?
Cool, I really look forward to studying this!  :D

But there's no way that I'd give a music driver 25% or more of the frame time ... that's for graphics!  :wink:

Now ... if I can cut it down to 2 channels of 8-bit sound, then that seems good to me.

If the sample channels can't be tuned, then they're limited (in practice) to percussion and speech/sound-effects anyway.

BTW ... do you have an estimate of the CPU time taken for 2 5-bit channels with volume control?

As far as a "new" driver goes ... I'm curious about using the ADPCM hardware for a drum channel.  :-k

<EDIT>

Whoops ... I thought that you'd released source code rather than a ROM demo.  :oops:

Sure ... it sounds good!

There's a little audible crackling in mednafen, but that could easily just be mednafen.

TurboXray

#4
So here's a basic 4 PCM playing at the same time; a long stream followed by 3 other "FX" type samples playing shortly into (voice one, gate of thunder explosion, and a gate of thunder voice).

 So this is how I'm handling this:
 The 4 channels are actually a set of paired soft mixed channels. Each pair reads in an 8bit sample, used a volume table to adjust the sample, and then adds them together. But.. I don't store this as 9bit. I store it as 8bit, which means I saturate on overflow. I do the same for the next pair, but this time the output is 9bit. Using a table, which is precalculated as multiplying all samples by 2 (so 9bit becomes 10bit) and divided into upper 5bits and lower 5bits (it's a split table) to be store in a set of buffers.

 So while the H-int PCM Driver is always outputting 10bit audio (in this 15.3khz version), the "mixer" can do all kinds of things, with all kinds of configurations (as well as the resource it takes to do whatever). You can use different mixers as long as they output to that specific buffer.

 Ok, so the buffer: 256 bytes each for high and low. The display has 262 or 263 scanlines depending on mode you choose, but I only output 256 samples regardless. So ever so many scanlines, a sample is not output ( I think this is something like 30 or 40 lines, I forget). Since the rate is so high, you won't hear the this.

 There's a couple of reasons I did this: both on the mixer side and the Hint PCM driver side, it makes things much easier. I also, with my conversion util, make sure samples all have a multiple length of 256byte blocks. Silence is appended to the sample block if the original sample ends premature of the 256 block boundary. This isn't an issue, because you don't need to start another sample MID frame.. only on frame boundaries. It makes mixing and play so much faster and at the MOST, your sample will be 255bytes longer than normal. What this translates into on the mixer side, is that you don't need to check EOF for every byte that you read. Multiply that check by 256, and then multiply that by 4, and you'll see that it quickly adds up.

 As for the mixer, why don't I just mix 8:8 to 9bit, and 8:8 9bit, and then 9:9 to 10bit. I could, and that's all in the whatever mixer module I choose to use with the Hint PCM driver. But for now, I'm looking speed.

 In the current driver that I'm using (8:8->8, 8:8->8, 8:8->9, 9*2), you might notice that I'm mixing beyond my capable resolution. But cause two 8bit samples added together really need 9bit to represent it. So in cases where it overflows, I saturate it at #$ff. You might be thinking, that's got to sound horrible. It can, if your samples are near max amplitude as an average. In this next example, I boosted the amplitude of all samples to be really loud. But to avoid to distortion, I set the volume of all channels using 8bit samples to 11 bout of 15. This is roughly equivalent to 7.5 bit resolution. There is less occurrence of "clipping", they sound loud enough, and resolution is pretty good.

 On more thing to add: the mixer and current samples, mixes unsigned samples. This is done for speed (the clipping thing). It sound fine, but there's a catch for any samples that start with a length of silence (though there shouldn't be, that's wasteful), and end with silence - there's going to be a pop. The way this works, is that the lower sample amplitude is 00. An unsigned 8bit sample center lines at $80. So a string of $80 is silence. If a sample trails with this, and is removed, you go from $80 down to $00 for that channel. That's going to result in a pop. So a small ramp to 0 is required to remove the pop. The same could be said of samples that have larges parts of silence - ramp down to 00, then ramp back up 00 at the end of the section - but this isn't for popping, but giving back the other mixed in channel of the pair its resolution.

 If this unsigned mixing sound convoluted in design, it kinda is. But it's surprisingly easy to work with. And it sounds great. The issue when working with higher rate playback, is that it affects a lot of things. The 65x is a faster processor for tight data sets, once you start to move outside this range - performance starts to really drop off. There are ways to handle it, such as subdividing the data set into smaller chunks but with lots of multiple code paths - results in code bloat and some complex code that can be difficult to follow (debug or understand). Any, my point is - is that something has to give and I chose to change the mixing approach as my main approach.

 Here's an example of the tail end of a sample ramp down to avoid popping the output:
/HuPCMDriver/hypercocoon.png

Pretty simple stuff.

And here's a example output; all 8bit samples played at volume 11 (out of 15; linear volume scale). So you can judge the results for yourself:

http://www.pcedev.net/HuPCMDriver/8bitmixer_test2.zip




Quote from: elmer on 11/26/2016, 12:19 PM
Quote from: TurboXray on 11/26/2016, 01:18 AMhttp://www.pcedev.net/HuPCMDriver/8bitmixer_test1.zip
That's just two 8bit samples mixed at a time. Anyone have any good 4 sample mix set they can think of to demo this driver?
Cool, I really look forward to studying this!  :D

But there's no way that I'd give a music driver 25% or more of the frame time ... that's for graphics!  :wink:
I understand. There's always a trade off for something. Honestly, shmups tend to be the most active in the sound FX department IMO. This was primarily my idea for this mixer; when the big explosions samples happen in Blazing Lazers, the drum samples and some other samples immediately drop out. When playing something like GOT or LOT, that have CD audio and loud FX - it's not as noticeable. But even then, you can't have a loud creative death scream and explosion sounds at the same time with single channel ADPCM. What I envisioned was something along those lines. Of course, it works with chip music too: 2 channels reserved for drum kit and other music related samples, and two channels for awesome FX. I would give up 25% resource for that in a shmup no sweat.

QuoteNow ... if I can cut it down to 2 channels of 8-bit sound, then that seems good to me.

If the sample channels can't be tuned, then they're limited (in practice) to percussion and speech/sound-effects anyway.
Pretty much. There's no frequency scaling here. It's you basic sample playback of PCE, but with more channels without using more hardware channels, and greater bit resolution. The 7khz version is less modular at the moment, so you can't just change out mixers. I might change that and make it like the 15.3khz version, but still using the TIRQ. I'll have to play around with the numbers. If I did the modular version, then they PCM driver doesn't care how many mixed channels there are because it always outputs a buffer to a paired hardware channel set. The downside of the modular version, is that you need two sets of paired buffers (4 x 117 bytes total). Most flexible, but eats up some ram.


QuoteBTW ... do you have an estimate of the CPU time taken for 2 5-bit channels with volume control?
At 7khz? At fixed frequency? Uncompressed? It looks pretty much like this:
;call
__skip_PCM
rti


PCM:

stz $1403
BBS0 <PCM_In_Progress, __skip_PCM
inc <PCM_In_Progress
cli

pha

tma
pha

.ch0.on
stz $800
.ch0.bank
lda #00
tam #nn

.ch0
lda $0000
bmi .ch0_control
sta $806
inc .ch0+1
beq .msb_ch0


.ch1.on
lda #01
sta $800
.ch1.bank
lda #00
tam #nn

.ch1
lda $0000
bmi .ch1_control
sta $806
inc .ch1+1
beq .msb_ch1

pla
tam
pla
stz <PCM_In_Progress
rti
Sits in ram. Self modifying code. Plays nice with the Hsync interrupts. If Hsync interrupts for whatever reason takes too long, this interrupt has protection so it can't be called more than once. The self modifying labels ".ch1.on" would replace the opcodes with BRA $nn if the channel was disabled. This allows you to use both samples and regular use; the channel wouldn't be just reserved for sample use. Since it's hardware channels, independent, volume only needs to be handled on a 60hz or less basis and not in the driver itself. And since it's hardware, no soft volume translation needed. You'll have to count those cycles to see what it comes out to, ignoring bank adjustment cases. I tend to either do 116 samples a frame by resyncing the TIRQ in Vblank int, or 117 as the same as 116 +sync, but I make a fake INT call to the routine inside vblank. Though I really can't tell the difference between 7000hz and 6960hz.

QuoteAs far as a "new" driver goes ... I'm curious about using the ADPCM hardware for a drum channel.  :-k
I can give you the source to a PCE soft ADPCM player. Though it handles saturation in the player itself, but it would be faster to handle those cases outside the player.

TurboXray

Quote from: elmer on 11/26/2016, 12:19 PMThere's a little audible crackling in mednafen, but that could easily just be mednafen.
The clicking, if you play the second example, is just me not initializing the driver before sending anything to it (I just haven't got around to it).

TurboXray

.loop

.ch0.a    ldx $0000,y
.ch0v.a   lda $0000,x  ;vol
     
   
.ch1.a    ldx $0000,y
.ch1v.a   adc $0000,x  ;vol

        bcc .skip00       ;4:6
        lda #$ff
        clc
.skip00
        sta <D0.l

 I just realized something.. this is the mixing code for paired channels.. unsigned. Specifically the lda #$ff and clc are needed for overflow.

 But.. if I did signed 2's complement method with clamping..
        bvc .skip00
        lda #$7f
        adc #$00
.skip00       
It's like the 65x was made for this.. lol! I can't believe I missed this. Same amount of cycles, and it handles both overflow cases for signed addition.

 Ok.. I'm gonna have to change this whole thing over to signed mixing. Dammit.. I'll have to re-order my 10bit conversion tables.

elmer

Quote from: TurboXray on 11/26/2016, 01:51 PMSo here's a basic 4 PCM playing at the same time; a long stream followed by 3 other "FX" type samples playing shortly into (voice one, gate of thunder explosion, and a gate of thunder voice).

So this is how I'm handling this:
Thanks for the detailed explanation ... that's really nice and clever!  =D&gt;

You've put a lot of thought into the implementation details.  :D



Quote from: TurboXray on 11/26/2016, 01:51 PM
Quote from: elmer on 11/26/2016, 12:19 PMBut there's no way that I'd give a music driver 25% or more of the frame time ... that's for graphics!  :wink:
I understand. There's always a trade off for something. ...
I would give up 25% resource for that in a shmup no sweat.
Absolutely ... the 4th-gen is all about finding creative solutions to problems, and to designing your game to fit within the bounds of the hardware.

For me, in that example, I'd have the CD music, 1 channel of ADPCM (12-bit samples), and either 1 channel of 7KHz 8-bit samples on the PSG, or 2 channels of 7KHz 5-bit samples (probably the latter).

That would keep the CPU cost low, and the memory cost low, and give decent results without compromising the rest of the game.

But that's just me!  :roll:


QuoteAt 7khz? At fixed frequency? Uncompressed? It looks pretty much like this:
Thanks, that's nice and simple and fast ... I really like the "fast" part.


QuoteThough I really can't tell the difference between 7000hz and 6960hz.
Yeah, not worth the bother of rsyncing, you won't notice a 1% playback difference.

Even then, you can just resample to 6960Hz in SOX instead of 7000Hz.


QuoteI can give you the source to a PCE soft ADPCM player. Though it handles saturation in the player itself, but it would be faster to handle those cases outside the player.
Hahaha ... you misunderstand me!

I don't want to do realtime ADPCM conversion, either to read or write, it's way too slow for game use.

What I'm talking about would be to incorporate the PCE CD ADPCM playback into the sound driver as another channel.

That way it could be used for high-quality drums/percussion/voice whenever sound effects aren't using it.

It would just be another tool in the sound designer's arsenal, rather than a separate programmer-controlled feature, the way that it is now.

TurboXray

Quote from: elmer on 11/26/2016, 06:57 PMWhat I'm talking about would be to incorporate the PCE CD ADPCM playback into the sound driver as another channel.

That way it could be used for high-quality drums/percussion/voice whenever sound effects aren't using it.

It would just be another tool in the sound designer's arsenal, rather than a separate programmer-controlled feature, the way that it is now.
What sound driver though? The PSG player? It's been some years since I looked at it, but I suspect (and remember) you need a piece of code that spies/monitors some PSG player attributes as it's parsing other track data, to sync with it and then make its own calls to the ADPCM hardware to play whatever samples in sync. That also means either and outside channel parser that reads the mml byte converted code of the system player, but parse it itself (to keep the compiler happy), or modify whatever compiler to handle as special ADPCM track of code (mml or not - whatever). The easiest way might be to write your own Vbl or TIRQ routine so you can write a hook that executes first before the call to the PSG player.

 I dunno. Interfacing it with the unmodified PSG player of the sys card is going to be hack-y. Probably doable, but still hack-y. And probably ugly hack-y too.

elmer

#9
Quote from: TurboXray on 11/26/2016, 09:53 PMWhat sound driver though? The PSG player?
God, no!  ](*,)

We've already identified that the SquirrelPlayer is a disassembled version of the System Card PSG player ... and therefore "tainted" in copyright terms.

Easier to just replace it with a new command-stream-per-channel player.

It could even accept the same bytecodes as the PSG player, if that made any kind of sense.

As Arkhan found out with the SquirrelCompiler ... the System Card PSG player is just a processed form of MML, with some extra bells-and-whistles.

FYI, the sound driver that I wrote back in the 1980s is also based on similar ideas ... it was custom-specified (by the musician) to replace the driver that he'd written for the C64 ... which was MML-based.

The biggest differences are in the byte-coding, and not in the background theory.

Processing a MIDI file, or possibly a Deflemask file, into some bytestream format isn't exactly rocket-science.

Arkhan was concerned with 100%-compatibility with the System Card PSG Player.

I don't see that as a useful/interesting/desirable goal for a new sound driver.

The most-important criteria, if a new driver is to be made at all, would be to make sure that there are usable tools surrounding it.

It's mostly a question of desire, and priorities.

Remember ... Arkhan wrote Squirrel for his own use, because he needed something.

TailChao wrote HuSound for similar reasons.

I'm perfectly-capable of doing the same if I decide that it's in my own self-interest.

Oh ... and we need a sound driver for the PC-FX, anyway.  :wink:

elmer

Quote from: TurboXray on 11/26/2016, 09:53 PMI dunno. Interfacing it with the unmodified PSG player of the sys card is going to be hack-y. Probably doable, but still hack-y. And probably ugly hack-y too.
BTW ... If someone actually wanted to make changes to the PSG player, then you wouldn't hack it at a binary level, just modify the source-code and assemble a new version.

After all, the player is available in source-code form, as the "SquirrelPlayer", with TheOldMan's excellent commented disassembly.

Heck, there's the old 2001 disassembly by zeograd and whoever-else that could be cleaned up back into fully-assemblable source.

But ... then you've still got the issue of the modifying the toolchain surrounding it in order to support your new features, and at this point, that means modifying Squirrel.

It's all about the toolchain, and less about the driver.

TurboXray

In the 7khz resampler driver (XM/MOD style frequency scaling), I was curious of how to handle the issue of samples being crush when they were played at higher playback rates but at 7khz output. Initially, and not included in any of the demos I released, some samples would need to be resampled to a low base frequency (something better than nearest neighbor method), and then transposed.

 But I got to thinking, what would it sound like if the 7khz XM driver did 14khz instead of 7khz? How would it sound for crushed samples? Did I redid the driver, and the resource surprisingly isn't that bad. The original one, if you played all 4 XM channels AND 2 regular sample channels (6 total) all at the same time - it would take 35.7% cpu resource. I decided, just to move the 4 XM channels into double frequency and keep the 2 fixed channels at 7khz (I was thinking sound FX for them). So 14khz 4 XM channels and 7khz 2 fixed channel = 61.3% cpu resource. That's pretty good. Again, that's all 6 channels playing samples. I could further reduce that if I moved one of the XM frequency channels to the 7khz domain (3x 14khz XM channels, 1x 7khz XM channel, 2x fixed 7khz channels) = 55.9% cpu resource.

TurboXray

I haven't tested the 14khz version on the real system yet, but under mednafen - there really isn't a huge difference. The crushing of some samples, is only alleviated somewhat. Drum/snare/etc are more crisp, but I was expecting a much bigger difference for twice the frequency output. I'm beginning to think maybe it's the 5bit resolution paired with the higher frequency (sample skipping) that is the issue.

 I'm gonna do a 7khz 4 channel 8bits and see what that sounds like. I have a feeling it going to sound better than the 14khz one..

elmer

#13
Continued on from the MML thread in order to stop Arkhan from getting unhappy ...

https://www.pcengine-fx.com/forums/index.php?topic=21677.msg479278#msg479278


Quote from: TurboXray on 12/13/2016, 12:01 AMOn your 2nd channel one. I would add a cli, nop, sei right after .channel4. Your worst case scenario for each channel is 68 cycle delay for H-int, which would probably be fine, but I wouldn't push it with twice that in a worse case scenario.
Why? It's not needed.

You have 455 cycles per scanline.

In your hsync-interrupt planning you already have to make allowances for a 32-byte TIA instruction that disables interrupts for 241 cycles.

I allow any already-triggered hsync interrupts to run at the start of the timer IRQ with the "cli" instruction.

The max time taken in the timer IRQ after interrupts are disabled again is far less than 241 cycles.

Where is the problem?  :-k


QuoteAlso, by not having a busy flag system (have interrupts open for the whole thing)- you're setup is going to be little less friendly with code using small Txx in 32byte segments during active display - and worse case scenarios in all settings (H-int and TIRQ). Just something to note. Might want to recommend or write block transfers with 16byte or 8byte segments with Txx.
Errr ... now I could be missing something ... but "nope".

My maximum IRQ-disable is far-shorter than a 32-byte TIA.

If another hsync/vync IRQ happens during the 3-instructions that IRQs are enabled, then there is no problem.

If it's another timer-IRQ, then it means that I should have output 2 samples during this time-period ... and I do! I've eliminated the "jitter"!

As-far-as-I-can-see, there is no way that something "bad" can happen here in a single-processor system.

Now, if you go multi-core ... then this whole construct falls apart.  :-k

But that's not a problem on the PC Engine.


QuoteSomething I'm curious about: Why channel's 3 and 4? Why not channels 0 and 1? Channel 0 saves you 2 cycles. And leaving channels 4 and 5 free allow noise mode for both of those while samples are playing.
No massively earth-shaking reason, just practicality.

Channel 3 for a sampled sound-effect, and channel 4 for a sampled drum.

That leaves channel 5 for a regular drum, and the channels 0,1,2 for regular tune data.

It's just the most-likely usage from my POV. I could be wrong.

TurboXray

#14
Hmm.. let me think about this for a sec..

 Txx happens and delays all interrupts by 241 cycles. Lets assume H-int and TIRQ fire near the exact same time, and at the very start of Txx. So, both are delayed by 241 cycles. Actually, I have no idea which IRQ has higher priority.. VDC or TIRQ. Anyway, whatever. Lets assume TIRQ has high priority on the CPU and gets called first. Your TIRQ routine re-enables interrupts 15 cycles after the call. So worse case, H-int is only stalled by 241+15. And 15 is only from the TIRQ routine.

 Yeah, that works out pretty decent. Nevermind.




 But what I was going to tell you, is that you can leave the channel in waveform mode and write to it like it was DDA. If you set the channel frequency really close to your TIRQ output, you can get weird overlay type of effects. Depending on the sample itself, it can sound interesting or not. I don't have a video to show though.


 Or, if you did an 8bit phase accumulator (overflow increments pointer) on a 32byte waveform in memory, you would do rough timing to 'distort' a single channel's instrument waveform over time - giving somewhat predictable timbre changing effects.
Like this:
But better because this one is waveform updating a single sample at 60hz, not 6960/tick hz. You could make up for the phase accumulator overhead by not having overhead for bank, eof, or msb checks. Why have a phase accumulator? So you can roughly scaling the frequency of the distortion with the scale of the note (based on octave and note).

Arkhan Asylum

Quote from: elmer on 12/13/2016, 01:15 AMContinued on from the MML thread in order to stop Arkhan from getting unhappy ...
It's not really me getting unhappy.  It's the thread being 50+% non-MML discussion that will cause people looking for MML to lose interest in as they sift through the walls of text and technobabble that they don't want to read because MML/music making and balls deep engine code are not really samey.

This "max-level forum psycho" (:lol:) destroyed TWO PC Engine groups in rage: one by Aaron Lambert on Facebook "Because Chris 'Shadowland' Runyon!," then the other by Aaron Nanto "Because Le NightWolve!" Him and PCE Aarons don't have a good track record together... Both times he blamed the Aarons in a "Look-what-you-made-us-do?!" manner, never himself nor his deranged, destructive, toxic turbo troll gang!

elmer

Quote from: TurboXray on 12/13/2016, 02:03 AMActually, I have no idea which IRQ has higher priority.. VDC or TIRQ. Anyway, whatever. Lets assume TIRQ has high priority on the CPU and gets called first.
... Yeah, that works out pretty decent. Nevermind.
The order is ... RESET > NMI > BRK > TIMER > IRQ1 > IRQ2.

I have no idea why Hudson decided that "TIMER > IRQ1", but the cli at the start of the timer handler lets the hsync take priority, if it's already queued-up.  :wink:

elmer

Quote from: TurboXray on 12/13/2016, 02:03 AMBut what I was going to tell you, is that you can leave the channel in waveform mode and write to it like it was DDA. If you set the channel frequency really close to your TIRQ output, you can get weird overlay type of effects. Depending on the sample itself, it can sound interesting or not. I don't have a video to show though.

 Or, if you did an 8bit phase accumulator (overflow increments pointer) on a 32byte waveform in memory, you would do rough timing to 'distort' a single channel's instrument waveform over time - giving somewhat predictable timbre changing effects.
Those are cool tricks ... thanks!  :D


Quote from: guest on 12/13/2016, 02:09 AM
Quote from: elmer on 12/13/2016, 01:15 AMContinued on from the MML thread in order to stop Arkhan from getting unhappy ...
It's not really me getting unhappy.
I know, I was teasing you!  :wink:

I like it when threads can organically follow interesting thoughts, and go off-topic for a while ... but, "yeah", we'd gone so far off that we needed to take it somewhere else.

Arkhan Asylum

Yeah, maybe SOMEONE

NAMED NECROMANCER

CAN MOVE THE OTHER POSTS TO THIS THREAD INSTEAD.

MAYBE.


;)  lol
This "max-level forum psycho" (:lol:) destroyed TWO PC Engine groups in rage: one by Aaron Lambert on Facebook "Because Chris 'Shadowland' Runyon!," then the other by Aaron Nanto "Because Le NightWolve!" Him and PCE Aarons don't have a good track record together... Both times he blamed the Aarons in a "Look-what-you-made-us-do?!" manner, never himself nor his deranged, destructive, toxic turbo troll gang!

TurboXray

Just create a thread called "the dump" and places off topic musings in there.

Nazi NecroPhile

Quote from: Psycho Arkhan on 12/16/2016, 12:11 AMYeah, maybe SOMEONE

NAMED NECROMANCER

CAN MOVE THE OTHER POSTS TO THIS THREAD INSTEAD.

MAYBE.


;)  lol
Tell me exactly which ones to move and I can indeed merge 'em with this thread, but I'm not going to guess which ones don't belong.  I'm too dumb to understand what the fuck y'all are talking about, so I can't really tell which ones need moving.
Ultimate Forum Bully/Thief/Saboteur/Clone Warrior! BURN IN HELL NECROPHUCK!!!

TurboXray

Just delete my posts in that thread. I mean, I didn't post anything related to MML, midi, squirrel - whatever. No need to move it here. Just delete it.

Arkhan Asylum

It was useful rambling though.. so I don't know if deleting it is the best move if it can just be merged here, lol
This "max-level forum psycho" (:lol:) destroyed TWO PC Engine groups in rage: one by Aaron Lambert on Facebook "Because Chris 'Shadowland' Runyon!," then the other by Aaron Nanto "Because Le NightWolve!" Him and PCE Aarons don't have a good track record together... Both times he blamed the Aarons in a "Look-what-you-made-us-do?!" manner, never himself nor his deranged, destructive, toxic turbo troll gang!

elmer

Quote from: guest on 12/16/2016, 10:55 PMIt was useful rambling though.. so I don't know if deleting it is the best move if it can just be merged here, lol
Yeah ... but who *really* wants the source-code examples for playing back samples?  :wink:

BTW ...  it looks like it'll be safe to play back up to 3 channels of samples with my code before you get to the timing-limit imposed by needing to keep servicing hsync interrupts.

You haven't weighed-in yet upon the channel-selection for the samples, and you're the guy with the most-recent experience of actually creating a game with both music and sound-effect running.

From Chris Covell's YouTube video dissecting the normal channel usage, I'm thinking that the channels 1 and 2 are normally used for the main synth leads.

That leaves three channels out of the remaining four to make sample-capable. But which three? ... And why?  :-k

Arkhan Asylum

Quote from: elmer on 12/16/2016, 11:16 PM
Quote from: Psycho Arkhan on 12/16/2016, 10:55 PMIt was useful rambling though.. so I don't know if deleting it is the best move if it can just be merged here, lol
Yeah ... but who *really* wants the source-code examples for playing back samples?  :wink:

BTW ...  it looks like it'll be safe to play back up to 3 channels of samples with my code before you get to the timing-limit imposed by needing to keep servicing hsync interrupts.

You haven't weighed-in yet upon the channel-selection for the samples, and you're the guy with the most-recent experience of actually creating a game with both music and sound-effect running.

From Chris Covell's YouTube video dissecting the normal channel usage, I'm thinking that the channels 1 and 2 are normally used for the main synth leads.

That leaves three channels out of the remaining four to make sample-capable. But which three? ... And why?  :-k
the last 3 probably.  5 and 6 can be turned to noise channels.  It's likely people will already want percussion there, and that's what the samples are probably for.   

I don't think sampling on the first two is a good idea incase you want to use LFO (leads).
This "max-level forum psycho" (:lol:) destroyed TWO PC Engine groups in rage: one by Aaron Lambert on Facebook "Because Chris 'Shadowland' Runyon!," then the other by Aaron Nanto "Because Le NightWolve!" Him and PCE Aarons don't have a good track record together... Both times he blamed the Aarons in a "Look-what-you-made-us-do?!" manner, never himself nor his deranged, destructive, toxic turbo troll gang!

Windcharger

Quote from: guest on 12/16/2016, 11:17 PMthe last 3 probably.  5 and 6 can be turned to noise channels.  It's likely people will already want percussion there, and that's what the samples are probably for. 
Having noise for sound effects is important too though.   :-k  I would think using as many channels that don't have other capabilities as possible would be better as the remaining channels could then use their other capabilities if needed.  So maybe using 3, 4, and 5 would be effective leaving 1 and 2 still free for LFO (if wanted) and 6 for percussive noise freedom for sound effects?  Just a thought...

TurboXray

Quote from: guest on 12/16/2016, 11:17 PMI don't think sampling on the first two is a good idea incase you want to use LFO (leads).
Are you talking about hardware LFO? Lol nobody.. nobody should be using hardware LFO to do... regular LFO stuffs. If you want to do non-musical screetchy stuff, kinda like what ccovell showed, then I guess.  But I personally think it's a huge waste of a channel, for a system that already should have had a couple more, when you can do so much more interesting sounds with pairing the channels in phase, etc.

 I though this was general knowledge, but if not: Every PCE channel is capable of software LFO, and isn't plagued like the master system, or triangle channel on nes, and other systems of that era where you get frequency artifacts when you change the high and low ports of the period system. PCE doesn't have any of those problems with any of its channels, making hardware LFO useless 99.9999% of the time.

TurboXray

Quote from: Windcharger on 12/16/2016, 11:57 PM
Quote from: guest on 12/16/2016, 11:17 PMthe last 3 probably.  5 and 6 can be turned to noise channels.  It's likely people will already want percussion there, and that's what the samples are probably for. 
Having noise for sound effects is important too though.   :-k  I would think using as many channels that don't have other capabilities as possible would be better as the remaining channels could then use their other capabilities if needed.  So maybe using 3, 4, and 5 would be effective leaving 1 and 2 still free for LFO (if wanted) and 6 for percussive noise freedom for sound effects?  Just a thought...
You can easily do LFSR noise in software at 7khz via DDA mode. But that aside, there the philosophy that you use all channels for music, then reserve some lesser sounds of the music for SFX overlay - so the majority of the music is still heard. If you play samples on any channel, just like regular mode - you stop outputting that sample and use the channel for whatever sound FX that occupies it. I.e. if a SFX was made from noise mode, you can stop sample streaming to play the noise SFX just like you would if the channel is in waveform mode. Matter of fact, you can mix and match of any modes at any time, as much as you want.

Windcharger

Quote from: TurboXray on 12/17/2016, 12:55 AM
Quote from: Windcharger on 12/16/2016, 11:57 PM
Quote from: guest on 12/16/2016, 11:17 PMthe last 3 probably.  5 and 6 can be turned to noise channels.  It's likely people will already want percussion there, and that's what the samples are probably for. 
Having noise for sound effects is important too though.   :-k  I would think using as many channels that don't have other capabilities as possible would be better as the remaining channels could then use their other capabilities if needed.  So maybe using 3, 4, and 5 would be effective leaving 1 and 2 still free for LFO (if wanted) and 6 for percussive noise freedom for sound effects?  Just a thought...
You can easily do LFSR noise in software at 7khz via DDA mode. But that aside, there the philosophy that you use all channels for music, then reserve some lesser sounds of the music for SFX overlay - so the majority of the music is still heard. If you play samples on any channel, just like regular mode - you stop outputting that sample and use the channel for whatever sound FX that occupies it. I.e. if a SFX was made from noise mode, you can stop sample streaming to play the noise SFX just like you would if the channel is in waveform mode. Matter of fact, you can mix and match of any modes at any time, as much as you want.
I sure am glad we have you here Tom to set the record straight.  :)

With that logic it sounds like having them as channels 2, 3, and 4 would be good then as you don't want to put all of your channel abilities in one basket in a manner of speaking.  This way choosing to use samples isn't a trade off of any kind on a channel vs using noise mode on 5 and 6 since they are separate, no?  Although I suppose this really depends on what types of things the samples would be used for (for example just percussion).

TurboXray

#29
Windcharger: Maybe I should make page 3 of the PCE Cribsheets; audio stuffs. Sometimes it's a lot to keep track of, especially if you're not always working with the hardware directly.

 I gave my old analog oscilloscope to me brother.. arrr!. I need to buy a new (used one) in the $400-$500 range. There are certain things I want to document, such as the volume regs not taking updates immediately (the have a frequency range of something like 2khz - from testing with mednafen author). There's also a filtering effect starting around.. IIRC 6khz, which actually makes sample streaming sound a little better on the real system than emulators. Stuff like that.

elmer

Quote from: TurboXray on 12/18/2016, 11:46 AMThere are certain things I want to document, such as the volume regs not taking update immediately (the have a frequency range of something like 2khz - from testing with mednafen author). There's also a filtering effect starting around.. IIRC 6khz, which actually makes sample streaming sound a little better on the real system than emulators. Stuff like that.
The more information that's available, the better!  :)

BTW ... can you tell me if accessing the PSG registers has the same 1-cycle-extra delay as the VDC registers?  :-k

In my new driver, I'm using a TIN instruction to update a channel's waveform, and would like to know if that's going to cause a 17+32*6 or 17+32*7 interrupt delay (it makes a difference, because I need to specifically disable interrupts during the update).

TurboXray

From memory, anything from $000-7ff range in bank $FF has the extra cycle delay (no matter where it's mapped). So from memory, yes 17+32*7. I'll see if I can retest it today to verify.

elmer

#32
Quote from: TurboXray on 12/18/2016, 02:05 PMFrom memory, anything from $000-7ff range in bank $FF has the extra cycle delay (no matter where it's mapped). So from memory, yes 17+32*7. I'll see if I can retest it today to verify.
OK, thanks!

If so, then I'll have to change the cycle-timings for the sample-playback interrupts, too.  #-o

<EDIT>

Hold on a second ... the PSG is at $0800, and reading Charles MacDonald's pcetech.txt seems to suggest that the extra-cycle only applies to the VDC and VCE ($0000-$07FF).

I *may* be OK.  [-o&lt;

elmer

#33
Here is what the timings are looking like at-the-moment.


Writing to VDC ...

                  tia     $0000,VDC_DATA,32     ; 241 Self-modifying TIN instruction.

  Delay 241 cycles.


Writing to PSG ...

  mz_update_wave: sei                           ;   2 No interrupts while writing PSG.
                  stx     PSG_R0                ;   5 Select PSG hardware channel.
                  sta     PSG_R4                ;   5 Reset this channel's read/write
                  stz     PSG_R4                ;   5 address.
                  jsr     mz_tin                ;   7 Transfer waveform & enable IRQ.
                  ...

  mz_tin:         tin     $0000,PSG_R6,32       ; 209 Self-modifying TIN instruction.
                  cli                           ;   2 Allow an hsync/timer IRQ to run.
                  rts

  Delay 235 cycles.


Additional sample-playback delay ...

  tirq_ch234:     ;;;                           ;   8 (cycles for the INT)
                  stz     $1403                 ;   5 Acknowledge TIMER IRQ.
                  cli                           ;   2 Allow HSYNC to interrupt.

  Delay 15 cycles.


Maximum hsync delay 256 cycles.



********************

And bonknuts (or anyone else), is there some reason why the System Card IRQ1 handler delays  processing the hsync interrupt by checking for a vsync interrupt first, and then delays things even more by doing 2 dummy BSR+RTS calls before changing a VDC register?

Has anyone looked at the timing of the hsync interrupt in relation to the VDC's latching of the scroll registers for the next display line?  :-k

TurboXray

Hahaha. Sorry, I was thinking of the VCE for some reason  :oops:

TurboXray

About the sys card VDC handling routine; I dunno. But I never use it. I opt to use custom handling myself for everything VDC related (which is one jmp indirection, and.. maybe one BBx involved?). Or just straight out replacing the bank in MPR7 with something of my own.

 Yeah, checking vsync first is totally ass backwards. The VDC interrupt handler should be optimized for hsync routine - who care about vsync and whatever small delay it gets. But then again, nothing in the sys card lib is really "optimal".

touko

#36
i think if you want to decrease the CPU load when you're playing samples, a little buffer can do the job very well .
It's the banking which take the most cycles, and reduce it to 1 mapping for 4 samples(4 bytes) for exemple, help a lot .

TurboXray

#37
Quoteand reduce it to 1 mapping for 4 samples(4 bytes)
If switched to a buffer system, there would no mapping (the buffer should be in fixed system ram).

Doing a buffer system is faster, but it also has some requirements. It's going to require a two buffers in ram for all channels; a timer is 1024cycles between interrupt - you're going to copy 4x116bytes in 1024 cycles? Not gonna happen. Even just one channel gets too close for comfort (713cycles via Txx).

 It's not just the bank mapping that the buffer system reduces. There's no MSB check on the buffer inside the TIRQ routine. Though that only saves you +2 cycles per sample, per channel. You could remove the EOF marker, and simply have all samples trail out zeros or $0f - both work (any value works, actually). So there's another +2 cycles per sample per channel saved.

Don't get me wrong; I use the double buffer system for my own stuff. But sometimes it's easier when you give other people functionality - to keep the interface a little more simple, and just eat a little overhead.

 For a single channel buffer system; you'd save ~1.8% cpu overhead. For two channel buffer, you save ~2.2%. For four channel buffer, you save ~2.7%. It's not a whole lot. The reason being, is that mapping in a channel is only 9 cycles (lda <zp: tam #$nn). The larger overhead is from the tma #n:pha and pla:tam for saving the MPR. That's 16 cycles overhead, but that overheard basically gets divide down as more channels are output inside the routine. So the biggest cost savings is single channel use, relative to per channel savings.

 Maybe I should be more clear; if you have 4 samples to stream - you don't map them into 4 individual banks. There's no reason to. You map them in sequential order, to the same MPR reg, as you use them. That way you only need to save/restore one bank for <n> number of channels to stream from. My above overhead savings assumes this. If it didn't, then you'd take the 1.8% and multiple that by the number of channels used as your total savings. But that shouldn't be the case.

 What I do like about the buffer system, over the slight savings, is the flexibility of it. You can support both compressed samples and uncompressed samples. You could also support half frequency samples (3.5khz instead of 7khz; some sound FX actually sound decent at this playback rate. There are some PCE games that do this; playback samples at both rates). Do all kinds of stuff, and the main TIMER routine wouldn't have to know anything, other than what's in the buffer.

elmer

#38
Quote from: touko on 12/21/2016, 05:23 AMi think if you want to decrease the CPU load when you're playing samples, a little buffer can do the job very well .
It's the banking which take the most cycles, and reduce it to 1 mapping for 4 samples(4 bytes) for exemple, help a lot .
That's an interesting idea, there does seem to be quite a few cycles spent in the banking.

I'd like to have seen an example of how you'd actually accomplish that in practice.

You'd be adding some overhead in creating that buffer every 4th interrupt (and some extra instructions in *not* creating it for the other 3 interrupts).

So it's all going to be in the details, and in how you ensure that you don't keep interrupts disabled for too long.


Here's an example that I came up with that shows the *maximum* benefit that you could obtain by dropping *all* the banking from the interrupts, and just buffering up an entire frame's worth of samples in three 116-sample buffers in RAM.

; Three Channel Sample Playback.
;
; Time (normal 0 channel):  71 * 116 calls =  8236 cycles (6.91%)
; Time (normal 1 channel): 107 * 116 calls = 12412 cycles (10.41%)
; Time (normal 2 channel): 143 * 116 calls = 16588 cycles (13.91%)
; Time (normal 3 channel): 179 * 116 calls = 20764 cycles (17.42%)
; Time (worst  3 channel): 179 * 115 calls +
;                          251 *   1 calls = 20836 cycles (17.48%)

; Three Channel Sample Playback with RAM buffer.
;
; Time (normal 0 channel):  55 * 116 calls =  6380 cycles (5.35%)
; Time (normal 1 channel):  80 * 116 calls =  9280 cycles (7.78%)
; Time (normal 2 channel): 105 * 116 calls = 12180 cycles (10.22%)
; Time (normal 3 channel): 130 * 116 calls = 15080 cycles (12.65%)
; Time (worst  3 channel): 130 * 115 calls +
;                          151 *   1 calls = 15101 cycles (12.67%)



OK, here's the first part, and the gain looks good:)

But you've then got to add the overhead for creating the RAM buffers.  :-k

When I do that, with the fastest TII code that I can think of, I get ...

; Three Channel Sample Playback with creating RAM buffer.
;
; Time (normal 0 channel):  6380 +   63 cycles =  6443 (5.40%)
; Time (normal 1 channel):  9280 +  924 cycles = 10204 (8.56%)
; Time (normal 2 channel): 12180 + 1785 cycles = 13965 (11.71%)
; Time (normal 3 channel): 15080 + 2646 cycles = 17726 (14.87%)
; Time (worst  3 channel): 15101 + 2646 cycles = 17747 (14.89%)



That's a 2.6% frame-time improvement at *best*, and I've not dealt with the issue of how to create those buffers safely without delaying the timer interrupt and causing an audio problem.

I'm not sure (yet) that the benefit is worth the cost.

I'd love to see what you can come up with!

Here's my code ...

;****************************************************************************
;
; Three Channel Sample Playback with RAM buffer.
;
; Time (normal 0 channel):  55 * 116 calls =  6380 cycles (5.35%)
; Time (normal 1 channel):  80 * 116 calls =  9280 cycles (7.78%)
; Time (normal 2 channel): 105 * 116 calls = 12180 cycles (10.22%)
; Time (normal 3 channel): 130 * 116 calls = 15080 cycles (12.65%)
; Time (worst  3 channel): 130 * 115 calls +
;                          151 *   1 calls = 15101 cycles (12.67%)
;
; Maximum hsync delay:     151 - 20 = 131 cycles

tirq_ch234:     ;;;                             ; 8 (cycles for the INT)
                stz     $1403                   ; 5 Acknowledge TIMER IRQ.
                cli                             ; 2 Allow HSYNC to interrupt.
                pha                             ; 3
                sei                             ; 2 Disable interrupts.

.channel2:      bbr2    <sample_flag,.channel3  ; 6
                lda     #2                      ; 2
                sta     PSG_R0                  ; 5
                lda     [sample2_ptr]           ; 7
                bmi     .eof2                   ; 2
                sta     PSG_R6                  ; 5
                inc     <sample2_ptr            ; 6

.channel3:      bbr3    <sample_flag,.channel4  ; 6
                lda     #3                      ; 2
                sta     PSG_R0                  ; 5
                lda     [sample3_ptr]           ; 7
                bmi     .eof3                   ; 2
                sta     PSG_R6                  ; 5
                inc     <sample3_ptr            ; 6

.channel4:      bbr4    <sample_flag,.done      ; 6
                lda     #4                      ; 2
                sta     PSG_R0                  ; 5
                lda     [sample4_ptr]           ; 7
                bmi     .eof4                   ; 2
                sta     PSG_R6                  ; 5
                inc     <sample4_ptr            ; 6

.done:          pla                             ; 4
                rti                             ; 7

.eof2:          stz     PSG_R4                  ; 5
                rmb2    <sample_flag            ; 7
                bra     .channel3               ; 4

.eof3:          stz     PSG_R4                  ; 5
                rmb3    <sample_flag            ; 7
                bra     .channel4               ; 4

.eof4:          stz     PSG_R4                  ; 5
                rmb4    <sample_flag            ; 7
                bra     .done                   ; 4


;****************************************************************************
;
; Three Channel Sample Playback with creating RAM buffer.
;
; Time (normal 0 channel):  6380 +   63 cycles =  6443 (5.40%)
; Time (normal 1 channel):  9280 +  924 cycles = 10204 (8.56%)
; Time (normal 2 channel): 12180 + 1785 cycles = 13965 (11.71%)
; Time (normal 3 channel): 15080 + 2646 cycles = 17726 (14.87%)
; Time (worst  3 channel): 15101 + 2646 cycles = 17747 (14.89%)
;

buffer_samples: tma3                            ; 4
                pha                             ; 3
                tma4                            ; 4
                pha                             ; 3

; Prepare channel 2's sample 116-byte buffer.

.channel2:      bbr2    <sample_flag,.channel3  ; 6

                lda     s2_bnk                  ; 5
                tam3                            ; 3
                inc     a                       ; 2
                tam4                            ; 3
.smod0:         tii     s2_ptr,s2_buf+$00,32    ; 209
.smod1:         tii     s2_ptr,s2_buf+$20,32    ; 209
.smod2:         tii     s2_ptr,s2_buf+$40,32    ; 209
.smod3:         tii     s2_ptr,s2_buf+$60,20    ; 137
                lda     smod0+1                 ; 5   lo
                ldy     smod0+2                 ; 5   hi

                clc                             ; 2
                adc     #116                    ; 2
                sta     smod0+1                 ; 5   lo
                bcc     .addr0                  ; 2
                iny                             ; 2
                bpl     .addr0                  ; 4
                inc     s2_bnk                  ; -
                ldy     #$60                    ; -

.addr0:         sty     smod0+2                 ; 5    hi
                clc                             ; 2
                adc     #32                     ; 2
                sta     smod1+1                 ; 5    lo
                bcc     .addr1                  ; 2
                iny                             ; 2
.addr1:         sty     smod1+2                 ; 5    hi

                clc                             ; 2
                adc     #32                     ; 2
                sta     smod2+1                 ; 5    lo
                bcc     .addr2                  ; 2
                iny                             ; 2
.addr2:         sty     smod2+2                 ; 5    hi

                clc                             ; 2
                adc     #32                     ; 2
                sta     smod3+1                 ; 5    lo
                bcc     .addr3                  ; 2
                iny                             ; 2
.addr3:         sty     smod3+2                 ; 5    hi

; Prepare channel 3's sample 116-byte buffer.

.channel3:      bbr3    <sample_flag,.channel4  ; 6
                ...

; Prepare channel 3's sample 116-byte buffer.

.channel4:      bbr4    <sample_flag,.done      ; 6
                ...

.done:          pla                             ; 4
                tam4                            ; 5
                pla                             ; 4
                tam3                            ; 5
                rts                             ; 7

elmer

Quote from: TurboXray on 12/21/2016, 12:22 PMDoing a buffer system is faster, but it also has some requirements. It's going to require a two buffers in ram for all channels; a timer is 1024cycles between interrupt - you're going to copy 4x116bytes in 1024 cycles? Not gonna happen.
Yeah, I was trying to avoid the double-buffer, but I'm not sure that I can easily do so.

You *could* interleave the buffer updates, i.e. update the 1st 16-bytes of each channel within a single TIRQ time-period, and then update the rest ... but your code is getting *excessively* timing-dependant at that point.


QuoteDon't get me wrong; I use the double buffer system for my own stuff. But sometimes it's easier when you give other people functionality - to keep the interface a little more simple, and just eat a little overhead.
Ahhh ... OK, so that's why you're so keen on keeping a consistent 116 interrupts-per frame and resyncing the timer every vsync!  :wink:

The simple code doesn't really care whether there are 116 or 117 interrupts in a frame, or the exact synchronization.

Yeah ... the more that I think about it, if the target is a generic sound driver that could be used in HuC as a replacement for the System Card Player, then I'd prefer to keep things simple-but-reliable, and accept the 2..3% CPU hit.  :-k


QuoteWhat I do like about the buffer system, over the slight savings, is the flexibility of it. You can support both compressed samples and uncompressed samples. You could also support half frequency samples (3.5khz instead of 7khz; some sound FX actually sound decent at this playback rate. There are some PCE games that do this; playback samples at both rates). Do all kinds of stuff, and the main TIMER routine wouldn't have to know anything, other than what's in the buffer.
All good points ... but I'll leave that for the "advanced" developers like you!  :wink:

touko

#40
No needs a big buffer, a 4 bytes buffer /voice is enough,you need to map datas only 1 time for 4 samples .
Banking datas each time is 50/60 cycles / sample, it's 100/120 cycles /sample for 2 voices .
For 8 samples (4 samples/voice) you lost 400/480 cycles vs only 100/120 with banking.
You can reduce drastically the CPU load in your frame .

you can also doing a bit packing to reduce the need of mapping(and also reduce the sample size by 1/3), 3 samples in 2 bytes .

elmer

#41
Quote from: touko on 12/21/2016, 01:33 PMNo needs a big buffer, a 4 bytes buffer /voice is enough,you need to map datas only 1 time for 4 samples.
Show a code example, please.


QuoteBanking datas each time is 50/60 cycles / sample, it's 100/120 cycles /sample for 2 voices .
For 8 samples (4 samples/voice) you lost 400/480 cycles vs only 100/120 with banking.
You can reduce drastically the CPU load in your frame .
You're not making any sense.

Please show some code example of why you think this is so.

The code that I posted earlier has a banking overhead of ...

1 channel sample playback = 27 cycles per timer interrupt
2 channel sample playback = 38 cycles per timer interrupt
3 channel sample playback = 49 cycles per timer interrupt


And 3 channels is the maximum before I'd have to re-enable interrupts or risk delaying an hsync too much.

Are you seeing something wrong in the code that I posted?


Quoteyou can also doing a bit packing to reduce the need of mapping(and also reduce the sample size by 1/3), 3 samples in 2 bytes .
Yes, you can, at the cost of more overhead, and more cycles.

Again ... please show a code example of how you're doing all of this without overhead, or provide some timing calculations to show the cost.


<edit>

OK, my code was actually in the MML thread, so here's the latest 3 channel version for reference ...

;****************************************************************************
;
; Three Channel Sample Playback.
;
; Time (normal 0 channel):  71 * 116 calls =  8236 cycles (6.91%)
; Time (normal 1 channel): 107 * 116 calls = 12412 cycles (10.41%)
; Time (normal 2 channel): 143 * 116 calls = 16588 cycles (13.91%)
; Time (normal 3 channel): 179 * 116 calls = 20764 cycles (17.42%)
; Time (worst  3 channel): 179 * 115 calls +
;                          251 *   1 calls = 20836 cycles (17.48%)
;
; Maximum hsync delay:     251 - 25 = 226 cycles

tirq_ch234:     ;;;                             ; 8 (cycles for the INT)
                stz     $1403                   ; 5 Acknowledge TIMER IRQ.
                cli                             ; 2 Allow HSYNC to interrupt.
                pha                             ; 3
                tma3                            ; 4
                pha                             ; 3
                sei                             ; 2 Disable interrupts.

.channel2:      bbr2    <sample_flag,.channel3  ; 6
                lda     <sample2_bnk            ; 4
                tam3                            ; 5
                lda     #2                      ; 2
                sta     PSG_R0                  ; 5
                lda     [sample2_ptr]           ; 7
                bmi     .eof2                   ; 2
                sta     PSG_R6                  ; 5
                inc     <sample2_ptr            ; 6
                beq     .msb2                   ; 2

.channel3:      bbr3    <sample_flag,.channel4  ; 6
                lda     <sample3_bnk            ; 4
                tam3                            ; 5
                lda     #3                      ; 2
                sta     PSG_R0                  ; 5
                lda     [sample3_ptr]           ; 7
                bmi     .eof3                   ; 2
                sta     PSG_R6                  ; 5
                inc     <sample3_ptr            ; 6
                beq     .msb3                   ; 2

.channel4:      bbr4    <sample_flag,.done      ; 6
                lda     <sample4_bnk            ; 4
                tam3                            ; 5
                lda     #4                      ; 2
                sta     PSG_R0                  ; 5
                lda     [sample4_ptr]           ; 7
                bmi     .eof4                   ; 2
                sta     PSG_R6                  ; 5
                inc     <sample4_ptr            ; 6
                beq     .msb4                   ; 2

.done:          pla                             ; 4
                tam3                            ; 5
                pla                             ; 4
                rti                             ; 7

.msb2:          inc     <sample2_ptr+1          ; 6
                bpl     .channel4               ; 2
                inc     <sample2_bnk            ; 6
                lda     #$60                    ; 2
                sta     <sample2_ptr+1          ; 4
                bra     .channel3               ; 4

.msb3:          inc     <sample3_ptr+1          ; 6
                bpl     .channel4               ; 2
                inc     <sample3_bnk            ; 6
                lda     #$60                    ; 2
                sta     <sample3_ptr+1          ; 4
                bra     .channel4               ; 4

.msb4:          inc     <sample4_ptr+1          ; 6
                bpl     .done                   ; 2
                inc     <sample4_bnk            ; 6
                lda     #$60                    ; 2
                sta     <sample4_ptr+1          ; 4
                bra     .done                   ; 4

.eof2:          stz     PSG_R4                  ; 5
                rmb2    <sample_flag            ; 7
                bra     .channel3               ; 4

.eof3:          stz     PSG_R4                  ; 5
                rmb3    <sample_flag            ; 7
                bra     .channel4               ; 4

.eof4:          stz     PSG_R4                  ; 5
                rmb4    <sample_flag            ; 7
                bra     .done                   ; 4

TurboXray

Ahh ok - I think know what Touko is talking about now. Touko can you post your code example?

TurboXray

I forgot how much better the PCE/SGX sounds through a stereo system. So much more bass-y-er and less tinny than emulation through TV or even earphones on the laptop. And the analog filtering makes is a bit softer on the real system too. I wish emulators could emulate that.

 Anyway, here's my batch of 7khz sample scaling vs 14khz sample scaling. On the real system, the 14khz performs better than on emulators thanks to the analog filtering. It's still not a big difference, or as much as I expected, going with double the frequency. But there is more 'punch' to some of the samples. Or at least on my stereo system. http://www.pcedev.net/HuPCMDriver/7khz_and_14khz.zip <- try them out on the real system (not emulator).

elmer

Quote from: TurboXray on 12/21/2016, 02:26 PMAhh ok - I think know what Touko is talking about now. Touko can you post your code example?
Are you thinking that touko is talking about your multichannel PCM driver?

The one that you've said takes 12% CPU to mix 8 PCM channels into 2 PSG channels?

https://www.pcengine-fx.com/forums/index.php?topic=20035.msg464140#msg464140

Now that I've seen the cost of 1/2/3 PSG sample-channels, something like that starts to sound quite tempting!  :-k

esteban

Quote from: TurboXray on 12/21/2016, 02:39 PMI forgot how much better the PCE/SGX sounds through a stereo system. So much more bass-y-er and less tinny than emulation through TV or even earphones on the laptop. And the analog filtering makes is a bit softer on the real system too. I wish emulators could emulate that.
Yes, absolutely. :)

Now, some games, like the venerable China Warrior, have awesome bass (in main tune)... now, imagine how ridiculously awesome the bass is with SUBWOOFER.

:)
IMGIMG IMG  |  IMG  |  IMG IMG

touko

#46
Quote from: TurboXray on 12/21/2016, 02:26 PMAhh ok - I think know what Touko is talking about now. Touko can you post your code example?
Yes .
Sorry but the comments are in french (some english comments added )

User_Timer_Irq:   
      stz    $1403            ; // RAZ TIMER      
      pha
      phx      
      phy               
      
   ; // Evite de désactiver la voix si pas de sample
      bbs   #7 , <test_octet_voix1 , .fin_sample_voix1      
      
      lda   #VOIX_DDA1            ; // Choix de la voix DDA 1
      sta   $800          
      
      bbs   #0 , <test_octet_voix1 , .prep_octets_voix1
      
      lda   <sample_base_voix1
      cmp   <sample_taille_voix1
      bcc   .fin_comp1
      lda   <sample_base_voix1 + 1
      cmp   <sample_taille_voix1 + 1   ; // Si fin du sample
      bcc   .fin_comp1
            
      stz   $804               ; // Son à 0 sur la voix 1                     
      smb   #7 , <test_octet_voix1   ; // On déactive la lecture des sample pour la voix 1      
      bra   .fin_sample_voix1
      
   ; // On Met en cache 3 samples(2 octets) pour la voix 1 et on lit le premier sample
     .fin_comp1:

      ;  Mapping datas
      tma  #3
      tax
           tma  #4
      tay
              
      lda    <sample_bank_voix1             
      tam  #3
      inc    A
      tam  #4

     ; Buffering the 2 bytes
      lda    [ sample_base_voix1 ] 
      sta   <cache_memory_voix1            
      
      inc   <sample_base_voix1   
      lda   [ sample_base_voix1 ]
      sta   <cache_memory_voix1 + 1      
      
  ; Restoring the banks context
      txa
      tam   #3 
      tya
      tam    #4
    
      lda   #3
      sta    <test_octet_voix1
   
      lda   <cache_memory_voix1   ; Reading the first sample
            
      inc   <sample_base_voix1      
      bne   .transfert_data_sample_voix1
      inc   <sample_base_voix1 + 1
      
      bra   .transfert_data_sample_voix1
      
   ; // On lit le second sample
     .prep_octets_voix1:      
      lda   <cache_memory_voix1 + 1    ; Reading the second sample
      
   ; // Si second sample lu, on décompresse le sample 3
      bbs   #1 , <test_octet_voix1 , .octet_suiv_voix1  ; If second sample was already send
           ; Decompressing the third sample then
      and    #$60      
      lsr    A
      lsr    A      
      sta   <cache_memory_voix1 + 1
      lda    <cache_memory_voix1   
      lsr    A
      lsr    A
      lsr    A
      lsr    A
      lsr    A
      ora   <cache_memory_voix1 + 1
      
   ; // On decale le compteur de sample pour la voix 1
     .octet_suiv_voix1:
      lsr   <test_octet_voix1   ; Right shift of the sample to send      
      
     .transfert_data_sample_voix1:      
      sta    $806                                    
         
     .fin_sample_voix1:        
      ply
      plx
      pla                     
      
      rti                  

There is 1 voice here,and i use a 2 bytes(3 samples because bits packed) buffer only .
After banking i 'am sending the first sample, next interrupt the second, and the next, i depacking and sending the third sample .

elmer

Quote from: touko on 12/22/2016, 04:08 AMSorry but the comments are in french (some english comments added )
Your English is far better than my French!  :wink:

I'm sure that we can manage with a bit of help from Google Translate.

Thanks!

*************************

OK, so now I'm outputting some real music data to the PSG, and it's obvious that the 5-bit volume in PSG register 4 is not linear ... it drops off very, very quickly.

Does anyone have a calibrated liner-volume to PSG-volume lookup table to share?

I can earball it and come up with a rough approximation ... but I don't have an oscilloscope to create a proper one.

Gredler

Quote from: elmer on 12/22/2016, 01:29 PMI can earball it
Hahah my new favorite term :P Elmer is a earballer!

TurboXray

Quote from: elmer on 12/22/2016, 01:29 PM
Quote from: touko on 12/22/2016, 04:08 AMSorry but the comments are in french (some english comments added )
Your English is far better than my French!  :wink:

I'm sure that we can manage with a bit of help from Google Translate.

Thanks!

*************************

OK, so now I'm outputting some real music data to the PSG, and it's obvious that the 5-bit volume in PSG register 4 is not linear ... it drops off very, very quickly.

Does anyone have a calibrated liner-volume to PSG-volume lookup table to share?

I can earball it and come up with a rough approximation ... but I don't have an oscilloscope to create a proper one.
For channel volume, it's 1.5 dB drop per integer. For pan, it's 3.0 dB drop per integer. For main channel volume, 0 is -infinity but that's not true for pan! 0 for pan is not true silence.

 Here's a chart I made for Amiga/XM linear to PCE: http://www.pcedev.net/blog/files/XM_volume_tables.txt