Page 1 of 2

Practical audio streaming while limiting kbps and CPU usage

Posted: Fri Jan 20, 2012 3:13 am
by Bregalad
Its a fact audio streaming is possible on the SNES, blargg proved it.
However I wonder if there is a way to limit the size of the data (kbps) and CPU usage.

I'll explain :
In Blarg's demo, he writes at the rate of 32kHz (the output rate of the SPC) data in the echo buffer. This works well but he used raw uncompressed data which is not acceptable for a system such as the SNES where the memory is limited.
Uncompressed mono data at 32kHz is 512kbps, so a one minute song will take about 3MB, the size of a big game like Final Fantasy VI, which is not acceptable.
Also this almost monopolizes CPU usage, similarly to using $4011 with the NES.

The first idea is to use SNES' native BRR format. It will compress data to a ratio of 9/32, to about 144kbps. That way a one-minute song will take about 850kb, wich is more acceptable.

This can be done in having a huge sample that takes a significant part of the memory in the SNES, and you update the first half when the second half is playing and vice versa (double buffering).
The problem is to sync the updates in the BRR sample between the CPU and the SPC. If you do it in open loop (with carefully timed code) chances are that it will depend on NTSC/PAL settings, and maybe even will not work so well with all SNES as there is two different crystals for the SPC and the CPU/PPU (I think).
So you'll need some way of keeping track of where the replaying is to tell the CPU what to update when. If this is possible, then it'll be possible to do streaming at a more reasonable bitrate without monopolizing the CPU.

The best option would be to use low bitrate OGG/Vorbis encoding, which can go as low as 45kbps with acceptable loss of quality, a one minute song would take only about 260kb then !
The problem is that of course the SPC can't decode this format natively so it'll be up to either the SPC and/or the CPU to handle the decoding and use the echo buffer for replaying. Then I wonder if the computing power of both is sufficient for vorbis decoding.

The best would be the CPU sending compressed data to the SPC, which would handle itself the decoding on the fly and paste it in its echo buffer. However, the SPC is clocked at only 1.024 MHz, while the CPU can reach about 3Mhz. So if this SPC can't decode vorbis, it'll up to the CPU to do it and then it'll monopolize it of course.

Posted: Fri Jan 20, 2012 3:25 am
by TmEE
You can forget OGG Vorbis on SNES. Chilly Willy has made a decoder for 32X and it struggles on it, and 32X has much more CPU power than several SNES together. But OGG quality is quite acceptable at such low bit rates.

Double buffering of BRR data should not be so difficult, and you can upload it to the SPC faster than it can play it, so you can have chunks of CPU time between uploads to do other things.

Posted: Fri Jan 20, 2012 4:00 am
by mic_
However, the SPC is clocked at only 1.024 MHz, while the CPU can reach about 3Mhz. So if this SPC can't decode vorbis, it'll up to the CPU to do it and then it'll monopolize it of course.
You'll only get full speed (3.55/3.57 Mhz) when accessing certain regions of memory. WRAM is not included among those, so you'll be limited to 2.68 Mhz when accessing WRAM.

Posted: Fri Jan 20, 2012 4:31 am
by Bregalad
If vorbis decoding is too CPU intensive, then is there some other decoding that could achieve better performance than plain BRR and that could be decoded in real time by the 65816 or the SPC700 (preferably by the SPC700, so that less data has to be transfered and the 65816 is free for gameplay) ?

Personally I think vorbis is quite perfect as I've never heard any losses even at q-1. However with MP3 you can hear occasion loss at around 160kbps. You hear them the most on music with saw-wave-ish melodies.

Posted: Fri Jan 20, 2012 5:25 am
by mic_
You've got a peak MIPS count of ~1.8 on the S-CPU. I haven't heard of any vorbis/mp3 decoder implementations that come even close to meeting that sort of performance constraint, even on far more capable processors.

Posted: Fri Jan 20, 2012 9:13 am
by tepples
I still think Moero Pro Yakyuu (Japanese Bases Loaded) had the right idea: audio decompression hardware on the cartridge board. Are there any MP3 player chipsets that can be controlled with SPI or I2C?

Posted: Fri Jan 20, 2012 10:15 am
by psycopathicteen
You can probably use HDMA to send data to the SPC700, to save on CPU power.

Posted: Fri Jan 20, 2012 10:46 am
by tepples
Or use zero CPU power and use the analog inputs on the cart edge. Unlike the NES, the Super NES shares the Super Famicom's pinout, and Super Game Boy depends on this.

Posted: Fri Jan 20, 2012 10:58 am
by psycopathicteen
tepples wrote:Or use zero CPU power and use the analog inputs on the cart edge. Unlike the NES, the Super NES shares the Super Famicom's pinout, and Super Game Boy depends on this.
That's sounds like a good way to bullshit friends.

Posted: Fri Jan 20, 2012 11:11 am
by Bregalad
Tepples, I was talking about something that would work with the power pak and if possible emulators (at least BSNES) of course.

Is it really possible to use HDMA to transfer data to the SPC ? Even if this is possible I bet the bitrate would be very low (something like 200 bytes per frame or something in the like), but yeah it'd use few CPU power.

Posted: Fri Jan 20, 2012 12:00 pm
by psycopathicteen
It's enough for BRR compressed 22kHz mono.

Posted: Sat Jan 21, 2012 2:18 am
by d4s
Bregalad wrote:Tepples, I was talking about something that would work with the power pak and if possible emulators (at least BSNES) of course.

Is it really possible to use HDMA to transfer data to the SPC ? Even if this is possible I bet the bitrate would be very low (something like 200 bytes per frame or something in the like), but yeah it'd use few CPU power.
I did this back in 2008 with N-Warp Daisakusen, so yeah, it is possible.
Makes minimal use of the main CPU and if you do it right, it will work on PAL and NTSC consoles all the same.
I wouldn't recommend using my audio player directly, but you can at least get an idea of how this is achieved by looking at my sourcecode.

Inferior emulators such as Zsnes and Snes9x will have trouble with this kind of timing-critical hardware usage, though.

Posted: Sun Jan 22, 2012 1:36 pm
by KungFuFurby
It seems like you can combined the streamed audio with the music... at least that's what I think. N-Warp Daisakusen doesn't quite combine them, but there is a piece of SFX still playing at the end of a round when the streaming sample starts up.

Posted: Mon Jan 23, 2012 8:56 am
by l_oliveira
d4s wrote:Inferior emulators such as Zsnes and Snes9x will have trouble with this kind of timing-critical hardware usage, though.
This reply made me think of Tales of Phantasia .... Oh it did ...

Posted: Fri Jan 27, 2012 4:12 am
by Bregalad
Tales of Phantasia only streams small BRRs. If you notice during the intro song, there is interruption during the singing all the times. That's because every phrase of signing is a different BRR.

I'm pretty sure the timing is based on the CPU side, beacuse the intro song don't work well on my PAL console. The game even sometimes crashes during the intro. Because the CPU and SPC use different crystal oscillators, there is no way to have them completely in sync without some kind of synchronization during the communication.

However I think there is a way to make BRR streaming work fine on both NTSC and PAL console without changing anything. Since the SPC timers increase at a frequency of 8 kHz, they update exactly once every 4 output samples.

Therefore if you run your engine in a typical way, that is when the timer has reached some N value, you know the DSP had output exacly 4*N samples.

So based on this you can send signals to the CPU when it needs to send more data. Of couse since the CPU has something else to do you'll have to wait it's available, so the "sample" should be long enough to compensate for this.

For example, say you want to stream in a BRR sample by blocks of 720 bytes (= 80 BRR blocks = 1280 samples).
Then you'll have to reserve memory for twice that size, that is 1440 bytes, in a special BRR sample reserved for this. (which is 160 blocks long, and loops back to block 0). The CPU should have sent the initial block before to start playing, too.

The duration of a block is 1280/32000 = 40ms
The SPC needs to watch one of its timer, and he knowns, that after having keyed on the channel, once the timer should increment exactly 1280/4 = 320 times before the SPC needs the CPU to send new data.
The timer is only 8 bit but this part of the timing can be done in software from smaller timer increments.
The CPU doesn't have to respond immediately as there is still ~2 PAL frames before it would get critical, but the sooner is the better. Since the SPC supposedly have it's loop going way faster than the CPU, the CPU will then have to wait for the SPC to accept data and the actual transfer can be done.