Practical audio streaming while limiting kbps and CPU usage
Moderator: Moderators
Forum rules
- For making cartridges of your Super NES games, see Reproduction.
Practical audio streaming while limiting kbps and CPU usage
Its a fact audio streaming is possible on the SNES, blargg proved it.
However I wonder if there is a way to limit the size of the data (kbps) and CPU usage.
I'll explain :
In Blarg's demo, he writes at the rate of 32kHz (the output rate of the SPC) data in the echo buffer. This works well but he used raw uncompressed data which is not acceptable for a system such as the SNES where the memory is limited.
Uncompressed mono data at 32kHz is 512kbps, so a one minute song will take about 3MB, the size of a big game like Final Fantasy VI, which is not acceptable.
Also this almost monopolizes CPU usage, similarly to using $4011 with the NES.
The first idea is to use SNES' native BRR format. It will compress data to a ratio of 9/32, to about 144kbps. That way a one-minute song will take about 850kb, wich is more acceptable.
This can be done in having a huge sample that takes a significant part of the memory in the SNES, and you update the first half when the second half is playing and vice versa (double buffering).
The problem is to sync the updates in the BRR sample between the CPU and the SPC. If you do it in open loop (with carefully timed code) chances are that it will depend on NTSC/PAL settings, and maybe even will not work so well with all SNES as there is two different crystals for the SPC and the CPU/PPU (I think).
So you'll need some way of keeping track of where the replaying is to tell the CPU what to update when. If this is possible, then it'll be possible to do streaming at a more reasonable bitrate without monopolizing the CPU.
The best option would be to use low bitrate OGG/Vorbis encoding, which can go as low as 45kbps with acceptable loss of quality, a one minute song would take only about 260kb then !
The problem is that of course the SPC can't decode this format natively so it'll be up to either the SPC and/or the CPU to handle the decoding and use the echo buffer for replaying. Then I wonder if the computing power of both is sufficient for vorbis decoding.
The best would be the CPU sending compressed data to the SPC, which would handle itself the decoding on the fly and paste it in its echo buffer. However, the SPC is clocked at only 1.024 MHz, while the CPU can reach about 3Mhz. So if this SPC can't decode vorbis, it'll up to the CPU to do it and then it'll monopolize it of course.
However I wonder if there is a way to limit the size of the data (kbps) and CPU usage.
I'll explain :
In Blarg's demo, he writes at the rate of 32kHz (the output rate of the SPC) data in the echo buffer. This works well but he used raw uncompressed data which is not acceptable for a system such as the SNES where the memory is limited.
Uncompressed mono data at 32kHz is 512kbps, so a one minute song will take about 3MB, the size of a big game like Final Fantasy VI, which is not acceptable.
Also this almost monopolizes CPU usage, similarly to using $4011 with the NES.
The first idea is to use SNES' native BRR format. It will compress data to a ratio of 9/32, to about 144kbps. That way a one-minute song will take about 850kb, wich is more acceptable.
This can be done in having a huge sample that takes a significant part of the memory in the SNES, and you update the first half when the second half is playing and vice versa (double buffering).
The problem is to sync the updates in the BRR sample between the CPU and the SPC. If you do it in open loop (with carefully timed code) chances are that it will depend on NTSC/PAL settings, and maybe even will not work so well with all SNES as there is two different crystals for the SPC and the CPU/PPU (I think).
So you'll need some way of keeping track of where the replaying is to tell the CPU what to update when. If this is possible, then it'll be possible to do streaming at a more reasonable bitrate without monopolizing the CPU.
The best option would be to use low bitrate OGG/Vorbis encoding, which can go as low as 45kbps with acceptable loss of quality, a one minute song would take only about 260kb then !
The problem is that of course the SPC can't decode this format natively so it'll be up to either the SPC and/or the CPU to handle the decoding and use the echo buffer for replaying. Then I wonder if the computing power of both is sufficient for vorbis decoding.
The best would be the CPU sending compressed data to the SPC, which would handle itself the decoding on the fly and paste it in its echo buffer. However, the SPC is clocked at only 1.024 MHz, while the CPU can reach about 3Mhz. So if this SPC can't decode vorbis, it'll up to the CPU to do it and then it'll monopolize it of course.
Useless, lumbering half-wits don't scare us.
- TmEE
- Posts: 789
- Joined: Wed Feb 13, 2008 9:10 am
- Location: Estonia, Rapla city (50 and 60Hz compatible :P)
- Contact:
You can forget OGG Vorbis on SNES. Chilly Willy has made a decoder for 32X and it struggles on it, and 32X has much more CPU power than several SNES together. But OGG quality is quite acceptable at such low bit rates.
Double buffering of BRR data should not be so difficult, and you can upload it to the SPC faster than it can play it, so you can have chunks of CPU time between uploads to do other things.
Double buffering of BRR data should not be so difficult, and you can upload it to the SPC faster than it can play it, so you can have chunks of CPU time between uploads to do other things.
You'll only get full speed (3.55/3.57 Mhz) when accessing certain regions of memory. WRAM is not included among those, so you'll be limited to 2.68 Mhz when accessing WRAM.However, the SPC is clocked at only 1.024 MHz, while the CPU can reach about 3Mhz. So if this SPC can't decode vorbis, it'll up to the CPU to do it and then it'll monopolize it of course.
If vorbis decoding is too CPU intensive, then is there some other decoding that could achieve better performance than plain BRR and that could be decoded in real time by the 65816 or the SPC700 (preferably by the SPC700, so that less data has to be transfered and the 65816 is free for gameplay) ?
Personally I think vorbis is quite perfect as I've never heard any losses even at q-1. However with MP3 you can hear occasion loss at around 160kbps. You hear them the most on music with saw-wave-ish melodies.
Personally I think vorbis is quite perfect as I've never heard any losses even at q-1. However with MP3 you can hear occasion loss at around 160kbps. You hear them the most on music with saw-wave-ish melodies.
Useless, lumbering half-wits don't scare us.
I still think Moero Pro Yakyuu (Japanese Bases Loaded) had the right idea: audio decompression hardware on the cartridge board. Are there any MP3 player chipsets that can be controlled with SPI or I2C?
-
psycopathicteen
- Posts: 3001
- Joined: Wed May 19, 2010 6:12 pm
-
psycopathicteen
- Posts: 3001
- Joined: Wed May 19, 2010 6:12 pm
Tepples, I was talking about something that would work with the power pak and if possible emulators (at least BSNES) of course.
Is it really possible to use HDMA to transfer data to the SPC ? Even if this is possible I bet the bitrate would be very low (something like 200 bytes per frame or something in the like), but yeah it'd use few CPU power.
Is it really possible to use HDMA to transfer data to the SPC ? Even if this is possible I bet the bitrate would be very low (something like 200 bytes per frame or something in the like), but yeah it'd use few CPU power.
Useless, lumbering half-wits don't scare us.
-
psycopathicteen
- Posts: 3001
- Joined: Wed May 19, 2010 6:12 pm
I did this back in 2008 with N-Warp Daisakusen, so yeah, it is possible.Bregalad wrote:Tepples, I was talking about something that would work with the power pak and if possible emulators (at least BSNES) of course.
Is it really possible to use HDMA to transfer data to the SPC ? Even if this is possible I bet the bitrate would be very low (something like 200 bytes per frame or something in the like), but yeah it'd use few CPU power.
Makes minimal use of the main CPU and if you do it right, it will work on PAL and NTSC consoles all the same.
I wouldn't recommend using my audio player directly, but you can at least get an idea of how this is achieved by looking at my sourcecode.
Inferior emulators such as Zsnes and Snes9x will have trouble with this kind of timing-critical hardware usage, though.
-
KungFuFurby
- Posts: 264
- Joined: Wed Jul 09, 2008 8:46 pm
- l_oliveira
- Posts: 409
- Joined: Wed Jul 13, 2011 6:51 am
- Location: Brasilia, Brazil
Tales of Phantasia only streams small BRRs. If you notice during the intro song, there is interruption during the singing all the times. That's because every phrase of signing is a different BRR.
I'm pretty sure the timing is based on the CPU side, beacuse the intro song don't work well on my PAL console. The game even sometimes crashes during the intro. Because the CPU and SPC use different crystal oscillators, there is no way to have them completely in sync without some kind of synchronization during the communication.
However I think there is a way to make BRR streaming work fine on both NTSC and PAL console without changing anything. Since the SPC timers increase at a frequency of 8 kHz, they update exactly once every 4 output samples.
Therefore if you run your engine in a typical way, that is when the timer has reached some N value, you know the DSP had output exacly 4*N samples.
So based on this you can send signals to the CPU when it needs to send more data. Of couse since the CPU has something else to do you'll have to wait it's available, so the "sample" should be long enough to compensate for this.
For example, say you want to stream in a BRR sample by blocks of 720 bytes (= 80 BRR blocks = 1280 samples).
Then you'll have to reserve memory for twice that size, that is 1440 bytes, in a special BRR sample reserved for this. (which is 160 blocks long, and loops back to block 0). The CPU should have sent the initial block before to start playing, too.
The duration of a block is 1280/32000 = 40ms
The SPC needs to watch one of its timer, and he knowns, that after having keyed on the channel, once the timer should increment exactly 1280/4 = 320 times before the SPC needs the CPU to send new data.
The timer is only 8 bit but this part of the timing can be done in software from smaller timer increments.
The CPU doesn't have to respond immediately as there is still ~2 PAL frames before it would get critical, but the sooner is the better. Since the SPC supposedly have it's loop going way faster than the CPU, the CPU will then have to wait for the SPC to accept data and the actual transfer can be done.
I'm pretty sure the timing is based on the CPU side, beacuse the intro song don't work well on my PAL console. The game even sometimes crashes during the intro. Because the CPU and SPC use different crystal oscillators, there is no way to have them completely in sync without some kind of synchronization during the communication.
However I think there is a way to make BRR streaming work fine on both NTSC and PAL console without changing anything. Since the SPC timers increase at a frequency of 8 kHz, they update exactly once every 4 output samples.
Therefore if you run your engine in a typical way, that is when the timer has reached some N value, you know the DSP had output exacly 4*N samples.
So based on this you can send signals to the CPU when it needs to send more data. Of couse since the CPU has something else to do you'll have to wait it's available, so the "sample" should be long enough to compensate for this.
For example, say you want to stream in a BRR sample by blocks of 720 bytes (= 80 BRR blocks = 1280 samples).
Then you'll have to reserve memory for twice that size, that is 1440 bytes, in a special BRR sample reserved for this. (which is 160 blocks long, and loops back to block 0). The CPU should have sent the initial block before to start playing, too.
The duration of a block is 1280/32000 = 40ms
The SPC needs to watch one of its timer, and he knowns, that after having keyed on the channel, once the timer should increment exactly 1280/4 = 320 times before the SPC needs the CPU to send new data.
The timer is only 8 bit but this part of the timing can be done in software from smaller timer increments.
The CPU doesn't have to respond immediately as there is still ~2 PAL frames before it would get critical, but the sooner is the better. Since the SPC supposedly have it's loop going way faster than the CPU, the CPU will then have to wait for the SPC to accept data and the actual transfer can be done.
Useless, lumbering half-wits don't scare us.