What if Sinistar's voice sample chip was used to play music?
-
- Posts: 3140
- Joined: Wed May 19, 2010 6:12 pm
What if Sinistar's voice sample chip was used to play music?
I was looking through youtube and I found this video using the NES's DPCM channel to play music:
https://m.youtube.com/watch?v=YvRqV9tBI7c&t=72s
...and it sounds pretty lousy. I'm think that Sinistar's chip chip would be similar, but slightly better just because it uses 1-bit adpcm instead of 1-bit dpcm.
https://m.youtube.com/watch?v=YvRqV9tBI7c&t=72s
...and it sounds pretty lousy. I'm think that Sinistar's chip chip would be similar, but slightly better just because it uses 1-bit adpcm instead of 1-bit dpcm.
Re: What if Sinistar's voice sample chip was used to play music?
They turned the volume up a lot in those encodings, favoring clipping to hiss. None of those should sound anywhere near that bad.
Actually comparing the same clip encoded at 33143bit/sec in NES DPCM and sox's "CVU" side-by-side is pretty comparible.
Actually comparing the same clip encoded at 33143bit/sec in NES DPCM and sox's "CVU" side-by-side is pretty comparible.
-
- Posts: 3140
- Joined: Wed May 19, 2010 6:12 pm
Re: What if Sinistar's voice sample chip was used to play music?
I'm actually surprised how easy it was to find a wave to cvsd converter. https://convertio.co/wav-cvsd/
I've played around with some samples, and I noticed a 32000hz cvsd sounds about as muffled as 8000hz pcm.
I've played around with some samples, and I noticed a 32000hz cvsd sounds about as muffled as 8000hz pcm.
Re: What if Sinistar's voice sample chip was used to play music?
sox's implementation of CVSD only supports 8kHz output, so I wouldn't be surprised if that's why you're hearing that muffling. That's also why I ended up going with "cvu" format for an apples-to-apples comparison.
That said, it does support arbitrary bitrates regardless of the filtering.
That said, it does support arbitrary bitrates regardless of the filtering.
-
- Posts: 3140
- Joined: Wed May 19, 2010 6:12 pm
Re: What if Sinistar's voice sample chip was used to play music?
What's weird is that eventhough the file comes out as 8khz, it's in slowmotion and I have to speed it up to 16khz for it to be at the correct speed. To do 32khz, I have to slow it down 2x, convert it, then speed it back up to 4x.
I'm saying even doing with 32khz, it sounds muffled like an 8khz pcm. I'm guessing the reason for the muffling is because the slope rate automatically decreases if there are less than 3 bits of the same value in a row.
I'm saying even doing with 32khz, it sounds muffled like an 8khz pcm. I'm guessing the reason for the muffling is because the slope rate automatically decreases if there are less than 3 bits of the same value in a row.
-
- Posts: 3140
- Joined: Wed May 19, 2010 6:12 pm
Re: What if Sinistar's voice sample chip was used to play music?
I was doing experiments with the CVU converter and I had a few observations on how the algorithm worked.
- There is a max slope rate that appears to be about 1/32
- The previous sample seems to get multiplied by a value slightly less than 1 (maybe 63/64?)
- When sampling a 6kHz sine wave at 48kHz, the amplitude gradually increases
- When sampling a 12kHz sine wave a 48kHz, the amplitude gradually decreases
- It must be incrementing and decrementing the slope at different rates, based on the two observations
- There is a max slope rate that appears to be about 1/32
- The previous sample seems to get multiplied by a value slightly less than 1 (maybe 63/64?)
- When sampling a 6kHz sine wave at 48kHz, the amplitude gradually increases
- When sampling a 12kHz sine wave a 48kHz, the amplitude gradually decreases
- It must be incrementing and decrementing the slope at different rates, based on the two observations
Re: What if Sinistar's voice sample chip was used to play music?
In backward-adaptive[1] DPCM in general, it is common to increment scale in response to slope overload faster than decrementing in response to granular noise. This ensures an adequately fast response to slope overload and decay over the course of a syllable without triggering excessive slope overload for multiple cycles in a single syllable.
I'd draw an analogy to IMA ADPCM (4-bit), as described in "DS Sound Notes" by Martin Korth. It uses a table of step scales where each is 10% bigger than the previous, updated after each sample. Sample values less than half full scale cause the next sample to use the previous entry in the scale table, whereas sample values greater than half the current step size cause jumping forward by 2, 4, 6, or 8 entries.
[1] "Backward-adaptive" refers to an encoder that decides how to encode the next sample based only on previous output. Distinguish from a "forward-adaptive" method such as SNES BRR, in which the encoder looks at future samples and tells the decoder in advance how they will be encoded. Source: "Research Interests: Adaptive Quantization" by Youngjun Yoo
I'd draw an analogy to IMA ADPCM (4-bit), as described in "DS Sound Notes" by Martin Korth. It uses a table of step scales where each is 10% bigger than the previous, updated after each sample. Sample values less than half full scale cause the next sample to use the previous entry in the scale table, whereas sample values greater than half the current step size cause jumping forward by 2, 4, 6, or 8 entries.
[1] "Backward-adaptive" refers to an encoder that decides how to encode the next sample based only on previous output. Distinguish from a "forward-adaptive" method such as SNES BRR, in which the encoder looks at future samples and tells the decoder in advance how they will be encoded. Source: "Research Interests: Adaptive Quantization" by Youngjun Yoo
-
- Posts: 3140
- Joined: Wed May 19, 2010 6:12 pm
Re: What if Sinistar's voice sample chip was used to play music?
I just tested an 8kHz wave (3 samples up, 3 samples down) and I found out that it does work, so there are two possibilities:
1) it increases slope every 2 same bits in a row, by a rate smaller than it decreases, but larger than half the rate
2) it increases slope every 3 same bits in a row, by a rate more than double the size it decreases
I wonder if there is an equilibrium frequency where the slope rate can sustain itself, but can't increase on it's own, where the amount of decreasing is exactly the same as the amount of increasing.
1) it increases slope every 2 same bits in a row, by a rate smaller than it decreases, but larger than half the rate
2) it increases slope every 3 same bits in a row, by a rate more than double the size it decreases
I wonder if there is an equilibrium frequency where the slope rate can sustain itself, but can't increase on it's own, where the amount of decreasing is exactly the same as the amount of increasing.
Re: What if Sinistar's voice sample chip was used to play music?
You could look at the implementation in SoX: https://sourceforge.net/p/sox/code/ci/m ... src/cvsd.c
Re: What if Sinistar's voice sample chip was used to play music?
Overview of CVSD:
It seems SoX also upsamples by 4 when encoding and downsamples by 4 when decoding. So if you encode an 8 kHz narrowband wave, you get 32 kbps output.
Code: Select all
tc0 = e^(-200 / rate) # Decay to 1/e-th in 1/200 s
tc1 = 0.1 * (1 - tc0) # Increase stepsize to full scale in 1/20 s
for each sample:
stepsize *= tc0 # exponential decay
if last 3 are 000 or 111: stepsize += tc1 # linear increase
output +stepsize if sample else -stepsize through a lowpass filter