Sound Emulation, Resources, Tips, Etc?
Moderator: Moderators
Sound Emulation, Resources, Tips, Etc?
I'm curious about writing my own Sound Emulation for my NES emulator but it doesn't seem like there is a whole lot of help aimed at emulating that explains some of the basics, mainly just alot of technical documents that assume you know things about sound that you don't. So does anyone have any good documents, tutorials, advice, or any suggestions?
Go through Bores Introduction to DSP, then when you come back and read our APU docs, things should make more sense.
Certain terminology was not explained so well in the information you linked as in regards to the NES. I'm been talking to someone though to try to get a better idea as to what is going on. I haven't looked at a huge assortment of APU documents but the Brad Taylor and Blargg documents seemed quite technical but didn't really help with certain basic aspects I did not understand yet. Makes me wish I'd have taken on writing my own APU when Blargg was still around.
Start with a C program that plays a simple song with square waves, I usually use Twinkle Twinkle Little Star for this purpose.
You know the frequencies of the notes, because the A note after middle C is 440Hz.
Formula for frequency of any note: F = f0 * 2^((n-n0)/12)
where f0 is the reference frequency (440 hz)
n0 is your reference note number (note "A3", 3*12+9 = 45)
n is your note number (12 * octave + note number within octave), 0 = C, 1 = C#, 2 = D, 3 = D#, 4 = E..., 12 = C on next octave
Frequency of middle C (C3) is the frequency of note number 3*12 + 0
Which is ~261.625... Hz.
You have your sampling rate. (Let's make it 44100Hz)
From the sampling rate and frequency, comes the period measured in Samples. So the C3 note is ~168.561... samples for a full period.
We're making a square wave, so half of the time is max level, and half of the time is min level.
Anyway, let's output 1/60s of audio, middle C note.
1/60s of audio is 735 samples long. With our Middle C note, that's ~4.36 periods long.
Make 84 samples of VolHigh, a fractional sample, then 84 samples of VolLow, and another fractional sample. Repeat 4.36 times.
more to come...
You know the frequencies of the notes, because the A note after middle C is 440Hz.
Formula for frequency of any note: F = f0 * 2^((n-n0)/12)
where f0 is the reference frequency (440 hz)
n0 is your reference note number (note "A3", 3*12+9 = 45)
n is your note number (12 * octave + note number within octave), 0 = C, 1 = C#, 2 = D, 3 = D#, 4 = E..., 12 = C on next octave
Frequency of middle C (C3) is the frequency of note number 3*12 + 0
Which is ~261.625... Hz.
You have your sampling rate. (Let's make it 44100Hz)
From the sampling rate and frequency, comes the period measured in Samples. So the C3 note is ~168.561... samples for a full period.
We're making a square wave, so half of the time is max level, and half of the time is min level.
Anyway, let's output 1/60s of audio, middle C note.
1/60s of audio is 735 samples long. With our Middle C note, that's ~4.36 periods long.
Make 84 samples of VolHigh, a fractional sample, then 84 samples of VolLow, and another fractional sample. Repeat 4.36 times.
more to come...
Last edited by Dwedit on Tue Jan 03, 2012 11:33 pm, edited 1 time in total.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
the audio stuff confused me at first, too. Dwedit's post is a good simple example of generating a digital square wave based on a frequency. when emulating the NES audio unit, determining what each sample byte should be is actually just derived from the duty cycle loop and timer period, so you don't actually need to know the frequency of any notes being played.
basically, on every square wave clock, if the channel is enabled then it's period value is decremented. the current position in the duty cycle loop is stepped if the square channel's period has reached zero. if that happens, then the period value is also reset so the countdown begins again. that is highly oversimplifying it, other variables are involved, but that is the gist of it.
the square channels have four possible duty cycles that could be used. this is what my duty cycle array looks like:
if the current array value for a square channel is 1, then the channel's sample output is equal to that channel's current envelope value. otherwise, silence.
i'm not good at explaining this, and maybe i got some details wrong. i hope it makes a little sense. i can provide more code if you want. my APU is not entirely complete, it doesn't handle the sweeping yet but otherwise sounds pretty good, and i think it's easy to follow when reading.
basically, on every square wave clock, if the channel is enabled then it's period value is decremented. the current position in the duty cycle loop is stepped if the square channel's period has reached zero. if that happens, then the period value is also reset so the countdown begins again. that is highly oversimplifying it, other variables are involved, but that is the gist of it.
the square channels have four possible duty cycles that could be used. this is what my duty cycle array looks like:
Code: Select all
uint8_t square_duty[4][8] = {
{ 0, 1, 0, 0, 0, 0, 0, 0 },
{ 0, 1, 1, 0, 0, 0, 0, 0 },
{ 0, 1, 1, 1, 1, 0, 0, 0 },
{ 1, 0, 0, 1, 1, 1, 1, 1 }
};
i'm not good at explaining this, and maybe i got some details wrong. i hope it makes a little sense. i can provide more code if you want. my APU is not entirely complete, it doesn't handle the sweeping yet but otherwise sounds pretty good, and i think it's easy to follow when reading.
Basics
Here is my attempt at conveying the basics, beginning from fundamentals, in a short but organized manner. There are other tutorials, but I wanted to write one just for the exercise of it :-)
In order to produce sound, you have to generate "PCM sound".
"PCM sound" is a type of signal.
A signal means anything that changes over time.
In case of sound, the signal is the elevation of the diaphragram of the loudspeaker (which reproduces air pressure waves by pushing and pulling the air in front of it).
Sampling rate is how often it is measured (and emitted).
For example, a PCM signal at 8000 Hz sampling rate is a numeric value that is emitted 8000 times in a second.
If you have an array of 40000 integers, and you know the sampling rate is 8000, you have 5 seconds of signal. (5*8000=40000). If the sampling rate is 22050, you have there about 1.8 seconds of signal.
Signal has two fundamental properties: Frequency and amplitude.
Amplitude is how large the differences are between values. Frequency is how fast the value changes from small to large and back.
For example, a PCM signal, sampled at 22050 Hz rate, that happens to have the amplitude of 20000 and a frequency of 2205 hertz, could look like this:
-10000 -6000 -2000 4000 7000 10000 6000 1000 -4000 -7000
-10000 -6000 -2000 4000 7000 10000 6000 1000 -4000 -7000
-10000 -6000 -2000 4000 7000 10000 6000 1000 -4000 -7000
(repeated for thousands of times).
Within 22050 samples (which represents 1.0 seconds of audio, because of the sampling rate of 22050), it oscillates 2205 times between -10000 and 10000, hence an amplitude of 20000 and frequency of 2205 Hz. The wave length is 10 samples (sampling rate divided by frequency).
If the amplitude was smaller, it would be quieter (the diaphragram moves very little); if it were larger, it would be louder (the diaphragram moves a lot).
If the frequency was lower, the pitch would be lower (the diaphragram moves slowly). The intervals between the extremes (wave length) would be greater.
If the frequency was higher, the pitch would be higher (the diaphragram moves rapidly). The intervals between the extremes (wave length) would be shorter.
When the signal samples are plotted in a graph, it forms a shape. The shape is called a wave. Different waves are called with different names.
There is the square wave, which goes from maximum value to minimum value and back in an abrupt manner, with no intermediates. For example, 100 100 100 100 100 20 20 20 20 20 100 100 100 100 100 20 20 20 20 20.
There is the triangle wave, which goes from maximum to minimum, and back, in a linear fashion. For example, 100 90 80 70 60 50 40 30 20 30 40 50 60 70 80 90 100.
There is the sine wave, which is a smooth wave that is generated with the mathematical sin() function.
Unlimited number of different wavetypes exist and can be devised.
Here is example C code that generates ten seconds of 8000 hertz PCM signal, consisting of a 440 hertz sinewave that has the amplitude of 60:
To mix different signals together, you usually simply add them. For example, this code outputs a 440 hertz sinewave and a 300 hertz sinewave together:
This covers the basics; the rest is extrapolation. :-)
In order to produce sound, you have to generate "PCM sound".
"PCM sound" is a type of signal.
A signal means anything that changes over time.
In case of sound, the signal is the elevation of the diaphragram of the loudspeaker (which reproduces air pressure waves by pushing and pulling the air in front of it).
Sampling rate is how often it is measured (and emitted).
For example, a PCM signal at 8000 Hz sampling rate is a numeric value that is emitted 8000 times in a second.
If you have an array of 40000 integers, and you know the sampling rate is 8000, you have 5 seconds of signal. (5*8000=40000). If the sampling rate is 22050, you have there about 1.8 seconds of signal.
Signal has two fundamental properties: Frequency and amplitude.
Amplitude is how large the differences are between values. Frequency is how fast the value changes from small to large and back.
For example, a PCM signal, sampled at 22050 Hz rate, that happens to have the amplitude of 20000 and a frequency of 2205 hertz, could look like this:
-10000 -6000 -2000 4000 7000 10000 6000 1000 -4000 -7000
-10000 -6000 -2000 4000 7000 10000 6000 1000 -4000 -7000
-10000 -6000 -2000 4000 7000 10000 6000 1000 -4000 -7000
(repeated for thousands of times).
Within 22050 samples (which represents 1.0 seconds of audio, because of the sampling rate of 22050), it oscillates 2205 times between -10000 and 10000, hence an amplitude of 20000 and frequency of 2205 Hz. The wave length is 10 samples (sampling rate divided by frequency).
If the amplitude was smaller, it would be quieter (the diaphragram moves very little); if it were larger, it would be louder (the diaphragram moves a lot).
If the frequency was lower, the pitch would be lower (the diaphragram moves slowly). The intervals between the extremes (wave length) would be greater.
If the frequency was higher, the pitch would be higher (the diaphragram moves rapidly). The intervals between the extremes (wave length) would be shorter.
When the signal samples are plotted in a graph, it forms a shape. The shape is called a wave. Different waves are called with different names.
There is the square wave, which goes from maximum value to minimum value and back in an abrupt manner, with no intermediates. For example, 100 100 100 100 100 20 20 20 20 20 100 100 100 100 100 20 20 20 20 20.
There is the triangle wave, which goes from maximum to minimum, and back, in a linear fashion. For example, 100 90 80 70 60 50 40 30 20 30 40 50 60 70 80 90 100.
There is the sine wave, which is a smooth wave that is generated with the mathematical sin() function.
Unlimited number of different wavetypes exist and can be devised.
Here is example C code that generates ten seconds of 8000 hertz PCM signal, consisting of a 440 hertz sinewave that has the amplitude of 60:
Code: Select all
for(int pos=0; pos<80000; pos++) putchar( 60*sin(440*pos*2*M_PI/8000) );Code: Select all
for(int pos=0; pos<80000; pos++) putchar( 60*sin(440*pos*2*M_PI/8000) + 60*sin(300*pos*2*M_PI/8000));
Last edited by Bisqwit on Wed Jan 04, 2012 4:17 am, edited 1 time in total.
Re: Sound Emulation, Resources, Tips, Etc?
There are three units:MottZilla wrote:(...)but it doesn't seem like there is a whole lot of help aimed at emulating that explains some of the basics, mainly just alot of technical documents that assume you know things about sound that you don't.
1. the emulated APU generating sound samples,
2. the audio output, usually SDL, DirectX or... Allegro, and
3. the resample unit.
Generating samples is pretty easy. I use a downcounter set with the frequency value. Such value represents the number of CPU cycles for the next sample.
Code: Select all
chan->freq--;
if(0 == chan->freq)
{
//do stuff
//...
chan->output = data; //sound sample
chan->freq = chan->freq_cache; //value written to freq registers
}I'll write more later.
Zepper
RockNES author
RockNES author
The way I picture it in my head, is that each channel has a certain "Update" logic, that is executed every x input cycles. Where x is the frequency value. For triangle, the update logic is that another step is taken through the pyramid:
Channels also have a configurable way to disable output (register 4015h), as well as internal ways to disable output. For instance, the square channel's duty cycles can disable output while in a low cycle, or the LFSR in the noise channel can disable output when bit 0 is set.
One thing to note, is that not all channels output 0 when silenced. Triangle for instance always outputs it's current amplitude, but when it stops counting, there is no more sound wave being generated and the channel flat lines.
Each channel is made up of a few different 'primitive' components (Envelope, Sweep, Duty, etc). For my purposes, I found it easier to code the individual components, and reference them as objects in my channel classes. This keeps the logic for those components static between all channels you make. These components also have an update logic, but their logic is invoked depending on the APU's "frame" sequencer.
I hope that wasn't convoluted or confusing, and I also hope it helped. Audio was the biggest problem for me, and it sounds like many others had issues with it as well.
Code: Select all
timer += cycles since last update
if (timer >= frequency value)
{
timer -= frequency value; // do NOT set to 0, otherwise you will lose cycles
step = (step + 1) % 32;
}Code: Select all
if (channel can output)
{
return amplitude;
}
else
{
return 0;
}Each channel is made up of a few different 'primitive' components (Envelope, Sweep, Duty, etc). For my purposes, I found it easier to code the individual components, and reference them as objects in my channel classes. This keeps the logic for those components static between all channels you make. These components also have an update logic, but their logic is invoked depending on the APU's "frame" sequencer.
I hope that wasn't convoluted or confusing, and I also hope it helped. Audio was the biggest problem for me, and it sounds like many others had issues with it as well.
This is part of the APU, the "quarter" and "half" frames.beannaich wrote:The way I picture it in my head, is that each channel has a certain "Update" logic, that is executed every x input cycles.
Zepper
RockNES author
RockNES author
Absolutely, but don't call me "incorrect". That's EXACTLY what I mean... and what I understood from you: The way I picture it in my head, is that each channel has a certain "Update" logic, that is executed every x input cycles.beannaich wrote:Zepper wrote:This is part of the APU, the "quarter" and "half" frames.Incorrect, the "quarter" and "half" frames, are where the components are updated (Sweep, Envelope, Linear Counter, etc).
Yes, but you need to resample it to the PC sample rate.The channel's output is constantly updating. With every CPU cycle
Zepper
RockNES author
RockNES author
You just misunderstood me, is all Zepper. 
By channel update logic, I meant what the channel does to actually render it's waveforms (Taking steps through a duty cycle, shifting the noise register, etc). And by component update logic, I mean what each individual part of a channel does on the APU's "half" and "quarter" frame counter clocks.
It's a very important distinction to make, and I called you "incorrect" as to not confuse MottZilla and other people in the future. I meant no harm by it
And yes, you have to re-sample for whatever audio rendering API you're using, and most people do so using the 44.1kHz sample rate. Those calculations weren't included in my first post because it comes later, and MottZilla wanted to know about emulation, not so much playback at present.
But, now that we're on the topic, the amount of cycles in between samples is simply:
In the case of the NES, with 44.1kHz sample rate:
and the following logic (simplified) is executed:
By channel update logic, I meant what the channel does to actually render it's waveforms (Taking steps through a duty cycle, shifting the noise register, etc). And by component update logic, I mean what each individual part of a channel does on the APU's "half" and "quarter" frame counter clocks.
It's a very important distinction to make, and I called you "incorrect" as to not confuse MottZilla and other people in the future. I meant no harm by it
And yes, you have to re-sample for whatever audio rendering API you're using, and most people do so using the 44.1kHz sample rate. Those calculations weren't included in my first post because it comes later, and MottZilla wanted to know about emulation, not so much playback at present.
But, now that we're on the topic, the amount of cycles in between samples is simply:
Code: Select all
sample delay = cpu frequency / sample frequencyCode: Select all
sample delay = 1789772.72 / 44100Code: Select all
sample timer += cycles since last update;
if (sample timer >= sample delay)
{
sample timer -= sample delay;
render sample();
}Hello i am new to emulation..Well, you must find an algorithm to resample the generated NES sound. The most simple that I've found & use is adding the samples and divide by the number of updates.
I dont undesrstand that one. E.g. trianglaoutput ranges from 0-15 and square too? how is that wdoes that with the divison work exatly...