In the past, I might have guesstimated a nominal level of 12 decibels below full scale (-12 dBFS) to give the signal some headroom to cooperate with other audio sources that the operating system is mixing together. That would mean 0 dBFS corresponds to +12 dB nominal, and 0 dB nominal corresponds to -12 dBFS. In 16-bit, -12 dBFS corresponds to 8192 units, as full scale is 32768 units and each -6 dB corresponds to a halving of output voltage. I don't remember where I got that 12 dB headroom value. And in my experience, FCEUX default settings tend to be much louder than Mesen 2 default settings.
If I had to come up with appropriate absolute output levels from first principles, maybe the default overall scaling of an NES emulator's audio output should be based on how the output voltage compares to a nominal
"line level." For home audio equipment, the nominal level is -10 decibel volts root mean square (-10 dBV). This corresponds to a sine wave at 0.894 volts peak-to-peak (Vp-p), 0.447 volts amplitude, or 0.316 volts root-mean-square (Vrms). The article claims that the maximum level of a line output is 2 Vp-p, which provides some headroom over the nominal level. A strict interpretation of this would treat each unit of 16-bit signed PCM as 1/32768 of a volt.
Another possibility is to normalize the levels to those of a Super NES Control Deck connected to the same A/V switch. We know what its full scale is because that system is a 16-bit DAC.
Further questions, if we go down this route of trying to standardize what the headroom "should" be:
- Is 1/32768 V per unit a reasonable convention?
- What is the actual Vp-p of each model's AV output (front-loading NES and new Famicom) when playing the loudest possible signal for each of the five internal channels?
- What is the actual Vp-p of a Super NES playing a sine wave, whether at full scale or at 25%?