Issues with musical tones via PCM ($4011 writes)

Discuss NSF files, FamiTracker, MML tools, or anything else related to NES music.

Moderator: Moderators

Dacicus
Posts: 32
Joined: Sat Dec 20, 2008 4:59 pm

Issues with musical tones via PCM ($4011 writes)

Post by Dacicus »

I have had musical training but no formal education in digital audio. I decided to try playing some musical tones via PCM (direct writes to $4011) because the square and triangle channels were not adequate: Square cannot go that low, and triangle has high overtones/harmonics that are louder than the desired low tones. The results have been mostly OK, but some of the notes sound out of tune. My tests were with FCEUX and Mesen. IDK if the problem is due to my theoretical understanding, my implementation, hardware versus emulator behavior, or some other factor.

THEORY (as I understand it)

Let c be the CPU clock speed. If samples are written to $4011 every a cycles, then the sampling rate is c/a. A tone that has frequency f will require (c/a)/f = c/(af) writes to $4011 for one period. For a square wave, half of these writes should be the high value and half should be the low value. Let w be the number of these writes for the high or low value. Then w = c/(2 a f). You can rearrange this to get that f = c/(2 a w) is the frequency of a wave generated by a given combination of values of a and w.

IMPLEMENTATION

My target is the NTSC NES. ASM6 is the assembler, and I use its ALIGN directive where needed in order to avoid branches and loads from crossing page boundaries. The goal is to be able to play the notes of the C major scale ranging from C1 to B6, where A4 is the A above middle C. I decided to use Pythagorean tuning based on A4 = 440 Hz. Using the formula for f from above, I wrote a Python script to find the combination of a and w that would yield the closest frequency to the desired one for each note in each octave. This is my code for PCM playback:

Code: Select all

;Variables:
;Y = value written
;delay_target  (zp)
;num_writes_lo (zp)
;num_writes_hi (zp)

write_y_4011:
    sty $4011         ; 4 cycles
    sec               ; 2 cycles
    lda num_writes_lo ; 3 cycles
    sbc #1            ; 2 cycles
    sta num_writes_lo ; 3 cycles
    lda num_writes_hi ; 3 cycles
    sbc #0            ; 2 cycles
    sta num_writes_hi ; 3 cycles
    bcc +             ; 3 cycles if taken, 2 cycles not
    ; delay_target = TARGET - 57 cycles
    ; 3 + 27 + 3 + 4 + 2 + 3 + 2 + 3 + 3 + 2 + 3 + 2 = 57
    lda delay_target  ; 3 cycles
    jsr delay_a_27_clocks
    jmp write_y_4011  ; 3 cycles
  + rts               ; 6 cycles
    ; 2 + 3 + 2 + 3 + 3 + 2 + 3 + 3 + 6 = 27 cycles elapsed after final write

;Variables:
;X = note index
;duration_lo    (zp)
;duration_hi    (zp)
;writes_lo      (zp)
;writes_hi      (zp)
;cycle_target   (zp)

play_note:
    lda note_dur_lo,x    ; 4 cycles
    sta duration_lo      ; 3 cycles
    lda note_dur_hi,x    ; 4 cycles
    sta duration_hi      ; 3 cycles

    lda note_wr_lo,x     ; 4 cycles
    sta writes_lo        ; 3 cycles
    lda note_wr_hi,x     ; 4 cycles
    sta writes_hi        ; 3 cycles

    lda note_cycles,x    ; 4 cycles
    sta cycle_target     ; 3 cycles
    sec                  ; 2 cycles
    sbc #57              ; 2 cycles
    sta delay_target     ; 3 cycles

  - ldy #$73             ; 2 cycles

    lda writes_lo        ; 3 cycles
    sta num_writes_lo    ; 3 cycles
    lda writes_hi        ; 3 cycles
    sta num_writes_hi    ; 3 cycles

    jsr write_y_4011     ; 6 cycles jsr + 4 sty, 27 cycles after final write

    ldy #$0C             ; 2 cycles

    lda writes_lo        ; 3 cycles
    sta num_writes_lo    ; 3 cycles
    lda writes_hi        ; 3 cycles
    sta num_writes_hi    ; 3 cycles

    ; TODO (done): delay TARGET - 85 cycles
    ; 27 + 2 + 12 + 3 + 2 + 2 + 27 + 10 = 85
    lda cycle_target     ; 3 cycles
    sec                  ; 2 cycles
    sbc #85              ; 2 cycles
    jsr delay_a_27_clocks

    jsr write_y_4011     ; 6 cycles jsr + 4 sty, 27 cycles after final write

    sec                  ; 2 cycles
    lda duration_lo      ; 3 cycles
    sbc #1               ; 2 cycles
    sta duration_lo      ; 3 cycles
    lda duration_hi      ; 3 cycles
    sbc #0               ; 2 cycles
    sta duration_hi      ; 3 cycles
    bcc +                ; 3 cycles if taken, 2 cycles not

    ; TODO (done): delay TARGET - 108 cycles
    ; 27 + 2 + 3 + 2 + 3 + 3 + 2 + 3 + 2 + 3 + 2 + 2 + 27 + 3 + 2 + 12 + 10 = 108
    lda cycle_target     ; 3 cycles
    sec                  ; 2 cycles
    sbc #108             ; 2 cycles
    jsr delay_a_27_clocks

    jmp -                ; 3 cycles
  + rts                  ; 6 cycles
PROBLEMS

Some of the high tones sound flat (lower pitch than expected). The most problematic ones to my ear are G5, C6, and F6. This is also evident when playing notes an octave apart (e.g., E6 sounds flat when played after E5). Something also sounds off about the tones in the lowest octave, but I find it difficult to express just what it is. When using a note tuning app on my phone to check the tuning, all of the notes are slightly flat starting from the lowest octave, and the deviation from expected worsens with each succeeding octave. An interesting thing is that the deviation reported by the app (in cents) is higher than expected based on calculating the cents between the frequencies. In case this were a side effect of using Pythagorean tuning, I wrote another script to determine the a and w values for 12-tone equal temperament tuning, but the results were no better.

I have attached the ROMs that I made for these tests. Any input on the situation is welcome.
Attachments
sound-test-pyth.nes
Pythagorean tuning
(16.02 KiB) Downloaded 43 times
sound-test-12tet.nes
12-tone equal temperament
(16.02 KiB) Downloaded 40 times
Bavi_H
Posts: 193
Joined: Sun Mar 03, 2013 1:52 am
Location: Texas, USA
Contact:

Re: Issues with musical tones via PCM ($4011 writes)

Post by Bavi_H »

I focused on examining the twelve-tone equal temperament version.

I used Mesen's debugger to find the code and data tables you didn't provide. See the attached file "1. code-comments.txt".

Instead of using an audio tuner, I wrote a FCEUX lua script that counts the number of CPU cycles between the writes to $4011 that have different values. This is half of the pitch period in units of CPU cycles. Then I used that "half-period" amount to calculate what the pitch is in terms of the closest 12TET A4 = 440 Hz pitch plus or minus some amount of cents.

(Note: I originally tried to write a lua script for Mesen 0.9.9, but Mesen's emu.getState() -- needed to get the number of CPU cycles -- causes a division by zero crash with your NES file.)

How to use the lua script:
  1. Open sound-test-12tet.nes in FCEUX 2.6.4.
  2. Drag and drop the attached file "2. calculate-pitches.lua" into the FCEUX window.
    The script will power cycle the NES emulation and start displaying output in the Lua Script window "Output Console" text box.
  3. Once the tones finish, you can right-click on the "Output Console" text and choose Select All, then right-click and choose Copy. You can paste the text into a text editor or spreadsheet to look at it further.
You can see the output I got in the attached file "3. output.txt".


Questions:

Can you tell me more about why you would write the same value to $4011 more than once? (Does it help various code fragments take the same amount of CPU cycles?) For a square wave, I think you only need to repeat the sequence of: write the high value once, wait half a period, write the low value once, wait half a period.

Can share the Python code or otherwise explain how you chose the values for note_wr_hi, note_wr_lo, and note_cycles?

Can you provide a more "zoomed-out" overview of the cycle counts? I get the feeling there must be some larger segments of code with specific cycle counts that are important for choosing the values for note_wr_hi, note_wr_lo, and note_cycles, but I don't yet understand the code well enough to see those larger patterns.

Here are some things I noticed:
  • note_wr_hi is always zero.
  • (note_wr_lo + 1) × note_cycles equals the number of CPU cycles taken each half period. (See the attached file "4. output with note_wr and note_cycles.txt") But I suspect this might not remain true for every possible value for note_wr_lo and note_cycles, right?
Attachments
dmc-direct-tones-attachments.zip
(4.29 KiB) Downloaded 32 times
Bavi_H
Posts: 193
Joined: Sun Mar 03, 2013 1:52 am
Location: Texas, USA
Contact:

Re: Issues with musical tones via PCM ($4011 writes)

Post by Bavi_H »

I just noticed if I change the note_wr_lo table in sound-test-12tet.nes by subtracting 1 from each value, most of the pitches are much closer now. Perhaps there is an off-by-one error in your Python script?
Attachments
5. modified output with note_wr and note_cycles.txt
(917 Bytes) Downloaded 38 times
Dacicus
Posts: 32
Joined: Sat Dec 20, 2008 4:59 pm

Re: Issues with musical tones via PCM ($4011 writes)

Post by Dacicus »

Bavi_H wrote: Wed Jan 18, 2023 1:30 pm I just noticed if I change the note_wr_lo table in sound-test-12tet.nes by subtracting 1 from each value, most of the pitches are much closer now. Perhaps there is an off-by-one error in your Python script?
I realized the same thing when I saw (note_wr_lo + 1) in your previous comment. It's because of the bcc in write_y_4011. This could well explain the flatness. I will test this change after work and also reply to some of your other points/questions.
User avatar
rainwarrior
Posts: 8734
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Issues with musical tones via PCM ($4011 writes)

Post by rainwarrior »

Aside from correcting the pitch tables, there are two different ways you could improve the tuning precision greatly:


1. Don't use an intermediate samplerate. Just use C / (F * 2) as the number of clocks to wait between writing a high or low value to $4011. This basically upgrades you to using a samplerate of C instead of the C / 110 (~16khz) that you've picked.

I have an example of this for Apple II here: apple2flat: sound_pulse.s

In the Apple II's case the speaker is toggled high/low with BIT $C030 but it could be replaced by LDA #0 / STA $4011 and LDA #$7F / STA $4011, and then just change the 16-cycle pre-adjustment to 18 cycles instead.

This method is appropriate for square or pulse waves. It is more precise at low frequencies than high ones.


2. Keep the samplerate concept, but keep track of the waveform position with a 16-bit accumulator instead of dividing the CPU clock.

This can be used to play arbitrary waveforms at a regular samplerate. I made a demonstration of this here, which was able to simulate 2 pulse channels while playing back music on the other NES channels: There's another tool called superNSF that does a similar technique to play MOD music on NES: supernsf source code

With an accumulator, the output frequency is a multiplier on the samplerate instead of a division of it. Every sample tick you add a fixed number to the accumulator, and the high byte of that accumulator will be a lookup to your waveform. Or, for a square wave maybe you just want to use the high bit, instead of a lookup table.

For a 16-bit accumulator, the value to add per tick would be: (65536 * F * S) / C

Where C is CPU clock rate, S is cycles per sample/tick, and F is the frequency.

This progresses through the waveform at a rate that will overflow the accumulator at the desired frequency. The precision is decoupled from the samplerate, and instead is based on the width of your accumulator. (You could use 24 bits for more precision, etc.). It is more precise at high frequencies than low ones.
Dacicus
Posts: 32
Joined: Sat Dec 20, 2008 4:59 pm

Re: Issues with musical tones via PCM ($4011 writes)

Post by Dacicus »

The tones sound much better both to my ear and the tuner after fixing that off-by-one error. I have attached my Python scripts (fixed to adjust for the off-by-one value) and the updated complete assembly source for the 12-tone equal temperament ROM. The values in the note_wr_lo and note_cycles areas are the only differences from the Pythagorean tuning, and you can get those from the wri and cyc output values of the Python scripts. Much of the initialization code is from the wiki, as is the delay_a_27_clocks subroutine.
Bavi_H wrote: Wed Jan 18, 2023 1:00 pmCan you tell me more about why you would write the same value to $4011 more than once?
The following should answer all of your questions. I was under the impression that I had to write something at every time point in the sample rate, not just when the value changes. Like approximating the graph of a curve from a sequence of dots, which is how a lot of tutorials explain sample rate. Continuing that graph analogy, note_cycles is the x-distance (time/clock cycles in real life) between samples, and note_wr_hi/lo are the high and low bytes for how many times you need to put down a dot in half a period. It turns out that note_wr_hi is always zero, as you noticed, but I did not know that beforehand. It certainly simplifies things to just write once and wait. I hope that this is the "zoomed-out" explanation that you wanted.

The key thing that I do not understand might be what the hardware physically does in order generate sound waves when you write values to $4011.
rainwarrior wrote: Wed Jan 18, 2023 3:38 pmAside from correcting the pitch tables, there are two different ways you could improve the tuning precision greatly:
Thanks for the info! I'll experiment with those approaches since it looks like I have to redo the code anyway in order to get rid of unnecessary $4011 writes.
Attachments
sound-test-code.7z
Assembly source and Python scripts
(2.25 KiB) Downloaded 33 times
User avatar
rainwarrior
Posts: 8734
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Issues with musical tones via PCM ($4011 writes)

Post by rainwarrior »

Dacicus wrote: Wed Jan 18, 2023 7:50 pmThe key thing that I do not understand might be what the hardware physically does in order generate sound waves when you write values to $4011.
When you write a value to $4011, it just directly sets an output level for the sample sound channel. It will hold at that level until you write it again.

(...or if you use the automated DPCM sample playback, it will automatically move up and down according to the sample data.)

So, to produce a square wave you only need to write it when the level should change up or down. If you wanted to produce a more complicated waveform, then you might want evenly spaced writes to create a consistent samplerate.
Bavi_H
Posts: 193
Joined: Sun Mar 03, 2013 1:52 am
Location: Texas, USA
Contact:

Re: Issues with musical tones via PCM ($4011 writes)

Post by Bavi_H »

Dacicus wrote: Wed Jan 18, 2023 7:50 pmnote_cycles is the [time] between samples, and note_wr[] how many times you need to put down a dot in half a period.
Thanks for explaining. Based on your Python script and your explanation, I'm beginning to realize that (note_wr + 1) × note_cycles does indeed always equal the pitch half period in CPU cycles for every possible value of note_wr and note_cycles in the ranges you've chosen. (I was previously discouraged by various comments in the code about added cycles and subtracted cycles, and figured there must be some values that don't work out neatly.)
Dacicus
Posts: 32
Joined: Sat Dec 20, 2008 4:59 pm

Re: Issues with musical tones via PCM ($4011 writes)

Post by Dacicus »

rainwarrior wrote: Wed Jan 18, 2023 9:22 pmWhen you write a value to $4011, it just directly sets an output level for the sample sound channel. It will hold at that level until you write it again.
OK, I see. Thanks again.

For my project, I'll probably go with the first method that you described, since getting good low tones was what prompted this in the first place. This is the PCM playback code that I've put together as a replacement for that from my first post:

Code: Select all

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; X = Note index
; duration_lo (zp)
; duration_hi (zp)
;
; duration_hi * 256 + duration_lo
;   = number of periods played - 1
; (i.e., how long to play the note)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

play_note:
    lda note_dur_lo,x     ; 4 cycles
    sta duration_lo       ; 3 cycles
    lda note_dur_hi,x     ; 4 cycles
    sta duration_hi       ; 3 cycles

; Write high value
  - ldy #$73              ; 2 cycles
    sty $4011             ; 4 cycles

; Wait half of a period = note_cyc_hi * 128 + note_cyc_lo cycles
    ldy note_cyc_hi,x     ; 4 cycles
    jsr wait_hi           ; 128 * Y + 17 cycles

    ; TODO (done): Delay note_cyc_lo,x - 62 cycles
    ; 4 + 17 + 4 + 2 + 2 + 27 + 2 + 4 = 62 cycles
    lda note_cyc_lo,x     ; 4 cycles
    sec                   ; 2 cycles
    sbc #62               ; 2 cycles
    jsr delay_a_27_clocks ; A + 27 cycles

; Write low value
    ldy #$0C              ; 2 cycles
    sty $4011             ; 4 cycles

; Determine if further writes are needed.
; This section takes 20 cycles when the branch is not taken.
    sec                   ; 2 cycles
    lda duration_lo       ; 3 cycles
    sbc #1                ; 2 cycles
    sta duration_lo       ; 3 cycles
    lda duration_hi       ; 3 cycles
    sbc #0                ; 2 cycles
    sta duration_hi       ; 3 cycles
    bcc +                 ; 3 cycles if taken, 2 cycles not

; Wait half of a period period = note_cyc_hi * 128 + note_cyc_lo cycles
    ldy note_cyc_hi,x     ; 4 cycles
    jsr wait_hi           ; 128 * Y + 17 cycles

    ; TODO (done): Delay note_cyc_lo,x - 85 cycles
    ; 20 + 4 + 17 + 4 + 2 + 2 + 27 + 3 + 2 + 4 = 85 cycles
    lda note_cyc_lo,x     ; 4 cycles
    sec                   ; 2 cycles
    sbc #85               ; 2 cycles
    jsr delay_a_27_clocks ; A + 27 cycles

    jmp -                 ; 3 cycles
  + rts                   ; 6 cycles

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Waits 128 * Y + 17 cycles (including jsr/rts)
; n = 128 chosen so that (n - 1) + 85 <= 255
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

wait_hi:
    ; 27 + 2 = 29 -> 128 - 29 = 99
    lda #99               ; 2 cycles
  - jsr delay_a_27_clocks ; A + 27 cycles
    dey                   ; 2 cycles
    beq +                 ; 3 cycles if taken, 2 cycles not
    ; 27 + 2 + 2 + 2 + 3 = 36 -> 128 - 36 = 92
    lda #92               ; 2 cycles
    bne -                 ; 3 cycles if taken, 2 cycles not
  + rts                   ; 6 cycles
    ; After final delay_a_27_clocks: 2 + 3 + 6 = 11 cycles
User avatar
za909
Posts: 249
Joined: Fri Jan 24, 2014 9:05 am
Location: Mijn hart woont al in Nederland

Re: Issues with musical tones via PCM ($4011 writes)

Post by za909 »

Something you might want to try to find a workaround for is the issue with OAM DMA. If the program uses sprites your code has to ask the CPU to copy a page of the CPU memory space to the special sprite RAM (OAM) once per frame. During this time the CPU is stalled for 512/513 cycles to automatically read and write all 256 bytes from the selected page. Nothing can stop or interrupt the CPU while this is happening except for a DPCM byte fetch if it happens to occur during the transfer. So depending on how and when you intend to create these tones with $4011 you might need to work around the distortion caused by missing an update during those OAM cycles.
Dacicus
Posts: 32
Joined: Sat Dec 20, 2008 4:59 pm

Re: Issues with musical tones via PCM ($4011 writes)

Post by Dacicus »

za909 wrote: Sat Jan 28, 2023 5:08 pmSomething you might want to try to find a workaround for is the issue with OAM DMA.
Yes, I did consider the OAM DMA issue. The program is going to be visually simple, so I believe that I can get away without using sprites during the part that involves $4011 writes. My current plan is to disable NMI generation at the start of that play_note subroutine via writing a 0 to bit 7 of $2000, then re-enable NMI at the end of the subroutine. Per the wiki, this also requires a read of $2002 prior to re-enabling NMI in case that occurs during VBlank.
User avatar
rainwarrior
Posts: 8734
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Issues with musical tones via PCM ($4011 writes)

Post by rainwarrior »

Yes, it's generally best not to attempt that.

Any way you do it will negatively affect the sound quality. In the best case, to avoid more distortion than necessary and to keep the pitch stable, you must "fast-forward" over a number of updates equal to the ones missed during the OAMDMA. In the accumulator version basically you just want to do something like add 6x the usual value to it, and then make sure the next sample comes right on time for the 7th one.

Even if it weren't a hassle, it's better not to mess with OAM if you don't have to. The sound quality will be better if you don't have to skip samples.

Though, the problem is mostly if you want to do it every frame, which ends up creating a 60hz buzz as it affects the playback. If you just want to do a sprite update now and then, an individual pause is just a little blip. If you can time them during quiet or silent moment in the tune, they don't have to be noticeable.
Dacicus
Posts: 32
Joined: Sat Dec 20, 2008 4:59 pm

Re: Issues with musical tones via PCM ($4011 writes)

Post by Dacicus »

rainwarrior wrote: Sun Jan 29, 2023 12:05 amYes, it's generally best not to attempt that.
Just to clarify, do you mean attempting the NMI on/off thing?
Fiskbit
Posts: 891
Joined: Sat Nov 18, 2017 9:15 pm

Re: Issues with musical tones via PCM ($4011 writes)

Post by Fiskbit »

Note that you can still write OAM manually during vblank at a substantial cost, allowing you to sprinkle $4011 writes throughout at the right times, and you can cut down on this cost by limiting how much of OAM you use and accounting for the glitches from updating $2003.
Dacicus
Posts: 32
Joined: Sat Dec 20, 2008 4:59 pm

Re: Issues with musical tones via PCM ($4011 writes)

Post by Dacicus »

OK. I did read about manually writing to OAM on the wiki. If I do everything without sprites, though, and turn NMI on/off, why does that affect the sound quality? I would be playing the entire pitch (currently thinking it would last about half of a second or a full second) before re-enabling NMI, so what would be the source of pitch distortion if the CPU does nothing except the play_note subroutine until the subroutine is complete?
Post Reply