Writing any value from $80 through $FF to any address from $8000 through $FFFF will reset the MMC1. This changes the PRG mode to fixed last bank in $C000 and resets the 5-bit shift register.
It shouldn't be JMP [$FFFC]; it should be JMP to the real entry point of your program. Otherwise, you'll end up in an infinite loop executing MMC1GlitchBoot31 over and over again because $FFFC still points at MMC1GlitchBoot31.
It's also wise to have these vectors at the end of each 16 KiB bank of the program, just in case you get reset while using 32 KiB mode or fixed-$8000 mode, or just in case you end up running on an MMC1 revision with poorly defined initial PRG bank values. See
my SGROM/SNROM template for how to get ca65 to do this. I haven't used NESASM, but I
think you'll probably have to repeat that code in every odd numbered half-bank (1, 3, 5, 7, ... 31).
DMC has three actions it can apply at the end of the waveform: stop ($00), repeat ($40), or stop and IRQ ($80). In stop and IRQ mode, the DMC will fire an interrupt eight samples (one byte) before playback ends, giving you a few hundred CPU cycles to set up the next sample for gapless playback. It won't trigger any IRQs unless you CLI (set minimum interrupt priority to 0) and actually put DMC in stop and IRQ mode, but I still wouldn't recommend handling NMI and IRQ from the same routine.