MMC5 Hacking Adventures

Discuss hardware-related topics, such as development cartridges, CopyNES, PowerPak, EPROMs, or whatever.

Moderator: Moderators

User avatar
Ben Boldt
Posts: 1149
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: MMC5 Hacking Adventures

Post by Ben Boldt »

I had actually lost that original test but I was able to get it to do it again after a lot of horsing around. It is very finicky and very difficult to nail down any sort of consistent behavior. Bear in mind, this whole exercise may be an artifact of my setup being marginal or invalid.

My setup has a starting point where I write a random byte to a random address in the range $5120-5127. My CHR Address bits all change randomly, no visible patterns. This is shown graphically in my GUI, so I am just watching the bits change and not finding a pattern. CHR A19 and A18 are always 0 in this test and A17-A10 are random.

When I expand the range to $5120-5128 into this test, now I can see partterns in the CHR bits. It will toggle between 2 values, then move on randomly, toggle between 2 new values, move on randomly, etc. Expanding to $5120-512B produces similar behavior. I am probably witnessing 8x16 sprite behavior with 2 values thing.

That was like that for a long time. Then suddenly it was A19 to A12 that were random and then A11 and A10 were coming from the PPU address bus. Not sure what I did to make that change but whatever it is got stuck doing it that way now.....

Earlier yet, when I added a read of $5208, it would keep updating the CHR address bus when writing randomly in the range $5120-512A and locked up when including $512B in the range, just like before, I was very excited, I definitely touched on what I was seeing before. I went back and forth at least 10 times doing various things (described in a moment) and it did it consistently. Not sure what I did but NOW, any time that I have the $5208 read in there it locks up no matter the range, even if I keep it to $5120-5127. And in fact, I change the read to some other address and it locks up, which points away from any connection to $5208 specifically. If I take away the read, it goes back to normal. I did in fact save a copy of the code that depended on $512B being in the mix and it too does not behave the same anymore. So apparently I have a moving target here. I inconsistently power cycle the MMC5 in this testing, but when I do, I remove power for a couple seconds... Maybe I broke something because it never does it now and I have been dorking around with it for a long time now, it is getting pretty late actually.

When it was locking up with $512B, I noticed that if I had it unlocked with range only to $512A, it would dance only between 2 values with the read in there. One of the values was CHR A19,18=0,0, A17-A10=all 1s, the other value it toggled to was a random value that stayed the same. If I changed my PPU address bus, it didn't change this value, but I noticed that when I went below $0C00, it was always CHR A19,18=0,0, A17-A10=all 1s, no longer toggling. Changing the PPU address back above $0C00, toggled again, still to that same value that got latched before. All the while writing random data to $5120-512A. I tried lots of PPU addresses. Very consistent cutoff at $0C00.

Once locked by writing to $512B, it could not be unlocked by simply stopping writing to $512B and keeping writing to $5120-512A. With $512B removed, if I removed the read, it would unlock, first toggling between the 2 old values for a moment then proceeding like normal. and I could add the read back and it would stay unlocked. Adding $512B back locked it again. I did this in several cycles. Things changed for some reason and none of this is repeatable anymore.

I think I am a little tired of chasing these things that keep changing on me. This is using my old slow setup without the microcontroller where reads and writes worked really well. My guess is that it is a setup issue somehow, I probably have the wrong edges of M2 or something but I just can't explain how the behavior keeps changing on me if my incorrect setup is staying incorrect.

Edit:
The RAM chips have been shipped! At least some good news today.

In hindsight, I think I moved to whitebox style testing with this particular thing too early. I got lots of different things to happen, which are all advancements to the understanding of a black box, but with the whitebox mentality I had, I rushed and pushed through those things trying specifically to find my preconceived pattern, which didn't really show up as expected. I think that was my main error tonight.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: MMC5 Hacking Adventures

Post by lidnariq »

Ben Boldt wrote:That was like that for a long time. Then suddenly it was A19 to A12 that were random and then A11 and A10 were coming from the PPU address bus. Not sure what I did to make that change but whatever it is got stuck doing it that way now.....
That really really sounds like a spurious write to $5101
User avatar
Ben Boldt
Posts: 1149
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: MMC5 Hacking Adventures

Post by Ben Boldt »

lidnariq wrote:
Ben Boldt wrote:That was like that for a long time. Then suddenly it was A19 to A12 that were random and then A11 and A10 were coming from the PPU address bus. Not sure what I did to make that change but whatever it is got stuck doing it that way now.....
That really really sounds like a spurious write to $5101
Yes, that makes perfect sense, a very good clue into my setup. I think you are really onto something with spurious writes because I do not currently have a solid understanding of how a write works. I had experimented at one point tonight expanding the range down to $5000, which could have done it once intentionally, but I also did power cycles and it persisted somehow or got written to spuriously again. There are too many variables in my setup right now I think.

One good thing is that the read in this situation is always locking it and it theoretically should not be doing that. I suspect my attempt to read screws up the sequence for the next write or something. It seems approachable to debug that.

When does a write get registered? Here is my understanding:
  • The CPU can only read with M2 high. I base this on my observation that with M2 Low, MMC5 does not drive the data bus.
  • To register a read, for example, to clear an interrupt flag, I believe that the CPU would drive the register's address with M2 low, set CPU R/W high, and the rising edge of M2 would count as the read that clears the interrupt flag.
  • To register a write, I am not sure. Either:
    A. With M2 low, the CPU would drive the register's address and data and then set CPU R/W low, and the rising edge of M2 would register the write (??)
    B. With M2 low and CPU R/W high, the CPU would drive the register's address and data and the falling edge of CPU R/W would register the write (??)
    C. With M2 high, the CPU would set CPU R/W low, then drive the register's address and data, and the falling edge of M2 would register the write (??)
I really do not know if it is A, B, or C, if you wouldn't mind letting me know, thanks
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: MMC5 Hacking Adventures

Post by lidnariq »

With the 6502/2A03, φ2/M2 high means "data bus is active", regardless of whether it's reading or writing.

As to exactly how the MMC5 is reacting to M2, I can only guess. It could be transparent latches or triggered by some edge of M2. It's even conceivable that different functions inside the MMC5 could work differently. But I'd hunch most of the functions are transparent latches.

Regardless, in your emulator: you should always leave M2 low, change everything, and only then raise and then lower M2.
User avatar
Ben Boldt
Posts: 1149
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: MMC5 Hacking Adventures

Post by Ben Boldt »

lidnariq wrote:Regardless, in your emulator: you should always leave M2 low, change everything, and only then raise and then lower M2.
Oh wow, okay, that is definitely not what I am doing, that explains a lot. I will give it a try.
User avatar
Ben Boldt
Posts: 1149
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: MMC5 Hacking Adventures

Post by Ben Boldt »

Ben Boldt wrote:
lidnariq wrote:Regardless, in your emulator: you should always leave M2 low, change everything, and only then raise and then lower M2.
Oh wow, okay, that is definitely not what I am doing, that explains a lot. I will give it a try.
I am having nothing abnormal or inconsistent now that I am writing and reading in that way, thanks a lot. I will also update my fast setup to work that way, hopefully that will fix it too.
User avatar
Ben Boldt
Posts: 1149
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: MMC5 Hacking Adventures

Post by Ben Boldt »

I have a question for you guys. Is it possible to get the 2A03's DMC to do its fetches anywhere in the range $8000-BFFF but to also keep it silenced somehow? In this case, the MMC5 DAC's read mode and $00 delimiter interrupt might make more sense.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: MMC5 Hacking Adventures

Post by lidnariq »

No.

While you can start the DPCM DMA at $FFC0, wait for it to overflow from $FFFF to $8000, and then fetch up to another $FB1 bytes from the bottom of the address space, its DAC can't be disabled.
User avatar
Ben Boldt
Posts: 1149
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: MMC5 Hacking Adventures

Post by Ben Boldt »

lidnariq wrote:No.

While you can start the DPCM DMA at $FFC0, wait for it to overflow from $FFFF to $8000, and then fetch up to another $FB1 bytes from the bottom of the address space, its DAC can't be disabled.
I wouldn't have thought so. Bummer... It is really hard to think of a reasonable way to use this DAC. Using the timer interrupt, trying to follow and update a 16-bit pointer, servicing other interrupts, it all seems like a deal breaker for any sort of reasonable sample rate without stopping everything else. The patent says it is "TWICE AS EFFICIENT" because you only have to read and not write. With our current understanding, the gain in efficiency is almost nothing compared to the overhead to get there via interrupt and keep track of and increment a 16-bit address.

The only timed DMA is that DMC, which is going to try playing the raw sample data and producing garbage sound off of it. The thought crossed my mind that the DMC would be 8x higher sample rate than the MMC5 DAC, and maybe a way to make the garbage ultrasonic, but it doesn't take more than a second to realize that won't work. White noise, frequency * 8 = white noise. The effort put into this by Nintendo, and patenting it, there ought to be a good way to use it.

I will spend some time trying to read the patent carefully tonight. I am sure that this has all been done and thought of before but I will give it a try anyway.

Edit:
I am really doing some reading between the lines in the patent here and letting my imagination run away, with extreme biases, which might be a bad thing. So bear with me.

In column 4, line 40, it says "On the other hand", suggesting that what follows is a separate aspect than what had been described previously. Definitely after this phrase, it begins describing DAC read mode and write mode as we already understand. But it seems that before this line, they could be describing some other aspect or mode of the DAC.

Focusing on column 4, lines 4-39 with predisposition that that this could describe a different function. In this section, they describe the audio data, and its layout, connected with Figure 2. This figure specifies that the audio data is stored in incrementing order, in the address range $8000-BFFF, and that each sound sample ends with $00.
Column 4 Line 28 wrote:If a first address (that is, a start address) of a desired quantized data train is designated, quantized data are continuously and sequentially read until the stop code is detected.
"continuously and sequentially read" is very odd to me, if you were to do that in read mode as we understand it, it would play with extremely high sample rate. The way I read it, the word "continuously" means that there literally is no delay between reads. Maybe it is just worded funny but it seems strange and important.

Furthermore, proceeding with the next sentence, column 4, line 31:
Column 4 Line 31 wrote:To this end, start address data for designating a start address of a certain quantized data train (X, Y, or the like) is stored in advance in a certain address in the program data storage area 14a.
This might have meant that your ROM program code keeps track of the start address of the sound data, but the next sentence:
Column 4 Line 35 wrote:The address corresponds to a timing when a desired sound corresponding to the quantized data train is to be generated.
That sounds like you store/write a start address into a certain/particular address in program space, and it will start doing whatever it does (i.e. loading data all at once "continuously and sequentially") to play the sound that corresponds to that start address.

Also mentioned here and there is a "temporary storing means" on address/data bus 2, i.e. bus 2 meaning inside the cartridge and not inside the Nintendo. It is not clear if this is a RAM area or if it is just 1 byte storage, maybe only referring to item 16 in figure 3. Not sure.


Coming at this whole thing from a different angle, if we were Nintendo and wanted to make something that could play the DAC very efficiently, how would we do that?

Here is an idea of my own design, restricted to not violating anything described in the MMC5 DAC patent.

Let's say that memory range $8000-BFFF is all nice and full of $00-delimited PCM sound data. Let's also say that there is a magic "DAC playback rate" and "DAC start address" register somewhere. This would be a 1-byte address, corresponding to 64-byte chunks of range $8000-BFFF, i.e. actual address = (byte * 64) + $8000. And the playback rate would correspond to an M2 clock cycle count at which to update the DAC. Once the start address is written, an interrupt occurs. MMC5 disables ROM and replaces the interrupt vector with the actual address corresponding to the DAC start address, so the CPU literally goes there to fetch an instruction. During the level and/or edges of the clock where the CPU grabs the instruction, the MMC5 disables RAM and ROM, and drives a NOP (i.e. $EA) onto the data bus. On the other part of the clock, the MMC5 enables the RAM or ROM and stores the actual PCM data into its own "temporary storage means" / internal RAM. The CPU continues doing NOPs continuously and sequentally, allowing the MMC5 to fetch all PCM data into its RAM extremely rapidly. When it encounters a $00, it simply changes its NOP instruction to an RTI instruction, and all is resumed, the MMC5 playing the PCM audio on its own time until it hits that $00 delimiter.

This is a made-up story, I am not asking anyone to believe this. Please think about it to make your mind go new places and let us know if any ideas happen.
User avatar
Ben Boldt
Posts: 1149
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: MMC5 Hacking Adventures

Post by Ben Boldt »

Enough pretending, I am trying to figure out the most efficient way to use the DAC in read mode now. Here is the best I have come up with:

Set IRQ vector to address $010C.

RAM locations, initialized to:
$FC = Hardware timer reload value (setting to $00 stops the timer, non-zero sets the DAC sample rate)
$FD,FE,FF = LDA $8000 ; Cause DAC update by reading
$100,101 = INC $FE ; Increment the low byte of the LDA instruction at $FD.
$102,103 = BNE $02 ; Skip next instruction if previous increment did not roll over.
$104,105 = INC $FF ; Increment the high byte of the LDA instruction at $FD.
$106,107 = LDA $FC
$108,109,10A = STA $5209 ; Restart or stop Hardware Timer based on $FC value
$10B = RTI ; Exit immediately. Other pending interrupts will presumably cause reentry and be handled per $111.
IRQ Entry:
$10C,10D,10E = LDA $5209
$10F,110 = BNE $EC ; i.e. goto $FD if hardware timer interrupt flag is set.
$111,112,113 = JMP $xxxx ';i.e. Go to normal IRQ handler
Normal IRQ handler will check for DAC IRQ and be handled by setting $FC = 0.

When you want a DAC sound to play, set DAC to read mode, set address $FE,FF to the starting address of the sound data, and set $FC as the period corresponding to the sample rate of your DAC data, and then write $01 to $5209 to trigger the first hardware timer interrupt.

I calculate that the above method would take the following number of cycles to run (please correct me if there are errors):
LDA $8000 -> 4
INC $FE -> 5 (zero-page inc)
BNE $02 -> 2
INC $FF -> 5/256 (zero-page inc, only 1 out of 256 times)
LDA $FC -> 3
STA $5209 -> 4
RTI -> 6
IRQ Entry: -> 7
LDA $5209 -> 4
BNE $EC -> 2

Total = 37.02 cycles.
The NES CPU is 1,790,000 cycles/second
100% CPU usage would be 1,790,000/37.02 = 48352 DAC samples / second.

Now backwards to find %CPU usage at 11.025 kHz DAC sample rate:
1,790,000/11,025 = 162.36 CPU cycles occur each DAC update
37.02 / 162.36 = 22.8%! And therefore 22.050kHz audio = only 45.6% CPU

It seems 8-bit, 11 or 22 kHz uncompressed audio is reasonably possible??
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: MMC5 Hacking Adventures

Post by lidnariq »

Note that you're trashing A there (only P is saved by the hardware), so that's not usable with anything in the main thread.

As far as I can tell, the "set DAC on contents of read from $8000-$BFFF" is only useful for skipping an extra stX $5011; nothing better.
User avatar
Ben Boldt
Posts: 1149
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: MMC5 Hacking Adventures

Post by Ben Boldt »

lidnariq wrote:Note that you're trashing A there (only P is saved by the hardware), so that's not usable with anything in the main thread.
Embarrassed to say so but I did not know that. That is nice though, I guess -- it lets you pick and choose what you want to push and pull. I see that S and P are saved by hardware when an interrupt occurs and just P when you do a JSR, thus the difference between RTS and RTI. Is there a good throw-away read method, or am I going to have to add a push and a pull?
lidnariq wrote:As far as I can tell, the "set DAC on contents of read from $8000-$BFFF" is only useful for skipping an extra stX $5011; nothing better.
It does seem per the patent at face value that the STX would be the only savings. But on a resource constrained system like this, I think it makes an actual difference. A 4-cycle savings at 22 kHz represents 22,050 * 4 = 88,200 cycles per second saved. 88,200/1,790,000 = 4.93% CPU capacity recovered. It is definitely not a 50% savings, or any savings of program space or "a programmer's time to write the code" as they seem to suggest in the patent. That troubles me big-time. Probably written by some self-righteous hardware guy that thought he made the world better. :mrgreen: And did. :wink:

There is the definite possibility that this feature really isn't as great as they say, and there is nothing more to it than what we already understand. We have to drive past that and keep thinking in different ways if we are going to have a chance to find something. We can only find things if we look at new angles and/or in new places, telling stories, Ouija boards, whatever it takes.

I think that this code, if fixed and modified, makes it possible to use the DAC, and I think that's a step in the right direction.

Another question for you -- How does the IRQ interact with V-blank? As I understand, V-blank and IRQ are 2 separate interrupts, each with their own vector. Can an IRQ interrupt happen from within a V-blank interrupt?

Edit:
How about this:
$FC = Hardware timer reload value (setting to $00 stops the timer, non-zero sets the DAC sample rate)
$FD,FE,FF = INC $8000 ; Cause DAC update by reading, non-destructive read
$100,101 = INC $FE ; Increment the low byte of the LDA instruction at $FD.
$102,103 = BNE $02 ; Skip next instruction if previous increment did not roll over.
$104,105 = INC $FF ; Increment the high byte of the LDA instruction at $FD.
$106,107 = LDA $FC
$108,109,10A = STA $5209 ; Restart or stop Hardware Timer based on $FC value
$10B = RTI ; Exit immediately. Other pending interrupts will presumably cause reentry and be handled per $111.
IRQ Entry:
$10C,10D,10E = LDA $5209
$10F,110 = BNE $EC ; i.e. goto $FD if hardware timer interrupt flag is set.
$111,112,113 = JMP $xxxx ';i.e. Go to normal IRQ handler
INC $8000 -> 6 (+2 from before)
INC $FE -> 5 (zero-page inc)
BNE $02 -> 2
INC $FF -> 5/256 (zero-page inc, only 1 out of 256 times)
LDA $FC -> 3
STA $5209 -> 4
RTI -> 6
IRQ Entry: -> 7
LDA $5209 -> 4
BNE $EC -> 2

Total = 39.02 cycles.

11.025 kHz:
39.02 / 162.36 = 24.03%
22.050kHz:
*2 = 48.07% CPU

Edit 2:

Ooops I still used A. That won't work.
User avatar
thefox
Posts: 3134
Joined: Mon Jan 03, 2005 10:36 am
Location: 🇫🇮
Contact:

Re: MMC5 Hacking Adventures

Post by thefox »

Ben Boldt wrote:Another question for you -- How does the IRQ interact with V-blank? As I understand, V-blank and IRQ are 2 separate interrupts, each with their own vector. Can an IRQ interrupt happen from within a V-blank interrupt?
A normal IRQ can happen within V-blank as long as the interrupt disable flag in the processor status register is cleared (it's set to 1 automatically when the interrupt handler is entered, but you can clear it in code with CLI to allow overlapping interrupts). V-blank uses the NMI (non-maskable interrupt) so it will trigger regardless of the state of the interrupt disable flag.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: MMC5 Hacking Adventures

Post by lidnariq »

Ben Boldt wrote:Is there a good throw-away read method, or am I going to have to add a push and a pull?
Because self-modifying code, sure?
Say you have

Code: Select all

a: BIT $8000
INC a+1
BNZ done
INC a+2
done:
How does the IRQ interact with V-blank?
NMI is reentrant. Any time /NMI line falls, the CPU will load the NMI vector, push the return address onto the stack, push the flags, and disable interrupts.

IRQ isn't. If interrupts are enabled, when the /IRQ line is low, the CPU will load the IRQ vector, push the return address onto the stack, push the flags, and still disable interrupts. As thefox points out, either interrupt handler can cooperate and re-enable interrupts. Won't help in case of OAM DMA, though.
$106,107 = LDA $FC
$108,109,10A = STA $5209 ; Restart or stop Hardware Timer based on $FC value
Still trashing A here. I think the only way you could detect completion without using A/X/Y is by detecting underflow from $8000 to $7FFF:

Code: Select all

a: BIT $8000
INC a+1
BNZ cont
DEC a+2
BMI cont
RTI
cont: DEC $5209
LSR $5209
RTI
... and restarting the cycle-timed IRQ is even worse.
Ugly. At 12cy here, and 14cy for PHA / LDA zp / STA $5209 / PLA, it's not worth it unless you specifically want 6.4kHz (just DEC) or 12kHz (DEC and LSR).

You'd also have to structure the PCM as blocks of 256 played forward, blocks backwards. Assuming 6.4kHz and the BIT instruction's operand is in zero page, typical overhead: 18+14cy, worst case 26+14cy. (14cy being unavoidable IRQ+RTI overhead)
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: MMC5 Hacking Adventures

Post by tepples »

Might it have been meant for a looping 256-entry wavetable in ROM, where the main program uses the IRQ's rate to control the playback frequency?
Post Reply