Which is a pretty good argument for considering using (Micro)SD, actually. More expensive, but reprogrammable by almost everyone.infiniteneslives wrote: If we were to use something like this for the bi-annual homebrew compo you could just sell the new SPI chip and users could swap it out with their cart.
INL-ROM custom MMC3 hybrid mapper design
Moderators: B00daW, Moderators
Re: INL-ROM custom MMC3 hybrid mapper design
- infiniteneslives
- Posts: 2102
- Joined: Mon Apr 04, 2011 11:49 am
- Location: WhereverIparkIt, USA
- Contact:
Re: INL-ROM custom MMC3 hybrid mapper design
True, although significantly more difficult to design/implement with effectively requiring a mcu. For the SPI flash I 'merely' need to toss a shift register with proper controls into my CPLD. Going though the work with an mcu I'd rather have USB connectivity with the mcu to reprogram the SPI flash especially since I've already got most of that work done with the NESDEV1, just need to swap to SPI vice parallel. A USB socket is also cheaper than microSD socket and card. Having USB would have the added benefit of making game development less cumbersome, and not necessarily slower if the whole ROM didn't need to be programmed. Plus if you only wanted to publish a game with this setup you wouldn't have to include the added cost of the mcu, socket, and flash card.lidnariq wrote:Which is a pretty good argument for considering using (Micro)SD, actually. More expensive, but reprogrammable by almost everyone.infiniteneslives wrote: If we were to use something like this for the bi-annual homebrew compo you could just sell the new SPI chip and users could swap it out with their cart.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
- infiniteneslives
- Posts: 2102
- Joined: Mon Apr 04, 2011 11:49 am
- Location: WhereverIparkIt, USA
- Contact:
Re: INL-ROM custom MMC3 hybrid mapper design
Okay, so going with the assumption that this is a go, I'd like to try and come up with an idea of how the data exchange might go so I can try to come up with the hardware design. Sorry for the HUGE post, I was basically board this evening and decided to type this up while I thought through everything.
So we've got an issue of where ROM/RAM/mapper registers are all located. I figure the best way is to keep MMC3 registers untouched which would require all loading/writes to PRG memory to occur while the memory is mapped to $6000-7FFF only. So you'd map the 8KB bank there by some variant of MMC3 prg bank control the prepare to load it up with your game data.
So the SPI byte can't very easily be mapped to $6000-FFFF then if we want to keep the data exchange simple. To keep from requiring too many more address inputs we could place it at $5000-5FFF with the addition of PRG A12 as an input. So the most recent read byte would be there until the mapper was 'signaled' to read the next byte.
Now to figure out how to handle the commands and addresses sent to the SPI via the mapper. For anyone interested in details this is the data sheet I'm looking at for winbond 2MB SPI flash.
So I was trying to figure out someway to make it so that writes to the SPI wouldn't be serial so that writing 8bit instruction and 24bit address wouldn't be so slow. But I after some looking into things I don't think it's worth spending the logic to keep writes from being serial. I figure start with bare bones essentials, then if additional things are needed/desired we'll consider adding and weighing the trade off between CPU time and mapper logic. Additionally this keep things independent of what type of SPI you're using. Basically the mapper doesn't care how big it is, how large the pages are, whether it's EEPROM or flash etc. So even if someone were interested in this for something small like save data alone the mapper doesn't care. Emulator authors you're on your own I guess... Good news is there are data sheets for this stuff and the commands and such are pretty universal.
So for anyone unaware or not interested in reading the data sheet the SPI flash is pretty simple I'll spell out the basics. You write a 8 bit command followed by the address if applicable. For reads you just continue to clock the chip and it spits out data bit by bit, byte after byte on each clock until you disable it by taking /CS high. Similarly for writes you just continue to write the data you'd like to save, assuming you set things up properly and erased the page in flash and everything before hand. Once you're done with the long stream of reading/writing you take /CS high to finish the process. To start another access you take /CS low and repeat the process with the next command, address, data etc. Trust me though, if you want to write anything to the chip from the NES you'll have to look through the data sheet. If you're just reading data the discussion below is probably enough.
I figure the best way to signal the mapper to read the next byte is to write to a control register. But conveintly we've also got PRG A0 as an input, so I figure we'll have two 'SPI registers' at $5000 and 5001 (more specifically: $5xxEVEN and $5xxODD in normal MMC3 style). Here are the definitions I'm thinking:
-----------------------
$5000 "SPI WRITE" All writes to this register are fed directly to the SPI flash. This is where you can write commands and data directly to the SPI flash. Only PRG D7 is seen by the SPI flash. Here is where you'll have to give the read command followed by the address before data can be pumped out by the mapper. You'll also have to write save data here serially bit by bit (like controllers but writes). Don't forget you'll have to supply the write command followed by the SPI address you want to save data to. This is ALSO where you'll read full bytes from that the mapper will pump out for you.
-----------------------
$5001 "SPI READ/Mapper command" So we need to use this register to enable and disable the SPI flash by controlling the /CS pin on the chip. I figure we'll just use D7. So writing any value with D7=0 enables the SPI flash and disables it when D7=1. Additionally this is the register to use to command the mapper to fetch the next byte from the SPI flash so you can read it out in one full byte. So for now we'll say writing any value with D7=0 commands the mapper to fetch the next byte. Writing D7=1 will disable the flash and stop the read data stream.
So I basically need 9 CPU clock cycles to fetch each byte. one cycle per bit and one more to reset my control circuitry and set things up to pump out the next byte. I *COULD* cut this down to 8 cycles and ABSOLUTELY REQUIRE 8 cycles no more no less. Basically I'd only clock in 7 bits and the 8th bit wouldn't be clocked into the shift register, it'd just be placed on PRG D0 for the required READ on the 8th cycle. I don't think this is very user friendly though, and could easily cause data to be read improperly. In my loop I insert a NOP to make an 8 cycle STA/LDA cycle stretch out to 10 cycles.
TL;DR:
Here would be the sequence of operations to read data from the SPI flash:
1) Write to $5001 with D7=0 to enable the SPI flash. (while this does reset my pump/shift register control circuitry this intial command WILL NOT count for the command to fetch the FIRST byte.)
2) Write serially to the SPI flash via $5000 bit PRG D7. Things are written MSB first. So you'll have to write 03h for the read command followed by the 24bit address.
3) Now write to $5001 with PRG D7=0 to give the 'read full byte' command to the mapper.
4)Wait 9 clock cycles. You can do anything during this time except read/write to $5xxx. In a loop this is where I store the data from the previous read.
5)Read the first/next byte from $5000.
6) read next byte by looping back to step 3.
7) when DONE reading, write to $5001 with PRG D7=1.
Now you can save yourself some CPU time with step 7. Basically if you know the next stream of data you'd like to read is sequential from your current read you can just let the mapper and flash sit there idle. You could then come back 5mins later and read the next byte in the stream. Maybe the best way is to just leave it enabled after the read. Then if before you start your next read/write cycle you decide if you need to disable, enable, and issue another command.
Here is the code I wrote up as an example obviously there may be better ways to do this. But this should explain how it all works.
Obviously you wouldn't have to do an entire 8KB loop, but assuming I haven't made too many mistakes that should work I'd think. Additionally it's require your data to be arranged in the correct order on the SPI flash to support this non-sequential copy loop. Maybe you guys can come up with a better solution/loop. I just did this to sort things out for myself. Copying data to pattern tables is even easier with just repetitive read $5000, delay, write $2007 loop.
So we've got an issue of where ROM/RAM/mapper registers are all located. I figure the best way is to keep MMC3 registers untouched which would require all loading/writes to PRG memory to occur while the memory is mapped to $6000-7FFF only. So you'd map the 8KB bank there by some variant of MMC3 prg bank control the prepare to load it up with your game data.
So the SPI byte can't very easily be mapped to $6000-FFFF then if we want to keep the data exchange simple. To keep from requiring too many more address inputs we could place it at $5000-5FFF with the addition of PRG A12 as an input. So the most recent read byte would be there until the mapper was 'signaled' to read the next byte.
Now to figure out how to handle the commands and addresses sent to the SPI via the mapper. For anyone interested in details this is the data sheet I'm looking at for winbond 2MB SPI flash.
So I was trying to figure out someway to make it so that writes to the SPI wouldn't be serial so that writing 8bit instruction and 24bit address wouldn't be so slow. But I after some looking into things I don't think it's worth spending the logic to keep writes from being serial. I figure start with bare bones essentials, then if additional things are needed/desired we'll consider adding and weighing the trade off between CPU time and mapper logic. Additionally this keep things independent of what type of SPI you're using. Basically the mapper doesn't care how big it is, how large the pages are, whether it's EEPROM or flash etc. So even if someone were interested in this for something small like save data alone the mapper doesn't care. Emulator authors you're on your own I guess... Good news is there are data sheets for this stuff and the commands and such are pretty universal.
So for anyone unaware or not interested in reading the data sheet the SPI flash is pretty simple I'll spell out the basics. You write a 8 bit command followed by the address if applicable. For reads you just continue to clock the chip and it spits out data bit by bit, byte after byte on each clock until you disable it by taking /CS high. Similarly for writes you just continue to write the data you'd like to save, assuming you set things up properly and erased the page in flash and everything before hand. Once you're done with the long stream of reading/writing you take /CS high to finish the process. To start another access you take /CS low and repeat the process with the next command, address, data etc. Trust me though, if you want to write anything to the chip from the NES you'll have to look through the data sheet. If you're just reading data the discussion below is probably enough.
I figure the best way to signal the mapper to read the next byte is to write to a control register. But conveintly we've also got PRG A0 as an input, so I figure we'll have two 'SPI registers' at $5000 and 5001 (more specifically: $5xxEVEN and $5xxODD in normal MMC3 style). Here are the definitions I'm thinking:
-----------------------
$5000 "SPI WRITE" All writes to this register are fed directly to the SPI flash. This is where you can write commands and data directly to the SPI flash. Only PRG D7 is seen by the SPI flash. Here is where you'll have to give the read command followed by the address before data can be pumped out by the mapper. You'll also have to write save data here serially bit by bit (like controllers but writes). Don't forget you'll have to supply the write command followed by the SPI address you want to save data to. This is ALSO where you'll read full bytes from that the mapper will pump out for you.
-----------------------
$5001 "SPI READ/Mapper command" So we need to use this register to enable and disable the SPI flash by controlling the /CS pin on the chip. I figure we'll just use D7. So writing any value with D7=0 enables the SPI flash and disables it when D7=1. Additionally this is the register to use to command the mapper to fetch the next byte from the SPI flash so you can read it out in one full byte. So for now we'll say writing any value with D7=0 commands the mapper to fetch the next byte. Writing D7=1 will disable the flash and stop the read data stream.
So I basically need 9 CPU clock cycles to fetch each byte. one cycle per bit and one more to reset my control circuitry and set things up to pump out the next byte. I *COULD* cut this down to 8 cycles and ABSOLUTELY REQUIRE 8 cycles no more no less. Basically I'd only clock in 7 bits and the 8th bit wouldn't be clocked into the shift register, it'd just be placed on PRG D0 for the required READ on the 8th cycle. I don't think this is very user friendly though, and could easily cause data to be read improperly. In my loop I insert a NOP to make an 8 cycle STA/LDA cycle stretch out to 10 cycles.
TL;DR:
Here would be the sequence of operations to read data from the SPI flash:
1) Write to $5001 with D7=0 to enable the SPI flash. (while this does reset my pump/shift register control circuitry this intial command WILL NOT count for the command to fetch the FIRST byte.)
2) Write serially to the SPI flash via $5000 bit PRG D7. Things are written MSB first. So you'll have to write 03h for the read command followed by the 24bit address.
3) Now write to $5001 with PRG D7=0 to give the 'read full byte' command to the mapper.
4)Wait 9 clock cycles. You can do anything during this time except read/write to $5xxx. In a loop this is where I store the data from the previous read.
5)Read the first/next byte from $5000.
6) read next byte by looping back to step 3.
7) when DONE reading, write to $5001 with PRG D7=1.
Now you can save yourself some CPU time with step 7. Basically if you know the next stream of data you'd like to read is sequential from your current read you can just let the mapper and flash sit there idle. You could then come back 5mins later and read the next byte in the stream. Maybe the best way is to just leave it enabled after the read. Then if before you start your next read/write cycle you decide if you need to disable, enable, and issue another command.
Here is the code I wrote up as an example obviously there may be better ways to do this. But this should explain how it all works.
Code: Select all
;;;;;;copy SPI to $5000-$5FFF routine;;;;;;;;
;first you must place the desired PRG RAM bank at $5000-5FFF via the MMC3 style control registers. (details later)
LDY #00
STY $5001 ; Writing to $5000 with D7=0 enables the SPI flash for access. (takes /CS low)
;Now you must serially write to the SPI via $5000 bit 7. the read command (03h) followed by the 24bit address, MSB first.
;Start unloading data now that everything is set up!
LDY #00
STY $5001 ;command to read the FIRST byte (with D7=0 still)
LDX #$00 ;2cyc; set up loop counter and provide 2 cycle delay for SPI data pump
NOP ;2cyc;
NOP ;2cyc; need total of 9 cycles to setup pump timing for entry to loop
NOP ;2cyc;
NOP ;2cyc; okay it's been 10 cycles since STA $5001, enter loop
load_spi_to_wram: ;copies 8KB bytes from SPI flash into page at $6000-7FFF
LDA $5000 ;mapper places most recent flash read at $5000 (decoded by PRG A0,12-15)
STY $5001 ;command to mapper to fetch next byte
STA $6000, x ;store first byte that was read
NOP ;provides at least 9 cycle delay from STA $5001
LDA $5000 ;read byte
STY $5001 ;fetch command
NOP ;delay
STA $6100, x ;store byte
LDA $5000
STY $5001
NOP
STA $6200, x
LDA $5000 ;4cyc
STY $5001 ;4cyc
NOP ;2cyc
STA $6300, x ;4cyc
...
LDA $5000
STY $7F00, x
INX
BNE load_spi_to_wram
;;end the read stream if you know your next SPI access isn't going to be a sequential read.
LDY #$80
STY $5001 ;writing to $5001 with D7=1 disables the SPI flash (takes /CS high)
;;;14cyc per byte * 8192bytes = ~115K cycles / 29800 = ~3.8 NTSC frames
Last edited by infiniteneslives on Sun Sep 16, 2012 10:58 am, edited 2 times in total.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
Re: INL-ROM custom MMC3 hybrid mapper design
Could you use the NES's 21/26MHz clock source? I guess the down side is that It's not famicom/famiclone compatible. A crystal/resonator (digikey:HWZT-12.00MD,12MHz,28¢/1)? Or use both edges of of M2 somehow? Winbond's large SPI EEPROMs can be clocked at up to 104MHz so there doesn't seem to be a relevant upper bound. Or can you use the winbond quad/dual SPI modes?infiniteneslives wrote:So I basically need 9 CPU clock cycles to fetch each byte. one cycle per bit and one more to reset my control circuitry and set things up to pump out the next byte.
On the other hand, you can't really beat 224kB/s and you're still talking about aggregate read speeds of 200kB/s so whatever.
- infiniteneslives
- Posts: 2102
- Joined: Mon Apr 04, 2011 11:49 am
- Location: WhereverIparkIt, USA
- Contact:
Re: INL-ROM custom MMC3 hybrid mapper design
Yeah I considered most of those things actually. I also thought about doing something like using a RMW instruction and direct reads from the SPI and writes to $6000. So LARGE unrolled loop could conceivably do it in 6 cycles (~290KB/s) with a lot of trickery, complexity, logic expense, I/O, components, etc. Like yourself, I realized it was plenty fast anyways so none of it's really justified.lidnariq wrote:Could you use the NES's 21/26MHz clock source? I guess the down side is that It's not famicom/famiclone compatible. A crystal/resonator (digikey:HWZT-12.00MD,12MHz,28¢/1)? Or use both edges of of M2 somehow? Winbond's large SPI EEPROMs can be clocked at up to 104MHz so there doesn't seem to be a relevant upper bound. Or can you use the winbond quad/dual SPI modes?infiniteneslives wrote:So I basically need 9 CPU clock cycles to fetch each byte. one cycle per bit and one more to reset my control circuitry and set things up to pump out the next byte.
On the other hand, you can't really beat 224kB/s and you're still talking about aggregate read speeds of 200kB/s so whatever.
Super simple, super cheap, plenty fast, tons of ROM so I'm happy
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
Re: INL-ROM custom MMC3 hybrid mapper design
Writing the saved game serially might be a little slow, but I guess players expect that.
- infiniteneslives
- Posts: 2102
- Joined: Mon Apr 04, 2011 11:49 am
- Location: WhereverIparkIt, USA
- Contact:
Re: INL-ROM custom MMC3 hybrid mapper design
Slow for the CPU, but pretty quick for the player. I'm used to a few seconds on everything but the NES which is take no time with battery backing. So even if you somehow managed to come up with 256bytes of save data (a full page of flash) It'll still be under a frame's worth of time.tepples wrote:Writing the saved game serially might be a little slow, but I guess players expect that.
I wrote up the routine quick, keep in mind it could be even faster if you unrolled each byte. Here I figured worst case looping on each bit.
Additionally I think I'm going to change my mind about D0 being connected to the SPI for the $5000 register. The SPI handles everything MSB first. So instead of hassling with rolling the MSB around to the LSB it just makes more sense to connect D7 to the SPI flash data input.
Code: Select all
;;;;save data to SPI routine;;;;
;this routine writes a full page of SPI flash
;before running you must erase the page
;and write the page program command (02h)and 24bit address
;alternatively you could load the command and address into your 'save_data' array:
;02h, addr4, addr3, addr2, addr1, save data (251 bytes)
;then this routine would give the program page command, address, and save_data all at once
LDY #00
STY $5001 ; Writing to $5000 with D7=0 enables the SPI flash for access. (takes /CS low)
LDX #$00
write_to_SPI:
LDA save_data, X ;4cyc; load byte
LDY #$08 ;2cyc; bit counter
save_byte:
STA $5000 ;4cyc * 8; write MSB to SPI (only D7 is connected)
ASL A ;2cyc * 8; move bit 6 to D7
DEY ;2cyc * 8;
BNE save_byte ;3cyc * 7 + 2cyc last;
INX ;2cyc; increment byte counter
BNE write_to_SPI ;3cyc
LDY #$80
STY $5001 ;writing to $5001 with D7=1 disables the SPI flash to end the write (takes /CS high)
;TOTAL time: ~100 cycles per byte = ~25.6K cycles = ~1frameIf you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
Re: INL-ROM custom MMC3 hybrid mapper design
I'm sure with good game design, you can make it seemless. Palette fade? Save on top of it since the game won't be playing. Screen switches? Write a page. Save point in your game? Make it save and then a sound effect for the player to know. I'm sure you can find 1-10 frames in game play which you can reuse as a save point to seemlessly add it.
Re: INL-ROM custom MMC3 hybrid mapper design
I really don't think this is an issue... People are used to messages like "saving the game, please don't turn the power off" being displayed for a few seconds and I honestly don't remember anyone complaining. And apparently this will fast enough to not even require a message. As long as it takes less than 1 second, I don't think you need a message.
Re: INL-ROM custom MMC3 hybrid mapper design
If it takes longer than one frame, display a message anyways, just to make sure.tokumaru wrote:I really don't think this is an issue... People are used to messages like "saving the game, please don't turn the power off" being displayed for a few seconds and I honestly don't remember anyone complaining. And apparently this will fast enough to not even require a message. As long as it takes less than 1 second, I don't think you need a message.
[url=gopher://zzo38computer.org/].[/url]
Re: INL-ROM custom MMC3 hybrid mapper design
Come on. It's actually BAD to display messages if the message won't be visible long enough for the player to see it properly.zzo38 wrote:If it takes longer than one frame, display a message anyways, just to make sure.tokumaru wrote:I really don't think this is an issue... People are used to messages like "saving the game, please don't turn the power off" being displayed for a few seconds and I honestly don't remember anyone complaining. And apparently this will fast enough to not even require a message. As long as it takes less than 1 second, I don't think you need a message.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
Re: INL-ROM custom MMC3 hybrid mapper design
Or use an icon, rather than a message. I know games that save extraordinarily fast, have a little SD card/floppy disk icon that appears in a bottom corner while it's saving.
Re: INL-ROM custom MMC3 hybrid mapper design
Yeah, sort of like the little floppy disk that would blink in the corner when Doom 1 would stall for loading. Super Smash Bros. Melee has a "saving" icon in the corner as well.
Re: INL-ROM custom MMC3 hybrid mapper design
If you have the SPI flash and a shift register, would it be hard to add a way to stream PCM or DPCM audio from the flash to the expansion sound pin? I guess 1-bit PCM would be trivial (just connecting the LSB of the shift register to a spare CPLD pin and then automatically retriggering the read command) but for 4 or 8 bit audio you would need another latch and more spare pins on the CPLD.
- infiniteneslives
- Posts: 2102
- Joined: Mon Apr 04, 2011 11:49 am
- Location: WhereverIparkIt, USA
- Contact:
Re: INL-ROM custom MMC3 hybrid mapper design
That's an interesting thought... I had only considered storing DPCM samples on the SPI, loading them to RAM and playing. But I'd think something like you're imagining could be possible as well assuming an EXP audio jumper/resistors were installed.Grapeshot wrote:If you have the SPI flash and a shift register, would it be hard to add a way to stream PCM or DPCM audio from the flash to the expansion sound pin? I guess 1-bit PCM would be trivial (just connecting the LSB of the shift register to a spare CPLD pin and then automatically retriggering the read command) but for 4 or 8 bit audio you would need another latch and more spare pins on the CPLD.
You'll have to forgive me I'm not much of a sound buff but I am interested in the possibilities, so feel free to correct me on this stuff or suggest better solutions. I made the exp pins easily accessible by extending all the pins into the cart (don't have to chip away at the cart shell to access them) The CPLD that's going to handle the SPI flash should have a free pin that could be assigned to the task. Or if you were accepting of a 0-3.3v signal you wouldn't even need a CPLD pin, the SPI could be connected directly to the EXP pin.
So really I'd imagine doing it a little differently than using the SPI for game/save/graphics data. It could be set up to just run free, so after writing the command and address to the SPI via $5000 bit 7, reads would be automatically enabled (all this really means is the SPI needs to be continually clocked after the read cmnd/addr). And the SPI would just spit out the data stream until the chip was disabled by writing to $5001 with D7=1. You wouldn't even bother with the shift register, just let the flash stream bits on each clock pulse. I'm guessing 1.79Mhz would be a little faster than desired for an audio stream. Instead of a shift register a clock divider could be put in it's place.
I'd guess you'd also want a low pass filter and could easily locate than in the perf area.
If there was logic to spare both the shift register and clock divider could be implemented at once. I'd just have to add another definition to $5001. Perhaps something like D6=0 divided clock bit stream to EXP pin, D6=1 byte feeding as discussed previously. D7 would still enable/disable the SPI which would stop either bit stream or byte feed reads.
EDIT: it wouldn't be required, but might be nice. The SPI's hold pin/function would basically act like a 'pause' for the bit stream. So you could stop the stream and pick up where you left off if control was given to that pin. Perhaps by D5 on $5001. We'll see how much logic and pins are available, but if desired this could be considered as well.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers