Page 4 of 6

Posted: Tue Feb 28, 2012 9:32 pm
by infiniteneslives
bunnyboy wrote:Designing for USB power might be a good idea anyways. That way you could do power on tests without losing the SRAM/CPLD contents.
Yeah it's set up to run on either NES or USB power and seamlessly switch between the two. So you don't need the NES to be on to program it and you don't need it to be plugged into USB to play on the console. But like you're saying you can leave it plugged into USB the entire time so you won't loose power/memory contents after shutting off the console. In the event that a clone doesn't supply 3.5V or more you'd always need USB power supplied.

Also we've got a battery on board that can be used to power WRAM, PRG RAM, or CHR RAM as desired. If you stored something in the CPLD's RAM you'd lose it though, but there is flash in the CPLD one could use. The CPLD user flash would be nifty for saving game data without use of a battery or external non-volatile memory.
drk421 wrote:Hmmm, what about hooking up a bluetooth spp module to transfer data?
I probably won't implement it myself, I find it hard to beat the speed, reliability, and compatibility of a USB cable. But the good news is, I left the serial pins available on the mcu with solder contacts on the board. So in the spirit of the project one could take this as a base to easily add something like that and DEVELOP your own BT interface for the cartridge :)

Posted: Wed Feb 29, 2012 6:35 am
by tepples
Anything that can be a USB host can be a Bluetooth host. In fact, that's how the Wii console connects to its remote: through a built-in USB Bluetooth adapter. Is this USB chipset OTG (cable-selected host or client) or client-only?

Posted: Wed Feb 29, 2012 7:15 am
by infiniteneslives
tepples wrote:Is this USB chipset OTG (cable-selected host or client) or client-only?
I never really thought about someone wanting to use it as a host...

No it's not OTG, just client only. I'm using V-USB, and I'm pretty sure there is nothing available with it that allows it to act as a host. Another option was LUFA, I'm not sure if hosting is possible with that or not. I don't have any USB specific hardware aside from the AVR mcu and a couple resistors. I've never looked into the possibility of hosting but my guess is you'd have to add hardware or write your own USB host code to do it.

Did you have something specific in mind?

I'm guessing using it for peripherals such as a keyboard/mouse or something? Simple items like that would probably be easier and no hardware (aside from a socket/cable) if you used serial that's already built into the mcu.

If one wanted to make it a host for data storage for something like a flash drive, the easier/nicer option would be an SD card slot. There are tools already out there for getting an SD card on a AVR, it would then just need to be connected to the SPI bus on the cart and maybe 1-2 of the free pins on the mcu.

Aside from that adding the BT card might add some hosting capabilities. But that one at least is still slower than USB which should beat it out in data transfer speed by 2-3 times.

Posted: Thu Mar 01, 2012 2:36 am
by Karatorian
Nice looking piece of kit you've got there. I'm looking forward to this entering production.
infiniteneslives wrote:Also everything is open source and I hope to have tutorials and stuff on how to modify or create mappers making it a useful tool that doesn't need to be reverse engineered to make full use of for mapper and game development.
This is the number one draw for me. I've come up with a few different ideas for new mappers and it would be nice to have a platform to prototype them on. (That and I'm a pretty big fan of open source. Not quite a zealot, but definitely convinced of the benefits.) Most of my ideas could be build out of a handful of discrete chips or a small programmable device, so they'd be cheap to make carts of later.
It has 10bit of distributed SRAM that can be configured as dual port without adding much more logic.
I'm assuming that's a typo. Ten bits!? How much RAM does it actually have? Dual ported RAM gives me all kinds of crazy ideas. (But those ideas require a lot of it...)

Posted: Thu Mar 01, 2012 4:12 am
by infiniteneslives
Karatorian wrote:
It has 10bit of distributed SRAM that can be configured as dual port without adding much more logic.
I'm assuming that's a typo. Ten bits!? How much RAM does it actually have? Dual ported RAM gives me all kinds of crazy ideas. (But those ideas require a lot of it...)
Ahh yeah... Not sure how I came up with that... This CPLD has 74Kbits of SRAM that can be easily configured as true dual port still much more than MMC5. The big dog of the family (pin compatible) has a whopping 240Kbits! Should satisfy most desires on the NES :)

In other news we finished porting over all the AVR code today and I just finished debugging it all. Everything works GREAT! I even tested it out quick on my portable clone that is only operating at ~3.5v and it worked breautifully even without USB power. All my mappers tested out great as well and there doesn't seem to be any issues with the buffer circuitry either.

So now it's time to start exercising this thing :)

Posted: Thu Mar 01, 2012 7:06 am
by Karatorian
infiniteneslives wrote:This CPLD has 74Kbits of SRAM that can be easily configured as true dual port still much more than MMC5.
Are you using any of that?

The idea I had was to have a dual ported CHR-RAM mapped into the CPU's address space. This would allow for more time to update the tiles by allowing writes to the offscreen page during rendering and utilizing a simple page swap during v-blank. (It's funny how I just had this idea early this morning and then I read about how your cart could be used to prototype the idea.)

Posted: Thu Mar 01, 2012 11:43 am
by infiniteneslives
No I'm not using any of it in the base mapper as of right now so it's completely free to play around with. Although I did have plans to play around with it to see how it works.

Isn't your idea basically the same thing as EXRAM that the MMC5 has?

Posted: Thu Mar 01, 2012 6:31 pm
by tepples
ExRAM can be used only as a nametable, not as a pattern table. MMC5 has no provision for writable pattern tables.

Posted: Fri Mar 02, 2012 3:42 am
by infiniteneslives
Okay so I dug a little deeper into the SRAM capabilities of this thing. Basically there are two different types of SRAM available. There is Embedded Block RAM (EBR) and Distributed RAM in each LUT (Look-up table, similar to a Macrocell)

The EBR is designed to provide "large" amounts of configurable SRAM (single port, dual port, psuedo dual port etc) and as far as we're concerned comes in 1KB chunks of that are 9 bits wide (9216bits) but assuming the 9th bit isn't used they are just 1KB x 8 bits wide. This EBR can be single, true dual, or psuedo ported (read only on one port, read&write on the other) without costing any logic elements (LUTs). This CPLD has 7 blocks of EBR so I can easily have 7KBytes of true dual ported SRAM. Larger members of the family have 8, 10, and 26 blocks (one KByte per block).

The Distributed RAM that is contained within each logic element (LUT, but you can think of this like a macrocell) So along the lines of what Tepples brought up about making SRAM from macrocells, SRAM can be created from the general logic cells available. However this can only be configured as single or psuedo dual port (true dual port not available here without using obscene amounts of logic) However putting SRAM here is very costly like Tepples brought up. Configuring LUTs as RAM costs about 21bits per LUT. And I've got 640 LUTs. So to make 1KB of pseudo dual ported SRAM it takes about 328 LUTs which is HALF of the logic I have available.

So long story short, there isn't much point to implement distributed SRAM unless you REALLY need it and have lots of logic to spare. The EBR true dual ported SRAM is great in the EBR (why it's there). Unfortunately I'm 1KB short of having 8KB for both nametables.

The only trick is that the SRAM is synchronous... Not much of a problem with PRG RAM I can just drive it off of M2. But the CHR side is a little tricky. From what I can see on my scope the CHR /CE (A13) can't be used as a clock because it doesn't toggle each access like PRG /CE does which makes sense. The only real signal available is CHR /RD and CHR /WR. It looks like CHR /RD toggles nearly every cycle, except for a write cycle in which case CHR /WR toggles. So I'm thinking a clock could be generated by NOR ing CHR /RD and /WR. Only issue being that if the clock is delayed behind CHR /WR then there could be some timing violations with using CHR /WR as my /WE line. But adding some delay to CHR /WR could resolve this if needed. It looks like the Address and data lines hang out long enough to prevent issues there.

Karatorian: did you want to write something up to demo this? If so, what kind of mapper set up were you thinking? Would you want just one full nametable (4KB) or the full 7KB I've got available and just map the original NT to the last/first 1KB? Where would you want it to be mapped on the PRG side? The convenient thing about being just under 8KB is that it would fit in the MMC5's EXRAM location. So if my math is right 7KB could be mapped to $4800-$7FFF. Otherwise it could just sit where WRAM normally does assuming there wasn't any. Or a single NT could be mapped to $6000-$7FFF.

As for the bank switching I'm thinking something like a smaller CNROM but with CHR-RAM. Then just swap the standard VRAM out for the dual ported SRAM like your saying. The standard VRAM would just fill in the whole of the 7KB. Or just swap out a single NT in the 4KB option.

Posted: Fri Mar 02, 2012 4:13 am
by tepples
infiniteneslives wrote:So long story short, there isn't much point to implement distributed SRAM unless you REALLY need it and have lots of logic to spare. The EBR true dual ported SRAM is great in the EBR (why it's there). Unfortunately I'm 1KB short of having 8KB for both nametables.
But with 7 KiB, you could still make a bank of MMC5 style extended attributes for each nametable (1 KiB each), four 1 KiB pattern table banks like Chinese TQROM, and 1 KiB extra for saving like MMC6.
So if my math is right 7KB could be mapped to $4800-$7FFF.
7 KiB would fit in $4400-$5FFF.

Posted: Fri Mar 02, 2012 4:32 am
by infiniteneslives
tepples wrote:
infiniteneslives wrote:So long story short, there isn't much point to implement distributed SRAM unless you REALLY need it and have lots of logic to spare. The EBR true dual ported SRAM is great in the EBR (why it's there). Unfortunately I'm 1KB short of having 8KB for both nametables.
But with 7 KiB, you could still make a bank of MMC5 style extended attributes for each nametable (1 KiB each), four 1 KiB pattern table banks like Chinese TQROM, and 1 KiB extra for saving like MMC6.


So if my math is right 7KB could be mapped to $4800-$7FFF.
7 KiB would fit in $4400-$5FFF.
Yeah I need to brush up on my PPU memory map a bit I'm getting all mixed up. I really could to a lot more with that 7KB than I was thinking. As for the last 1KB of saving like MMC6 it would have to work a little differently. That SRAM is volatile and battery backing the whole CPLD isn't really an option. But there is 8KB of user flash memory available on chip. But you could still use that last 1KB for all kinds of stuff. Dual porting with the AVR or other functions within the CPLD.

Posted: Fri Mar 02, 2012 7:22 pm
by Karatorian
I hadn't really thought about name tables yet, just pattern tables. The idea was inspired by some GBA code I wrote that faked a bitmap mode by filling the screen with a sequential tile pattern and using a custom tile for each one. (Yes, I'm aware the GBA has a real bitmap mode). So the name tables never needed changing. (Yes, it's horribly inefficient.)

Of course, on the NES, with only 512 tiles total (using both banks), this wouldn't quite work. So it would be necessary to update the name tables too. Assuming that you only use 256 tiles for the background, 8k would be enough. One 4k page for the front buffer and one 4k page for the back buffer. This requires that the sprites share the same bank as the background.

If you wanted the sprites to have their own bank, then you'd need 12k. I'm assuming you could use whatever RAM you currently have onboard for CHR-RxM already. However, switching banks for the sprite table reads would require a level of PPU monitoring similar to the MMC5. If you wanted the sprites double buffered too, you'd need 16k, all of it dual ported. (Or at least psudo dual: PPU read, CPU read/write.)

Unfortunately, the 7k you've got easy access to isn't quite enough for even the basic setup. My suggestion would be to use 6k of it as two 3k pages. That leaves 1k for other stuff (a third name table, extended attributes, etc.) That would give 192 double buffer tiles. The other 64 could be used for something fixed, like alphanumerics. Not too shabby if you ask me.

Of course the real limitation (which I ought to know, but don't) is how fast the NES can update these tiles. How many bytes can the NES move in one frame. Assuming you're just grabbing the tiles from PRG-ROM and they're not dynamically composed (like one needs to do variable width fonts, vector graphics, or bitmap emulation), then all you need to do is move the bits. There's not much point in supporting double buffers larger than the CPU can fill in a frame anyway. (Unless you wanted to cut the frame rate to 30 FPS.)

As for how the rest of the mapper would be setup, I hadn't gotten that far. When I first had the idea, which was basically "Hey, dual ported CHR-RAM could be used for double buffering", I was assuming the dual ported RAM would a separate chip. Then I started thinking about how many IO lines the mapper would need:

Code: Select all

PRG-CART-A    16
PRG-CART-D     8
PRG-ROM-A     15+
PRG-ROM-D      8
CHR-CART-A    14
CHR-CART-D     8
CHR-RAM-P1-A  15+
CHR-RAM-P1-D   8
CHR-RAM-P2-A  15+
CHR-RAM-P2-D   8
Which is a bare minimum of 115 pins for just addressing and data. Not to mention the chip enables and stuff. Plus even more for PRG-RAM. Which isn't required, but nice to have. (At least as an option.) So that's as far as my design went.

Another idea I had for working around the 7k limit was to only use 4k and implement a DMA engine to copy it to the real CHR-RAM during V-blank. It's probably not a viable idea though.

And now for something completely different...

With 7k of built in RAM, you could implement the various things the MMC5 uses ExRam for, all at the same time. The first thing that comes to mind is true four screen mirroring (which is a misnomer, 'cause they're not mirrored at that point), rather than the three the '5 has. And extended attributes on all of them at the same time. That alone would be pretty impressive.

Posted: Fri Mar 02, 2012 10:34 pm
by infiniteneslives
Unfortunately, the 7k you've got easy access to isn't quite enough for even the basic setup. My suggestion would be to use 6k of it as two 3k pages. That leaves 1k for other stuff (a third name table, extended attributes, etc.) That would give 192 double buffer tiles. The other 64 could be used for something fixed, like alphanumerics. Not too shabby if you ask me.
When I started the project the only XO2 cplds available were the one I have now. But since the bigger ones have become available I've been temped to officially step up to a bigger one. I didn't have much legitimate reason but when you put it all like this it's more convincing, assuming people want me to produce these. The next larger chip has 8KB of EBR (true dual port) but the one bigger than that is the SAME cost and gives 10KB of EBR. In production quantities we're only talking $3 or less. For a dev cart it seems justifiable. On the flip side if one was ever to want to produce a game with the mach XO2 you could down scale to the smaller $5-6 devices if not using the extra features. Interestingly enough 8K of dual ported SRAM is about the same cost of larger cplds anyways. So if one wanted dual ported SRAM the mach xo2 really looks like the best option (not considering FPGAs)

Of course the real limitation (which I ought to know, but don't) is how fast the NES can update these tiles. How many bytes can the NES move in one frame. Assuming you're just grabbing the tiles from PRG-ROM and they're not dynamically composed (like one needs to do variable width fonts, vector graphics, or bitmap emulation), then all you need to do is move the bits. There's not much point in supporting double buffers larger than the CPU can fill in a frame anyway. (Unless you wanted to cut the frame rate to 30 FPS.)
So I feel like I'm starting to dive too deep into what may be possible with this, I'm sure some people will say if you want to do all this go to a different console. But Nintendo went from NROM to MMC5 didn't they? I'll share the thought and you can do with it what you will. Depending on what you wanted to do exactly there are several different ways you could greatly increase the number of bits that got banged around. It really all depends on what you were trying to do, but if you were just moving bits from the PRG-ROM you could provide specific instructions to some logic in the CPLD running at HIGH speed 50-100Mhz. Then have it remove the PRG-ROM from the NES with the buffers (not possible at the moment but would be by re-appropriating one CPLD pin). Then while the CPU sat idle for a couple cycles several KB of data could be moved around. And even more complex yet if you wanted some processing done you could do all kinds of stuff with the AVR.

But enough of all that non-sense...
Which is a bare minimum of 115 pins for just addressing and data. Not to mention the chip enables and stuff. Plus even more for PRG-RAM. Which isn't required, but nice to have. (At least as an option.) So that's as far as my design went.
I think you're a little off there. many of those assignments can and should be doubled up. Why are PRG CART and ROM on different pins? For the higher non address able pins sure, but not A0-13 and the data bus. Same argument with CHR side. And why would each page of CHR RAM have is own full set of address and data lines??? Unless I'm missing something you only need to toggle ONE upper address bit to swap the pages. If you wanted a cart with CHR-RxM, PRG-ROM, WRAM and separate dual ported SRAM mapped to fixed locations on both busses most of those memories would be tied together. So the PLD would only need to have IO for the upper address lines, control signals, and PRG-data bus for controlling bank switching. That could be done with a cheap little ~40 pin CPLD. Something comparable to the MMC3 really. Now my cart has other things going on and really does need to the full CHR and PRG Address and data busses since the dual ported SRAM is inside it, but still you could do it with quite a bit less than 115 IO, I'm doing it with 108 but could do it with 80-90 assuming you didn't have a mcu to interface with like I do.


But yes like we're all saying there is still a LOT that can be done with what I've already set up and that 7KB available.

Posted: Sat Mar 03, 2012 7:07 am
by tepples
Karatorian wrote:The idea was inspired by some GBA code I wrote that faked a bitmap mode by filling the screen with a sequential tile pattern and using a custom tile for each one.
And I did the same thing for the menu system in the last versions of Lockjaw. I'd bet some GBA programs did the same so that they could mix bitmapped text with tiled game objects or get a backdrop layer behind the bitmap layer, as the GBA's 8bpp and 16bpp bitmap modes support only one layer. Furthermore, the DS's 2D is mostly the same as the GBA, and a 4bpp surface takes up far less VRAM than an 8bpp or 16bpp bitmap.
If you wanted the sprites double buffered too, you'd need 16k, all of it dual ported. (Or at least psudo dual: PPU read, CPU read/write.)
Even if you have a separate pair of tiles for each of 64 8x16 pixel sprites, double-buffered sprite cels would need only 2 KiB per buffer.
There's not much point in supporting double buffers larger than the CPU can fill in a frame anyway. (Unless you wanted to cut the frame rate to 30 FPS.)
And look at how slow the frame rates were in a few Super NES games, namely Wolfenstein 3D, Jurassic Park, and Star Fox/Wing.
Another idea I had for working around the 7k limit was to only use 4k and implement a DMA engine to copy it to the real CHR-RAM during V-blank. It's probably not a viable idea though.
That or reuse the circuitry for counting fetches and detecting end of scanline to implement what kevtris has called a "stuffer": queue up to sixteen writes in a FIFO, take CHR RAM off the bus, and execute them during the garbage nametable fetches at x=257, 259, 265, 267, ...

Posted: Sun Mar 04, 2012 12:16 am
by Karatorian
infiniteneslives wrote:Interestingly enough 8K of dual ported SRAM is about the same cost of larger cplds anyways.
That's kinda strange. Good to know for future reference.
So I feel like I'm starting to dive too deep into what may be possible with this, I'm sure some people will say if you want to do all this go to a different console.
"The person who says it cannot be done should not interrupt the person doing it." --Chinese Proverb
I think you're a little off there.
More than a little off actually. Thanks for pointing out this glaring thinko. As the design never made it out of my head and onto paper (or pixels), I missed the obvious.
Why are PRG CART and ROM on different pins? For the higher non address able pins sure, but not A0-13 and the data bus. Same argument with CHR side.
So we can have bit wisel granularity with mapping! Just kidding. Simply because the ideas in my head where abstract and when tried to make them concrete, I didn't take the time to think things through all the way.

Here's a block diagram of the version with all those pins:
Image
Obviously, this is not ideal.
And why would each page of CHR RAM have is own full set of address and data lines?
Um, P1 and P2 where the two ports of the dual ported SRAM. If the CPU is writing to VRAM and the PPU is reading from VRAM, then they need to be on separate addressing and data buses. Of course, as you pointed out, the CHR and PRG buses are already separate.
If you wanted a cart with CHR-RxM, PRG-ROM, WRAM and separate dual ported SRAM mapped to fixed locations on both busses most of those memories would be tied together.
So the PLD would only need to have IO for the upper address lines, control signals, and PRG-data bus for controlling bank switching. That could be done with a cheap little ~40 pin CPLD.
You are entirely correct. Here's a block diagram of the proper way:
Image
tepples wrote:Even if you have a separate pair of tiles for each of 64 8x16 pixel sprites, double-buffered sprite cels would need only 2 KiB per buffer.
Doh. I didn't think of that. So then, no combination of sprites really need the whole 4k page. I'll have to remember that.
That or reuse the circuitry for counting fetches and detecting end of scanline to implement what kevtris has called a "stuffer": queue up to sixteen writes in a FIFO, take CHR RAM off the bus, and execute them during the garbage nametable fetches at x=257, 259, 265, 267, ...
That is an interesting idea. Sounds like it could have it's uses. Kinda like an H-Blank DMA.