Page 1 of 7

Posted: Mon Dec 07, 2009 7:46 am
by ikari_01
Now, with the PowerPak out and all I'm not sure anybody cares, but there has been some progress. A nice little PCB came out, and then one more with some fixes. And the third revision is on the way. :roll:
There are some pics here.

Currently there's only Map 0x20, 0x21, and 0x25 support with near-time SRAM save.

I think the next step will be DSP-1 emulation.

There's a 200kgate Spartan 3 on the board which can be replaced by its pin compatible 400kgate variant, which should definitely be sufficient for chip emulation in case the former is not enough.

Posted: Mon Dec 07, 2009 8:20 am
by smkd
There's a 200kgate Spartan 3 on the board which can be replaced by its pin compatible 400kgate variant, which should be sufficient for chip emulation.
DSP-1 emulation.
nice. Bonus points for the SD card interface. How long does it take games to load? Say, what is the figure on a 32mbit ROM?

Looks to be a very capable cart, I'm wondering how much that SRAM is costing though =(

The fat FPGA is a huge plus, I'm guessing that's your main draw. Playing starfox on the real deal with a flash cart would be wicked.

Posted: Mon Dec 07, 2009 8:54 am
by ikari_01
smkd wrote: nice. Bonus points for the SD card interface. How long does it take games to load? Say, what is the figure on a 32mbit ROM?
It's a good deal slower than the Powerpak, due to the AVR accessing the SD card in SPI mode.
A 32Mbit ROM takes about half a minute to load.

It should be possible to offload the bulk transfers to the FPGA which should result in at least a 100% speed increase. ATM the AVR uses its SPI to load a block from the card, then uses the same SPI to push it to SRAM via the FPGA. With the data flowing directly to the FPGA I have direct SRAM access (no shared SPI) and could achieve higher SPI clocks (currently 6.144MHz).
EDIT: to clarify, this is possible with a software update as the FPGA is already connected to the SPI bus.

Looks to be a very capable cart, I'm wondering how much that SRAM is costing though =(
Quite a lot. :/
The fat FPGA is a huge plus, I'm guessing that's your main draw. Playing starfox on the real deal with a flash cart would be wicked.
True, that would be awesome. Still a long way to go though.

Posted: Mon Dec 07, 2009 11:45 am
by MottZilla
Just because the SNES PowerPAK came out doesn't mean there is no interest in other projects anymore. If anything it should encourage other projects to think about how to be different than it. I'm sure if GSU (SuperFX) support existed in a similar type of device they would find lots of interest and buyers.

Posted: Mon Dec 07, 2009 4:58 pm
by naI
:shock: Those SRAM prices are outrageous! FWIW, I'm excited about this cart.

Posted: Mon Dec 07, 2009 6:02 pm
by MottZilla
SRAM makes the designing of any product easier but more expensive. With DRAM and SDRAM you must have a refresh cycle in your design. I think that PSRAM is supposed to do this for you so you don't have to worry about it yourself and it acts like SRAM.

Posted: Mon Dec 07, 2009 6:57 pm
by tepples
If your memory is twice as fast as the CPU, I don't see how hard it would be to work DRAM into your design. The Apple II multiplexed CPU, video, and refresh with just discrete logic. On the Super NES, memory would need to be capable of 7.2 MHz operation for this to work in fast ROM mode.

Posted: Mon Dec 07, 2009 7:07 pm
by Super-Hampster
FWIW I'm also interested in this cart. It will support add on chips. That's really cool.

Posted: Mon Dec 07, 2009 8:02 pm
by MottZilla
I don't really know how hard it is. It's obviously not impossible since tons of SNES Copiers were made using DRAM. And as I heard the PowerPAK uses SDRAM. But by using some sort of memory that doesn't require you to refresh it like SRAM or PSRAM all you have to worry about then is a BIOS, Source for loading data and Memory Mapping functions for your device. So atleast you'd have one less thing to worry about.

Posted: Mon Dec 07, 2009 9:52 pm
by smkd
I wouldn't mind waiting 15 seconds to load a 32mbit ROM if that means anything. Atleast when I play Mario Kart or some hack of it, it would only be a couple seconds waiting with no need for donor DSP-1 chip. I still have to import a NTSC deck though..

SRAM cost is clearly going to be killer (hope you got a decent quantity discount) but surely it would make stuff like SFX/SA-1 high speed ROM / RAM access easier to implement. I think I read in the docs that SA-1 runs at 10mhz when it has ROM or RAM to itself.

Posted: Tue Dec 08, 2009 6:25 am
by ikari_01
smkd wrote:I wouldn't mind waiting 15 seconds to load a 32mbit ROM if that means anything. Atleast when I play Mario Kart or some hack of it, it would only be a couple seconds waiting with no need for donor DSP-1 chip. I still have to import a NTSC deck though..

SRAM cost is clearly going to be killer (hope you got a decent quantity discount) but surely it would make stuff like SFX/SA-1 high speed ROM / RAM access easier to implement. I think I read in the docs that SA-1 runs at 10mhz when it has ROM or RAM to itself.
Frankly I did not pay too much attention to cost. I don't have the guts nor time for commercial distribution anyway; being in Germany only makes it harder, with a f*ckton of regulations to meet. I do intend to release the source and gerber files at some point. (GNU GPL or similar)

Actually I need more than 7.2MHz of random access throughput. SuperFX and SA1 (among others) have separate buses for ROM and RAM which can be accessed simultaneously. e.g. while the SuperFX has the RAM for itself, the SNES can still access the ROM freely.

My board has only one memory bus so I need at least three time slices per "master clock" (roughly: AVR, SNES, GSU). With the FPGA maxed out at ~100MHz and an estimated 11 clocks per SDRAM access using autorefresh and autoprecharge, it would be a bit of a stretch. EDIT: This may be inaccurate, I haven't looked into the various SDRAM datasheets for quite some time now.

How many clock cycles does the SA-1 need for memory access? "The Documentation" does give some numbers for the SuperFX but not for the SA-1...

However, I'm considering it maybe for next year. ;) I like the idea as SDRAM is dirt cheap and having a single 64M or 128M chip instead of four 16M chips would save a lot of space. I'm kind of an FPGA noob so I need some time to make myself comfortable.

EDIT: I was a bit off, loading a 32Mbit ROM currently takes about 21 seconds.

Posted: Tue Dec 08, 2009 1:26 pm
by Near
The SuperFX is actually the simpler chip. It has a switch to toggle access between the S-CPU and GSU accessing ROM or RAM. The SuperFX2 side can access the ROM/RAM at 5 clocks per fetch, or 21.47MB/s / 5. But it also caches instruction reads in a 256-byte window, and can fetch from it at the full 21.47MB/s. If you aren't concerned with high accuracy, games should still work okay if you omit the instruction cache, they'll just run a little slow and music will end too quickly in some games.

The SA-1 is the real bastard. It can access ROM, I-RAM and the I/O registers at the full 10.74MB/s. It can access BW-RAM at 5.37MB/s. But both the S-CPU and SA-1 share the memory at the same time. There is a bus contention chip inside the SA-1 that detects this, and it will stall out the SA-1 until the S-CPU is done accessing it, as it obviously can't delay the S-CPU's request.

It's not the two chips accessing the same address at the same time that is a problem, but both accessing the same chip at the same time that will make SA-1 reads take longer.

So to get both of these right with only one SRAM chip, you'll need some crazy fast SRAM and some logic to determine which emulated memory chip each device is accessing at which time.

Posted: Tue Dec 08, 2009 2:19 pm
by ikari_01
Thank you for the details, byuu.
So let me think.. my FPGA logic is currently running @86.02MHz, using 6 clocks for one SRAM access cycle, resulting in roughly 70ns per cycle or 14.336 Mwords/s. (Data width is 16bit on the SRAM side, which might be beneficial.) My SRAM is 55ns so there should be some headroom.

With FastROM, cycle length appears to be ~280ns, meaning I can do four 16-bit transfers "at 14.336MHz" each, with the first one being the S-CPU access to meet the required 120ns with some margin.

The documentation states that with the SA-1, the S-CPU must run at 2.68MHz, so I have ~372 ns between two S-CPU accesses which is enough to serve the S-CPU and run four more SA-1 cycles at full speed. Pausing the "virtual" SA-1 when a new S-CPU cycle is detected should not be too hard.
Having only one bus, I would also have to stall the SA-1 when they are not trying to access the same regions, where the original SA-1 would allow true parallel access (e.g. S-CPU -> BWRAM, SA-1 -> ROM). Resulting in only 7.16MB/s in this case. Damn! Does this happen a lot?

I-RAM (2kBytes iirc?) and registers (and the SFX's cache) will be implemented inside the FPGA as block RAM or maybe even distributed RAM.

But first things first, which will be an emulated DSP-1 in some sort of soft core. :)

Posted: Mon Dec 21, 2009 6:55 pm
by ikari_01
Yay, I managed to offload the SD bulk transfers from AVR to the FPGA.
SPI clock is now 21.5MHz (as opposed to 6.144MHz) and Tales of Phantasia loads in 5 seconds. That's nice!

Posted: Mon Dec 28, 2009 12:12 am
by smkd
Frankly I did not pay too much attention to cost. I don't have the guts nor time for commercial distribution anyway; being in Germany only makes it harder, with a f*ckton of regulations to meet. I do intend to release the source and gerber files at some point. (GNU GPL or similar)
that's a shame, I'd definitely be one of the guys willing to fork out for something as cool as this. I say that as a gamer/homebrewer though.

You said you want to release the source files at some point, and though I had to google what a gerber file is, it looks like I'd be able to produce a cart for my own use with these files. I wouldn't mind having one or two of these for personal use, what type of money would I be looking to spend to make it actually happen if you have any ideas? Where would I have to order from and what potential difficulties would I run into? Just curious, it'd be great to have something like this for myself even if you won't be handling the production yourself.