Then the obvious solution is to build real hardware that runs them
Expand ARAM to 128K
Option to put echo buffer in top half ($10000-$1FFFF).
No incomplete results for CPU multiplier and divider
Make $4203 and $4206 finish in 4 cycles before the CPU can get around to reading the result, or at least stall the CPU until it finishes. The DMA function already stalls the CPU by internally pulling RDY low.
Genesis-style VRAM access slots
Writes to VRAM outside vertical or forced blanking go to a FIFO. While the FIFO is full, the PPU tells the CPU to stall. The PPU commits a write to VRAM when any of the following is true:
- Background fetch from a background layer that is not enabled in TM, TS, TMW, TSW
- Background fetch from a (tile number, Y) combination identical to the previous tile on the same layer (will help with large blank areas in HUD)
- The 34th background fetch
- Half the sprite pattern fetches (for a momentary increase in sprite dropout)