Overriding CPU divider from 16 to 15

Discuss hardware-related topics, such as development cartridges, CopyNES, PowerPak, EPROMs, or whatever.

Moderator: Moderators

Post Reply
User avatar
krzysiobal
Posts: 1036
Joined: Sun Jun 12, 2011 12:06 pm
Location: Poland
Contact:

Overriding CPU divider from 16 to 15

Post by krzysiobal »

As we all know, there exists two dendy CPU revisions (UA6527P)
* older one (UMC logo on left) that has internal divider = 16, which results in M2 = 26.601712 / 16 = 1.662607 MHz
Image
Image

* new one (UMC logo on top) that has internal divider = 15, which results in M2 = 26.601712 / 15 = 1,773447 MHz
Image
Image

(I did not check for more differences and how the old one is comparable with PAL NES' CPU)

Main source of those CPUs today is chinese aliexpress, but when ordering, you never know what you get. And the obvious issue is that many games that relies on cpu cycle counting will simply wont work correctly on the old revision.
This is especially true for Codemasters' games like Big Nose Freaks Out or Micro Machines
Image

Because I have plenty of those chips that I would ultimately use for building consoles, I was thinking of any idea how to "fix" those CPUs.

First idea would be to fed as a clock
26.601712 * 16 / 15 = 28,37516 MHz (looks like there exists crystals of that value, though not very common).
But even feding the same frequency with second crystal would ultimately cause frequency drift without any feedback between this and the console's main crystal and as a result - CPU/PPU sync drift.

Second option would be to use some FPGA with PLL to obtain 16/15 phase locked freq, but I doubt if this would be affordable.

So my idea is to use some cheap CPLD, clock it at least at a rate of 2x 26.601712 (100 MHz would be enough) and make the CPLD generate "new" clock signal for the CPU:
- after the first edge of original clock, CPLD generates "a little faster" signal than 26.601712 Mhz
- there are 16 clocks of the new signal per 15 clocks of the original one
- after the 16th clock of output lock, CPLD waits until 15 clocks of the original signal passes and the whole generation starts again
- that way, we maintain 16/15 ratio AND phase sync every 16 clocks)

Code: Select all

original _|_---___---___---___---___---___---___---___---___---___---___---___---___---___---___---_|_
          | 00    01    02    03    04    05    06    07    08    09    10    11    12    13    14  |
          |                                                                                         |
new      _|_---__---__---__---__---__---__---__---__---__---__---__---__---__---__---__------------_|_
          | 00   01   02   03   04   05   06   07   08   09   10   11   12   13   14   15           |
Now the question is, what frequency should the "new" signal be? FPGA is clocked at 100Mhz and without PLL, we can use only integer dividers, so:
100 MHz / 2 = 50Mhz -> but this might be too fast for CPU
100 Mhz / 3 = 33.3Mhz -> this is the best choice
100 Mhz / 4 = 25 MHz - this is slower than 26.601712 MHz so can't be

After wiring my EMP3064 test board
Image

I can proudly confirm it works!

Image

Image

This code occupies 20/64 CPLD macrocells. Probably could be also done in some PAL16V8.
I am thinking of making some kind of adapter that could be put under dip40 socket

Code: Select all

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity proteza is
	port (
		clk100MHz    : in std_logic;
		clk26mhz     : in std_logic;
		clkout       : out std_logic;
		fix_enabled  : in std_logic
		
	);
end proteza;

architecture Behavioral of proteza is
	signal last_clk26mhz : std_logic_vector(1 downto 0);
	type state_t is (S1, S2);
	signal state : state_t;
	
	signal counter26mhz : integer range 0 to 15;
	signal counter100mhz : integer range 0 to 47;
	signal clk33mhz : std_logic;
	signal clk33mhzreset : std_logic;
	
begin
	rtl : entity work.divide_by_3(rtl)
		port map(cout => clk33mhz, clk => clk100Mhz, reset =>  clk33mhzreset);
		
	clkout <= clk33mhz when fix_enabled = '1' else  clk26mhz;

	process (clk100Mhz) is begin
		if rising_edge(clk100Mhz) then 
			
			case state is
			when S1 =>
				if last_clk26mhz = "01" then
					state <= S2;
					counter100mhz <= 0;
					counter26mhz <= 0;
					clk33mhzreset <= '0';
				end if;
			when S2 =>
				if counter100mhz /= 47 then
					counter100mhz <= counter100mhz + 1;
				else
					clk33mhzreset <= '1';
				end if;
				
				if last_clk26mhz = "01" then
					counter26mhz <= counter26mhz + 1;
				elsif last_clk26mhz = "10" and counter26mhz = 14 then
					state <= S1;
				end if;
			when others =>
			end case;
		
			last_clk26mhz <= last_clk26mhz(0) & clk26mhz;
		end if;
	end process;

	
end;

library ieee;
    use ieee.std_logic_1164.all;
    use ieee.std_logic_unsigned.all;

entity divide_by_3 is
    port (
        cout   :out std_logic; -- Output clock
        clk    :in  std_logic; -- Input clock
        reset  :in  std_logic  -- Input reset
    );
end entity;

architecture rtl of divide_by_3 is
    signal pos_cnt :std_logic_vector (1 downto 0);
    signal neg_cnt :std_logic_vector (1 downto 0);
begin
    process (clk, reset) begin
        if (reset = '1') then
            pos_cnt <= (others=>'0');
        elsif (rising_edge(clk)) then
            if (pos_cnt /= 2) then
                pos_cnt <= pos_cnt + 1;
            else
					pos_cnt <= "00";
				end if;
        end if;
    end process;
    
    process (clk, reset) begin
        if (reset = '1') then
            neg_cnt <= (others=>'0');
        elsif (falling_edge(clk)) then
            if (neg_cnt /= 2) then
                neg_cnt <= neg_cnt + 1;
				else
					neg_cnt <= "00";
            end if;
        end if;
    end process;
    
    cout <= '1' when ((pos_cnt /= 2) and (neg_cnt /= 2)) else
            '0';
end architecture;

I wired a button that enables/disables this patch on the fly, you can see the difference not only in proper video timing, but also sound pitch is changed:

https://youtu.be/P1cv4_y3PgA
Image My website: http://krzysiobal.com | Image My NES/FC flashcart: http://krzysiocart.com
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Overriding CPU divider from 16 to 15

Post by lidnariq »

The programmable logic cells in a PIC or AVR might be up to this too...

At the very least, there's a new line ("Q43") of PICs with three "Numerically-Controlled Oscillators" which should be able to generate both the needed PAL×6 and PAL×6.4 from their reference clock. There are probably other more-clever options too.
NewRisingSun
Posts: 1510
Joined: Thu May 19, 2005 11:30 am

Re: Overriding CPU divider from 16 to 15

Post by NewRisingSun »

Wait, does that mean that there are actually two "Dendy" timings that need to be considered when emulating games?
User avatar
krzysiobal
Posts: 1036
Joined: Sun Jun 12, 2011 12:06 pm
Location: Poland
Contact:

Re: Overriding CPU divider from 16 to 15

Post by krzysiobal »

NewRisingSun wrote: Mon Nov 21, 2022 8:21 am Wait, does that mean that there are actually two "Dendy" timings that need to be considered when emulating games?
I never meet those "old revision" chips in any of the famiclones, so technically not.
Image My website: http://krzysiobal.com | Image My NES/FC flashcart: http://krzysiocart.com
User avatar
Individualised
Posts: 309
Joined: Mon Sep 05, 2022 6:46 am

Re: Overriding CPU divider from 16 to 15

Post by Individualised »

krzysiobal wrote: Mon Nov 21, 2022 8:50 am
NewRisingSun wrote: Mon Nov 21, 2022 8:21 am Wait, does that mean that there are actually two "Dendy" timings that need to be considered when emulating games?
I never meet those "old revision" chips in any of the famiclones, so technically not.
Just because they're not common doesn't mean they're not worth emulating. If anything, that means it's worth emulating more for preservation purposes.

I've said it before but the way emulators currently handle regions/hardware variants is flawed anyway. Ideally, an emulator would emulate multiple different CPU and PPU revisions and their different quirks (Ricoh or otherwise) and allow you to pick and choose your hardware configuration, rather than just a simple "Region" drop-down plus maybe some additional options to emulate certain quirks such as swapped duty cycles or no periodic noise. Of course though, this would be a massive undertaking for emulator devs.
NewRisingSun
Posts: 1510
Joined: Thu May 19, 2005 11:30 am

Re: Overriding CPU divider from 16 to 15

Post by NewRisingSun »

The distinction between swapped and non-swapped duty cycles is not clearly-delineated between particular UMC CPU revisions -- on some revisions, different runs of the same revision can differ between the swapped and non-swapped behavior. I don't think there even exists a complete table as to which chip has which duty cycle behavior.

Furthermore, selecting chip revisions instead of chip attributes would be a massive usability fail, as even the advanced user may not know which CPU/PPU model has which periodic noise/duty cycle/$2004 readback behavior, but may know what periodic noise or what a duty cycle is.

There are further distinctions between CPUs with regards to DPCM bit order, and the presence of a working decimal mode. For most of these attributes, it is not perfectly known which chip exhibits which behavior. Even if one were to agree with you that selecting chips rather than chip attributes is better, one would have to delay such an implementation until all details of every chip are fully known.
User avatar
Individualised
Posts: 309
Joined: Mon Sep 05, 2022 6:46 am

Re: Overriding CPU divider from 16 to 15

Post by Individualised »

NewRisingSun wrote: Mon Nov 21, 2022 3:02 pm The distinction between swapped and non-swapped duty cycles is not clearly-delineated between particular UMC CPU revisions -- on some revisions, different runs of the same revision can differ between the swapped and non-swapped behavior. I don't think there even exists a complete table as to which chip has which duty cycle behavior.

Furthermore, selecting chip revisions instead of chip attributes would be a massive usability fail, as even the advanced user may not know which CPU/PPU model has which periodic noise/duty cycle/$2004 readback behavior, but may know what periodic noise or what a duty cycle is.

There are further distinctions between CPUs with regards to DPCM bit order, and the presence of a working decimal mode. For most of these attributes, it is not perfectly known which chip exhibits which behavior. Even if one were to agree with you that selecting chips rather than chip attributes is better, one would have to delay such an implementation until all details of every chip are fully known.
Very good points. Apologies for my ignorance.
User avatar
org
Posts: 155
Joined: Tue Aug 07, 2012 12:27 pm

Re: Overriding CPU divider from 16 to 15

Post by org »

I've said it before but the way emulators currently handle regions/hardware variants is flawed anyway. Ideally, an emulator would emulate multiple different CPU and PPU revisions and their different quirks
Absolutely agree with this point of view, I finally found a like-minded person :)

But at the moment it can't be done for the reason a) Ordinary users are more used to the phenomenological concept of "region" b) As already said a shift to a feature-based approach can't be done now, because not all chips are properly studied. So other than waves of criticism there will be nothing now :) As all chips are studied (I hope) - users will also "mature". In the end all the features can be combined into a single "model" of the game console and from GUI side it won't be different to choose NTSC/PAL/Dendy or NES USA/NES European/Famicom/Famiclone-of-your-choice (Dendy,Pegasus,Else).
Pokun
Posts: 2675
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: Overriding CPU divider from 16 to 15

Post by Pokun »

I agree as well, I would probably provide the user the possibility to mix and match revisions or features manually but also probably provide setting "packages" that makes it easier to emulate a specific region/model for mortals. Mesen and Nintendulator partly does this with their options to swap duty cycles, disable $2004 writability and such advanced options.
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Overriding CPU divider from 16 to 15

Post by rainwarrior »

I think for most emulators it is more about not yet having enough documented information for that kind of detailed behaviour selection. There's been a ton of stuff unearthed in the past few years that we just didn't have any centralized information about before.

You see this kind of detailed platform configuration in emulators like PCem or OpenMSX where they might let you select from hundreds of known machines, where each might have an associated BIOS ROM, RAM size, peripheral connections, etc. and it's mostly controlled by a small number of discrete data points from a database. There's probably not enough research/documentation/time to implement finer details about specific machines, for the most part, but at least some of the behaviours are being captured, and the setting holds a place for a more accurate machine-specific implementation if that ever comes to light.
Pokun
Posts: 2675
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: Overriding CPU divider from 16 to 15

Post by Pokun »

Yes the OpenMSX approach is basically what I'm talking about. You can choose from existing MSX computers or you can basically create your own MSX machine by mixing and matching hardware features. As you say it might not make much sense on NES (and the NES definition is of course not as loose as the MSX definition) at the moment but it could be a good goal as various variations becomes better documented.

The emulator also needs to cover the features in the help documentation. It's quite frustrating when you have an emulator with tons of different parameters to adjust but no or very poor explanation of what they are all about. It's seldom a problem with a NES emulator for us who are familiar with almost every little hardware quirk, but for anyone else using it it's essential.
User avatar
Eugene.S
Posts: 317
Joined: Sat Apr 18, 2009 4:36 am
Location: UTC+3
Contact:

Re: Overriding CPU divider from 16 to 15

Post by Eugene.S »

My position is fundamentally different.

Famiclones were based on many different chips from different manufacturers. If you try to count all revisions there were several dozen of these chips (separated CPU/PPU, single-crystal CPU/PPU with external RAM, fully NoACs). And almost each of them had its own specific bugs/flaws.
Somewhere the duty cycles were swapped; somewhere there were obvious sound troubles (problems with the DMC, clicks of triangle channel); somewhere emphasis bits colored/darkened the picture more than necessary (UM6561). Some of early UM6561 models (UM6561 "AF" revision) even had serious issues with compatibility (Prince of Persia game broken on it).

But despite all the flaws of specific chips they were all united by a common "dendy-like" timing. And my main goal was to separate the main point from the secondary: Emulate the timing to run NTSC games with high compatibility, but not to emulate bugs that interfere with normal operation and which differs from model to model (or make it optional, but not default).

I saw that UA6527P with /16 divider, but what real reason to emulate it? It just ruins compatibility.
This is an early revision that was abandoned.
Also, i saw broken/buggy UA6538 exemplars (from aliexpress) that have NMI on 286 line instead of correct 291 (tepples 240pee test rom show it).
This chip ruined timings completely.

I strongly believe that there is no need to add obvious bugs to emulators.

In conclusion
Unfortunately, there are no absolutely glitch-free pirate chips.
If Nintendo/Ricoh had made a PAL NES with more proper timing (Dendy-like instead official 2A07/2C07 timings), it would have been a much better solution. And we wouldn't have this bunch of headaches with pirate chip emulation.
Pokun
Posts: 2675
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: Overriding CPU divider from 16 to 15

Post by Pokun »

That sounds like a philosophical question about what the goal of your emulator is. For many emulators you want as high compatibility as possible, or you just want games to be reasonably playable. For emulators like Mame, documenting the hardware, including all its flaws, is an important part of the project, and if a bug comes with compatibility problems on real hardware, that is ideally what you want in the emulation as well.
User avatar
krzysiobal
Posts: 1036
Joined: Sun Jun 12, 2011 12:06 pm
Location: Poland
Contact:

Re: Overriding CPU divider from 16 to 15

Post by krzysiobal »

I was just wondering, if the above technique should be used to accomodate dendy CPU (either with 16 or 15 built in divider) as a replacement for NTSC CPU, that is one that should output M2 = 1.7897727MHz (= 21.477272 / 12, PPU/CPU div = 3). PPU in that console would be NTSC, so that forces CPU to be clocked by21.477272 MHz cystal.

To keep PPU/CPU div at 3, for each of 15 or 16 main clock cycles (depending on the CPU version), 3 (or 4) clock cycles should be "eaten" by the CPLD. Sound speed of games run in such configuration should be same as for genuine NTSC console (60 frames/sec) , but I wonder what about pitch of the sound.

I am going to make similar console to the one: viewtopic.php?t=13851
but instead of just Dendy/NTSC switch, I want it to work in all 3 modes (Dendy+PAL+NTSC). So instead of using 3 CPUS and 3 PPUS, i Wwant just 3 PPU and 1 CPU and the CPU clock divider will be selectable.
Image My website: http://krzysiobal.com | Image My NES/FC flashcart: http://krzysiocart.com
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Overriding CPU divider from 16 to 15

Post by lidnariq »

The 2A07 has different tuning tables for DPCM and noise (as well as the frame IRQ, but who uses that?), so music that uses things like "Sunsoft bass" will be out of tune if played on a 2A03/Dendy and vice versa.

But I don't see any reason you couldn't use a differentially-clocked 2A03/6540 to run the other's software.
Post Reply