* older one (UMC logo on left) that has internal divider = 16, which results in M2 = 26.601712 / 16 = 1.662607 MHz
* new one (UMC logo on top) that has internal divider = 15, which results in M2 = 26.601712 / 15 = 1,773447 MHz
(I did not check for more differences and how the old one is comparable with PAL NES' CPU)
Main source of those CPUs today is chinese aliexpress, but when ordering, you never know what you get. And the obvious issue is that many games that relies on cpu cycle counting will simply wont work correctly on the old revision.
This is especially true for Codemasters' games like Big Nose Freaks Out or Micro Machines
Because I have plenty of those chips that I would ultimately use for building consoles, I was thinking of any idea how to "fix" those CPUs.
First idea would be to fed as a clock
26.601712 * 16 / 15 = 28,37516 MHz (looks like there exists crystals of that value, though not very common).
But even feding the same frequency with second crystal would ultimately cause frequency drift without any feedback between this and the console's main crystal and as a result - CPU/PPU sync drift.
Second option would be to use some FPGA with PLL to obtain 16/15 phase locked freq, but I doubt if this would be affordable.
So my idea is to use some cheap CPLD, clock it at least at a rate of 2x 26.601712 (100 MHz would be enough) and make the CPLD generate "new" clock signal for the CPU:
- after the first edge of original clock, CPLD generates "a little faster" signal than 26.601712 Mhz
- there are 16 clocks of the new signal per 15 clocks of the original one
- after the 16th clock of output lock, CPLD waits until 15 clocks of the original signal passes and the whole generation starts again
- that way, we maintain 16/15 ratio AND phase sync every 16 clocks)
Code: Select all
original _|_---___---___---___---___---___---___---___---___---___---___---___---___---___---___---_|_
| 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 |
| |
new _|_---__---__---__---__---__---__---__---__---__---__---__---__---__---__---__------------_|_
| 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 |
100 MHz / 2 = 50Mhz -> but this might be too fast for CPU
100 Mhz / 3 = 33.3Mhz -> this is the best choice
100 Mhz / 4 = 25 MHz - this is slower than 26.601712 MHz so can't be
After wiring my EMP3064 test board
I can proudly confirm it works!
This code occupies 20/64 CPLD macrocells. Probably could be also done in some PAL16V8.
I am thinking of making some kind of adapter that could be put under dip40 socket
Code: Select all
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity proteza is
port (
clk100MHz : in std_logic;
clk26mhz : in std_logic;
clkout : out std_logic;
fix_enabled : in std_logic
);
end proteza;
architecture Behavioral of proteza is
signal last_clk26mhz : std_logic_vector(1 downto 0);
type state_t is (S1, S2);
signal state : state_t;
signal counter26mhz : integer range 0 to 15;
signal counter100mhz : integer range 0 to 47;
signal clk33mhz : std_logic;
signal clk33mhzreset : std_logic;
begin
rtl : entity work.divide_by_3(rtl)
port map(cout => clk33mhz, clk => clk100Mhz, reset => clk33mhzreset);
clkout <= clk33mhz when fix_enabled = '1' else clk26mhz;
process (clk100Mhz) is begin
if rising_edge(clk100Mhz) then
case state is
when S1 =>
if last_clk26mhz = "01" then
state <= S2;
counter100mhz <= 0;
counter26mhz <= 0;
clk33mhzreset <= '0';
end if;
when S2 =>
if counter100mhz /= 47 then
counter100mhz <= counter100mhz + 1;
else
clk33mhzreset <= '1';
end if;
if last_clk26mhz = "01" then
counter26mhz <= counter26mhz + 1;
elsif last_clk26mhz = "10" and counter26mhz = 14 then
state <= S1;
end if;
when others =>
end case;
last_clk26mhz <= last_clk26mhz(0) & clk26mhz;
end if;
end process;
end;
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
entity divide_by_3 is
port (
cout :out std_logic; -- Output clock
clk :in std_logic; -- Input clock
reset :in std_logic -- Input reset
);
end entity;
architecture rtl of divide_by_3 is
signal pos_cnt :std_logic_vector (1 downto 0);
signal neg_cnt :std_logic_vector (1 downto 0);
begin
process (clk, reset) begin
if (reset = '1') then
pos_cnt <= (others=>'0');
elsif (rising_edge(clk)) then
if (pos_cnt /= 2) then
pos_cnt <= pos_cnt + 1;
else
pos_cnt <= "00";
end if;
end if;
end process;
process (clk, reset) begin
if (reset = '1') then
neg_cnt <= (others=>'0');
elsif (falling_edge(clk)) then
if (neg_cnt /= 2) then
neg_cnt <= neg_cnt + 1;
else
neg_cnt <= "00";
end if;
end if;
end process;
cout <= '1' when ((pos_cnt /= 2) and (neg_cnt /= 2)) else
'0';
end architecture;
I wired a button that enables/disables this patch on the fly, you can see the difference not only in proper video timing, but also sound pitch is changed:
https://youtu.be/P1cv4_y3PgA