Page 2 of 17

Posted: Sun Jan 17, 2010 3:17 am
by neviksti
caitsith2 wrote:All of this was figured out by segher, one of the few that hangs out on IRC.
Did he reproduce this himself by looking at the IC chip, or is this just based on my circuit tracing here (and other circuit tracing comments in other threads)?
http://nesdev.com/bbs/viewtopic.php?p=43119#43119

It sounds like he traced out the first three PC cells? (And possibly the "reset" state?) Both of which I did not get around to doing yet. That would be great.

Breaking the rom into 'quadrants' instead of just the usual row address / column address are high/low (or vice versa) was discussed as well. But I'd have to reread all the threads and/or find the rest of my notes to see if I got the same exact address decoding as you wrote there.

As an aside, the earlier CICs had 12 "columns" instead of just 8.
segher wrote:> Then please enlighten me

Hint: the fixed point of an LFSR is 0, and that is the start address here.

The correct name for this kind of thing is "polynomial counter".
I have to agree with Tepples here.
We've been calling this a LFSR from the beginning of that revelation about the PC, and while I feel the term is correct, even if you do not it should be clear what was meant by it. We're all friendly here.

If there is some information we are lacking by all means point it out, but please don't create semantic barriers and then not provide clarifying information when dismissing what others say. OK? Let's all work together. The atmosphere was much more congenial when everyone worked on the Tengen chip.
segher wrote:It is neither Harvard architecture nor Von Neumann architecture.

Quoting wackypedia doesn't make you look smart, btw; quite the opposite.
How is it not Harvard?
The program code is even a different bit width than the ram, and the ROM addressing can come ONLY from the PC. Those two facts seem to require it to be Harvard to me.

Again, please don't be dismissive without explaining why. Just disagreeing and then taking a crack at tepples is not helpful.

Posted: Sun Jan 17, 2010 2:59 pm
by segher
> Did he reproduce this himself by looking at the IC chip,

I couldn't get any good photos of the chip, I would love to see some.

I reverse engineered this all from just the bits in the rom, and a lot of
datasheet/patent archeology. Couldn't find the exact cpu fwiw.

> or is this just based on my circuit tracing here (and other circuit tracing
> comments in other threads)?
> http://nesdev.com/bbs/viewtopic.php?p=43119#43119

No, interesting though. For some reason I totally missed that thread.

> It sounds like he traced out the first three PC cells? (And possibly the
> "reset" state?) Both of which I did not get around to doing yet.
> That would be great.

I messed around with permuting and inverting the address bits until I
got something that looked resonable. Then I figured out the jump/call/ret
opcodes, which then allowed me to fine-tune the address decoding (jumps
have to go somewhere reasonable).

Figuring out the rest of the opcodes is harder: I did not even know what
registers there are! So reversing the opcodes and reversing the cpu has
to be done in parallel.

Right now I have one opcode left to do. It is used only in the "nescic"
rom, and only once.

> As an aside, the earlier CICs had 12 "columns" instead of just 8.

Those are _earlier_?! The code in the "12-bit cic" is much nicer, smaller
code, which does pretty much the same thing.

It does make sense though: it has a few opcodes swapped, while "nescic"
and "d411" are the same.

> I have to agree with Tepples here.
> We've been calling this a LFSR from the beginning

That doesn't make it right though ;-)

An LFSR is a _linear feedback_ shift register. XNOR isn't linear.
It matters a lot mathematically, and e.g. when you try to brute-force
this stuff by trying out all LFSRs and giving each a score. I ended
up with the rom backwards, since the complement of the PC counter
_is_ an LFSR of course (there is an even number of taps).

> We're all friendly here.

I hope so, so am I, I hope.

> If there is some information we are lacking

As I said already, I'll post about this later, just not yet, give me some
more time. It's much more efficient for everyone if I simply explain
this stuff, rather than you having to pry it from me with a
question-and-answer game.

> How is it not Harvard?

It can read data from insn rom.


Segher

Posted: Sun Jan 17, 2010 7:58 pm
by tepples
byuu wrote:Dear god why, why would anyone design hardware like this? ;_;
Today NovaYoshi was making a really simple CPU in KLogic as an experiment. I told him how the CIC's program counter was a polynomial counter, much like the noise generator in the 2A03 and the POKEY.

His first question: "Wait, what? That's a CPU? Why would Nintendo include a CPU in every Game Pak for lockout and not for DSP?" I told him the 4-bit microcontroller was cheap enough for Nintendo to have made that it was worth avoiding the pile of total crap released for a certain Atari console around the mid-1980s video game recession.

His second question was like yours: "Why does it use a polynomial counter?" I told him my guess: a row of latches and an XNOR gate save a few gates off the row of half adders that make a up a linear program counter.
Is there a place one can obtain the SNES CIC ROM, and some logs of observed output from a real CIC?
We could try with the (well-known) NES data, provided it actually is the same CPU.
segher wrote:It can read data from insn rom.
Then it's a modified Harvard architecture. But if the instructions for reading from data memory and from instruction memory are separate, and there's no mechanism for writing to instruction memory or executing from data memory, it's still a lot closer to Harvard than von Neumann. But my point was it'd still be a pain to squeeze a linearly indexed array into the program space if the program itself is not linearly indexed.

I just got uncomfortable when you started disrespecting the work of Wikipedia contributors. Nevertheless, I raised the issue of "XNOR polynomial counters are not strictly linear" on the LFSR article's talk page.

Posted: Sun Jan 17, 2010 9:48 pm
by segher
I have posted an article at http://hackmii.com/2010/01/the-weird-and-wonderful-cic/ .

It describes the architecture and instruction set of the CIC. I also posted disassemblies of the NES and SNES ROMs, and the source code for the disassembler (which is at http://git.infradead.org/users/segher/dis-cic.git ).

Have fun! Questions welcome, of course.


Segher

Posted: Sun Jan 17, 2010 9:58 pm
by segher
> His second question was like yours: "Why does it use a polynomial
> counter?" I told him my guess: a row of latches and an XNOR gate save
> a few gates off the row of half adders that make a up a linear program
> counter.

Yes, it is actually only about half the area, which is worthwhile given that
this chip is so simple that _anything_ is big :-)

>> It can read data from insn rom.
> Then it's a modified Harvard architecture.

Sure, it is closer to Harvard arch than to Von Neumann arch.

> But my point was it'd still be a pain to squeeze a linearly indexed array
> into the program space if the program itself is not linearly indexed.

Not really; the table lookup insns use a different bank of ROM anyway.

> I just got uncomfortable when you started disrespecting the work of
> Wikipedia contributors.

I don't disrespect their work, or the contributors; I don't find wikipedia
a trustworthy source of information though.


> Nevertheless, I raised the issue

Thank you.

Posted: Mon Jan 18, 2010 12:24 am
by neviksti
Thanks for taking the time to write that up.
segher wrote:I reverse engineered this all from just the bits in the rom, and a lot of datasheet/patent archeology. Couldn't find the exact cpu fwiw.
Did you compare with the Tengen code, ala 'rosetta' style? Otherwise I don't understand how you are gleaning information on opcodes.

And stuff like "there is a four entry stack for the PC; it’s not in RAM, it is separate". As I mentioned, I could see that when I traced out the circuit, but I don't understand how you can see that just from the ROM dump and with no knowledge of which cpu is used.

Basically, it is important for my learning process to know what assumptions went into creating this ... where intuition and guesses are separated from derivations. This is important to me. I would like to understand the process better, so I can understand the current state of knowledge better.
segher wrote:> How is it not Harvard?

It can read data from insn rom.

...

Not really; the table lookup insns use a different bank of ROM anyway.
Since I've traced out the PC and stack myself, I don't see how a lookup physically is possible (unless there are complicated multi-cycle instructions involving the stack). It looks to me like you can only have data from the ROM in the sense that an "immediate" address mode opcode contains data in the opcode itself.

Scanning your opcode list, I don't see any table lookup instructions. So it still looks like Harvard architecture to me. If I'm missing something material here, please let me know. (If you feel I'm misusing terminology again, I guess it can only help to clear that up as well.)

While this line of questions may seem silly, since I approached this rev-engineering mostly from bottom->up from the IC circuitry itself, it is important to me to reconcile this information with what I have learning from studying the circuitry myself.

---
EDIT:
You wrote:
"that third ROM is 768 bytes, which I don’t handle in my little conversion script, so you’ll need to remove the extra columns (they are empty anyway)"

Actually, there is one byte in there that is not empty. I wonder if this corresponds with your 'mystery' opcode in some way. I'm tired right now, so I'm going to bed. I'll re-read all this stuff tomorrow after some rest.

Posted: Mon Jan 18, 2010 12:37 am
by caitsith2
neviksti wrote: ---
EDIT:
You wrote:
"that third ROM is 768 bytes, which I don’t handle in my little conversion script, so you’ll need to remove the extra columns (they are empty anyway)"

Actually, there is one byte in there that is not empty. I wonder if this corresponds with your 'mystery' opcode in some way. I'm tired right now, so I'm going to bed. I'll re-read all this stuff tomorrow after some rest.
That not-empty byte is only a "t 400" instruction. (machine code 0x80.)

Posted: Mon Jan 18, 2010 12:59 am
by segher
> Thanks for taking the time to write that up.

My pleasure, it was a wonderful ride. I hope this info is useful in some way.

> Did you compare with the Tengen code, ala 'rosetta' style?

Yes I did. The Tengen code isn't a 1-1 translation, it does some things
in a different order, and it doesn't do most of the work at all.

It certainly helped though, esp. the timing info.

> Otherwise I don't understand how you are gleaning information on opcodes.

You take a frequent opcode, see in what patterns it is used, and go
from there. Lots of trial and error.

> And stuff like "there is a four entry stack for the PC; it’s not in RAM, it is
> separate". As I mentioned, I could see that when I traced out the
> circuit, but I don't understand how you can see that just from the ROM
> dump and with no knowledge of which cpu is used.

A big part of the work was finding and reading as much documentation
on this chip as possible. None is there that I could find, but some family
members have a bit of info hidden in various patents. Also, some newer
family members have actual datasheets available.

I think the actual chip is a Sharp SM4 (or some very old SM5).

Gleaning the PC stack from the best die photographs i could find
( https://netfiles.uiuc.edu/mantey/www/D4 ... erview.jpg ) was easy. It is not so easy to find registers on there etc., and I cannot read
out the insn decoder PLA on that resolution/quality ;-)

> Basically, it is important for my learning process to know what
> assumptions went into creating this ... where intuition and guesses are
> separated from derivations. This is important to me. I would like to
> understand the process better, so I can understand the current state of
> knowledge better.

Find me on IRC, it is hard to explain in a forum. You can write it up if you
want to, it's not like I want to keep it a secret or something, I just don't
know where to start.

> > > How is it not Harvard?
> > It can read data from insn rom.

> > Not really; the table lookup insns use a different bank of ROM anyway.

> Since I've traced out the PC and stack myself, I don't see how a lookup
> physically is possible (unless there are complicated multi-cycle
> instructions involving the stack). It looks to me like you can only have
> data from the ROM in the sense that an "immediate" address mode
> opcode contains data in the opcode itself.

It indeed does a push to stack, then it fetches a byte (from a special bank, offset X and A), to X and A, and finally it pops PC again. That is how it is
described for different SM5 anyway.

> Scanning your opcode list, I don't see any table lookup instructions.

Yes, I have only insns that are _used_ in there, I have no way of figuring
out the rest without a much better die photograph, or some ancient docs
showing up magically. There might not _be_ a table insn on this even, I
thought there was though.

> So it still looks like Harvard architecture to me. If I'm missing
> something material here, please let me know. (If you feel I'm misusing
> terminology again, I guess it can only help to clear that up as well.)

Many opcodes have immediate operands in the opcode; that isn't "pure
Harvard". But almost nothing is, heh. It is pretty silly to want to divide
all CPU designs into these two groups; some CPUs are *very* much not like
either Harvard or Von Neumann arch!

I agree now that the CIC CPU is quite like Harvard though.

> While this line of questions may seem silly, since I approached this
> rev-engineering mostly from bottom->up from the IC circuitry itself, it is
> important to me to reconcile this information with what I have learning
> from studying the circuitry myself.

Yeah, it is great to hear (and see) how other people approach the problem,
in my experience everyone has a very different way of working.

> > "that third ROM is 768 bytes, which I don’t handle in my little conversion
> > script, so you’ll need to remove the extra columns (they are empty
> > anyway)"

> Actually, there is one byte in there that is not empty.

It's a "t 0" at offset 7f, you have that in every bank. Doesn't do anything
useful, probably an artifact of the assembler?

> I wonder if this corresponds with your 'mystery' opcode in some way.

Not as far as I can see, sorry :-(

I think the mystery 5e sets some internal flag or I/O something, btw.


Segher

Posted: Tue Jan 19, 2010 5:14 am
by ikari_01
OK, so this is kind of groundbreaking and also great news for people who would like to build small series of SNES dev carts or the likes. ;)
So, do any ROM dumps or die photographs exist of the D413A (PAL CIC)?

Posted: Tue Jan 19, 2010 6:18 am
by Jeroen
^^I could be wrong but I don't think they've actually cracked it yet. They're getting closer though.

Posted: Tue Jan 19, 2010 6:39 am
by ikari_01
The progress is amazing. AFAICS there are only two mysteries left:
  • Instruction $5e (which is not used in the currently dumped SNES CIC ROMs)
  • multiple consecutive ldi instructions.
AFAICS the latter are used in seed initialization so it might actually help to have another ROM dump.
I do have a spare D413A but I seriously have no idea how to go about decapping it and taking pictures. I'm afraid I don't have access to the necessary equipment.

Posted: Tue Jan 19, 2010 10:55 am
by Jeroen
Oh wow thats awesome. Wonder how long it'll take till we have a working clone then :-D

Posted: Tue Jan 19, 2010 2:11 pm
by segher
> OK, so this is kind of groundbreaking and also great news for people who
> would like to build small series of SNES dev carts or the likes. ;)
> So, do any ROM dumps or die photographs exist of the D413A (PAL CIC)?

I haven't seen any ROM dumps or die shots for those.

They aren't needed though (assuming the PAL versions etc. use the same
algorithm, just different initial values / "keys"). The easiest way to create
a "sciclone" would be to sniff the data stream on a D411, and verify that we
understand the algorithm from that; also, you can derive the timings from
that (much easier than counting cycles in the disassembly; well, much less
tedious and error-prone anyway).

Then, take a D413 etc., and sniff the streams on that; then, write some
simple program that derives the initial values from that (it isn't exactly
a cryptographically secure cipher).

The algorithm for the SNES CIC is almost identical to the one on the NES
CIC: the main (only?) difference is that it runs the "mangle" function three
times in a row where the NES CIC does it only once.

Someone _could_ have figured that out from the data alone even. It's
hard to speculate how much of a long shot that would have been.

Posted: Tue Jan 19, 2010 2:19 pm
by segher
> The progress is amazing. AFAICS there are only two mysteries left:

Depends what you think is the goal of all of this. If the goal is to actually
figure out the instruction set, there probably are some more opcodes that
aren't used in the CIC code. If the goal is to figure out the algorithm used
on the SNES CIC, then we have plenty of information (together with
stream dumps) to figure it all out.

I am also not claiming there are no mistakes in my interpretation of what
the instruction set and CPU architecture is.

> [*]Instruction $5e (which is not used in the currently dumped SNES CIC ROMs)

Yeah, I have no real clue what this does.

> [*]multiple consecutive ldi instructions.
> AFAICS the latter are used in seed initialization so it might actually help
> to have another ROM dump.

They are used in a few other places. It isn't necessary to really understand
those to create a clone, it is almost trivial to reconstruct the initial state
from a few dumps of the actual data streams.

Posted: Tue Jan 19, 2010 4:00 pm
by bunnyboy
Not sure if this has been posted yet, but a while ago I made a logic analyzer dump of the D411 when stream 1111 is picked.

http://nesmuseum.com/10nes/snes1111.png

The idea is similar to the NES cic where the console sends the 4 bit stream to use (shown before the -400uS line) then its sparse bits from console and cart. I can make more accurate picts if needed but there should be a couple people with manual logging equipment to large cycle accurate tables.