PC-Engine General questions

Discussion of development of software for any "obsolete" computer or video game system. See the WSdev wiki and ObscureDev wiki for more information on certain platforms.
User avatar
za909
Posts: 249
Joined: Fri Jan 24, 2014 9:05 am
Location: Mijn hart woont al in Nederland

PC-Engine General questions

Post by za909 »

Hi, I've made a similar thread a few years ago, but now that I've actually been writing code for this machine and have the real hardware with a flash card to test things, I keep bumping into incomplete and sometimes outright incorrect or contradictory documentation. It makes the process of producing something for this machine very cumbersome but also gives a more "authentic" experience of what development for an old system may have been like during its time. I'm mainly surprised by how much freedom the VDC gives when it comes to setting up the display, and the possibility of displaying a "non-standard" resolution image with it always intrigued me. I'm trying to understand what needs to be done to achieve this, but I have many uncertainties.

This is the deepest explanation I've read about how the pixel output works, but what is unclear is the CPU access "slots". So in order to "safely" use the 10MHz pixel clock (to display a 512x240 image) VRAM access must be set to the 2 or 4 slot mode. What I assume is that in 4-slot mode, no time is allocated for the CPU to access VRAM so you end up with a situation similar to the NES and other systems, and basically you can only ever safely access VRAM during VBlank? But what is even more confusing is what happens in 2-slot mode. It says slots 3 and 4 are used by the CPU, but what is the result? Does it mean the CPU is stalled until those slots occur, and reads/writes become slower? Or can the CPU accidentally read/write at the wrong time if the slots do not line up with the CPU access?

If there is anyone knowledgeable about the PCE hardware, any help would be much appreciated!
Pokun
Posts: 2681
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: PC-Engine General questions

Post by Pokun »

Here are my notes on VRAM Access Width Mode, maybe it explains it a bit more clearly:

Code: Select all

MWR = $09
R09 Memory Access Width Register
FEDCBA98 76543210
         ||||||++- VM - VRAM Access Width Mode
         ||||++--- SM - Sprite Access Width Mode
         |+++----- SCREEN - Screen Width Mode
         +-------- CM - CG Mode

VM - VRAM Access Width Mode
Selects how many clocks should be used for VRAM, BAT (Background Attribute
Table), CG (Character Generator) or DMA. A value that matches the speed of
VRAM should be selected. After writing, it takes effect on the start of the
next vblank period. Accessing VRAM is allowed after next vblank interrupt.

  Bit   Dot   Dot cycles within an 8-dot unit
  1 0  Width   1   2   3   4   5   6   7   8
  -------------------------------------------
  0 0    1    CPU BAT CPU  X  CPU CG0 CPU CG1
  0 1    2    --BAT-- --CPU-- --CG0-- --CG1--
  1 0    2    --BAT-- --CPU-- --CG0-- --CG1--
  1 1    4    ------BAT------ ---CG0 or CG1--
CPU - A read or write to register R02
BAT - The palette block and character name from the BAT
X   - Unused "dummy" access
CG0 - Bitplanes 0, 1 from the character generator
CG1 - Bitplanes 2, 3 from the character generator

Note: Both value %01 and %10 does the same thing. It varies which is used by
games, but they are supposedly identical in function.
Note: In 4-dot Mode background data is displayed using 4 instead of 16 of
the 256 colors.
Note: This setting can be adjusted to match the speed of the VRAM used.
The PC Engine uses very fast VRAM though so 1-dot mode can often be used.
If the 10.7 MHz dot clock is used, setting VM to %01 or %10 is a good idea.


SM - Sprite Access Width Mode
...
I don't think you ever need to set VM to %11 (dot width 4). Hudson made that option so that the chip could be used with slower VRAM as well, but NEC picked the fastest RAM chips available (supposedly to make sure it was ready for CD technology that they planned form the beginning) for the PC-Engine.


Setting a custom resolution is quite complicated and not all parameters matters in the PC-Engine due to how the VCE is used for output instead of the VDC (which is supposedly capable of video output by itself) and overrides it.
I have a summary of what the parameters do in my notes (but I forgot the details). Here it is:

Code: Select all

Summary of resolution parameters:
Dot clock = set to one of 3 possible horizontal resolutions
HDS       = desired start position of display area in BG characters -1
HDW       = desired width of display area in BG characters -1
HSW       = 1~32, -1
HDS + HSW = does not matter on the PC Engine
HDE       = horizontal end position of display area
VDW       = desired height of the active display area in lines -1
VDS       = desired start position from the top for the active display area
VDS + VSW = 14 or more
VCR       = remaining number of display area lines of the 242 total (if any)
And here are some examples (stolen from Charles McDonald's docs I believe):

Code: Select all

256x239 - most common setting:
  Clock    HSW     HDS     HDW     HDE    VSW    VDS     VDW     VCR
   5.37    $02     $02     $1F     $04    $02    $0F     $00EF   $04

224x192 - small screen with large visible borders on all four sides:
  Clock    HSW     HDS     HDW     HDE    VSW    VDS     VDW     VCR
   5.37    $02     $04     $1B     $04    $00    $24     $00BF   $04
I think the numbers comes from what games actually tends to use.
turboxray
Posts: 348
Joined: Thu Oct 31, 2019 12:56 am

Re: PC-Engine General questions

Post by turboxray »

za909 wrote: Fri Apr 01, 2022 1:16 pm Hi, I've made a similar thread a few years ago, but now that I've actually been writing code for this machine and have the real hardware with a flash card to test things, I keep bumping into incomplete and sometimes outright incorrect or contradictory documentation. It makes the process of producing something for this machine very cumbersome but also gives a more "authentic" experience of what development for an old system may have been like during its time. I'm mainly surprised by how much freedom the VDC gives when it comes to setting up the display, and the possibility of displaying a "non-standard" resolution image with it always intrigued me. I'm trying to understand what needs to be done to achieve this, but I have many uncertainties.

This is the deepest explanation I've read about how the pixel output works, but what is unclear is the CPU access "slots". So in order to "safely" use the 10MHz pixel clock (to display a 512x240 image) VRAM access must be set to the 2 or 4 slot mode. What I assume is that in 4-slot mode, no time is allocated for the CPU to access VRAM so you end up with a situation similar to the NES and other systems, and basically you can only ever safely access VRAM during VBlank? But what is even more confusing is what happens in 2-slot mode. It says slots 3 and 4 are used by the CPU, but what is the result? Does it mean the CPU is stalled until those slots occur, and reads/writes become slower? Or can the CPU accidentally read/write at the wrong time if the slots do not line up with the CPU access?

If there is anyone knowledgeable about the PCE hardware, any help would be much appreciated!
So bits D0-D1 of MWR are known as "VM" in the official docs. They control the cpu access slots during active display. Think of vram access during active display as a group of 8 vram access slots. During this period, again for active display, you get the following:

For value $00:

| CPU | BAT | CPU | internal | CPU | CG0 | CPU | CG1 |

Vram is 16bits, so each of these 8 slots are a WORD or two bytes. BAT is the tilemap, and CG0 is the first 2 bit planes of a tile, and CG1 is the second 2 bit planes for the tile (giving 4 bits). This is independent of the VDC dot clock setting. But to put that in real word terms, assume 5.369mhz dot clock (typical for 256x240 res or whatever). That means the VDC gives the cpu four WORD vram (read or write) slots per 8 pixels of the VDC. For this purpose, for this dot clock, there are 341 'vdc' pixels on a given line. So each access slot is every ~1.33 cpu cycles, or ~186ns. For reference, CPU cycle is ~139.7ns. So every 2.66 cpu cycles, the CPU can write two bytes. You'll never saturate that bandwidth, even with Txx or faster graphics embedded as opcodes ST1/ST2. If the CPU is misaligned, /RDY will be asserted to the CPU. This happens on the latch of $0003 write. Not the $0002 lsb write (that simply goes into a buffer, not vram). So during active display, you effectively have near full vram read or write bandwidth.

On thing to note about all of this, sprite pixel data is fetched only during hblank. So sprite pixels do not interfere during active display since it comes from an internal buffer. But.. during hblank, sprite dma fetch uses ALL slots in an 8 pixel block. So if you write or read vram during this period, you will be stalled until it finishes. The length of this depends on how many sprite 16x1 cells are fetched (up to 16 will be fetch, no more, but could be less). The VDC does not take into account if the sprite is off screen horizontally - it counts those sprites and fetches those pixels. So always put off screen sprites as Y=0 or Y > screen height.

For 5.37mhz and 7.16mhz dot clocks, you can use the setting of $00. For 10.74mhz, if you use $00 then you're overclocking vram. While lots of homebrewers, including myself, have done this without any effects for decades... it's probably a good idea to set it to slower vram fetch mode of $01 ($02 is the same value). Since you're running the clock rate in 10.74mhz mode ( i.e. "512x240" mode), and a rate of $01, you only have 1 cpu WORD access slot per 8 pixel block. The upside is that even though you have a CPU word access slot every ~558ns, you'll still probably not saturate it; the fastest access is ST1/ST2 so 5 cycles is 698ns (1396ns for the pair).. and you have 774ns to write to the latch, not the LSB buffer. There's no way you're going to write one WORD of data to vram in 774ns or less... unless you do tricks like a series of only ST2 #nn back to back opcodes to the latch (leaving the old value in the LSB).

So all of this is for active display. When the display ends, the last scanline defined by the VDC settings, then you'll be in 'burst mode'. In burst mode, you have all access slots open to the CPU (assuming you're not running a vram-vram dma request). If you leave BG and SPR bits as disabled, by the time it preps for the next frame (vsync signal from the VCE), the next frame will be disabled (you can't turn it on mid display) and it will also be in 'burst mode'. Burst mode continues to happen as long as BG and SPR are off when latched during VCE's vsync (this is not to be confused with 'vblank' interrupt or mode on the VDC - they are independent). The only exception to 'burst mode' is SPRITE DMA. If you have auto-sprite dma flag enabled, the SAT in vram will be copied to the internal registers. This uses all access slots. If you write or read to vram during this time, which happens right after vblank fires on the VDC side, the cpu will be paused the whole time. For 5.37mhz dot clock, this is about 3 scanlines. So just be aware of that if you're trying to maximize vblank time for whatever.

Also to note: if you use the slowest vram BG access setting of $03 (both bits set), you will be in 2bit color mode - not 4bit.Just be aware of that. The 'CM' bit will tells the VDC which 2bit planes to fetch; 0/1 or 2/3.


Edit: Here are the VM access table modes:

$00
| CPU | BAT | CPU | internal | CPU | CG0 | CPU | CG1 |

CPU can access vram; 5.37mhz @ every 2.66 cpu cycles, 7.16mhz @ every 2 cpu cycles, 10.74mhz @ every 1.33 cpu cycles.


$01 or $02
| BAT | *(BAT) | CPU | *(CPU)| CG0 | *(CG0) | CG1 | *(CG1) |

CPU can access vram; 5.37mhz @ every 10.64 cpu cycles, 7.16mhz @ every 8 cpu cycles, 10.74mhz @ every 5.32 cpu cycles.


$03
| BAT | *(BAT) | *(BAT) | *(BAT) | CGx | *(CGx) | *(CGx) | *(CGx) |

No CPU access slots during active display.

x = CG0 or CG1 which is controlled by 'CM' bit #7 of MWR.
* = continues access of previous slot.
NOTE: These are all 16bit wide access slots. 'Access' is reading or writing to the latch; $0003.
Pokun
Posts: 2681
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: PC-Engine General questions

Post by Pokun »

Thanks for this explanation, it's much more detailed than the official dev docs. I heard that VM dot width 1 shouldn't be used together with the 10.74 MHz dot clock, but I didn't understand why.
turboxray
Posts: 348
Joined: Thu Oct 31, 2019 12:56 am

Re: PC-Engine General questions

Post by turboxray »

Pokun wrote: Sun Apr 03, 2022 8:25 am I heard that VM dot width 1 shouldn't be used together with the 10.74 MHz dot clock, but I didn't understand why.
Not sure who said that. Basically, setting of 0x1 tells the VDC to wait an extra cycle when accessing vram. So instead of 93ns in 10.74mhz mode, the vram access is 186ns. VRAM static ram chips are marked as 120ns. So definitely safe for timing of 0x1 in VM for 10.74mhz dot clock. SM should be set to 0x01 (IIRC) for 10.74mhz mode too, if you're not willing to overclock vram access. I've ran 10.74mhz mode with 0x0 setting for hours (I use the mode for text display when running test, on many PCE systems) and have never seen any graphic corruption. But like I said, it is technically overclocking the ram.
Pokun
Posts: 2681
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: PC-Engine General questions

Post by Pokun »

turboxray wrote: Sun Apr 03, 2022 11:42 am Not sure who said that.
You just said that it would overclock VRAM although you haven't really seen any problems.
User avatar
za909
Posts: 249
Joined: Fri Jan 24, 2014 9:05 am
Location: Mijn hart woont al in Nederland

Re: PC-Engine General questions

Post by za909 »

Thank you everyone, it really helps to see it broken down like this. To be sure I would probably refrain from using the high-res mode for anything other than high-quality images during cutscenes, menus, maps etc., something where I don't have to worry about VRAM updates as much as during gameplay.

Another question I have is regarding some inconsistent explanation of some of the HuC6280-specific instructions. Especially TST, which is sometimes said to actually modify the contents of the memory location being used (the result of the AND between the immediate argument and the memory location is written back to the memory location), but sometimes it is described as an equivalent of BIT, just with an immediate argument instead of the accumulator. If the former explanation is correct, then TST is a pretty good alternative to TRB in some cases for bitflag manipulation.
turboxray
Posts: 348
Joined: Thu Oct 31, 2019 12:56 am

Re: PC-Engine General questions

Post by turboxray »

za909 wrote: Sun Apr 03, 2022 3:56 pm Thank you everyone, it really helps to see it broken down like this. To be sure I would probably refrain from using the high-res mode for anything other than high-quality images during cutscenes, menus, maps etc., something where I don't have to worry about VRAM updates as much as during gameplay.

Another question I have is regarding some inconsistent explanation of some of the HuC6280-specific instructions. Especially TST, which is sometimes said to actually modify the contents of the memory location being used (the result of the AND between the immediate argument and the memory location is written back to the memory location), but sometimes it is described as an equivalent of BIT, just with an immediate argument instead of the accumulator. If the former explanation is correct, then TST is a pretty good alternative to TRB in some cases for bitflag manipulation.
Yeah, TST is a test instruction. It's non-destructive except for flags. TRB and TSB are read-modify-write instructions and are 'destructive' for the operand address. Where did you see this info on TST?

I made some PCE cribsheets like 10 years ago: http://www.turboxraypce.org/pce_cribsheet/
There are two 2 sheets, in png and bmp format. "EA" is Effective Address (addressing mode).
User avatar
za909
Posts: 249
Joined: Fri Jan 24, 2014 9:05 am
Location: Mijn hart woont al in Nederland

Re: PC-Engine General questions

Post by za909 »

turboxray wrote: Sun Apr 03, 2022 4:13 pm Yeah, TST is a test instruction. It's non-destructive except for flags. TRB and TSB are read-modify-write instructions and are 'destructive' for the operand address. Where did you see this info on TST?

I made some PCE cribsheets like 10 years ago: http://www.turboxraypce.org/pce_cribsheet/
There are two 2 sheets, in png and bmp format. "EA" is Effective Address (addressing mode).
Admittedly this is very outdated: https://web.archive.org/web/20050304115 ... rnals/isa/
I just like that I can quickly check any addressing mode nuance or cycle cost that I don't quite have memorized yet, of course in many cases it's even a different cycle count compared to a 6502.
User avatar
za909
Posts: 249
Joined: Fri Jan 24, 2014 9:05 am
Location: Mijn hart woont al in Nederland

Re: PC-Engine General questions

Post by za909 »

I've been trying to find information on the LFO, but it seems that there's not much more than the info in the US Patent documents and its simplified form circulating around on the internet. Emulation seems to be unable to fully capture how it works as well. So now that I have a mostly working music engine, would anyone be interested in testing the LFO on real hardware? I'm really looking forward to hearing something akin to the Famicom Disk System FM, although this seems to be much more limited (scales only to 3 depth levels via shifting which bits are modified by the ch1 fm data).
Pokun
Posts: 2681
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: PC-Engine General questions

Post by Pokun »

Sure, as long as you are willing to assemble HuCard ROMs. I don't want to burn CDs just for testing.
I have a PC-Engine Duo-R and an Everdrive v2. I don't know if it's the HuC6280 or the HuC6280A as I've never opened it. Rev A is found in all SuperGrafx systems (as it's part of its specs) but has also been found in some CoreGrafx I and GT systems, so I guess there is always a chance it's found in some Duo-R systems as well.
User avatar
za909
Posts: 249
Joined: Fri Jan 24, 2014 9:05 am
Location: Mijn hart woont al in Nederland

Re: PC-Engine General questions

Post by za909 »

As a fun teaser for what I have in store, here's a test of my code using the Timer interrupt to reset the wave phase on a channel playing a pitch bending sine wave. This creates a hard sync effect similar to what's possible on a SID chip. I'm fairly certain nobody has done this sort of thing on the PCE hardware. The best part is that this costs very little CPU effort and the lower you go with the Timer frequency, the less CPU time you need. One iteration takes 62 cycles and allows itself to be interrupted by a VDC interrupt so code that affects the display is always prioritized.
Pokun
Posts: 2681
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: PC-Engine General questions

Post by Pokun »

I'm not sure what you can do with that, but it sounds pretty cool.
turboxray
Posts: 348
Joined: Thu Oct 31, 2019 12:56 am

Re: PC-Engine General questions

Post by turboxray »

za909 wrote: Thu Apr 14, 2022 7:37 am As a fun teaser for what I have in store, here's a test of my code using the Timer interrupt to reset the wave phase on a channel playing a pitch bending sine wave. This creates a hard sync effect similar to what's possible on a SID chip. I'm fairly certain nobody has done this sort of thing on the PCE hardware. The best part is that this costs very little CPU effort and the lower you go with the Timer frequency, the less CPU time you need. One iteration takes 62 cycles and allows itself to be interrupted by a VDC interrupt so code that affects the display is always prioritized.
Nice! Love seeing stuff like this. There's actually quite a bit of hidden gem stuff you can do with PCE audio that no ones really trying to do. Problem is though, is that it requires people writing their own sound driver. Here's my old blog entry for some TIRQ sync related stuff (trying out new ideas and approaches): https://pcedev.wordpress.com/2010/07/18 ... -modeling/

But yeah, hard sync is pretty easy to do with TIRQ. Just out of curiosity, what method are you using to reset the waveform pointer without turning the channel on/off (i.e. avoiding the 'pop' on non-A revision chips)?

I did a bunch of tests with LFO and eventually came to the conclusion that it was "mostly" a waste of a channel. Maybe it has some value for CD games, where chip generated SFX have more channels to play with. But for chip music, if you can incorporate "macros" into your sound engine - then anything has value. I remember like 10 years ago someone in the FM community ("mad" or something like that) use PC interrupt timer and macros to create some incredible OPL3 instruments. After chatting with him, he was the one that inspired me to look into abusing all sorts of tricks on the PCE for creating new sounds. And there's actually quite a bit of stuff no one has done yet haha. If you'e interested, I can share what I have found. The more people trying this stuff out, the better! :beer:

Do you know about the waveform corruption method that the Fire Pro Wrestling games use? It's also super light weight (uses 60hz ticks instead of TIRQ). It's sooo under utilized that it's criminally overlooked hahah.
turboxray
Posts: 348
Joined: Thu Oct 31, 2019 12:56 am

Re: PC-Engine General questions

Post by turboxray »

Pokun wrote: Tue Apr 12, 2022 1:15 pm Sure, as long as you are willing to assemble HuCard ROMs. I don't want to burn CDs just for testing.
I have a PC-Engine Duo-R and an Everdrive v2. I don't know if it's the HuC6280 or the HuC6280A as I've never opened it. Rev A is found in all SuperGrafx systems (as it's part of its specs) but has also been found in some CoreGrafx I and GT systems, so I guess there is always a chance it's found in some Duo-R systems as well.
No Duo systems have been found to have A revision, as well as any Core Grafx II. Looks like they just stopped using it. On a side note: I found that Duo-RX has the best waveform output (cleanest output, best bass and low frequency resolution) out of all the systems I've tested; unclear if an internal change but I suspect it is - but it doesn't have A revision behavior.
Post Reply