Kulor's Guide to Mode 7 Perspective Planes

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
User avatar
rainwarrior
Posts: 8399
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by rainwarrior »

kulor wrote: Wed Aug 03, 2022 7:23 am
rainwarrior wrote: Tue Aug 02, 2022 9:49 pm All 4 of the SNES games I mentioned can dynamically change the tilt. I don't think it's unreasonable to ask of the SNES... but if you allow a dynamic perspective, I don't think there's a practical way to use a table to bypass the divide.
Tilt would be pitch, not FOV. Dynamic FOV would be something like, being able to zoom in with a sniper rifle in a game like Goldeneye or MDK, or I seem to recall Ocarina of Time animates camera position with FOV to get this kind of "background zooming out" effect...it's a bit hard to explain, if I can find it I'll give it a link here. I've never seen an SNES game with an adjustable FOV, that was only something that started cropping up with real polygon engines in 5th gen stuff.
Well, what I was really trying to describe was for which situations you can pre-compute a perspective divide, and which you can't.

For the purposes of a perspective plane, "FOV" should really just be your horizontal scaling factor. I don't really understand why the SNES shouldn't do this? Pilotwings and Mario Kart do both have independent horizontal scale. F-Zero and Final Fantasy VI could change horizontal scale but not independently of vertical scale, due to the compromise I mentioned. Mario Kart and Pilotwings don't exploit this extra parameter do to a dolly zoom... but not because it would be difficult. Their engines can do it, as far as I can tell. I think only because it's a pretty striking aesthetic effect, and they don't want it. Most movies don't do dolly zooms either. A consistent view scaling is usually important to keep the viewer oriented.


So, we're rasterizing a tited quad. You have a horizontal scale and vertical position at the far/horizon/top scanline, and a different horizontal scale and vertical position at the bottom scanline. You interpolate between them from top to bottom, but the vertical spacing of that interpolation needs to be corrected with a division step (perspective divide).

So, if you want to use a pre-computed table for the perspective divide, anything that changes the vertical spacing/scale of the scanlines would require a different table. Tilting the plane or adjusting the distance to the horizon would change the spacing.

FOV by itself wouldn't, as it should just change the two horizontal scales you're interpolating between. In the compromise used by F-Zero/FF6, FOV can't be separately controlled from the vertical draw distance, so they couldn't do an effective dolly zoom, but they can and do adjust it while tilting.

So... in my estimation dynamic FOV isn't the thing that should be harder on the SNES, and all the examples I looked at do indeed have dynamic tilt, even though independent horizontal scale is a little bit harder.


So the bottom line, I think, is that unless the only thing the camera can do is rotate and translate around the plane (i.e. no tilting), you can't get away with tabling the divide. FZ/FF6 do 1 divide and 2 multiplies per HDMA line. MK/PW do 4 multiplies and get independent horizontal scale. The latter probably needs a DSP to get below the 1-scanline per HDMA line computation threshold, though as I mentioned I think interpolating every second line would be enough to get it back under that metric.

I'm not actually sure how much performance tabling the divide could buy, here... I haven't spent much time considering it, because I didn't want to give up dyanmic control of the perspective, which already seems practical.

Anyway, if you're doing all 4 multiplies, you sure can do a dolly zoom, if you want it. I don't think that's unfeasible for SNES at all! It might be tricky to put that particular effect to good artistic use, though. Like that OOT scene is the classical hitchcock thing where you're using it to frame a person's face within the scene... doing it with only a tilted plane is going to be a bit more limited, visually.
psycopathicteen
Posts: 3082
Joined: Wed May 19, 2010 6:12 pm

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by psycopathicteen »

I haven't read every single post so I don't know if people mentioned this before but I thought of an approach.

Have two HDMA channels for controlling Matrix A and Matrix C, and a third HDMA channel controlling BG1Y?

Edit:
It looks like it would need Matrix A and C, not A and B.
Last edited by psycopathicteen on Wed Aug 03, 2022 12:07 pm, edited 1 time in total.
User avatar
rainwarrior
Posts: 8399
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by rainwarrior »

psycopathicteen wrote: Wed Aug 03, 2022 10:33 am Have one HDMA channel set to controlling Matrix A and Matrix B, while having another HDMA channel controlling BG1Y?
I believe you need to do 2 things per scanline:
  • 1. Set the u,v increment for each pixel along the row (matrix A and C).
  • 2. Set the position of the the line of texels on the map (two ways to accomplish this).
So 1 is just matrix A/C. You have to do that either way.

2 is done in games I've looked at with matrix B/D, but you could also do it with M7HOFS/M7VOFS.

However, the equivalent calculation fo HOFS/VOFS is a lot more complicated, at least if you want to deal with being able to rotate. If your camera points along the X or Y axis only, it gets a lot simpler and you could update only one of HOFS or VOFS.

Instead B/D is used, which accomplishes the same goal of shifting the new centre of the image for each scanline, but it does it relative to M7X/M7Y and in the way FZ/FF6 do it it becomes entangled with the horizontal scale as well.

Edit: after psycopathicteen clarified, I do believe that only changing M7Y is a a third alternative for 2. Cool!
Last edited by rainwarrior on Fri Aug 05, 2022 10:05 pm, edited 1 time in total.
User avatar
rainwarrior
Posts: 8399
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by rainwarrior »

I had been meaning to make this diagram illustrating how I think about this problem. Maybe now is a good time for it:
mode7_trapezoid.png
This is an overview of the perspective plane rendering, visualized in the space of the tilemap texture. We're rasterizing a trapezoid from the tilemap.

So, scale 0 is how wide our view is at the horizon/top-scanline, and scale 1 is how wide it is at the bottom. The blue lines converge at our effective viewpoint (where z=0).

The green lines are an exaggerated representation of individual scanlines. These are widely spaced near the horizon, and densely spaced near the bottom. This non-linear spacing is what requires the perspective divide step, to properly locate each scanline.

Anyway, not going to rehash the calculations here, Kulor has mostly covered that I think, but maybe this provides a visual intuition of what needs to be accomplished. The perspective plane is made of parallel lines from a trapezoid, with the corrected spacing.

How we decide on the dimensions of our trapezoid, there are different paths to get there. We can consider this in terms of a camera in 3D space pointed at a plane, and that gives us one set of parameters to play with. We could also consider the trapezoid directly and parameterize that way.

However we set that up, the rasterization problem of setting up our HDMA tables should break down to this same thing. Each line you must account for the non-linear spacing to find the next row of texels, set their horizontal scale, and somehow set their position (either via vertical scale, or tilemap offset).


I might also point out that we could shear or wobble the trapezoid per-scanline. Like there's a lot of funky effects we could do here... though it'd probably be hard to calculate them in an efficient manner, given the problems I pointed out earlier (i.e. the existing plain transformation is already in a 1:1 computation time ballpark).

In the non-perspective case the sides of the trapezoid become parallel (scale 0 = scale 1) and the scanlines become evenly spaced. In the above-and-below lawnmower-man effect one of the scales is inverted and the trapezoid crosses itself in an hourglass shape.


Edit: Another thought... maybe I should try to hack Mesen-S to be able to draw the mode 7 scanlines on the tilemap. It'd probably be a very useful visualization of existing implementations.
none
Posts: 102
Joined: Thu Sep 03, 2020 1:09 am

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by none »

I've restructured my code a little, in this version it should be better visible what actually needs to be computed per scanline. Also I tried to simulate the precision limitations when doing the math with 16bit fixed point precision.

With some optimizations applied, now it looks very close to what a persective correcting rasterizer looks like. I.e. if you start with the rasterization idea, instead of raytracing, you could arrive at a very similar result.

With an 8x8 bit multiplication LUT and a fixed division LUT, all the math here can be done with five 16-bit additions, and five table lookups per scanline, if I count correctly - and precision still seems quite ok. I guess it should be doable in real time.

Also what rainwarrior said about precomputing division tables still holds, and its reflected here. Computing the divisor is only dependant on the camera pitch, nothing else - so if pitch were constant, the divisor could be precomputed for every scanline.

Code: Select all


function cos(a) {return(Math.cos(a))}
function sin(a) {return(Math.sin(a))}
function floor(a) {return(Math.floor(a))}
function highbyte(a) {return(floor(a / 256))}

// setup

var FOV = 90;
var forward = 128 / Math.tan(FOV * (Math.PI * 2 / 360) / 2);

var yaw = var1 * 5 * Math.PI / 180;
var pitch = var2 * 5 * Math.PI / 180;

var camera_x = 0;
var camera_y = 0;
var camera_z = 16;

// sprite stuff

var sprite_x = -8;
var sprite_y = 40;
var sprite_z = 0;

sprite_x = camera_x - sprite_x;
sprite_y = camera_y - sprite_y;
sprite_z = camera_z - sprite_z;

var tf_sprite_x = sprite_x * cos(yaw) + sprite_y * sin(yaw);
var tf_sprite_y = sprite_x * -sin(yaw) * cos(pitch) + sprite_y * cos(yaw) * cos(pitch) + sprite_z * -sin(pitch);
var tf_sprite_z = sprite_x * -sin(yaw) * sin(pitch) + sprite_y * cos(yaw) * sin(pitch) + sprite_z * cos(pitch);

var ss_sprite_x = tf_sprite_x * forward / -tf_sprite_y + 128;
var ss_sprite_y = tf_sprite_z * forward / -tf_sprite_y + 112;

rectangle(ss_sprite_x - 8, ss_sprite_y - 8, 16, 16);

// mode 7 stuff

// constant across the frame

var dx = 256 * cos(yaw);
var dy = 256 * sin(yaw);

var ax = forward * -sin(yaw) * cos(pitch);
var ay = forward * cos(yaw) * cos(pitch);
var az = forward * sin(pitch) / camera_z;

var bx = sin(yaw) * sin(pitch);
var by = -cos(yaw) * sin(pitch);
var bz = cos(pitch) / camera_z;

// scale values for subpixel precision
// and simulate fixed point math

ax *= 4; ay *= 4; bx *= 4; by *= 4; camera_x *= 4; camera_y *= 4;

dx = floor(dx * 256); dy = floor(dy * 256);
ax = floor(ax * 256); ay = floor(ay * 256); az = floor(az * 256);
bx = floor(bx * 256); by = floor(by * 256); bz = floor(bz * 256);
camera_x = floor(camera_x); camera_y = floor(camera_y);

// per scanline

var cx = ax + (scanline - 112) * bx;
var cy = ay + (scanline - 112) * by;
var cz = az + (scanline - 112) * bz;

// this would require 16 bit by 16 bit division

// var point_center_x = camera_x + cx / cz;
// var point_center_y = camera_y + cy / cz;

// var offset_x = dx / cz;
// var offset_y = dy / cz;

// it is important that inverse z fits in 8 bits

var iz = floor(65536 / cz);  
if(iz < 0) iz = 0;
if(iz > 255) iz = 255;

var point_center_x = camera_x + highbyte(cx) * iz / 256;
var point_center_y = camera_y + highbyte(cy) * iz / 256;

var offset_x = highbyte(dx) * iz / 256;
var offset_y = highbyte(dy) * iz / 256;

m7a = offset_x;
m7b = point_center_x;
m7c = offset_y;
m7d = point_center_y;
m7x = 0;
m7y = 0;
m7hofs = -128;
m7vofs = -64 -scanline;

return [m7a, m7b, m7c, m7d, m7x, m7y, m7hofs, m7vofs];


Edit: Accidentally posted previous version of the code that still had a mistake.
User avatar
rainwarrior
Posts: 8399
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by rainwarrior »

So I hacked Mesen-SX to visualize Mode 7 view frustums.
fzero_frustum.png
It changes the tilemap viewer whenever mode 7 is active. The "show scroll overlay" rectangle is replaced with pink scanlines, showing how they have been rendered out of the tilemap.

It keeps a record of the last render state of each scanline, so if you stop mid-frame, half the frustum might be from the previous frame. Generally best to have it update at the bottom.


Edit: I've attached a build of the hack, in case anyone else wants to play with it. I ended up posting a bunch of examples on twitter because it was fun to take a look at them.
Attachments
Mesen-SX-mode7.zip
Windows x64 build of experimental Mesen-SX hack.
(2.15 MiB) Downloaded 7 times
User avatar
kulor
Posts: 33
Joined: Thu Mar 15, 2018 12:49 pm

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by kulor »

rainwarrior wrote: Wed Aug 03, 2022 8:58 am So, we're rasterizing a tited quad. You have a horizontal scale and vertical position at the far/horizon/top scanline, and a different horizontal scale and vertical position at the bottom scanline. You interpolate between them from top to bottom, but the vertical spacing of that interpolation needs to be corrected with a division step (perspective divide).
I think the disconnect here is probably due to me using probably a very unorthodox solution for this problem, mainly because I had (have?) no idea what I was doing, couldn't understand any of the established explanations for how to accomplish the effect, and was kinda just trying to solve the problem visually with geometry. In my solution, I believe you're correct in that the interpolation needs to be corrected with a division step, and I believe that division step happens in the getm7y function. However, that function defines a m7y value that's valid for an entire perspective plane, it doesn't need to be recalculated more than once per frame.
The result of this is the texture offset needing to be corrected, which is another once-per-frame/plane value, and I'm fairly sure the "mystery curve" is somehow related to this m7y value...though I haven't been able to find a link yet.
But, in my implementation, the normalized height and the distance-to-scale multiplier are both calculated from the FOV, and one of the per-line multiplications uses distance-to-scale:

Code: Select all

var scale = (1/sl) * distanceToScale;
...so if distance-to-scale is a const, I don't see any reason why (1/value) * const couldn't just be a lookup table, with the only consequence being that FOV wouldn't be adjustable; camera pitch, camera yaw, camera height would still be freely adjustable. In fact, this is how my script currently works, just because I did a lot of the calculations assuming an FOV of 60 and never put the work in to try to properly parameterize it. For me, having a non-adjustable FOV is a perfectly acceptable constraint for an SNES game, and I can't imagine many people would miss it.
Also this is amazing! I was really wishing for this exact thing a few months ago when I started trying to crank this out.
User avatar
rainwarrior
Posts: 8399
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by rainwarrior »

kulor wrote: Wed Aug 03, 2022 10:21 pmIn my solution, I believe you're correct in that the interpolation needs to be corrected with a division step, and I believe that division step happens in the getm7y function. However, that function defines a m7y value that's valid for an entire perspective plane, it doesn't need to be recalculated more than once per frame. ... I'm fairly sure the "mystery curve" is somehow related to this m7y value...though I haven't been able to find a link yet.
The implementations I've seen only update A/B/C/D each scanline and don't need to update M7X/Y or M7HOFS/M7VOFS. I kinda spelled this out a bit better a few posts back in my response to psycopathicteen but A/C is essential to set the scale of each scanline, and the position for a scanline can be set in a few different ways.

So, you could do it with B/D, or with HOFS (edit: or with Y, according to psycopathicteen). The value of M7X/Y can be anywhere along the centre-axis of your frustum, but the meaning of whichever parameters you decide to vary to set the scanline position will depend on all of B/D/M7X/Y/HOFS together. It's definitely possibly that whatever you set for M7X/Y is causing an unexpected offset.
kulor wrote: Wed Aug 03, 2022 10:21 pmI don't see any reason why (1/value) * const couldn't just be a lookup table, with the only consequence being that FOV wouldn't be adjustable; camera pitch, camera yaw, camera height would still be freely adjustable.
I'm pretty certain that pitch cannot change without adjusting the table. I don't really understand how it could be otherwise... but I'd like to be able to follow the why (or why not, if I'm wrong) through the code. Unfortunately, at this point I could't figure out which code you're referring to. There are so many different snippets in the first post, and many more since. Can you point out which one is the snippet under discussion?

Whether FOV could change without rebuilding the table depends on what FOV is meaning. If FOV is just the horizontal scale, then it would simply be a once per frame adjustment of the base scale to A/C. If FOV is a shared horizontal and vertical view factor, it might also affect similar depth stuff as pitch does, requiring a table rebuild.

Either way, I think existing games demonstrate that both these things can be done without tables. Eventually I'll finish my example and hopefully that can demonstrate it. Maybe it'd be fun to make a little dolly zoom demo afterward.
Last edited by rainwarrior on Fri Aug 05, 2022 10:06 pm, edited 1 time in total.
psycopathicteen
Posts: 3082
Joined: Wed May 19, 2010 6:12 pm

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by psycopathicteen »

rainwarrior wrote: Wed Aug 03, 2022 10:44 am
psycopathicteen wrote: Wed Aug 03, 2022 10:33 am Have one HDMA channel set to controlling Matrix A and Matrix B, while having another HDMA channel controlling BG1Y?
I believe you need to do 2 things per scanline:
  • 1. Set the u,v increment for each pixel along the row (matrix A and C).
  • 2. Set the position of the the line of texels on the map (two ways to accomplish this).
So 1 is just matrix A/C. You have to do that either way.

2 is done in games I've looked at with matrix B/D, but you could also do it with M7HOFS/M7VOFS.

However, the equivalent calculation fo HOFS/VOFS is a lot more complicated, at least if you want to deal with being able to rotate. If your camera points along the X or Y axis only, it gets a lot simpler and you could update only one of HOFS or VOFS.

Instead B/D is used, which accomplishes the same goal of shifting the new centre of the image for each scanline, but it does it relative to M7X/M7Y and in the way FZ/FF6 do it it becomes entangled with the horizontal scale as well.
I think only VOFS would need to be changed because it gets added to the Y-position before the matrix multiply happens.
User avatar
kulor
Posts: 33
Joined: Thu Mar 15, 2018 12:49 pm

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by kulor »

rainwarrior wrote: Wed Aug 03, 2022 11:01 pm I'm pretty certain that pitch cannot change without adjusting the table. I don't really understand how it could be otherwise... but I'd like to be able to follow the why (or why not, if I'm wrong) through the code. Unfortunately, at this point I could't figure out which code you're referring to. There are so many different snippets in the first post, and many more since. Can you point out which one is the snippet under discussion?
Sure, it's snippet 7.3 in my OP. I suppose it might be difficult to find the "final version" at-a-glance, so I added a little blurb to the preamble to point it out. Also updated the script to make it a little easier to do multiple plane calculations, and also made it a bit easier to see what's happening per-frame or per-scanline.
Thinking about it some more, that (1/sl) * const line wouldn't be the perspective divide equivalent, it would be this:

Code: Select all

var sl = lerp(1/topdist, 1/btmdist, scanline / 223);
But I don't think that would need to be a per-scanline divide/multiplication, because it doesn't actually need to be a lerp. 1/const could easily be a lookup table, then you can just do one divide for (1/topdist - 1/btmdist) / 223, and use that value to iteratively add to 1/topdist to get your per-scanline 1/scale, which you would then use as a value for the (1/value) * const lookup. If the lack of precision causes too much error buildup, you could do a handful of actual lerps and do iterative adding for the in-betweens of those. I believe the original Quake had something like this for texture perspective correction?
rainwarrior wrote: Wed Aug 03, 2022 11:01 pmEither way, I think existing games demonstrate that both these things can be done without tables. Eventually I'll finish my example and hopefully that can demonstrate it. Maybe it'd be fun to make a little dolly zoom demo afterward.
Yeah, to be clear I'm not saying "dolly zoom is impossible on SNES" by any means. I'm just saying that, in my implementation, dynamic vertical FOV makes the const in (1/value) * const into a variable, so dynamic vertical FOV would add another per-scanline multiply. For my purposes, FOV is the angle between the top and bottom edges of the viewing frustum (which I simplified to a triangle in my explanation). Personally, I'd just take having a fixed FOV as a constraint over spending the effort on the extra per-scanline multiply...but a solution that only needed 2 (or less??) multiplies per-scanline and could accommodate camera pitch, height, yaw, and FOV would be superior.
User avatar
rainwarrior
Posts: 8399
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by rainwarrior »

kulor wrote: Thu Aug 04, 2022 8:13 amThinking about it some more, that (1/sl) * const line wouldn't be the perspective divide equivalent, it would be this:

Code: Select all

var sl = lerp(1/topdist, 1/btmdist, scanline / 223);
But I don't think that would need to be a per-scanline divide/multiplication, because it doesn't actually need to be a lerp. 1/const could easily be a lookup table, then you can just do one divide for (1/topdist - 1/btmdist) / 223, and use that value to iteratively add to 1/topdist to get your per-scanline 1/scale, which you would then use as a value for the (1/value) * const lookup.
1/sl is indeed the divide step, and when rasterizing what you've described is exactly how a lerp should be accomplished (adding a fixed amount each iteration).

Your topdist/btmdist do seem to be dependent on these three things: cam.FOV, cam.pitch, cam.y

So, changing any of those 3 should necessitate a new table, I think? (This is definitely the set of things I would expect to affect the divide.)

Also, that clarifies that by FOV you meant both horizontal and vertical at the same time. If you had two independent FOV for horizontal and vertical, only the vertical one would be relevant to the perspective divide. (...and if you did have independent horizontal scale, it could do a very good dolly-zoom-like effect without regenerating the table.)
kulor wrote: Thu Aug 04, 2022 8:13 amIf the lack of precision causes too much error buildup, you could do a handful of actual lerps and do iterative adding for the in-betweens of those. I believe the original Quake had something like this for texture perspective correction?
Incidentally, I have Abrash's Black Book in a stack that props up my trackball, so it's literally sitting right next to me.

The relevant stuff is in chapter 70 (Quake: a Post Mortem), in the subsection "Drawing the World". He describes calculating the full result only once per each 16-pixel horizontal span (or at any triangle edge), and interpolating between them.

In our case our horizontal spans are already linear, every scanline represents a constant depth along the plane. In the vertical case... 16 pixels is probably far too much to interpolate and have it look nice. Quake wasn't subdividing vertically, AFAIK. I think every second line will probably be fine (will report back after I try it), but I think much more than that would start to degrade quality quickly.

I worse situation... floors and walls in Playstation 1 games had notoriously "swimmy" textures for want of a hardware perspective correction. Generally what they had to do was subdivide the mesh as much as they could afford, because you get perspective-correct vertices but everything in between is interpolated. (Interesting video discussing this.)

Based on the uneven spacing pattern I see in Seiken Densetsu 3's flying intro in my hacked viewer, I suspect it might be interpolating 2 out of each 3 lines, though I haven't hit it with a debugger to analyze the code.
none
Posts: 102
Joined: Thu Sep 03, 2020 1:09 am

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by none »

here's yet another iteration that has the lookup tables simulated.

the multiplication table would need to be quite big, but you can get away with the reciprocal table only having a few entries (1024 in this case). also both tables can be used for general purpose.

i've also tried combining the tables into one, it works but precision suffers a lot.

there's a tradeoff between precision and view distance, but maybe it can be improved upon, for example by choosing radix more carefully.

Code: Select all

function cos(a) {return(Math.cos(a))}
function sin(a) {return(Math.sin(a))}
function floor(a) {return(Math.floor(a))}
function highbyte(a) {return(floor(a / 256))}
function high10bit(a) {return(floor(a / 64))}
function clamp(a) {return a<0?0:a>255?255:a}
//function clamphalf(a) {return a<0?0:a>127?127:a}
function clampsigned(a) {return a<-128?-128:a>127?127:a}
function clamp10bit(a) {return a<0?0:a>1023?1023:a}

function lut_mul(a,b) {return floor(clampsigned(a)*clamp(b)/32) }

function lut_reciprocal(a) {return clamp(floor(256 * 64 / clamp10bit(a))) }

// setup

var FOV = 90;
var forward = 128 / Math.tan(FOV * (Math.PI * 2 / 360) / 2);

var yaw = var1 * 2 * Math.PI / 180;
var pitch = var2 * 2 * Math.PI / 180;

var camera_x = 0;
var camera_y = 0;
var camera_z = 32;

// sprite stuff

var sprite_x = -8;
var sprite_y = 40;
var sprite_z = 0;

sprite_x = camera_x - sprite_x;
sprite_y = camera_y - sprite_y;
sprite_z = camera_z - sprite_z;

var tf_sprite_x = sprite_x * cos(yaw) + sprite_y * sin(yaw);
var tf_sprite_y = sprite_x * -sin(yaw) * cos(pitch) + sprite_y * cos(yaw) * cos(pitch) + sprite_z * -sin(pitch);
var tf_sprite_z = sprite_x * -sin(yaw) * sin(pitch) + sprite_y * cos(yaw) * sin(pitch) + sprite_z * cos(pitch);

var ss_sprite_x = tf_sprite_x * forward / -tf_sprite_y + 128;
var ss_sprite_y = tf_sprite_z * forward / -tf_sprite_y + 112;

rectangle(ss_sprite_x - 8, ss_sprite_y - 8, 16, 16);

// mode 7 stuff

// constant across the frame

var dx = 256 * cos(yaw);
var dy = 256 * sin(yaw);

var ax = forward * -sin(yaw) * cos(pitch);
var ay = forward * cos(yaw) * cos(pitch);
var az = forward * sin(pitch) / camera_z;

var bx = sin(yaw) * sin(pitch);
var by = -cos(yaw) * sin(pitch);
var bz = cos(pitch) / camera_z;

// scale values for subpixel precision
// and simulate fixed point math

ax *= 4; ay *= 4; bx *= 4; by *= 4; camera_x *= 4; camera_y *= 4;

dx = floor(dx * 63); dy = floor(dy * 64);
ax = floor(ax * 63); ay = floor(ay * 63); az = floor(az * 256 * 16);
bx = floor(bx * 63); by = floor(by * 63); bz = floor(bz * 256 * 16);
camera_x = floor(camera_x); camera_y = floor(camera_y);

// per scanline

var cx = ax + (scanline - 112) * bx;
var cy = ay + (scanline - 112) * by;
var cz = az + (scanline - 112) * bz;

var iz = lut_reciprocal(high10bit(cz));

var point_center_x = camera_x + lut_mul(highbyte(cx), iz);
var point_center_y = camera_y + lut_mul(highbyte(cy), iz);

var offset_x = lut_mul(highbyte(dx), iz);
var offset_y = lut_mul(highbyte(dy), iz);

m7a = offset_x;
m7b = point_center_x;
m7c = offset_y;
m7d = point_center_y;
m7x = 0;
m7y = 0;
m7hofs = -128;
m7vofs = -64 -scanline;

return [m7a, m7b, m7c, m7d, m7x, m7y, m7hofs, m7vofs];
rainwarrior wrote: Thu Aug 04, 2022 11:44 am So, changing any of those 3 should necessitate a new table, I think? (This is definitely the set of things I would expect to affect the divide.)
can you clarify what kind of division table you mean? Is it a table that gives values you multiply with something else or is it something that is a fixed hdma table so it does not need to be changed at all? because if it's the former, the above code should demonstrate that you only actually need one reciprocal table, regardless of pitch, fov, or camera height - although you could argue that precision is not as good as it could be with a per scanline table.
rainwarrior wrote: Thu Aug 04, 2022 11:44 am The relevant stuff is in chapter 70 (Quake: a Post Mortem), in the subsection "Drawing the World". He describes calculating the full result only once per each 16-pixel horizontal span (or at any triangle edge), and interpolating between them.

In our case our horizontal spans are already linear, every scanline represents a constant depth along the plane. In the vertical case... 16 pixels is probably far too much to interpolate and have it look nice. Quake wasn't subdividing vertically, AFAIK. I think every second line will probably be fine (will report back after I try it), but I think much more than that would start to degrade quality quickly.
Wasn't it 8 pixels? I think they did in Quake that way amongst other things because division would stall the Pentiums instruction pipeline if they did it too often, and that doesn't really apply here.

Anyways I think 8 or 16 pixels of linear interpolation wouldn't look all too bad depending on the steepness of the perspective.

However, actually I think this kind of interpolation would not buy a real benefit on snes hardware because everything has to be done with lookup tables anyway (because mode7 hardware is busy so no hardware multiplication or division). Even if hardware was used, performance isn't much worse with division than with multiplication (13 against 5 wait cycles I think), so the time you save there you will loose by doing the actual interpolation itself, I'd wager.

There's another thing that I think about and that is avoiding the need for the additional hdma channel for setting vofs. I just can't find a good way around that without loosing lots of precision. You could try and divide b and d by the current scanline number, for example, but I don't like that option.
93143
Posts: 1591
Joined: Fri Jul 04, 2014 9:31 pm

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by 93143 »

none wrote: Thu Aug 04, 2022 1:28 pmHowever, actually I think this kind of interpolation would not buy a real benefit on snes hardware because everything has to be done with lookup tables anyway (because mode7 hardware is busy so no hardware multiplication or division). Even if hardware was used, performance isn't much worse with division than with multiplication (13 against 5 wait cycles I think), so the time you save there you will loose by doing the actual interpolation itself, I'd wager.
Mode 7 only ties up the instantaneous 16x8 signed multiplier. The 8-cycle 8x8 unsigned multiplier and 16-cycle 16/8 unsigned divider are for the exclusive use of the S-CPU, and the only caveat is that you can't use them both in and out of an interrupt routine (because MMIO).
none
Posts: 102
Joined: Thu Sep 03, 2020 1:09 am

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by none »

93143 wrote: Thu Aug 04, 2022 1:34 pm
none wrote: Thu Aug 04, 2022 1:28 pmHowever, actually I think this kind of interpolation would not buy a real benefit on snes hardware because everything has to be done with lookup tables anyway (because mode7 hardware is busy so no hardware multiplication or division). Even if hardware was used, performance isn't much worse with division than with multiplication (13 against 5 wait cycles I think), so the time you save there you will loose by doing the actual interpolation itself, I'd wager.
Mode 7 only ties up the instantaneous 16x8 signed multiplier. The 8-cycle 8x8 unsigned multiplier and 16-cycle 16/8 unsigned divider are for the exclusive use of the S-CPU.
Oh I didn't know that, that changes things. So it would actually be possible to do the division in place, like this, which would give much better precision but take a little longer but still seems pretty much feasible to do in real time.

Code: Select all

function cos(a) {return(Math.cos(a))}
function sin(a) {return(Math.sin(a))}
function floor(a) {return(Math.floor(a))}
function highbyte(a) {return(floor(a / 256))}
function high10bit(a) {return(floor(a / 64))}
function clamp(a) {return a<0?0:a>255?255:a}
//function clamphalf(a) {return a<0?0:a>127?127:a}
function clampsigned(a) {return a<-128?-128:a>127?127:a}
function clamp10bit(a) {return a<0?0:a>1023?1023:a}


// setup

var FOV = 90;
var forward = 128 / Math.tan(FOV * (Math.PI * 2 / 360) / 2);

var yaw = var1 * 2 * Math.PI / 180;
var pitch = var2 * 2 * Math.PI / 180;

var camera_x = 20;
var camera_y = 0;
var camera_z = 16;

// sprite stuff

var sprite_x = -8;
var sprite_y = 40;
var sprite_z = 0;

sprite_x = camera_x - sprite_x;
sprite_y = camera_y - sprite_y;
sprite_z = camera_z - sprite_z;

var tf_sprite_x = sprite_x * cos(yaw) + sprite_y * sin(yaw);
var tf_sprite_y = sprite_x * -sin(yaw) * cos(pitch) + sprite_y * cos(yaw) * cos(pitch) + sprite_z * -sin(pitch);
var tf_sprite_z = sprite_x * -sin(yaw) * sin(pitch) + sprite_y * cos(yaw) * sin(pitch) + sprite_z * cos(pitch);

var ss_sprite_x = tf_sprite_x * forward / -tf_sprite_y + 128;
var ss_sprite_y = tf_sprite_z * forward / -tf_sprite_y + 112;

rectangle(ss_sprite_x - 8, ss_sprite_y - 8, 16, 16);

// mode 7 stuff

// constant across the frame

var dx = 256 * cos(yaw);
var dy = 256 * sin(yaw);

var ax = forward * -sin(yaw) * cos(pitch);
var ay = forward * cos(yaw) * cos(pitch);
var az = forward * sin(pitch) / camera_z;

var bx = sin(yaw) * sin(pitch);
var by = -cos(yaw) * sin(pitch);
var bz = cos(pitch) / camera_z;

// scale values for subpixel precision
// and simulate fixed point math

ax *= 4; ay *= 4; bx *= 4; by *= 4; camera_x *= 4; camera_y *= 4;

dx = floor(dx * 63); dy = floor(dy * 64);
ax = floor(ax * 63); ay = floor(ay * 63); az = floor(az * 256 * 16);
bx = floor(bx * 63); by = floor(by * 63); bz = floor(bz * 256 * 16);
camera_x = floor(camera_x); camera_y = floor(camera_y);

// per scanline

var cx = ax + (scanline - 112) * bx;
var cy = ay + (scanline - 112) * by;
var cz = az + (scanline - 112) * bz;

var point_center_x = camera_x + cx / clamp(highbyte(cz));
var point_center_y = camera_y + cy / clamp(highbyte(cz));

var offset_x = dx / clamp(highbyte(cz));
var offset_y = dy / clamp(highbyte(cz));

m7a = offset_x;
m7b = point_center_x;
m7c = offset_y;
m7d = point_center_y;
m7x = 0;
m7y = 0;
m7hofs = -128;
m7vofs = -64 -scanline;

return [m7a, m7b, m7c, m7d, m7x, m7y, m7hofs, m7vofs];
User avatar
rainwarrior
Posts: 8399
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by rainwarrior »

none wrote: Thu Aug 04, 2022 1:28 pmcan you clarify what kind of division table you mean?
I meant a table replacing the result in "scale" (i.e. the lerp+1/sl etc.) for each scanline. A set of values for one setting of the camera (excepting the rotation).

Is the actual range of inputs for all camera settings narrow enough to generically just be a table of reciprocals?
none wrote: Thu Aug 04, 2022 1:28 pmWasn't it 8 pixels? I think they did in Quake that way amongst other things because division would stall the Pentiums instruction pipeline if they did it too often, and that doesn't really apply here.
It said 16 in the chapter of the book I cited. I don't have any firsthand knowledge of Quake's source. The book is available for free online if you want to read it. There's a whole chapter about Quake, and it does talk about Pentium pipeline stuff.
Post Reply