Kulor's Guide to Mode 7 Perspective Planes

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by rainwarrior »

An animation that adds framecount to S0, demonstrating an independant horizontal scaling:
zolly.gif
zolly.gif (2.15 MiB) Viewed 1910 times
Not exactly a "dolly zoom", but similar. We can see that the bottom and top centre-points are fixed, along with the bottom width, and the only thing that's changing is the top width (i.e. the horizontal field of view).

Simulating a real camera's dolly zoom would have a bit more complicated setup, involving pulling the camera back while simultaneously increasing both horizontal and vertical scale. The result of that would still boil down to a trapezoid that this can render, though, so even the real thing is doable if you want to put in the work.
Pokun
Posts: 2681
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by Pokun »

This thread is so amazing that it should be stickied or referred to from a relevant mode 7 page on the wiki or something like that.
Finally a homebrew deep-diving into the most famous display mode of the system!
ehaliewicz
Posts: 18
Joined: Thu Oct 10, 2013 3:30 pm

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by ehaliewicz »

rainwarrior wrote: Thu Aug 04, 2022 11:44 am
Incidentally, I have Abrash's Black Book in a stack that props up my trackball, so it's literally sitting right next to me.

The relevant stuff is in chapter 70 (Quake: a Post Mortem), in the subsection "Drawing the World". He describes calculating the full result only once per each 16-pixel horizontal span (or at any triangle edge), and interpolating between them.

In our case our horizontal spans are already linear, every scanline represents a constant depth along the plane. In the vertical case... 16 pixels is probably far too much to interpolate and have it look nice. Quake wasn't subdividing vertically, AFAIK. I think every second line will probably be fine (will report back after I try it), but I think much more than that would start to degrade quality quickly.

I think you can get away with more than just one scanline, even for flat or vertical surfaces. My engine on genesis does a similar trick. Textures have 4px wide columns, and I draw two of those columns for each perspective division, coming out to 8px for each affine chunk of columns, and I can't really see any distortion (maybe because the whole renderer is a little sloppy :) ). The key is that generally surfaces that have large perspective distortions also tend to cover less screen-space e.g. walls at narrow angles, although maybe that won't apply as much for mode 7-style planes.
User avatar
kulor
Posts: 33
Joined: Thu Mar 15, 2018 12:49 pm

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by kulor »

I still need to update the guide, but here's probably the final-final version of the script:

Code: Select all

var baseXOffset = 128;
var baseYOffset = 112;

var lerp = function(v0, v1, t) {
	return v0 + t * (v1 - v0);
};

var rad = function(d) {
	return d * Math.PI / 180;
};

var dist = function(cam, a) {
	return cam.y / Math.cos(rad(a));
}

var texelSpan = function(cam, a) {
	return Math.tan(rad(a)) * cam.y;
}

var setFovDependencies = function(cam) {
	cam.normalizedHeight = baseYOffset / Math.tan(rad(cam.fov / 2));
	cam.distanceToScale = 256 / (cam.normalizedHeight / Math.cos(rad(cam.fov / 2)));
	return cam;
}

var getm7y = function(cam, topa, btma, neg) {
	var i, ta, ba, td, bd;
	ta = topa;
	ba = btma;
	td = dist(cam, ta);
	bd = dist(cam, ba);
	if (neg) {
		i = (texelSpan(cam, ta) + texelSpan(cam, ba)) * ((cam.normalizedHeight / Math.cos(rad(cam.fov / 2))) / 224);
	}
	else {
		i = (texelSpan(cam, ta) - texelSpan(cam, ba)) * ((cam.normalizedHeight / Math.cos(rad(cam.fov / 2))) / 224);
	}
	var dd = td - bd;
	var ib = i - bd;
	if (dd == 0) {
		return 112;
	}
	else {
		return ib / dd * 223;
	}
}

var getModelViewMatrix = function(cam) {
	var p = cam.pitch;
	var w = 360 - cam.yaw;
	var x = cam.x;
	var y = cam.y;
	var z = cam.z;
	return [
		[Math.cos(rad(w)), Math.sin(rad(p)) * Math.sin(rad(w)), -Math.sin(rad(w)) * Math.cos(rad(p)), 0], 
		[0, Math.cos(rad(p)), Math.sin(rad(p)), 0], 
		[-Math.sin(rad(w)), Math.sin(rad(p)) * Math.cos(rad(w)), -Math.cos(rad(p)) * Math.cos(rad(w)), 0], 
		[
			x * -Math.cos(rad(w)) + z * Math.sin(rad(w)), 
			x * Math.sin(rad(p)) * -Math.sin(rad(w)) + y * -Math.cos(rad(p)) + z * Math.sin(rad(p)) * -Math.cos(rad(w)), 
			x * Math.sin(rad(w)) * Math.cos(rad(p)) + y * -Math.sin(rad(p)) + z * Math.cos(rad(p)) * Math.cos(rad(w)), 
			1
		]
	];
}

var getProjectionMatrix = function(cam) {
	//far and near clipping planes, hardcoding because they're not really used otherwise
	var n = 0.3;
	var f = 10000;
	return [
		[(1/Math.tan((cam.fov/2)*(Math.PI/180)))/(8/7), 0, 0, 0], 
		[0, 1/Math.tan((cam.fov/2)*(Math.PI/180)), 0, 0], 
		[0, 0, -(f + n) / (f - n), -1], 
		[0, 0, -2 * n * f / (f - n), 0]
	];
}

var pointTimesMatrix = function(p, m) {
	return [
		m[0][0] * p[0] + m[1][0] * p[1] + m[2][0] * p[2] + m[3][0] * p[3], 
		m[0][1] * p[0] + m[1][1] * p[1] + m[2][1] * p[2] + m[3][1] * p[3], 
		m[0][2] * p[0] + m[1][2] * p[1] + m[2][2] * p[2] + m[3][2] * p[3], 
		m[0][3] * p[0] + m[1][3] * p[1] + m[2][3] * p[2] + m[3][3] * p[3]
	];
}

var normalizePoint = function(p) {
	return [p[0]/p[3], p[1]/p[3], p[2]/p[3], p[3]/p[3]];
}

var calcPlane = function(cam) {
	//prep (per-frame)
	var da = 90 - cam.pitch;
	var hfov = cam.fov / 2;
	var topa = da + hfov;
	var btma = da - hfov;
	var negedge = btma < 0;
	btma = Math.abs(btma);
	//centering offset (per-frame)
	var lineoffs = getm7y(cam, topa, btma, negedge);
	//rectangle(125, lineoffs-3, 6, 6, "green", true)
	//texel recentering (per-frame)
	topa = Math.abs(topa);
	var camcenter = (-Math.tan(((cam.pitch - 90) * Math.PI) / 360) * cam.normalizedHeight) * (cam.y / cam.normalizedHeight);
	var voffscentercomp = Math.cos(cam.yaw/180 * Math.PI) * camcenter;
	var hoffscentercomp = -Math.sin(cam.yaw/180 * Math.PI) * camcenter;
	//rotation (per-frame)
	var a = rad(cam.yaw);
	//scale part 1 (per-frame)
	var topdist = dist(cam, topa);
	var btmdist = dist(cam, btma);
	//scale part 2 (per-scanline)
	var sl = lerp(1/topdist, 1/btmdist, scanline / 223);
	var scale = (1/sl) * cam.distanceToScale;
	
	var ret = {};
	ret.sl = sl;
	ret.m7a = Math.cos(a) * scale;
	ret.m7b = Math.sin(a) * scale;
	ret.m7c = -Math.sin(a) * scale;
	ret.m7d = Math.cos(a) * scale;
	ret.m7hofs = -baseXOffset + hoffscentercomp + cam.x;
	ret.m7vofs = -baseYOffset + (112 - lineoffs) - voffscentercomp - cam.z;
	ret.m7x = 128 + m7hofs;
	ret.m7y = m7vofs + lineoffs;
	return ret;
}

var groundCam = setFovDependencies({
	x: Math.sin(((framecount / 10) % 256) / 256 * 2 * Math.PI) * 512,
	y: var2 * 5,
	z: Math.cos(((framecount / 10) % 256) / 256 * 2 * Math.PI) * 512,
	fov: 60,
	pitch: var1,
	yaw: 360 - var3 * 5,
});

//Do sprite transforms
var objs = [
	{
		x: 0, y: 0, z: 0
	},
	{
		x: 50, y: Math.abs(Math.sin(framecount / 30)) * 50, z: 50
	},
	{
		x: 100, y: Math.abs(Math.sin(framecount / 30)) * 100, z: 100
	}
]
for (var e in objs) {
	var cur = objs[e];
	var matmv = getModelViewMatrix(groundCam);
	var matp = getProjectionMatrix(groundCam);
	var intermediary = pointTimesMatrix([cur.x, cur.y, cur.z, 1], matmv);
	var transformed = pointTimesMatrix(intermediary, matp);
	var normalized = normalizePoint(transformed);
	cur.ny = 112 + normalized[1] * 112;
	if (normalized[2] < 1) {
		var sprxpos = 128 + normalized[0] * 128;
		var sprypos = 112 + normalized[1] * 112;
		rectangle(sprxpos - 3, 224 - sprypos - 3, 6, 6, "green", true);
	}
}

//Do plane effect
var gp = calcPlane(groundCam);

if (gp.sl > 0) {
	m7a = gp.m7a;
	m7b = gp.m7b;
	m7c = gp.m7c;
	m7d = gp.m7d;
	m7x = gp.m7x;
	m7y = gp.m7y;
	m7hofs = gp.m7hofs;
	m7vofs = gp.m7vofs;	
}
else {
	m7a = 0x0000;
	m7b = 0x0000;
	m7c = 0x0000;
	m7d = 0x0000;
	m7x = 128;
	m7y = 112;
	m7hofs = 0;
	m7vofs = 0;
}
return [m7a, m7b, m7c, m7d, m7x, m7y, m7hofs, m7vofs];
Image

Variable FOV!
Well, for some reason it breaks down at FOV < 15...but I don't really care enough to try to fix it. I doubt any actual SNES games would bother to calculate variable FOV anyway, but at least now this "ideal" isn't locked to an FOV of 60.

And, for good measure, the now-infamous SNES dolly zoom:
Image
none
Posts: 117
Joined: Thu Sep 03, 2020 1:09 am

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by none »

I've noticed that in my previous tries I've not been checking if all the values fit in their respective ranges when you do the math in 16 bits.
It's actually a little more tricky than I expected to achieve acceptable precision.

Here's a code snippet for comparing a few different variations. Using the "mode" variable, you can choose one of three:
  • division: this one could be implemented with hw division, but since division would be unsigned, would need additional code for correcting for the sign in the numerator (the denominator is always unsigned). This is the most accurate but is probably also slowest. Per scanline cost is five 16 bit additions, four 16 bit signed by 8 bit unsigned divisions.
  • hardware multiplication with reciprocal: this multiplies B and D with 1/z using the mode7 registers. it should be the fastest and least acccurate of the three methods. Per scanline cost: two 24 bit + 16 bit additions, one 16 bit addition, one table lookup, some bit shifting, and two 8 bit signed by 8 bit unsigned multiplications. if a multiplication LUT is provided, both of those could be done via indirect HDMA.
  • software multiplication with reciprocal: this multiplies B and D with 1/z on the CPU. It is similar but not identical to kulors solution. Needs five 16 bit additons, some bit shifting, two 8x8 multiplications which cannot be offloaded to indirect HDMA, and two which can be offloaded to indirect HDMA.

Of course you could mix these, for example using "real" division for A and C and multiplication with reciprocal for B and D.

When multiplying with a reciprocal, the value range of the result of the table lookup needs to fit in 8 bits. This is not good enough to be both precise and have a good view distance. Because of that, there is a "dscale" parameter which controls this trade off. It would be possible to change this parameter a few times each frame, however for each setting, a separate LUT for 1/z and also a different variation of the per scanline function, with different bit shifting would be needed, providing a better accuracy vs. performance ratio at the cost of a few kilobytes of rom space.

Code: Select all


var mode = 0;     // 0 for division
                  // 1 for hw multiplication with reciprocal
                  // 2 for sw multiplication with reciprocal

var dscale = 0;   // set to 0, 1, 2, 3, ... for accuracy / view distance tradeoff
                  // only necessary for reciprocal modes

var radix = 128;  // for signed 16 bit numerator
var radix2 = 256; // for unsigned 8 bit denominator



function shift(a, b){return(b > 0 ? floor(a >> b) : floor(a << -b))}

function cos(a) {return(Math.cos(a))}
function sin(a) {return(Math.sin(a))}
function floor(a) {return(Math.floor(a))}
function highbyte(a) {return(floor(a / 256))}
function high10bit(a) {return(floor(a / 16))}
function clamp(a) {return floor(a<0?0:a>255?255:a)}
function clamps(a) {return floor(a<-32768 ?0:a>32767 ?32767 :a)}
function clampu(a) {return floor(a<0 ?0:a>65535 ?65535 :a)}

function lut_reciprocal(a) {return clamp(shift(256 * 256 * radix2 / radix, dscale) / high10bit(a)) }

// setup

var FOV = 90;
var forward = 128 / Math.tan(FOV * (Math.PI * 2 / 360) / 2);

var yaw = (var1 + framecount * 0.1) * Math.PI / 180;
var pitch = var2 * Math.PI / 180;

var camera_x = framecount;
var camera_y = 0;
var camera_z = var3;

// sprite stuff

var sprite_x = -8;
var sprite_y = 40;
var sprite_z = 0;

sprite_x = camera_x - sprite_x;
sprite_y = camera_y - sprite_y;
sprite_z = camera_z - sprite_z;

var tf_sprite_x = sprite_x * cos(yaw) + sprite_y * sin(yaw);
var tf_sprite_y = sprite_x * -sin(yaw) * cos(pitch) + sprite_y * cos(yaw) * cos(pitch) + sprite_z * -sin(pitch);
var tf_sprite_z = sprite_x * -sin(yaw) * sin(pitch) + sprite_y * cos(yaw) * sin(pitch) + sprite_z * cos(pitch);

var ss_sprite_x = tf_sprite_x * forward / -tf_sprite_y + 128;
var ss_sprite_y = tf_sprite_z * forward / -tf_sprite_y + 112;

rectangle(ss_sprite_x - 8, ss_sprite_y - 8, 16, 16);

// mode 7 stuff

// constant across the frame

var dx = cos(yaw) * camera_z;
var dy = sin(yaw) * camera_z;

var ax = forward * -sin(yaw) * cos(pitch);
var ay = forward * cos(yaw) * cos(pitch);
var az = forward * sin(pitch);

var bx = sin(yaw) * sin(pitch);
var by = -cos(yaw) * sin(pitch);
var bz = cos(pitch);

// scale values for subpixel precision
// and simulate fixed point math

dx = clamps(dx * radix); dy = clamps(dy * radix);
ax = clamps(ax * radix); ay = clamps(ay * radix); az = clampu(az * radix2);
bx = clamps(bx * radix); by = clamps(by * radix); bz = clampu(bz * radix2);

camera_x = floor(camera_x); camera_y = floor(camera_y);

// per scanline

var cx = clamps(ax + (scanline - 112) * bx);
var cy = clamps(ay + (scanline - 112) * by);
var cz = clampu(az + (scanline - 112) * bz);


if (mode == 1) {

var icz = lut_reciprocal(cz);

var point_center_x = shift(highbyte(cx) * camera_z, 4 - dscale);
var point_center_y = shift(highbyte(cy) * camera_z, 4 - dscale);

var offset_x = shift(dx * icz, 12 - dscale);
var offset_y = shift(dy * icz, 12 - dscale);

m7a = offset_x;
m7b = point_center_x;
m7c = offset_y;
m7d = point_center_y;
m7x = -floor(camera_x / 16);
m7y = -floor(camera_y / 16);
m7hofs = -floor(camera_x / 16) -128;
m7vofs = -floor(camera_y / 16) -icz - scanline;

} else if (mode == 2) {

camera_x = floor(camera_x * 16 / shift(camera_z, -dscale));
camera_y = floor(camera_y * 16 / shift(camera_z, -dscale));

var icz = lut_reciprocal(cz);

var point_center_x = camera_x + shift(highbyte(cx) * icz, 4);
var point_center_y = camera_y + shift(highbyte(cy) * icz, 4);

var offset_x = shift(dx * icz, 12 - dscale);
var offset_y = shift(dy * icz, 12 - dscale);

m7a = offset_x;
m7b = point_center_x;
m7c = offset_y;
m7d = point_center_y;
m7x = 0;
m7y = 0;
m7hofs = -128;
m7vofs = -shift(camera_z, -dscale) - scanline;

} else {

camera_x = floor(camera_x * 16 / camera_z);
camera_y = floor(camera_y * 16 / camera_z);

var point_center_x = camera_x + floor(cx / highbyte(cz)) * (radix2 / radix);
var point_center_y = camera_y + floor(cy / highbyte(cz)) * (radix2 / radix);

var offset_x = floor(dx / highbyte(cz)) * (radix2 / radix);
var offset_y = floor(dy / highbyte(cz)) * (radix2 / radix);

m7a = offset_x;
m7b = point_center_x;
m7c = offset_y;
m7d = point_center_y;
m7x = 0;
m7y = 0;
m7hofs = -128;
m7vofs = -camera_z - scanline;

}


return [m7a, m7b, m7c, m7d, m7x, m7y, m7hofs, m7vofs];

Also, I've investigated how sign correction could be done for the signed multiplication / division. The addition for interpolating of the numerator, and then correcting for the sign, can be done in exactly 13 cycles, so it fits into the division waiting period perfectly. This leads me to believe that if you do not want to use a multiplication LUT, actually division could be preferable over multiplication with reciprocal performance wise (because it can skip the 1/z table lookup).

On the other hand, it should be possible to precalculate the y coordinate of the scanlines at which, when interpolating, the signs of the numerators flip, divide the screen into different sections based on that and use a different per scanline routine for each case which would avoid the sign correction issue. This would of course have some additional setup cost.
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by rainwarrior »

Still needs some refinement, but I've got my method more or less working.
dizworld3.png
dizworld3.png (30.23 KiB) Viewed 1427 times
video demonstration
source code

As I mentioned above, it more or less boils down to 1 divide and 4 multiplies per line to generate the ABCD entries. I'd like to try and improve its accuracy some more, it's a bit wobbly, but it's pretty close to what I want. Similarly it's almost as fast as I'd like it to be, but I'm confident I'll find some optimization for it. I have a few ideas how to improve it on both fronts, but it will take some experimentation. It's tricky, because I've tried to map the values for the divide/multiply operations into ranges suitable for the hardware 16/8 and 8x8 units, but I think I can find a little more breathing room in there somewhere.
creaothceann
Posts: 611
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by creaothceann »

I'm pretty sure the 2D part of the sky should be scrolling to the right when you turn left, and vice versa. :wink:
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by rainwarrior »

Yes, I already have it going the other way in the source.

I had quickly thrown in scroll=angle for the sky at that point just to have something to test, but hadn't adjusted it yet. I'm more concerned about the guts of the mode 7 perspective than aesthetic details at the moment tho...
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by rainwarrior »

Replaced the 1 / 8-bit z hardware divide with a 1 / 12-bit z table divide (code), and that clears up my precision issues nicely: video comparison.

Basically I was interpolating 1/z as an 8.8 fixed point, and with the table it becomes a 12.4 fixed point. Before the divide, everything past the point gets discarded. Even though the result of the divide is still only 8-bit, having more precision on the reciprocal fills in a lot of "in between" values that I couldn't hit with just an 8 bit denominator. Also lets me bake a clamp to $FF for the low values.

Takes about the same amount of cycles as the hardware divide did, too, so no performance change.

The 12-bit table is 4k. Bigger than that doesn't seem to offer any improvement. It'd be easy to use 11 or 10 bits instead. 10-bit at 1k was not too bad, but I don't feel like 4k is much of a space burden for this to begin with.

So, I'm happy with the result, and it's quite viable as-is, but I still want to improve performance a bit more. Or if not, maybe more interpolation is an option. I should see if 4x rather than 2x interpolation looks good. An interpolated line currently takes less than half as much time as a calculated perspective line. Once I'm happier with the performance I'll work up finishing up the demo with an animated sprite and a few other useful things.
none
Posts: 117
Joined: Thu Sep 03, 2020 1:09 am

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by none »

rainwarrior wrote: Sun Aug 14, 2022 2:50 pm Replaced the 1 / 8-bit z hardware divide with a 1 / 12-bit z table divide (code), and that clears up my precision issues nicely: video comparison.

Basically I was interpolating 1/z as an 8.8 fixed point, and with the table it becomes a 12.4 fixed point. Before the divide, everything past the point gets discarded. Even though the result of the divide is still only 8-bit, having more precision on the reciprocal fills in a lot of "in between" values that I couldn't hit with just an 8 bit denominator. Also lets me bake a clamp to $FF for the low values.
Yes, this is what i found too. If you do use the hw 16/8 dividsion, you need to divide each value separately to get enough precision. But the result will be better than multiplication with 1/z then. I can also confirm the 12 bit thing with the LUT.
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by rainwarrior »

Re: interpolation. I'm finding that 2x interpolation looks just fine, but 4x starts to amplify precision errors unpleasantly, especially at the bottom of the screen. Just kinda looks like a "ripple" across specific lines of the screen.

Not a huge deal, but enough that I personally don't want to go higher than 2x, after trying it. Depends on your performance requirements, maybe worthwhile if you really need the CPU. An interpolated line is more than twice as fast as a calculated one for me, at the moment.

I get some rippling effect with or without interpolation, due to the lack of precision, but basically the height of the ripple is multiplied by the interpolation. At 2x it doesn't really feel significantly worse to me than the 1x version... but 4x I do notice it more.
ehaliewicz
Posts: 18
Joined: Thu Oct 10, 2013 3:30 pm

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by ehaliewicz »

rainwarrior wrote: Sun Aug 14, 2022 6:54 pm Re: interpolation. I'm finding that 2x interpolation looks just fine, but 4x starts to amplify precision errors unpleasantly, especially at the bottom of the screen. Just kinda looks like a "ripple" across specific lines of the screen.

Not a huge deal, but enough that I personally don't want to go higher than 2x, after trying it. Depends on your performance requirements, maybe worthwhile if you really need the CPU. An interpolated line is more than twice as fast as a calculated one for me, at the moment.

I get some rippling effect with or without interpolation, due to the lack of precision, but basically the height of the ripple is multiplied by the interpolation. At 2x it doesn't really feel significantly worse to me than the 1x version... but 4x I do notice it more.
Perhaps you could switch to 4x interpolation further away?
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by rainwarrior »

I dunno, it's noticeable in the distance too, and 4x interpolation is a diminishing return... only has 1/2 as much speedup as 2x interpolation did. Doing half the screen again would mean it only makes 1/4 of the difference... So it'd be a bit fussy in code for maybe not much gain, and you don't save any of the overhead costs either.

Mostly I just wanted to test interpolation at various scales until I thought it made it noticeably worse. My verdict was 2x is fine, but 4x is too much.

It's hard to explain in words. Here's a video: video demonstration

That's 1x, 2x and 4x side by side. 1x is 120% CPU load per scanline, 2x is 90%, 4x is 75%.

The visual artifact I object to is a strong ripple that you can see just above the status numbers. All 3 have ripples but the strength increases with interpolation. Hopefully the video isn't too compressed to see this. (I linked mastodon because it doesn't recompress like twitter.)
Last edited by rainwarrior on Mon Aug 15, 2022 3:32 pm, edited 1 time in total.
creaothceann
Posts: 611
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by creaothceann »

(You could also use imgur.)

IMO it seems plausible that even the 4x version could have been used back in the day (especially if it's not just vertical scrolling but free rotation). This still looks better than the PSX texture warping, which people were mostly fine with.
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Kulor's Guide to Mode 7 Perspective Planes

Post by rainwarrior »

I'm leaving it in my demo as an option. It's just that interpolation is only useful up to a point. Like, it's not a magic way to get CPU down to zero. It has some pretty strong limitations.

Versus my full perspective calculation, interpolating a line takes about 50% as much time. Versus the abbreviated version (non-independent vertical scale) it's more like 60%. Versus a non-rotating version it's maybe 80%. There's also a pretty non-trivial amount of overhead before the per-scanline stuff begins.

So, at 2x working on only half the lines, it's saving only 25% of the total per-scanline. At 4x this only increases to 37%, despite tripling the amount of less-accurate interpolated lines.

The benefits are even less if you want to accept the vertical scale compromise (as many games seemed to)... and that compromise is actually pretty reasonable, since it gives a field of view where a square is still square, locally. Straying very far from this tends to look distorted, fisheye, etc.

So, I'm not really stamping 4x interpolation as acceptable or not, I'm just trying to get a feel for how good it is. I wanted to know: is the CPU gain enough to offset the loss in quality? Depends on your situation, but 2x seems to have much better value vs. the tradeoff.

There are other compromises that can be taken to address CPU usage. You can lower the horizon or raise the bottom and reduce the number of lines to draw. You can run at 30hz instead of 60hz, etc.
Post Reply