I can't see the benefits of doing that. PLA takes 4 cycles, just as lda Absolute,x takes 4 cycles (I never cross page boundaries). Also, I'd have to change $2006 to point to different sections. Currently, each tile update takes 160 cycles (it can read from ROM or RAM, and the number also includes the time it takes for JSR/RTS and setting up pointers and bankswitching), which really isn't bad. All the other routines read from RAM. Plus, you'd have to make sure you're not destroying addresses when putting stuff on the stack and that would be variably complex.
EDIT: So I've implemented extended Vblank for a simple demo. It extends Vblank like 10 or 11 scanlines, with no status bar, but it doesn't look really weird or anything I don't think. In my opinion, it doesn't look that skewed compared to left clipping. I think with the top and bottom clipped, it's so small that you don't notice. But if the top and bottom aren't clipped (like in Nintendulator), you'll definitely notice it more.
http://www.freewebs.com/the_bott/ExtendedTest.nes
You can move around with the control pad and it does that cool Metal Storm effect with fake parallax.
Here's the deal. This is for NTSC only. NTSC TVs mostly have the top and bottom clipped off. And if they are shown, it will look like it does in Nintendulator. The only benefit I have from that would be that it hides vertical mirroring glitches just like left clipping does for horizontal mirroring. Otherwise it looks kind of skewed. However, for PAL there would be no extended Vblank, so there would be no skewing. So would I be safe extending Vblank to this length?
Also, this allows me to update 10 CHR RAM tiles instead of just 5. So I think this engine gives me a lot of flexibility which I'm thinking heavily outweighs the cost of having it slightly skewed looking.