The problem is that dealing with bank switching in CC65 is painful. There are two solutions, the EE solution and the programmer solution. The EE solution is "increase the linear range to 46kB", and can be implemented with a single 74 series chip
that costs 18c in single quantities from Mouser. Yes, it's stopgap, but it does in fact solve the problem as outlined. It also doesn't pose any problems for future-proofing — it is not contradictory to any other existing hardware, and "here's 48kB of PRG" couldn't mean anything else.*
The programmer solution is fixing CC65. While this is not
hard it
is a huge time sink, and since no one has mentioned that they're now investing effort into fixing CC65, I just don't understand the nay-sayers.**
It's not even like one'd need to do this the really hard way with the aforementioned call graph: you could implement something like Borland Turbo C++'s Medium (bankswitched code, no bankswitched data, one trampoline in fixed bank, all calls involve bankswitch) or Compact (bankswitched data, no bankswitched code, all data fetches involve bankswitch) memory models.
But I'm not volunteering, and until one of you do, I'm going to stick by the EE solution.
*
Someone's going to bring up the "why is it 48kB-2kB instead of 48kB-24 bytes?" complaint I read earlier. Answer: because the hardware implementation is the important implementation. Emulators could implement the full window but the concern is that would encourage the development of programs that would require significantly more expensive hardware.
**
If a programmer in your employ came up to you and said "Boss, I'm having problems writing bigger games and $simple_change would help me," that your reaction is No! You're going to use the same tools everyone else does! — well, it makes me glad I don't work for you.