I thought about it, and I wrote a program that can find the shortest (in bytes) program that takes the least number of cycles. There is a linear algorithm, but unfortunately it has very high constants, so it is really slow -- probably impractical. I'll see if I can speed it up and release it. EDIT: Turns out the program does work on smallish inputs.
Basically, it performs a graph search to find the shortest path to writing all the bytes, where the length is decided by cycle count, and ties are broken by byte length. A node is a 4-tuple of index, a, x, and y, which represents the current state of the registers and the current index in the buffer. The start node is all registers uninitialized and an index of -1 (nothing written). The goal node is any node with index at the end of the buffer. An edge goes from one state to another if there is an operation to go from the state to the other. Each operation is something like "lda #00; sta PPUDATA" or "stx PPUDATA".
The graph is a DAG -- the index always increases by 1 when going from one node to another. DAGs have a O(V) shortest path algorithm -- visit each node in topographic order. The shortest path to the node is the shortest path from any incoming edge.
So, the algorithm looks like this:
Code: Select all
Initialize a queue with the start node.
While there are nodes in the queue:
pop a node from the queue.
if the node's index is the last in the input buffer, check if it is the shortest.
otherwise,
Find all of the possible next nodes (nodes that are reachable with one operation. Up to 3 -- it only makes sense to consider
the shortest node for each register. If the value is already in a register, just store it)
For each possible next node:
add the node to the queue if it isn't in it already.
update the path to the node if it is cheaper to get to the node through the current node.
This algorithm is linear based on the number of nodes. To show that it is linear on the number of input bytes, consider the possible states at index i. One of the registers must be set to the input byte at i. Another can be uninitialized, or one of the other 255 values. The third can be uninitialized, or one of the remaining 254 values. (Note that the registers will never be equal). That gives 3 * 256 * 255 = 195840 possible states. This *is* a constant, so the algorithm is technically linear. In practice, the constant will be lower, so it is practical for < 1000 byte inputs.
I've attached the code. I'm on a Linux machine at the moment, but
I'll attach a Windows binary later I've attached a Windows binary. It should only take a second for most inputs.