Alp wrote:...Now if only I could make the 2nd sprite follow the first!

Are you familiar with the
MVC programming pattern? The basic idea is to separate the model and the view of your game world as much as possible. This means that pressing right on the controller shouldn't directly cause the X coordinate byte of an object's OAM entry to increase. Such hardcoded programming would make your code difficult to maintain, because everything is so stuck together that you can't make changes without affecting a huge part of the engine.
The model is the state of the game world, it's how the game is represented in memory. Things like level maps (including collision data) and objects (including position, speed, health, etc.) are part of the model. The view is just a graphical representation of that model, so that us humans can see what's happening in there, but the MODEL SHOULD BE ABLE TO EXIST AND FUNCTION WITHOUT A VIEW. Another interesting side effect of this design is that you can have several views of the same game world. This is what allows games like Doom or Duke Nukem 3D to be played in 3D mode or map mode. It's the same game world, only displayed differently. In Duke Nukem 3D there were also security cameras that would display other parts of the level, which is another type of view the game uses. None of this would be possible if the enemies were trying to draw themselves directly to your screen.
The NES is a machine with very limited resources though, so full separation isn't always possible. But sprites are certainly one of the things that should be abstracted from the main engine. Most games have a meta-sprite rendering routine. A meta-sprite is a list of sprites that represent one animation frame of a single object. So if your player uses 2 sprites at any given animation frame, each of his meta-sprite definitions will have 2 entries. Now, the main difference between these entries and the regular OAM entries, is that the coordinates of meta-sprites are relative to the object they represent, not absolute to the screen. There are many ways to optimize the space used by meta-sprites (automatic pattern increment - so that only the first index has to be defined, automatic coordinate increment of grid-arranged sprites, etc.), but for simplicity and full freedom each meta-sprite entry can be 4 bytes, just like normal OAM entries.
So, you have the player object as part of the model in RAM, and it has a single set of coordinates that specify where in the current room it is. When you call the mata-sprite rendering routine, it will loop through the entries of the selected meta-sprite and add the relative sprite coordinates to the object's coordinates in order to generate the final OAM entries that will be sent to the PPU later. This may sound a bit complicated, and it is at first, but once you have this routine ready, you'll never have to worry about manually updating 8 different sprites every time a big character moves... just call the meta-sprite routine and you're done!
Like I said before, this improves maintainability of your code. For example, if you decide to change the type of OAM cycling (since the NES can only show 8 sprites per scanline, it's common practice to rotate or randomize the OAM positions for all the objects every frame, so that all objects affected will flicker instead of some disappearing), you just have to modify this particular routine. If the sprite code was hardcoded in each object, you would have to revise every single object in the game, with big chances of missing something along the way.
Does that make sense? It may seem a little overwhelming, but the sooner you realize that math is more important than knowing what each opcode and register does, the sooner you'll be on the right track. You can always look up the documentation for opcodes and registers, but the internal workings of your game should always be clear to you.