ARM Assembler Question

Discussion of development of software for any "obsolete" computer or video game system. See the WSdev wiki and ObscureDev wiki for more information on certain platforms.
User avatar
nicklausw
Posts: 376
Joined: Sat Jan 03, 2015 5:58 pm
Location: ...

ARM Assembler Question

Post by nicklausw »

(Assuming the Nintendo DS is "obsolete" so this thread can be in Other Retro Dev, move if need be).

I wanted to check out ARM assembly language, but don't have any money for a Raspberry Pi (my preferred method), so I decided to go with the Nintendo DS using libnds.

ARM makes sense to some extent, but due to problems with GAS not letting me grab just #defines from a file and ignore C code, if I continue with the NDS then C will be the primary language rather than the assembler.

That's not exactly what I'm wondering about, though. What I want to know is why GAS chooses to have registers "point to pointers" instead of be pointers.

As an example, for iprintf("HI!");, GAS would generate:

Code: Select all

ldr r0, [message_pointer]
bl iprintf

message_pointer:
.word message

message:
.ascii "HI!\0"
The pointer isn't necessary, though, since afaik registers can be pointers for themselves:

Code: Select all

ldr r0, =message
bl iprintf

message:
.ascii "HI!\0"
This is kind of a difficult question now that I think about it (and may or may not be kind of an excuse to have people to talk about ARM with) but does anyone know why GAS chooses to manually make pointers rather than just skip that step of the assembly process entirely? Is there a good reason, or is that just a random consequence of having a computer make code for you?
lidnariq
Site Admin
Posts: 11803
Joined: Sun Apr 13, 2008 11:12 am

Re: ARM Assembler Question

Post by lidnariq »

ARM has an explicit instruction for "pull out the 32-bit pointer from this 16-bit offset in memory", but no "just load a 32-bit number directly into a register" (The latter is a pseudoinstruction that might string several individual instructions together, depending on the specific numeric value of the pointer)
tepples
Posts: 22993
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)

Re: ARM Assembler Question

Post by tepples »

Probably because of the literal pool thing that ARM does. An immediate operand is limited to 8 bits but can be shifted, but pointers are 32 bits. There are two workarounds:
  1. The MIPS approach of constructing a constant in two steps with a load immediate shifted followed by an OR immediate. This works well on MIPS because constants can be 16 bits, with an optional 16-bit left shift only for load immediate (lui). On ARM, this can be effective for Game Boy Advance and Nintendo DS MMIO ports in 0x04000000-0x040003FC or 0x04800000-0x048003FC because they can be expressed as (0x40 << 20) | (regaddr << 2). But generic addresses aren't necessarily easy to build this way.
  2. Store the pointer in a table between one subroutine and the next and load it using PC-relative addressing. The assembler offers a ldr rxx, =whatever syntax to add the address to a nearby pool and generate a load instruction.
User avatar
nicklausw
Posts: 376
Joined: Sat Jan 03, 2015 5:58 pm
Location: ...

Re: ARM Assembler Question

Post by nicklausw »

Confusion ensues.
An immediate operand is limited to 8 bits but can be shifted, but pointers are 32 bits.
Would this be an immediate operand? Because it loads a 32-bit number and assembles just fine: ldr r3, =0xffffffff
User avatar
thefox
Posts: 3134
Joined: Mon Jan 03, 2005 10:36 am
Location: the universe

Re: ARM Assembler Question

Post by thefox »

Assemble the code, then disassemble (you can use e.g. objdump to disassemble an object file). You should see that your instruction has changed into a different form.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
tepples
Posts: 22993
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)

Re: ARM Assembler Question

Post by tepples »

nicklausw wrote:
An immediate operand is limited to 8 bits but can be shifted, but pointers are 32 bits.
Would this be an immediate operand? Because it loads a 32-bit number and assembles just fine: ldr r3, =0xffffffff
The mvn instruction loads a number and EORs it with 0xFFFFFFFF it before storing it. For example, mvn r3, #0x000B puts 0xFFFFFFF4 into r3.

Because ldr rxx,=value is a macro, it can do any of several things. Usually it'll be assembled to mov, mvn, mov then orr, mvn then bic, or ldr from a constant pool. The last is most likely in Thumb.
User avatar
nicklausw
Posts: 376
Joined: Sat Jan 03, 2015 5:58 pm
Location: ...

Re: ARM Assembler Question

Post by nicklausw »

Oh! So the "double-pointers" take up less space and are faster then? Okay, I had it all backwards.

Is using immediate addressing that much of a slow-down, though, if at all? Because having to have a section of code just for pointers is not ideal.
lidnariq
Site Admin
Posts: 11803
Joined: Sun Apr 13, 2008 11:12 am

Re: ARM Assembler Question

Post by lidnariq »

The literal pool takes two 32-bit memory cycles. (or one 16-bit plus one 32-bit in thumb mode)
Two in-line instructions take two 32-bit memory cycles (or ... some number of 16-bit fetches in thumb mode)

More to the point, if you try to prevent the compiler and/or assembler from using a literal pool, you're going to be fighting it the whole way.
User avatar
nicklausw
Posts: 376
Joined: Sat Jan 03, 2015 5:58 pm
Location: ...

Re: ARM Assembler Question

Post by nicklausw »

lidnariq wrote:More to the point, if you try to prevent the compiler and/or assembler from using a literal pool, you're going to be fighting it the whole way.
Yeah, I'm not really trying to get the compiler to be evasive with that type of thing. If libnds functions will run like that, then I never have to look at that machine code mess, so it doesn't really concern me. It's just that I don't want to be redundant with assembler code written on my own.
tepples
Posts: 22993
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)

Re: ARM Assembler Question

Post by tepples »

Premature optimization is a root of all kinds of evil. Get it working first, and then get it fast once it's working.
User avatar
Bregalad
Posts: 8181
Joined: Fri Nov 12, 2004 2:49 pm
Location: Divonne-les-bains, France

Re: ARM Assembler Question

Post by Bregalad »

This is a problem typical to RISC processors. The entire design is based on a word size, which is also the instruction size and register size. Additionally, ALL instructions are that word size, and there is no instructions that (directly) uses multiple words. As such, it is impossible to have a "load immediate into register" instruction, because this immediate already takes the whole word (in this case, 32-bit).

The only solution is to have a two-word instruction, but this is also impossible because RISC philosophy says all instructions should be 1 word long, supporting variable length instruction would make it much harder to pipeline the processor.

The solution used by ARM is to have the "second word of the instructions" (in this case, your pointer) stored not in the code, but right after the code. (this was an arbitrary decision, it could go before and work just as well). This turned out to be very practical for romhacking, as there is a pool of parameters used in a function right to eachother, which often makes you able to change things without even disassemble the routine :)

Note that RISC was never made to save memory (on the other hand, it wastes a lot of ROM as programs are stored very inefficiently - except for THUMB mode in ARM where it gets decent). It was made to simplify instruction decoding within the processor in order to get them to run faster.

Also note that there is NO pointer to your pointer. It is just an instruction relative to the program counter (r15). ARM assembly uses many pseudom-instructions, which acually are compiled to different instructions. Google "arm pseudo instruction" to get more details.

In this case

Code: Select all

lda r0, =something
is probably equivalent to something like

Code: Select all

here:
    lda r0, [r15], #something-here-8
The PC is always 2 words (8 bytes) ahead because of the pipeline.
Last edited by Bregalad on Tue May 17, 2016 8:58 am, edited 1 time in total.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: ARM Assembler Question

Post by AWJ »

AFAIK Hitachi SuperH (used by the Saturn and Dreamcast) also uses literal pools.
User avatar
Jarhmander
Formerly ~J-@D!~
Posts: 570
Joined: Sun Mar 12, 2006 12:36 am
Location: Rive nord de Montréal

Re: ARM Assembler Question

Post by Jarhmander »

Bregalad wrote:In this case

Code: Select all

lda r0, =something
is probably equivalent to something like

Code: Select all

here:
    lda r0, [r15], #something-here-8
The PC is always 2 words (8 bytes) ahead because of the pipeline.
Hell no, not only you fetch the wrong word, you'll corrupt the PC, or it will fault. This is post-indexed addressing, instead of regular offset addressing, which is the only form accepted for base addresses based on PC.
So it's more like:

Code: Select all

    ldr r0, [pc, #off-8]
The -8 thing is true, PC is "ahead" because of pipeline. This is important to consider upon receiving imprecise faults (if I remember correctly!), the old PC points after the faulty instruction.
((λ (x) (x x)) (λ (x) (x x)))
User avatar
nicklausw
Posts: 376
Joined: Sat Jan 03, 2015 5:58 pm
Location: ...

Re: ARM Assembler Question

Post by nicklausw »

Um...does anyone have any idea why lr might magically turn into pc in a subroutine? Because I have a problem where my subroutines will randomly turn into a bx lr loop sometimes, and I can't figure things out at all. Not sure what other information to provide.
User avatar
nicklausw
Posts: 376
Joined: Sat Jan 03, 2015 5:58 pm
Location: ...

Re: ARM Assembler Question

Post by nicklausw »

nicklausw wrote:Um...does anyone have any idea why lr might magically turn into pc in a subroutine? Because I have a problem where my subroutines will randomly turn into a bx lr loop sometimes, and I can't figure things out at all. Not sure what other information to provide.
Update, I figured this one out on my own.

Putting:

Code: Select all

stmfd  sp!, {lr}
at the beginning of subroutines, and:

Code: Select all

ldmfd  sp!, {pc}
at the end prevents recursive lr's. Now to figure out what the crap "stmfd" and "ldmfd" mean.