Subroutines and Calling Conventions

Every non-trivial program needs a way to reuse fragments of code that are used in more than one place in the program. Subroutines allow the program to jump to another part of the program, execute some instructions, then jump back to where the subroutine was called from. This allows the program to have only one copy of the code, which is important for machines with small amounts of memory. It also allows a programmer to write a series of instructions once, and can be debugged in one place, instead of having to fix the instruction sequence throughout the program. Subroutines also let a programmer break down a complex problem into a set of smaller problems.

The PDP-11 instruction set includes the JSR (Jump to SubRoutine) function. This instruction takes two operands; the first is a register, and the second is the address of the subroutine. When the processor executes the JSR instruction, it pushes the value of the register operand to the stack, and stores the address of the instruction following the JSR instruction into the register. It then sets the program counter (PC) to the value of the destination operand, so the next instruction to be run is the first instruction of the subroutine.

The RTS (Return from SubRoutine) instuction does the opposite of this. It takes a register operand. When the processor executes the RTS instruction, it sets the PC to the value of the register operand, and pops the value from the top of the stack into the register operand.

In early PDP-11 software, it was customary to use R5 as the register operand, and would also be known as the "link register". As a trivial example, say we have a subroutine that multiplies the R0 register by two, by shifting its bits to the left. A program that wants to use this subroutine may look like the following

                                ; PROGRAM CODE HERE
        JSR     R5,TIMES2       ; CALL TIMES2 WITH R5 LINKAGE
                                ; MORE PROGRAM CODE HERE

TIMES2: ASL     R0              ; SHIFT LEFT ONE BIT
        RTS     R5              ; RETURN TO CALLER USING R5 LINKAGE

When the program reaches the JSR instruction, R5 is pushed to the stack, R5 is set to the address after the JSR instruction, and PC is set to the address of the ASL instruction. When the program reaches the RTS instruction, it sets the PC to the value stored in R5 (the address after the JSR instruction), and pops the value from the top of the stack into R5.

Subroutines are more useful if they can take parameters- for example, some parts of the program would want to multiply the value of a certain memory location by two, and some other parts of the program may want to multiply the value of a different memory location by two. Instead of writing two subroutines that do largely the same thing, a programmer can write one subroutine that takes one or more parameters.

There needs to be some agreement between the caller of the subroutine and the author of the subroutine about how these parameters are passed from the caller to the subroutine, and how results of the routine are returned to the caller. These agreements are called "calling conventions".

A simple calling convention would be "the R0 is used for the first input parameter, and upon return, R1 will contain the result. Use R5 as the linkage register". For example,

        MOV     #2000,R0        ; WE WANT TO MULTIPLY THE VALUE AT 2000
        JSR     R5,TIMES2       ; CALL THE SUBROUTINE

; MULTIPLY THE ADDRESS IN R0 BY TWO
; RETURN THE ORIGINAL VALUE IN R1
TIMES2: MOV     (R0),R1         ; COPY THE ORIGINAL VALUE TO R1
        ASL     (R0)            ; SHIFT THE MEMORY ADDRESS IN R0 LEFT
        RTS     R5              ; RETURN TO CALLER AT THE ADDRESS IN R5

The problem with this is that you may want to use R0 or R5 for something else; if you want to use those values later, you need to save them somewhere and later restore them. As a caller, you would need to know which registers the subroutine modifies, which may be different for each subroutine. Also, if there are more parameters than registers, this convention won't work.

A different calling convention used by early PDP-11 systems uses a bit of trickery: it puts the parameters at the address directly after the JSR instruction, and expects the subroutine to use the linkage register to find the parameter values, incrementing it along the way, so by the time RTS is called, the linkage register should hold the address of the instruction after the last parameter. For example

        JSR     R5,TIMES2       ; CALL THE SUBROUTINE
        .WORD   2000            ; FIRST PARAMETER

; MULTIPLY THE ADDRESS IN R0 BY TWO
; RETURN THE ORIGINAL VALUE IN R1
TIMES2: MOV     (R5),R1         ; R5 IS THE ADDRESS AFTER JSR. SAVE IT TO R1
        ASL     (R5)+           ; SHIFT THE VALUE AT THE ADDRESS IN R5,
                                ;   AND INCREMENT R5 TO POINT TO THE ADDRESS
                                ;   AFTER THE .WORD DIRECTIVE
        RTS     R5              ; RETURN TO CALLER AT THE ADDRESS IN R5

One of the downsides to this approach is that the address in this case is "burned in" to the program- if you wanted this part of the code to use a different address, the program would have to modify the value after the JSR instruction: if this program was in read-only memory, this would not be possible. Also- later PDP-11 machines do not allow the memory that holds instructions to be altered by other instructions.

Any register can be used as the "linkage" register, even PC. The above could also be written as

                                ; PROGRAM CODE HERE
        JSR     PC,TIMES2       ; CALL TIMES2 WITH PC LINKAGE
                                ; MORE PROGRAM CODE HERE

TIMES2: ASL     R0              ; SHIFT LEFT ONE BIT
        RTS     PC              ; RETURN TO CALLER USING PC LINKAGE

In this case, JSR pushes the current value of PC (the instruction after JSR) to the stack, then sets the PC to the address of the ASL instruction. At RTS, it sets the PC to the value stored in PC (which does nothing), and pops the value from the top of the stack into PC.

Another common calling convention was introduced in the UNIX operating system, and its associated C programming laguage developed on the PDP-11. Also known as the "C" calling convention, this uses the stack to pass parameters into the subroutine, using PC as the linkage register, and using R0 and R1 to return values. The caller is responsible for putting parameters on the stack and cleaning up the stack after the call. Parameters are pushed "right to left", in reverse order- if a subroutine needs three parameters, the caller pushes the third parameter, then the second, then the first. Our example here only has one parameter.

        MOV     #2000,-(SP)     ; PUSH 2000 TO STACK
        JSR     PC,TIMES2       ; CALL TIMES2 WITH PC LINKAGE
        ADD     #2,SP           ; CLEAN UP STACK

; MULTIPLY THE ADDRESS IN THE FIRST PARAMETER BY TWO
; RETURN THE ORIGINAL VALUE IN R0
TIMES2: MOV     @2(SP),R0       ; SAVE ORIGINAL VALUE IN R0
        ASL     @2(SP)          ; SHIFT LEFT ONE BIT
        RTS     PC              ; RETURN TO CALLER USING PC LINKAGE

Remember that the @n(Rx) is the deferred indexed addressing mode, and tells the processor that the operand is at the address given by adding n to the value of Rx. In this case, the address at the location of 2 plus SP. At the beginning of a subroutine, the top of the stack (ie, the address given by SP) contains the address where the program should return when RTS is executed. Since the stack grows "downward", 2(SP) is the word "underneath" the top of the stack, in this case, the value 2000 that was put there by the MOV instruction. @2(SP) means the source is the value at the address in 2(SP), in this case, the value is 2000. So, MOV @2(SP),R0 moves the value at 2000 into R0. Likewise, ASL @2(SP) tells the processor to shift the value at 2000 left. After returning, the caller should 'pop' the parameters from the stack. Adding 2 to the stack pointer effectively restores the top of the stack to the value it had before the call.

This may seem like an additional overhead for the caller to have to remember to clean up the stack- after all, a subroutine could be designed to manipulate the stack in a way that it cleans up after itself. But sometimes, the programmer may want to use the same parameters for a series of subroutine calls:

        MOV     #2000,-(SP)     ; PUSH 2000 AS FIRST PARAMETER
        JSR     PC,SUB1         ; CALL SUB1 WITH 2000 PARAMETER
        JSR     PC,SUB2         ; CALL SUB2 WITH 2000 PARAMETER
        JSR     PC,SUB3         ; CALL SUB3 WITH 2000 PARAMETER
        ADD     #2,PC           ; RESTORE THE STACK

If the subroutines were responsible for cleaning up the stack themselves, then the programmer would have to push 2000 to the stack before each call, because the subroutine would "helpfully" clean the parameter from the stack at the end of the call. Having the caller be responsible for this can speed up the program and provide the caller with more flexibility, at the expense of having the clean up the stack themselves.

A well-behaved subroutine should leave the other registers as they were before the call. A common idiom is to push registers modified in the subroutine to the stack at the beginning of the subroutine, and restore them from the stack at the end. For example

TIMES2: MOV     R3,-(SP)        ; STASH R3 IN STACK
        MOV     R4,-(SP)        ; STASH R4 IN STACK

        CLR     R3              ; MODIFY R3
        CLR     R4              ; MODIFY R4

        MOV     @6(SP),R0       ; SAVE ORIGINAL VALUE IN R0
        ASL     @6(SP)          ; SHIFT VALUE LEFT

        MOV     (SP)+,R4        ; RESTORE R4 FROM STACK
        MOV     (SP)+,R3        ; RESTORE R3 FROM STACK
        RTS     PC              ; TOP OF STACK IS RETURN ADDRESS
                                ; RESULT IS IN R0

Notice the value is no longer at 2(SP), but at 6(SP). This is because at the MOV @6(SP),R0 instruction, the top of the stack (SP) contains the original value of R4, 2(SP) contains the original value of R3, 4(SP) contains the return address put there by the JSR instruction, and 6(SP) contains the value put there by the MOV value,-(SP) call before the JSR. Thus, it is very important that just before the RTS instruction, the top of the stack contains the return address set by the JSR instruction. Popping R4 and R3 from the stack "undoes" the pushes of R3 and R4 at the beginning of the subroutine.

This calling convention is also used in later architectures- it is seen in in C code compiled for many 32-bit architectures. Today's 64-bit architectures have many registers, and use a calling convention that pass subroutine arguments in registers. C compilers allow a programmer to specify which calling convention to use- some default to this "C" convention (modified to suit the processor and OS architecture). An example of a variant "fast call" convention would put the first two parameters into R0 and R1, and the remainder of the parameters on the stack like "C". Since many subroutines need two or fewer parameters, this would save the time needed to push and pop parameters to and from the stack.

In the end, whichever convention is chosen, it is important that the convnentions used within a program are consistent and documented.