Assembly Language

The assembly language is close to the heart of the instruction set you will program for, for example for our SUBLEQ instruction set it is pretty simple. We don't actually have general purpose registers or any other operations.

START:
  subleq 7,6, END
  subleq 8,8, START
  subleq 3,3, 0
END:
  subleq 8,8, END

After we compile the program the actual machine code will be 7 6 9 8 8 0 3 3 0 8 8 9

If we write a program for a processor that implements the RISC-V (RISC Five) instruction set we have access to 32 registers, and all kinds of operations, add, subtract, shift etc, we can load from RAM into register, store from register into ram, and so on. Those operations are common on almost all modern CPUs, but they differ slightly and each architecture has its own assembler language.

Lets examine the same count to 3 program but in RISC-V assembly:

addi x5, x0, 3
loop: 
  addi x5, x5, -1
  bne x5, x0, loop
end:
  jal x0, end

Takes a second to get used to the symbols. Don't panic.

First we start with addi x5, x0, 3. x5 is one of the general purpose registers we could use, addi takes 3 parameters, destination register (rd), source register (rs) and an immidiate value (imm), it adds the source register plus the immidiate value and stores the result into the destination register rd = rs + imm. x0 is a special zero register, you always read zero from it, you can write to it, and it does nothing, its always zero, so addi x5, x0, 3 is the same x5 = zero + 3 so x5 will become 3.

Then we have addi x5, x5, -1 which is x5 = x5 + -1 which decrements x5, in the first iteration it goes from 3 to 2.

bne x5, x0, loop means if x5 != x0: jump to loop, so if the content of x5 is the same as x0 it will set the program counter to where the label loop is. The computer does not understand labels, in RISC-V the branch instructions are relative to the branch instruction itself, and also in the RISC-V32I we use all instructions are 32 bit, or 4 bytes, so bne x5, x0, loop will be compiled to bne x5, x0, -4, and branch means set the program counter to some value, if x5 != x0: pc = pc - 4. The assembler must know where things are going to be, where each instruction is in memory and how big it is, in order to calculate where the labels are.

jal x0, end means x0 = pc + 4; pc = pc + end, or store the next instruction address in x0, and set the program counter to wherever the label end is, again the instruction is relative, and in our case we want to jump to ourselves, so x0 = pc + 4; pc = pc + 0. JAL means Jump And Link, it is usually used with x1, also called the return address register, or ra, so that you can jump into a subroutine and then from there you want to come back to continue your program, but in our case we dont want to remember, we just want to jump, so we link to the zero register x0.

The compiled program will be 0x00300293 0xfff28293 0xfe029ee3 0x0000006f or as decimal 3146387 4294083219 4261584611 111. The processor will fetch one instruction, decode it, and execute it, then go to the next one, wherever the program counter is set to. Very similar to our SUBLEQ processor, but we did not have the "decode" step, because we had only one instruction, to decode it means basically to pick a mini program to be executed from the control unit.

The same program written for other architectures:


ARM:

    mov r5, #3
  loop:
    sub r5, r5, #1
    cmp r5, #0
    bne loop
  end:
    b end


x86:

    mov ecx, 3
  loop:
    dec ecx
    cmp ecx, 0
    jne loop
  end:
    jmp end


Z80:

    ld a, 3
  loop:
    dec a
    cp 0
    jr nz, loop
  end:
    jr end


6502:

    lda #3
    sta count
  loop:
    dec count
    lda count
    cmp #0
    bne loop
  end:
    jmp end
  
  count:  .byte 0

The idea is the same, they are different and yet they are the same. In this book we will use RISC-V because I think it is the coolest one, it is open source, and it is very very well thought, there are hundreds of emmulators and simmulators for it, and there are many very very cheap computers like esp32c3 which uses it.

Before we continue I will explain the most important RISC-V instructions.

I will actually ask Claude to write a list of the important instruction with their explanations, since RISCV is an open source project, Claude has been trained on it for sure, and I know enough to know when its wrong. The prompt I used: i want to add most important riscv instructions to my book, can you make a list with descriptions, explanations and also examples please.

Essential RISC-V Instructions

Arithmetic Instructions

ADD (Add)

Format: add rd, rs1, rs2
Description: Adds the values in two source registers and stores the result in the destination register
Example:
```
add x5, x6, x7    # x5 = x6 + x7
```

ADDI (Add Immediate)

Format: addi rd, rs1, immediate
Description: Adds a 12-bit immediate value to a source register and stores the result in the destination register

Example:

addi x5, x6, 10    # x5 = x6 + 10
addi x5, x0, 42    # Load immediate value 42 into x5

SUB (Subtract)

Format: sub rd, rs1, rs2
Description: Subtracts the value in rs2 from rs1 and stores the result in rd
Example:
```
sub x5, x6, x7    # x5 = x6 - x7
```

Logical Instructions

AND

Format: and rd, rs1, rs2
Description: Performs bitwise AND operation between two registers
Example:
```
and x5, x6, x7    # x5 = x6 & x7
```

OR

Format: or rd, rs1, rs2
Description: Performs bitwise OR operation between two registers
Example:
```
or x5, x6, x7     # x5 = x6 | x7
```

XOR

Format: xor rd, rs1, rs2
Description: Performs bitwise XOR operation between two registers
Example:
```
xor x5, x6, x7    # x5 = x6 ^ x7
```

Load/Store Instructions

LW (Load Word)

Format: lw rd, offset(rs1)
Description: Loads a 32-bit word from memory into a register

Example:

lw x5, 8(x6)      # Load word from address (x6 + 8) into x5

SW (Store Word)

Format: sw rs2, offset(rs1)
Description: Stores a 32-bit word from a register into memory

Example:

sw x5, 12(x6)     # Store word from x5 into address (x6 + 12)

Branch Instructions

BEQ (Branch if Equal)

Format: beq rs1, rs2, offset
Description: Branches to offset if rs1 equals rs2

Example:

beq x5, x0, loop  # Jump to loop if x5 equals zero

BNE (Branch if Not Equal)

Format: bne rs1, rs2, offset
Description: Branches to offset if rs1 is not equal to rs2

Example:

bne x5, x0, loop  # Jump to loop if x5 is not zero

BLT (Branch if Less Than)

Format: blt rs1, rs2, offset
Description: Branches to offset if rs1 is less than rs2 (signed comparison)

Example:

blt x5, x6, loop  # Jump to loop if x5 is less than x6

Jump Instructions

JAL (Jump and Link)

Format: jal rd, offset
Description: Jumps to offset and stores return address (pc+4) in rd

Example:

jal x1, function  # Jump to function, store return address in x1

JALR (Jump and Link Register)

Format: jalr rd, rs1, offset
Description: Jumps to address in rs1 plus offset and stores return address in rd

Example:

jalr x0, x1, 0    # Return from function (when x1 holds return address)

Shift Instructions

SLL (Shift Left Logical)

Format: sll rd, rs1, rs2
Description: Shifts rs1 left by the amount specified in rs2 (logical shift)
Example:
```
sll x5, x6, x7    # x5 = x6 << x7
```

SRL (Shift Right Logical)

Format: srl rd, rs1, rs2
Description: Shifts rs1 right by the amount specified in rs2 (logical shift)

Example:

srl x5, x6, x7    # x5 = x6 >> x7 (zero-extended)

SRA (Shift Right Arithmetic)

Format: sra rd, rs1, rs2
Description: Shifts rs1 right by the amount specified in rs2 (arithmetic shift)

Example:

sra x5, x6, x7    # x5 = x6 >> x7 (sign-extended)

Important Register Conventions

x0: Zero register (always contains 0)
x1: Return address (ra)
x2: Stack pointer (sp)
x3: Global pointer (gp)
x4: Thread pointer (tp)
x5-x7: Temporary registers (t0-t2)
x8-x9: Saved registers (s0-s1)
x10-x11: Function arguments/results (a0-a1)
x12-x17: Function arguments (a2-a7)
x18-x27: Saved registers (s2-s11)
x28-x31: Temporary registers (t3-t6)

Common Programming Patterns

Initialize a Register

addi x5, x0, 42     # Load immediate value 42 into x5

Simple Loop

    addi x5, x0, 10    # Initialize counter to 10
loop:
    addi x5, x5, -1    # Decrement counter
    bne x5, x0, loop   # Loop if counter != 0

Function Call

    jal x1, function   # Call function
    # ... more code ...
function:
    # function body
    jalr x0, x1, 0     # Return

Memory Access

    # Store value
    sw x5, 8(x2)       # Store x5 to address in x2+8
    
    # Load value
    lw x6, 8(x2)       # Load from address in x2+8 to x6

Now its back to me.

You are quite familiar witht he jumps and the arithmetic operations, but we did not have lw and sw in our SUBLEQ computer, we could build up to them, in the same way we made the MOV subroutine, but they are not native to the machine.

RISC-V is very consistent with data size, w means word which is 32 bits, or 4 bytes, h is half word, 16 bits or 2 bytes, b is byte: 8 bits, 1 byte.

lw means Load Word, or load one word size of data, 32 bits, from memory and store it in a register. sw means Store Word, or take 32 bits from the register and store it in memory. The syntax is a bit strange, lw x6, 8(x2) is the same as x6 = memory[x2 + 8], and sw x5, 8(x2) is memory[x2 + 8] = x5. You cant use absolute addresses, e.g. if you want to read address 64, memory[64], you cant do lw x6, 64. You must first load 64 in some register, and then use it in lw.

Like this:

addi x5, x0, 64
lw x6, 0(x5)

It is the same with sw you cant just store the value in memory. If you want to store the value 7 at address 64, you can't just do sw 7, 64, you have to put 7 in a register, then 64 in another register, and then do sw.

addi x5, x0, 7
addi x6, x0, 64
sw x5, 0(x6)

It takes a bit of time to get used to, but the assembler is very consistent and things make a lot of sense, if you get confused ask Claude or ChatGPT and it will help you out. There are also many resources about RISC-V online, all kinds of guides and simmulators, like https://github.com/TheThirdOne/rars or https://www.cs.cornell.edu/courses/cs3410/2019sp/riscv/interpreter/ and instruction decoders, and debuggers and so on.

We will use RISC-V assembly to write a higher level language, we could write C, but I dont think that is very educational, so I will make a Forth compiler and interpreter, in the spirit of our infinite loop book, Forth is probably the best language for the purpose, as it modifies itself, and most of it is written in itself.

Machine Code