Forth
Forth or FORTH is a stack based programming language, made in the 70s by Chuck Moore, it is incredibly compact and expressive language, but most of all, it is beautiful and elegant. And we must always strive towards beauty.
Stack
A stack is an abstract thing, it is pretty much what you are thinking of when
you see the word stack
, it is a bunch of things on top of each other, like a
deck of cards. You can add(push) one more card on top, or you can take (pop) the
top card in your hand. Those two operations define a stack. We call them push
and pop, instead add and take. But we can have a stack of cards, or a stack of
books, or a stack of pancakes. For all of them you can do push and pop, you can
add one more pancake on top, and take the top pancake.
Anything that can do push and pop efficiently can be used as a stack, so when you think of what you can do with stack, this thing can do it. When you add a pancake on top of the stack of pancakes, it takes no time, you just add it on top, you dont have to do anything else. When you take the top one its the same, no other work, just take it. Imagine however you want to take the middle pancake, then you have a problem, you have to move multiple pancakes from the top, take the pancake, then put them back.
This is what defines the stack abstract datastructure. A data structure is just a way to organize data so that we can access it and modify it, each data structure have different properties, like the stack makes it easy to push and pop, but had to modify the middle, it is also difficult to lookup values in it, e.g. if you want to know if the value 3 exists in a stack of numbers, you have to go through the whole stack to check one by one. There are others, where its to lookup, like sets, but its hard to have a concept of 'top'. Some make it easy to add, some easy to delete, otheres easy to scan, or to seach and so on.
The data structures are more general than computers, you can see them in nature, self organizing trees, like if you have seen ducks flying together, they form this V shape. Or self sorting organizations like the cells in our bodies. In our computers however they must live in memory, the same electrons and flipflops you already know about.
Our addressable memory allows us to implement a stack in a very very efficient
way. It almost comes out for free. We just keep track of where the top of the
stack is, lets say our stack will work with just 4 byte values, then a push
would mean memory[top] = value; top += 4
then a pop would be top -= 4; value = memory[top]
, thats pretty much it. top
is just a variable which we can store at some memory address, or we can keep it in a special purpose register.
When I talk about memory I always imagine this
address | value
0 | 0
4 | 0
8 | 0
12 | 0
16 | 0
20 | 0
24 | 0
28 | 0
32 | 0
36 | 0
40 | 0
44 | 0
48 | 0
52 | 0
58 | 0
62 | 0
...
Now when I think of a variable, lets say in our case the variable top
, I just
imagine it at some random address, in our case address 248, and we want our
stack to start at address 256 (again, just a number I picked). So you see the value at address 248 is 256, or top = 256
address | value
...
240 | 0
244 | 0
248 | 256 <-- top
252 | 0
256 | 0
260 | 0
264 | 0
268 | 0
272 | 0
276 | 0
280 | 0
...
Lets push the value 3 to the stack, first we will do memory[top] = 3
address | value
...
240 | 0
244 | 0
248 | 256 <-- top
252 | 0
256 | 3 <-- memory[top] = 3
260 | 0
264 | 0
268 | 0
272 | 0
276 | 0
280 | 0
...
Then we want to move the top of the stack by doing top += 4
, and 256 + 4 is 260
address | value
...
240 | 0
244 | 0
248 | 260 <-- top
252 | 0
256 | 3 <-- memory[top] = 3
260 | 0
264 | 0
268 | 0
272 | 0
276 | 0
280 | 0
...
Lets push few more values, 3 5 6, which will get our top
to 272:
address | value
...
240 | 0
244 | 0
248 | 272 <-- top
252 | 0
256 | 3
260 | 4
264 | 5
268 | 6
272 | 0
276 | 0
280 | 0
...
Now lets do a pop
, and lets store the result in some variable, we will call it
v
(people are quite upset when single character variable names are used, but,
they dont mind when i
is used).
First we do top -= 4
, 272 - 4 is 268
address | value
...
240 | 0 <-- v (just a random address I picked)
244 | 0
248 | 268 <-- top
252 | 0
256 | 3
260 | 4
264 | 5
268 | 6
272 | 0
276 | 0
280 | 0
...
Then we do v = memory[top]
address | value
...
240 | 6 <-- v
244 | 0
248 | 268 <-- top
252 | 0
256 | 3
260 | 4
264 | 5
268 | 6
272 | 0
276 | 0
280 | 0
...
Lets pop again, and again into v
address | value
...
240 | 5 <-- v
244 | 0
248 | 264 <-- top
252 | 0
256 | 3
260 | 4
264 | 5
268 | 6
272 | 0
276 | 0
280 | 0
...
Thats it. we did push and pop, we have a stack, you see because of the way our
digital computer with addressable memory works, the operations are really fast,
since we know the address of top
and the address of v
we can update and read
them. top
is a normal 4 byte integer, but you can see we use it to lookup
another address memory[top]
this is called dereferencing
because top
is
actually a pointer to the actual place we are interested in.
Lets implement push and pop examples in RISCV assembly, and we will discuss it line by line. It will seem frightening all at once, but remember that nothing is as complicated as water.
addi x6, x0, 256 # x6 = 256
addi x5, x0, 248 # x5 = 248 (top)
sw x6, 0(x5) # memory[x5] = x6
addi x5, x0, 240 # x5 = 240 (v)
sw x0, 0(x5) # memory[x5] = 0
jal x1, push_3
jal x1, push_4
jal x1, push_5
jal x1, pop_into_v
end:
jal x0, end
push_3:
# memory[top] = 3
addi x5, x0, 248 # x5 = 248 (top)
lw x5, 0(x5) # x5 = memory[x5]
addi x6, x0, 3 # x6 = 3
sw x6, 0(x5) # memory[x5] = x6
# top += 4
addi x5, x0, 248 # x5 = 248 (top)
lw x6, 0(x5) # x6 = memory[x5]
addi x6, x6, 4 # x6 += 4
sw x6, 0(x5) # memory[x5] = x6
jalr x0, 0(x1)
push_4:
# memory[top] = 4
addi x5, x0, 248 # x5 = 248 (top)
lw x5, 0(x5) # x5 = memory[x5]
addi x6, x0, 4 # x6 = 4
sw x6, 0(x5) # memory[x5] = x6
# top += 4
addi x5, x0, 248 # x5 = 248 (top)
lw x6, 0(x5) # x6 = memory[x5]
addi x6, x6, 4 # x6 += 4
sw x6, 0(x5) # memory[x5] = x6
jalr x0, 0(x1)
push_5:
# memory[top] = 5
addi x5, x0, 248 # x5 = 248 (top)
lw x5, 0(x5) # x5 = memory[x5]
addi x6, x0, 5 # x6 = 5
sw x6, 0(x5) # memory[x5] = x6
# top += 4
addi x5, x0, 248 # x5 = 248 (top)
lw x6, 0(x5) # x6 = memory[x5]
addi x6, x6, 4 # x6 += 4
sw x6, 0(x5) # memory[x5] = x6
jalr x0, 0(x1)
pop_into_v:
# top -= 4
addi x5, x0, 248 # x5 = 248 (top)
lw x6, 0(x5) # x6 = memory[x5]
addi x6, x6, -4 # x6 -= 4
sw x6, 0(x5) # memory[x5] = x6
# v = memory[top]
addi x5, x0, 248 # x5 = 248 (top)
lw x5, 0(x5) # x5 = memory[x5]
addi x6, x0, 240 # x6 = 240 (v)
lw x5, 0(x5) # x5 = memory[x5]
sw x5, 0(x6) # memory[x6] = x5
jalr x0, 0(x1)
Everything after #
is a comment, the assembler just ignores it.
We made few subroutines: push_3, push_4, push_5, pop_into_v, a subroutine is just a bunch of reusable code we can jump to, Lets say our assembler prepares our program to be executed at address 0, this is the machine code produced:0x10000313 0x0f800293 0x0062a023 0x0f000293 0x0002a023 0x014000ef 0x034000ef 0x054000ef 0x074000ef 0x0000006f 0x0f800293 0x0002a283 0x00300313 0x0062a023 0x0f800293 0x0002a303 0x00430313 0x0062a023 0x00008067 0x0f800293 0x0002a283 0x00400313 0x0062a023 0x0f800293 0x0002a303 0x00430313 0x0062a023 0x00008067 0x0f800293 0x0002a283 0x00500313 0x0062a023 0x0f800293 0x0002a303 0x00430313 0x0062a023 0x00008067 0x0f800293 0x0002a303 0xffc30313 0x0062a023 0x0f800293 0x0002a283 0x0f000313 0x0002a283 0x00532023 0x00008067.
Quite intense, but each number will map almost exactly to our assembly code.
Zooming into the first instruction addi x6,x0, 256
, the instruction is 0x10000313, or in decimal 268436243. in binary: 00010000000000000000001100010011. You can see that it has 3 parameters, x6 (register destination: rd), x0, (register source: rs), 256 the immediate value, and of course the fact that it is the addi
instruction. so somehow in the number 268436243 all this information is encoded. I will color code which part of the number is which part of the instruction.
addi x6, x0, 256
00010000000000000000001100010011
From the official documentation you can see how the instruction is defined:
In our example 100000000
is 256, which it is, rs is 0, which is x0, rd is 110,
which is x6. So if we change 256 to 3, or 000100000000
to 000000000011, we get the number
00000000001100000000001100010011 or 0x00300313 in hex. And if you look at our
program, 0x00300313 is addi x6,x0,3
! Success! We can write actual RISCV
machine code.
You can imagine how the instruction is decoded, once you know which instruction is about to be executed, then you have special logic to extract the parameters and do the appropriate things, like in our SUBLEQ example.
So addi
has only 12 bits for the number you want to use, and the first bit is
actually the sign bit, is it + or -, So the biggest number you can addi
is
011111111111 or 2047, and the smallest number is 111111111111 or -2048. You can see how addi x6, x6, -4
is
translated to the machine code 0xffc30313, when you decode it you see the first
12 bits are 111111111100 which is the Two's complement
for -4. In the code below it is shown as addi x6,x6,0xfffffffc
, and 0xfffffffc
is 32 bit number, but this is just a convention, only 12 bits are actually in
the machine code. What do you do then, if you want to set a value to 4828327 for
example? You must use 2 instructions to do that, lui
Load Upper Immediate,
which can put 20 bits in the upper bits of a register, and then do addi
for
the lower 12 bits. Or you can use a pseudo instruction, meaning we write li x5, 4828327
which the assembler will translate into lui x5, 0x49b; addi x5, x5, 0xca7
.
This is the same program but shown which machine code goes to which memory address and also human readable format of the instruction, plus the acutal line in our source code.
Address Code Basic Line Source
0x00000000 0x10000313 addi x6,x0,0x00000100 1 addi x6, x0, 256 # x6 = 256
0x00000004 0x0f800293 addi x5,x0,0x000000f8 2 addi x5, x0, 248 # x5 = 248 (top)
0x00000008 0x0062a023 sw x6,0(x5) 3 sw x6, 0(x5) # memory[x5] = x6
0x0000000c 0x0f000293 addi x5,x0,0x000000f0 5 addi x5, x0, 240 # x5 = 240 (v)
0x00000010 0x0002a023 sw x0,0(x5) 6 sw x0, 0(x5) # memory[x5] = 0
0x00000014 0x014000ef jal x1,0x00000014 9 jal x1, push_3
0x00000018 0x034000ef jal x1,0x00000034 10 jal x1, push_4
0x0000001c 0x054000ef jal x1,0x00000054 11 jal x1, push_5
0x00000020 0x074000ef jal x1,0x00000074 12 jal x1, pop_into_v
0x00000024 0x0000006f jal x0,0x00000000 15 jal x0, end
0x00000028 0x0f800293 addi x5,x0,0x000000f8 20 addi x5, x0, 248 # x5 = 248 (top)
0x0000002c 0x0002a283 lw x5,0(x5) 21 lw x5, 0(x5) # x5 = memory[x5]
0x00000030 0x00300313 addi x6,x0,3 22 addi x6, x0, 3 # x6 = 3
0x00000034 0x0062a023 sw x6,0(x5) 23 sw x6, 0(x5) # memory[x5] = x6
0x00000038 0x0f800293 addi x5,x0,0x000000f8 27 addi x5, x0, 248 # x5 = 248 (top)
0x0000003c 0x0002a303 lw x6,0(x5) 28 lw x6, 0(x5) # x6 = memory[x5]
0x00000040 0x00430313 addi x6,x6,4 29 addi x6, x6, 4 # x6 += 4
0x00000044 0x0062a023 sw x6,0(x5) 30 sw x6, 0(x5) # memory[x5] = x6
0x00000048 0x00008067 jalr x0,x1,0 32 jalr x0, 0(x1)
0x0000004c 0x0f800293 addi x5,x0,0x000000f8 38 addi x5, x0, 248 # x5 = 248 (top)
0x00000050 0x0002a283 lw x5,0(x5) 39 lw x5, 0(x5) # x5 = memory[x5]
0x00000054 0x00400313 addi x6,x0,4 40 addi x6, x0, 4 # x6 = 4
0x00000058 0x0062a023 sw x6,0(x5) 41 sw x6, 0(x5) # memory[x5] = x6
0x0000005c 0x0f800293 addi x5,x0,0x000000f8 45 addi x5, x0, 248 # x5 = 248 (top)
0x00000060 0x0002a303 lw x6,0(x5) 46 lw x6, 0(x5) # x6 = memory[x5]
0x00000064 0x00430313 addi x6,x6,4 47 addi x6, x6, 4 # x6 += 4
0x00000068 0x0062a023 sw x6,0(x5) 48 sw x6, 0(x5) # memory[x5] = x6
0x0000006c 0x00008067 jalr x0,x1,0 50 jalr x0, 0(x1)
0x00000070 0x0f800293 addi x5,x0,0x000000f8 55 addi x5, x0, 248 # x5 = 248 (top)
0x00000074 0x0002a283 lw x5,0(x5) 56 lw x5, 0(x5) # x5 = memory[x5]
0x00000078 0x00500313 addi x6,x0,5 57 addi x6, x0, 5 # x6 = 5
0x0000007c 0x0062a023 sw x6,0(x5) 58 sw x6, 0(x5) # memory[x5] = x6
0x00000080 0x0f800293 addi x5,x0,0x000000f8 62 addi x5, x0, 248 # x5 = 248 (top)
0x00000084 0x0002a303 lw x6,0(x5) 63 lw x6, 0(x5) # x6 = memory[x5]
0x00000088 0x00430313 addi x6,x6,4 64 addi x6, x6, 4 # x6 += 4
0x0000008c 0x0062a023 sw x6,0(x5) 65 sw x6, 0(x5) # memory[x5] = x6
0x00000090 0x00008067 jalr x0,x1,0 67 jalr x0, 0(x1)
0x00000094 0x0f800293 addi x5,x0,0x000000f8 72 addi x5, x0, 248 # x5 = 248 (top)
0x00000098 0x0002a303 lw x6,0(x5) 73 lw x6, 0(x5) # x6 = memory[x5]
0x0000009c 0xffc30313 addi x6,x6,0xfffffffc 74 addi x6, x6, -4 # x6 -= 4
0x000000a0 0x0062a023 sw x6,0(x5) 75 sw x6, 0(x5) # memory[x5] = x6
0x000000a4 0x0f800293 addi x5,x0,0x000000f8 78 addi x5, x0, 248 # x5 = 248 (top)
0x000000a8 0x0002a283 lw x5,0(x5) 79 lw x5, 0(x5) # x5 = memory[x5]
0x000000ac 0x0f000313 addi x6,x0,0x000000f0 80 addi x6, x0, 240 # x6 = 240 (v)
0x000000b0 0x0002a283 lw x5,0(x5) 81 lw x5, 0(x5) # x5 = memory[x5]
0x000000b4 0x00532023 sw x5,0(x6) 82 sw x5, 0(x6) # memory[x6] = x5
0x000000b8 0x00008067 jalr x0,x1,0 84 jalr x0, 0(x1)
Now back to the subroutines, the very interesting calls are jal x1,0x00000014
and jalr x0,x1,0
. As I said before, JAL is Jump And Link. It has 2 parameters,
register destination rd
and immediate value
, it stores regurn address pc+4
into rc
, pc
is the program counter register, and its value is the address of
current instruction being executed, which is the jal instruction itself, so
pc+4
is the next instruction where we want to come back to in order to
continue from where we left of before we jumped into the subroutine. The imediate value is relative offset from pc
, once we link, we set pc += immediate value
and the next instruction is going to be executed there.
Address Code Basic Line Source
...
0x00000014 0x014000ef jal x1,0x00000014 9 jal x1, push_3
0x00000018 0x034000ef jal x1,0x00000034 10 jal x1, push_4
0x0000001c 0x054000ef jal x1,0x00000054 11 jal x1, push_5
0x00000020 0x074000ef jal x1,0x00000074 12 jal x1, pop_into_v
0x00000024 0x0000006f jal x0,0x00000000 15 jal x0, end
0x00000028 0x0f800293 addi x5,x0,0x000000f8 20 addi x5, x0, 248 # x5 = 248 (top)
...
0x00000048 0x00008067 jalr x0,x1,0 32 jalr x0, 0(x1)
...
We want to execute the push_3 subroutine, we know it is in address 0x00000028,
and we know we are at address 0x00000014, so if we add 0x14 (20 in decimal) to
pc
we will go right where we want. jal x1, 0x14
will do x1 = pc+4; pc += 0x14
. in this case pc is 0x00000014 and pc+4 is 0x00000018, so x1 = pc+4; pc += 0x14
is x1 = 0x18; pc += 0x14
(you see sometimes I leave the leading zeroes in front to remind you that the address is just a 32 bit number, but sometimes I remove them for brievery). We then start executing instructions from
address 0x28, one by one. 0x0f800293 bing, 0x0002a283 bang, 0x00300313
ting, 0x0062a023 tang.. and so on, until we reach 0x00008067, ah the famous
0x8067, my favorite instruction. jalr x0, 0(x1)
. JALR means Jump And Link Register, it has 3 parameters rd, rs, and immediate value, it sets rd to pc+4 and then sets pc to rs+immediate value so you can jump relative to rs. rd = pc + 4; pc = rs + immediate
. Now in our case rs
is x1
, the immediate value is 0, and rd
is x0
which is the zero register, so x0 = pc + 4; pc = x1 + 0
the write to x0 will be ignored, this is its purpose after all, the zero register, but after that, magic happens, previosly when we jumped to the subroutine we stored the return address 0x18 in x1, which means that x0 = pc + 4; pc = x1 + 0
becomes pc = 0x18
and BANG we are back to where we were going to be before we executed the subroutine call. And then we will execute the instruction at address 0x18 which is a jump to push_4
, then we will be back again and execute jump to push_5
and so on until we execute a halt instruction, or in our case the infinite loop of jal x0, 0
or 0x6f
my other favorite instruction. Jump to itself, x0 = pc + 4; pc = pc + 0
.
This is a bit weird having push_3
and push_4
and push_5
, the code is exactly the same, but the only difference is in the addi
parameter, is it 3 or 4 or 5. We could use a register to just pass a parameters to the subroutine.
Rewriting the
# top = 256
li x6, 256 # x6 = 256
li x5, 248 # x5 = 248
sw x6, 0(x5) # memory[x5] = x6
# v = 0
li x5, 240 # x5 = 240 (v)
sw x0, 0(x5) # memory[x5] = 0
# push 3
li x10, 256
li x11, 3
jal x1, push
# push 4
li x10, 248
li x11, 4
jal x1, push
# push 5
li x10, 248
li x11, 5
jal x1, push
# pop into v
li x10, 248
li x11, 240
jal x1, pop
end:
jal x0, end
push:
# x10: address of top
# x11: value
# memoxy[x10] = x11
lw x5, 0(x10) # x5 = memory[x10]
sw x11, 0(x5) # memory[x5] = x11
# top += 4
lw x5, 0(x10) # x5 = memory[x10]
addi x5, x5, 4 # x5 += 4
sw x5, 0(x10) # memory[x10] = x5
jalr x0, 0(x1) # return
pop:
# x10: address of top
# x11: address of v
# top -= 4
lw x5, 0(x10) # x5 = memory[x10]
addi x5, x5, -4 # x5 -= 4
sw x5, 0(x10) # memory[x10] = x5
# v = memory[top]
lw x5, 0(x10) # x5 = memory[x10]
lw x5, 0(x5) # x5 = memory[x5]
sw x5, 0(x11) # memory[x11] = x5
jalr x0, 0(x1) # return
I have been using the addresses in their raw form x0, x1, x2 and so on, but there are conventions and mnemonics for us to make the code easier to write and read, x1 is the ra
return address, x10 is a0
the argument 0 register. We also have a bunch of other pseudo instructions for example jal push
will expand to jalr x1, push
or ret
will expand to jalr x0, 0(x1)
, j 0
will expand to jal x0, 0
, and many more.
x0/zero: Hardwired zero
x1/ra: Return address
x2/sp: Stack pointer
x3/gp: Global pointer
x4/tp: Thread pointer
x5-x7/t0-t2: Temporary registers
x8/s0/fp: Saved register/Frame pointer
x9/s1: Saved register
x10-x11/a0-a1: Function arguments/return values
x12-x17/a2-a7: Function arguments
x18-x27/s2-s11: Saved registers
x28-x31/t3-t6: Temporary registers
Rewriting the program again to use the mnemonics and the pseudo instructions
# top = 256
li t0, 248 # t0 = 248
li t1, 256 # t1 = 256
sw t1, 0(t0) # memory[t0] = t1
# v = 0
li t0, 240 # t0 = 240 (v)
sw zero, 0(t0) # memory[t0] = 0
# push 3
li a0, 256 # First argument: address of top
li a1, 3 # Second argument: value to push
jal push
# push 4
li a0, 248 # First argument: address of top
li a1, 4 # Second argument: value to push
jal push
# push 5
li a0, 248 # First argument: address of top
li a1, 5 # Second argument: value to push
jal push
# pop into v
li a0, 248 # First argument: address of top
li a1, 240 # Second argument: address of v
jal pop
end:
j end
push:
# a0: address of top
# a1: value
# memory[a0] = a1
lw t0, 0(a0) # t0 = memory[a0]
sw a1, 0(t0) # memory[t0] = a1
# top += 4
lw t0, 0(a0) # t0 = memory[a0]
addi t0, t0, 4 # t0 += 4
sw t0, 0(a0) # memory[a0] = t0
ret
pop:
# a0: address of top
# a1: address of v
# top -= 4
lw t0, 0(a0) # t0 = memory[a0]
addi t0, t0, -4 # t0 -= 4
sw t0, 0(a0) # memory[a0] = t0
# v = memory[top]
lw t0, 0(a0) # t0 = memory[a0]
lw t0, 0(t0) # t0 = memory[t0]
sw t0, 0(a1) # memory[a1] = t0
ret
OK now we have usable push and pop that we can call as much as we want. Almost
all modern systems use a stack to keep the temporary variables for their
subroutines, also to be able to store data if i call a subroutine that calls the
subroutine, it will mangle the return address in x1 (ra), so we need to preserve
it in memory and later get it out from there, x2 is the sp
register that is
used specifically designated for that, and we will use it later when our program
gets more complicated, but the idea is exactly the same as our stack, but
instead of having the top address in system memory, we have it in the register
x2 (sp).
We are slowly building up, we started with wires, and electrons, up to control logic, up to instruction decoding, to instruction parameters, to pseudo instructions, and now we have our abstract concept of a stack and a pointer. We are so far away from the electrons, its almost as if they dont exist, and yet when you open a file and inside of it you write those 4 bytes: 0x0000006f, you can imagine, what would the machine do. Even if you dont know how it is wired, you can pretend. You will have certain expectations, like when you have a sequence of instructions, they will be executed in the order you wrote them.
# top = 256
li t0, 248 # t0 = 248
li t1, 256 # t1 = 256
sw t1, 0(t0) # memory[t0] = t1
And, I will now break everything you have built. The order of instructions is
not guaranteed in the way you think. In the name of speed, the wiring might
fetch multiple instructions in the same time, and execute them in parallel, or
in different order if it decides that it is better. In the example above, it li t0, 248
and li t1, 256
are completely independent, so we could exploit that
fact, we just have to make sure both are done before sw t1, 0(t0)
is
executed. Modern processors are so complicated, they inside are whole
distributed systems. There are all kinds of syncronous and asyncronous procesees
going on, message passing, pipelining, out of order execution, register renaming
(Apple's M1 for example has 600 registers, and it uses them to store and read
temporary values, in order to be able to run more instruction in parallel),
branch prediction, speculative execution..
Depending on how much you want to think like the machine, how much you want to extract out of it, you have to understand it to different depth, some people stop at 'I understand basic assembly, I dont want to know anything lower', others have to go to the electorns, and I am a bit in between, I have a simplified model of wires and flipflops and few instructions, but don't understand the sophisticated complexity of the modern processor, I just "guess" how it works, unless I need to do something very performant, and then I need to know how much SRAM it has, how big is the write line, how far are things in memory, what is the memory organization and so on. Others dont want to know anything about it, they are just interested in its abstract operations, "it can add, it can store data" or even higher "I can push and pop data from a stack". Or even higher they just think about how objects interract through messages, and what kinds of relationships and structure they can build through this interraction.
You will have to find out what works for you. I am just trying to show you, that it is not so scary to go closer to the electrons, and it will allow you to have some empathy for it.
Forth is simple. Forth is complicated. Forth is extremely powerful. Forth is extremely minimal.
-- Everyone who has written a Forth
Forth, Again
A stack language is exactly what you imagine, every symbol in the language
either pushes or pops from a stack. For example +
will pop the top 2 elements,
add them together and then push the result back. 1
will push the number 1 to the stack, 2
will push 2 and so on.
1
2
+
4
+
bye
This forth program will first push 1, then push 2, then evaluate + which will pop 2 and pop 1, and push 3 to the stack, then 4 will be pushed, and then + again will pop 4, pop 3 and push 7 to the stack, so after executing it the stack will have just the value 7 in it.
Is kind of the same as this pseudo assembly code (pseudo code is just a mock of code, it wont compile, its goal is just to illustrate an idea):
push 1
push 2
jal plus
push 4
jal plus
jal bye
plus:
a = pop
b = pop
c = a + b
push c
ret
bye:
j bye
Even in this simple program 1 2 + 4 + bye
we already have a language. We have
symbols, we have semantic rules of how to interpret and evaluate them, we have
syntax.
Syntax (grammar rules):
- Each symbol is separated by whitespace
- The program is read from left to right
Semantics (meaning)
- Numbers are pushed to the stack
- Words
-
plus(+): Pops two values from the stack, adds them together, pushes the result back
-
bye: Stops the program, in real Forth it exists the program, but in our pseudocode we just go into infinite loop.
-
Operational Semantics (how to process/understand the symbols)
- The program is evaluated symbol by sumbol, left to right, each symbol is evaluated according with their semantic properties.
The language lives in a different plane from the wires, whoever writes Forth
does not need to know about how our assembly will implement the +
operation,
or how the control logic will manipulate the circuits, or how exactly it will
use the ALU. They know, when the +
is executed, it will do what it is supposed
to do. In the same time they expect that '+' is fast and does not depend on the
values, imagine if 3 + 5 was doing 3 + 1 + 1 + 1 + 1 + 1 under the hood. So the
way machine works does leak a bit into the language, and into the programmer's
thoughts. Other things matter as well, like how much RAM the machine has so that
you know how stack you could use.
This is the eternal tension, between us and and the machine.
Programming languages must take advantage of what the machine can do, and what our minds can think. A language that ignores this principle is doomed to fail, regardless of how powerfull or beautiful it is. In the same time, we keep writing code like we still use computers from 1979 with 64kb of RAM and 5 registers beating with 1mhz clocks, now we have 600 registers, instruction parallelism, 5ghz and 64GB of RAM. The machines have grown million fold, but we haven't. Some people say, we keep writing dead programs. Until recently I felt we have not made a real phase transition, you know when you boil water, it just keeps getting hotter and hotter until it reaches 100 degrees, and then from fluid it becomes gas, a true change, a new material, a new phase. But today, I am so excited. I read a lot of old computer books from 80s and a lot of new books, I write code in old languages and in new languages, in order to understand, both myself and the machine. As Kirkegaard says: "life can only be understood backwards, but must be lived forwards". To understand the new computers, the new phase, we must understand the old, but they should not keep us hostage. The soul of the new machine must be explored.
A language, you see, is meaningless, it can not do anything, just like the symbol '7' does not do anything. The wires however, can do things. In this world of ours, where by some miracle, the physical law was gracious enough to reveal some of its mysteries, and we have learned how to ask electrons politely to go through the wire. Who is really evaluating the symbolic language then? Is it our machine or the physical law? When the electrons go through the feed forward gates of the ALU, who is doing the addition?
Wires, assembly and Forth are possibly the best way to study the machine, language and expression. You might ask why not C, and it is a good question, but C is almost assembly, once you get used to it you can almost compile it in your head, it is an amazing language, and allow you to build incredible structures and organizations, it hides almost nothing, it tries to give you all the power over the machine, at least in the 80s that was the case, now the underlying hardware is so complicated that even gcc doesnt know how the instructions will be executed. But to explore language, it is not a great tool. LISP and Forth are better, and I have picked Forth because I think it is cool and not appreciated enough.
They say: you understand Forth once you implement Forth. So lets implement it. We will start with this tic-tac-toe Forth program, and slowly implement a Forth interpreter that will be able to execute it.
create board 9 allot
: board[] board + ;
: reset-board ( -- )
9 0 do
'-' i board[] c!
loop
;
: print ( -- )
3 0 do \ j
3 0 do \ i
j 3 * i + board[] c@ emit
loop
cr
loop
;
: check-line ( a b c -- flag )
board[] c@ rot board[] c@ rot board[] c@
dup '-' = if
drop drop drop 0
else
over \ a b c -> a b c b
= \ a b c==b
rot rot \ c==b a b
= \ c==b a==b
and \ c==b && a==b
then
;
: check-win ( -- )
0 1 2 check-line if 1 exit then
3 4 5 check-line if 1 exit then
6 7 8 check-line if 1 exit then
0 3 6 check-line if 1 exit then
1 4 7 check-line if 1 exit then
2 5 8 check-line if 1 exit then
0 4 8 check-line if 1 exit then
2 4 6 check-line if 1 exit then
0
;
: play ( -- )
'X' 'O'
begin
over emit ." 's turn" cr
print
over key '0' - board[] c!
swap
1 check-win = if
print cr emit ." wins" cr
exit
then
again
;
reset-board play bye
Ἴκαρος was warned that the sun will melt the wax on his wings, and yet, he flew towards it. Why did he do that? I often wonder. And sometimes I know.