Operating System
An Operating System (OS) is just a program, its a program with the only purpose of allowing the user of the computer to use the computer to its fullest potential, be able to run other programs and be able to access the hardware.
There are many operating systems, the most used are Linux, Windows, MacOS, iOS, Android (which is on top of Linux), FreeBSD, Minix, and FreeRTOS, but there are many others, BeOS, Haiku, CollapseOS, and so on.
Pretty much all devices you use have some operating system, even the oven. The TV usually runs Android, your phone is with iOS, your laptop Windows. There are operating systems even in parts of your computer, for example in most Intel CPUs itself there is small Minix that handles certain security features and power management. There might be a small operating system in your bluetooth headphones. even in your keyboard.
Again, its just a program, it does not have to be the first program the computer
starts, in our SUBLEQ computer the first program would be a circuit that copies
the actual SUBLEQ program from non volatile storage to RAM and then disconnects
itself, set the program counter to 0, and start the normal execution. Then we
could have an operating system written in SUBLEQ (if we had a bigger version of
the computer, its not much we can do in 16 memory locations. After the computer boots, and on some computers this is quite complicated process, somehow at some at point the operating system will be jumped into, like in our Forth, we have j forth
, and from then on, it is in charge.
A lot of complexity goes into it because most popular operating systems try to
be very generic, for example Windows has to work on AMD processors and on Intel
processors, and even though they both have x86 instruction set, there are subtle
hardware bugs, certain functionalities exist only in one not in the other.
Windows also has to work on all kinds of motherboards, with all kinds of RAM,
and network cards, and USB controllers, all kinds of graphic cards, and so on.
When you write a program for Windows, you don't know at all what kind of
computer it will run on, you compile putc(65)
, and know that it will somehow
print 'A' on the screen, and you do not care how this is going to happen.
In our QEMU computer, the operating system would have to check if the UART is
ready to write by busy polling address 0x10000005
and when its available we
write the character to address 0x10000000
putc:
li t0, 0x10000000
1:
lbu t1, 5(t0)
andi t1, t1, 0x20
beqz t1, 1b
sb a0, 0(t0)
ret
Now imagine we have different QEMU computers, one has the UART register at address 0x10000000, the other at 0x20000000, we could write two versions of the OS, compile them and let the users choose which one they need.
putc:
li t0, 0x10000000
1:
lbu t1, 5(t0)
andi t1, t1, 0x20
beqz t1, 1b
sb a0, 0(t0)
ret
and
putc:
li t0, 0x20000000
1:
lbu t1, 5(t0)
andi t1, t1, 0x20
beqz t1, 1b
sb a0, 0(t0)
ret
There are multiple problems with this, first the user would have to very carefully install the right operating system, and if there are 200 kinds of computers, chances are they will try to install the wrong one, the user does not care, they just want to display text on the screen. Also the combinations of devices is ridiculous, you have 50 kinds of RAM, 200 kinds of wifi controllers, maybe families 150 network adapters, .. how many variants of the operating systems we have to compile to support all of the combinations?
The hardware manufacturers want their device to be used, so they make them
compatible with another device that is already supported by some operating
system, or sometimes we have standards that attempt to guarantee compatibility,
for example the USB HID specification, or the RISC-V specification. If I make a
RISC-V CPU and it adheres to the specification, than I can say my CPU is RISC-V
compatible, any program written for another RISC-V should be able to be executed
on my CPU, but, the CPU is not the whole computer, even if my CPU executes the
same sb
instruction, as the other ones, writing to address 0x10000000 could
mean a very different thing on each computer. Some say that software
compatibiility is the biggest technological advancement since the
transistor. Everybody works really hard for things to be compatible. I think
this will change soon, as the main reason for this compatibility is because it
was hard to write software, and almost nobody can program computers, but now,
the wind of change is here, I am quite excited what it will bring. There is
price we must pay for compatibility, for example your Intel x86 CPU, still boots
in "Real Mode" which emulates Intel's 1978 8086 processor until it is switched
to protected mode. I often wonder, what world would it be if we knew how to
program computers, and are free from the chains of the past. Your bluetooth
headphones have to adhere to a standard definition that is more than 10,000
pages, why cant you just make some headphones and describe how their wires work
and someone else can write their program to use them? We keep adding standards
and layers of indirection everywhere, so that its easier to do things, but, I am
not sure anything is easier. In Commodore64 BASIC, in 1984 or so, people just
did peek
and poke
and were able to read and write memory, now computers do
not even come with a programming environment installed, not to mention you cant
read any memory you want (we will discuss this more in the security
section). The entire software industry has given up on teaching people how to
program, it has lost all trust in us. It is too complicated they say.. but is
it? Is it really complicated? Just tell us how it works, wires are wires, bits
are bits. It is too dangerous they say, we have to secure the computer for you,
otherwise someone might hack into it, well.. what if I want to hack into it, its
my computer!
Anyway, in order to make it easy to add support for more kinds of hardware, and
for hardware manufacturers to also add code to the operating systems, because
after all they know best how their hardware actually works, we "hide" hide how
putc
works under the hood, we add a layer of indirection between the user and
the hardware.
boot:
li t0, 2 # somehow check what kind of QEMU are we running on
li t1, 2 # if the QEMU kind is 2, load the correct putc variant
beq t0, t1, .L_qemu_2
.L_qemu_1:
la t0, putc
la t1, putc_qemu_1
j .L_boot_done
.L_qemu_2:
la t0, putc
la t1, putc_qemu_2
j .L_boot_done
.L_boot_done:
j main
putc:
.word putc_qemu_1
putc_qemu_1:
li t0, 0x10000000
1:
lbu t1, 5(t0)
andi t1, t1, 0x20
beqz t1, 1b
sb a0, 0(t0)
ret
putc_qemu_2:
li t0, 0x20000000
1:
lbu t1, 5(t0)
andi t1, t1, 0x20
beqz t1, 1b
sb a0, 0(t0)
ret
syscall:
addi sp, sp, -4
sw ra, 0(sp)
lw t0,0(a5)
jalr t0
lw ra, 0(sp)
addi sp, sp, 4
and use it like this
...
li a0, 65
la a5, putc
call syscall
...
This is an example of doing so, adding a layer of indirection between the
programmer and the UART, so they only say "system please print a character on
screen", we call syscall, with the function we want, in this case putc
, it
will load the value of putc and jump to wherever it points to. We wont even care
if its written on the UART or displayed on a screen, the operating system will
hide that from us. This also allows us to actually change during the runtime of
the program, what putc does, for example imagine I plugin some different
display, the operating system somehow detects that, and now putc() could write
to the display, instead of UART.
This is what a device driver
is BTW, it is just a bit of code that enables
certain hardware functionality, like putc
support. Our display driver could
put the machine code that controls the leds somewhere in memory and change
putc
to point to that location, so we have added support for displaying
characters on the screen. You might have heard 'you need to install a driver for
it' or 'there is new version of the NVIDIA driver, you have to update'
This is one aspect of the operating system, to abstract the hardware functionality, print, display pixel, get keyboard input, read from memory, write to memory, etc.
There is another aspect of it, which is to create an environment that allow the user to use the computer, write programs and execute programs, to run multiple programs in the same time. The most basic use could be to be able to edit a file, compile it and execute it. This is what we will focus on with our operating system.
At the moment our program is somewhere in memory at the label "human_program",
and we can only execute that one program, also our getch
function is busy
looping, it is consuming all the cpu resources just to check if the UART is
ready to read, if our cpu runs at 100 mhz, and we type one character per second,
it will execute 100,000,000 useless instructions. Most CPUs have a way to go
into 'wait' mode, where they consume very little power until an interrupt
happens, for example UART interrupt or timer interrupt. Our code at the moment
does not take advantage of it. Our interpreter is also quite ridgit, we can only
start it once. We dont have the concept of a process, or inter process
communication.
It seems like a lot to build, but its not that much, and we will start small, first we will create the operating system scheduler, which is a tiny bit of code that runs at regular intervals, e.g. every millisecond, and just changes the running process, it snapshots all the registers as they are at the moment of the interrupt, and loads the registers of another process.
Imagine the following code:
process_a:
li a0, 'a'
jal putc
j process_a
process_b:
li a0, 'b'
jal putc
j process_b
If we jump into process_a
we will be forever stuck there. We will inline putc
into the code, so that we can see through everything:
process_a:
li a0, 'a'
li t0, 0x10000000
1:
lbu t1, 5(t0)
andi t1, t1, 0x20
beqz t1, 1b
sb a0, 0(t0)
j process_a
process_b:
li a0, 'b'
li t0, 0x10000000
1:
lbu t1, 5(t0)
andi t1, t1, 0x20
beqz t1, 1b
sb a0, 0(t0)
j process_b
Zooming in even more, imagine you have jumped into process_a
, and you are executing the instructions one by one.
800019a4: 06100513 li a0,97
800019a8: 100002b7 lui t0,0x10000
800019ac: 0052c303 lbu t1,5(t0)
800019b0: 02037313 andi t1,t1,32
800019b4: fe030ce3 beqz t1,800019ac <--- YOU ARE HERE
800019b8: 00a28023 sb a0,0(t0)
800019bc: fe9ff06f j 800019a4
So pc
is at address 800008a0, there we have the instruction beqz t1, 1b
,
1b
means the label 1
backwards, which is actually a pseudo instruction that
expands to beq x0, x6, -8
, or the machine code 0xfe030ce3
. Even though our
disassembler outputs beqz t1,800019ac
, the offset in the machine code is
actually relative, and 800019ac - 800019b4 is -8.
Just for practice lets look at 0xfe030ce3
in binary it is 11111110000000110000110011100011 which corresponds to beq x6, x0, -8.
So you are the program counter, and at the moment you are at address 800019b4
, at this point the program has set the value 97 in a0, the value 0x10000000 in t0 and it has loaded the value from memory address 0x10000005 into register t1, lets imagine the value at that point in time was 0, so the whole state of the our program is:
a0: 67
t0: 0x10000000
t1: 0
pc: 0x800019b4
What if I turn off the computer, for 100 years, and then I start the computer with those values preloaded into the registers? Would the program execution know in any way that it was paused for one century?
What if our universe pauses for 1 billion years every nanosecond, would we know?
Our computer has no way of knowing if it just started in a certain configuration, or reached that configuration while executing some program. Because our computer is a deterministic machine, meaning if it starts from a known configuration and follows some program, if we run it 100 times, every time the output will be the same for the same input, setting a0 to 97 will always set it to 97, well almost always, as there could be errors in the wires, or the universe itself, but this is beyond the scope of this book.
So, if we can just store the registers process_a
we can pause it, and later we
can just set them to their values and continue the process. You see there is no
such thing as a process
its just code with a dedicated piece of memory where
we can store its registers when we switch it out. This way we can run multiple
programs in the same CPU, this is called time-sharing, every program gets a
chance to run for a little bit, and then its swapped, the scheduler tries to
give equal time to all running programs.
The piece of memory where we hold the process information is called Process Control Block (PCB), or Task/Thread Control Block. It usually hold the registers, some information like why was the process switched out, was it because it is waiting for something, or sleeping, or the scheduler just decided that its time to give chance to another process. PCB also holds the process identifier or PID, and accounting information like how much cpu was it using, which files it has open so when the process is shut down the files can be closed, some security infromation, priority level, what its status is (running, blocked, etc) and whatever else you need in the operating system in order to manage the running proceses.
So this is all fine, we store the registers, load another set of registers and jump to the new pc, but what does it mean to 'interrupt', how do we interrupt a program?
In our SUBLEQ computer we had a flag register and its output was controlling the microcode on one of the eeproms. In our microcode that is how we execute different micro instructions depending if the result was less than zero or not.
OR(A==B, MSB==1)
|
D0 OE ---------------.
| | |
.------------------. |
| 74ls173 FlagReg | |
'------------------' |
| | |
Q0 C |
| `----------------|--.
'--------. | |
| | |
| | |
| | |
| | |
| | |
| | |
0 1 2 3 4 5 address lines | |
| | | | | | .. | |
.------------------. | |
| 28at64c eeprom | | |
'------------------' | |
| | | | | | .. | |
0 1 2 3 4 5 I/O lines | |
| | | |
'--------------------' |
| |
`------------------'
Weirdly enough, we control the output of the flag register from the I/O lines, and its output controls the eeprom's address line which of course controls its output. If you were to remember one thing from this book, I want it to be this feedback loop. The interplay between the input and the output of a system, and how to build structures on top of it.
So if the value of register A is the same as register B, OR the most significant bit of the ALU output is 1 (meaning the number is negative), AND we have latched the value in the flag register, once we enable it, it will control the I/O lines by whatever we have as value on the new address.
We could use a similar technique, we can create a small timer circuit, that every 1 milliosecond enables an address line, which would allow us to execute different microcode, that could turn off the timer circuit when done and continue normal execution. Once we go in interrupt mode we could store the program counter in some temporary register, and jump to a predefined location, where we will have our interrupt handler (just a piece of code to be executed once interrupt occurs). Now the interrupt handler can fetch the value of the old program counter from the temporary register, and jump there if it wants, or store it somewhere and load some other value. This is what it means to interrupt a program, the processor is forced to jump to the interrupt handler code.
There are many kinds of interrups, there are timer interrupts, or for example uart interrupts, or exception interrupts, and so on, what if you get an interrupt while you are in the interrupt handler? There are certain interrupts you can ignore, they are called maskable interrupts, e.g. timer or soft interrupts, others you cant and you must handle, those are non-maskable interrupts, e.g. hardware errors, watchdog timer.
When I was young, I was quite terrified by interrupts, I just couldn't imagine how can the computer jump somewhere on its own, it was freaking me out a bit.
In QEMU we can use the UART hardware interrupts and the timer interrupts to make
our operating system scheduler. To learn a new thing, you should always start
with the absolute minimum. Make a new empty folder, just copy the Makefile and
we start again from boot.s
and nothing else, later once the scheduler is done
we will bring back our Forth.b
# boot.s
.section .text
.globl _start
_start:
la sp, _stack_top
j process_a
j process_b # never reached
process_a:
li a0, 'a'
li t0, 0x10000000
1:
lbu t1, 5(t0)
andi t1, t1, 0x20
beqz t1, 1b
sb a0, 0(t0)
j process_a
process_b:
li a0, 'b'
li t0, 0x10000000
1:
lbu t1, 5(t0)
andi t1, t1, 0x20
beqz t1, 1b
sb a0, 0(t0)
j process_b
.end
Now we get infinite stream of 'a', which is a good place to start, you understand every single line in this code, and we are on solid ground. The next step would be to create an interrupt handler code and to create a timer interrupt, and we will just print 'c' from the interrupt handler.
There are many ways to break down something you think is impossible or unachievable, one common way is to split it into known unknowns and unknown unknowns, basically first do the things you are not sure are going to work, jump in the deep end. There is another way, which is do the smallest step in the direction you want to go, we are trying to experience interrupt handling, for that we need interrupts.. and handling, so we can just do those. And of course there is my favorite way, explore, because when you start with something, you think you know where you want to go, but as you are on the journey, you see it was not the right destination. Allow yourself to explore, try things out, be ready to delete your code, start from scratch, do not be afraid of the code that you will write, is it going to be stupid or embarassing, just play. It is hard to show you this in the book, but I don't know Forth nor RISC-V assembly, nor I have written a scheduler before, or an operating system, didnt know anything about the 74LS TTL chips before I made the 4 bit SUBLEQ computer, I didnt even know about SUBLEQ itself. I just try things out, and I am sure there are infinitely better ways to do what I did, but so what? In the beginning you have a path, and it seems quite straight bath, you plan your stops, 1) I will make a chapter about AND gates, 2) I will write about z80, 3) I will make small z80 program, never wrote z80 assembly, but why not.. and so on, midway however I realize, wait I can actually make my computer, that would be cool, why not, then I went on this sidequest for like 1 month, and it was was so much fun! My advice is, don't be afraid to experiment, only the blind know where they are going, so close your eyes and see.
Don't value your code that much, be happy when you delete it, it means you have outgrown it.
Its time to make some timers interrupts and their handlers.
In qemu first we need to setup the time at which we want to get the interrupt, now + 5000 clock ticks
for example, we do that by the tick value is 64 bit value and our registers hold 32 bits, but this is fine because want to trigger the interrupt after 5000 ticks.
li t1, 0x0200BFF8 # address of mtime register
lw t0, 0(t1) # load the lower 32 bits
li t1, 5000
add t0, t0, t1 # add 5000 to it
li t1, 0x02004000 # address of time compare register mtimecmp
sw t0, 0(t1) # store the new value in it
mtime
is continously incrementing hardware time register, well hardware is somewhat of a soft term in qemu, but if you are using a physical computer there will be some way to set up timers, and it will probably be similar.
mtimecmp
is a compare register, once mtime >= mtimecmp
it triggers machine timer interrupt.
In our assembly we first load the value of mtime, then we add 5000 to it and set it in the mtimecmp, now we only need to tell qemu that we are interested in receiving timer interrupt and we have to tell it how we are ghoing to handle them, adn the RISC-V specification defines how mtimecmp and mtime are used.
_start:
la sp, _stack_top
jal setup_timer
process_a:
...
process_b:
...
setup_timer:
la t0, interrupt_handler
csrw mtvec, t0
li t1, 0x0200BFF8 # address of mtime register
lw t0, 0(t1) # load the lower 32 bits
li t1, 5000
add t0, t0, t1 # add 5000 to it
li t1, 0x02004000 # address of time compare register mtimecmp
sw t0, 0(t1) # store the new value in it
li t0, (1 << 7) # MTIE bit in mie
csrw mie, t0
csrr t1, mstatus
li t0, (1 << 3) # MIE bit in mstatus
or t1, t1, t0
csrw mstatus, t1
# initialize current process pointer
la t0, current_process
la t1, process_a
sw t1, 0(t0)
j process_a
interrupt_handler:
# setup the next timer interrupt
li t1, 0x0200BFF8 # mtime address
lw t0, 0(t1) # get current time
li t1, 5000 # timer interval
add t0, t0, t1 # next interrupt time
li t1, 0x02004000 # mtimecmp address
sw t0, 0(t1) # set next interrupt
la t1, current
lw t2, 0(t1)
beqz t2, set_to_one # if current == 0, jump
set_to_zero:
li t2, 0
sw t2, 0(t1)
la t0, process_a
j set_mepc
set_to_one:
li t2, 1
sw t2, 0(t1)
la t0, process_b
# set mepc (machine exception program counter)
set_mepc:
csrw mepc, t0
mret
current:
.word 0
.end
In setup_interrupt
we also have to enable timer interrupt MTIE (bit 7), in the mie
register (_M_achine _I_nterrupt _E_nable). After that we actually need to enable MIE interrupts in mstatus, so we read the current value and or it with 00000000 00000000 00000000 00000100 (3rd bit), and then write it back, bit 3 is the MIE bit in mstatus.
# enable MTIE
li t0, (1 << 7) # MTIE bit in mie
csrw mie, t0
# enable MIE
csrr t1, mstatus
li t0, (1 << 3) # MIE bit in mstatus
or t1, t1, t0
csrw mstatus, t1
You have noticed that we dont use the normal li
or lw
and sw
for mie, mstatus, mtvec
, they are 'Control and Status Registers', or CSRs, a special
kind of registers, csrr
means read, csrr t1, mstatus
is read the value of
mstatus into t1, and csrw
means write, csrw mie, t0
means write the value of
t0 into mie.
When interrupt happens, the processor has to jump somewhere, in the mtvec
csr
register, Machine Trap Vector, it has the base address of where the CPU will
jump when machine mode interrupt or exception occurs.
la t0, interrupt_handler
csrw mtvec, t0
All is well so far, once the next interrupt happens, it will jump to our
interrupt_handler
code, there we have to set a new mtimecmp
value so that we
get another interrupt next time, and we just check the value of the current
variable if its the address of process_a
we switch it to process_b
and later
we set mepc
(Machine Exception Program Counter) to the new value, mepc
holds
the value of pc
at the time of the interrupt. At the end we call mret
instead of ret
, you know there is no such thing as ret
, it is jalr zero, 0(ra)
or 0x00008067, so ret just jumps to wherever ra
was pointing to, but
mret
jumps to wherever mepc
is, so by setting mepc
we control where we go
back to.
If you had nice 80 column monitor you would see something like this:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
In pseudocode the program looks like this
start:
setup_timer()
process_a:
print('a')
goto process_a
process_b:
print('b')
goto process_b
setup_timer:
set mtvec to interrupt_handler
interrupt after 5000 ticks
enable timer interrupts
enable machine interrupts
process_a()
interrupt_handler:
interrupt after 5000 ticks
if current == 0:
t0 = process_b
current = 1
else:
t0 = process_a
current = 0
mepc = t0
mret
So now we switch between two procesees, but we dont actually restore their state
at all, we just start process_a
and process_b
from scratch every time, we
jump to the very beginning of their machine code at every interrupt.
Think again, at the time of the interrupt it is as if the world is frozen from
the point of view of process_a
, when the cpu jumps into interrupt_handler
all the registers are just as they were, again, try to imagine as if you
were executing those instructions on paper, and the world freezes.
process_a:
li a0, 'a'
li t0, 0x10000000
1:
lbu t1, 5(t0) <--- FREEZE
andi t1, t1, 0x20
beqz t1, 1b
sb a0, 0(t0)
j process_a
What would the values be. What would the value of pc
be at this point? The instruction is not partially executed, it must be executed, so pc
points to th enext instruction, t1
is set with the value from memory address t0+5
, a0
is 97 and t0 is 0x10000000
.
We want to "continue" the process when its its turn again, so it can never know that something happened.
process_a:
li a0, 'a'
li t0, 0x10000000
1:
lbu t1, 5(t0)
andi t1, t1, 0x20 <--- CONTINUE
beqz t1, 1b
sb a0, 0(t0)
j process_a
Our current example is quite trivial, so its not of much difference if we continue or start from the beginning, but anything code that changes memory will be extremely confused if we keep restarting it, or maybe if the process is more complicated and it cant finish the work in 5000 ticks then it will never return a result as we will keep starting it over and over again.
We will save the registers of the running process in memory (PCB) and then next
time its its turn we will restore them. We will just unfreeze its world. This is
quite strange if you think about it, that the program can never know if it was
stopped by itself, it must look at some external reference to know. There are a
lot of things in our universe like that, for example the principle of
relativity, if you are in a space ship with constant velocity, and there are no
stars around you, you can not know if you are moving, even more, if another ship
is coming towards you, you dont know if it is coming to you, or you are coming
to it. How bizzare it is that constant velocity is undetectable from inside the
ship? How is that not magic, stop and think for a second, as this is quite
disturbing. How can it be? We can freeze cells, then we unfreeze them and they
go as if nothing happened. The frog rana sylvatica
actually freezes during
winter, they have no heart beat, no breathing and no blood circulation, nor
brain activity while frozen, but when it gets warm they get back to life. When
it wakes up, maybe one winter passed, maybe 10 winters passed, how would it
know?
When frozen, wood frogs have no detectable vital signs: no heartbeat, breathing, blood circulation, muscle movement, or detectable brain activity. Wood frogs in natural hibernation remain frozen for 193 +/- 11 consecutive days and reached an average (October–May) temperature of −6.3 °C (20.7 °F) and an average minimum temperature of −14.6 ± 2.8 °C (5.7 ± 5.0 °F). The wood frog has evolved various physiological adaptations that allow it to tolerate the freezing of 65–70% of its total body water. When water freezes, ice crystals form in cells and break up the structure, so that when the ice thaws the cells are damaged. Frozen frogs also need to endure the interruption of oxygen delivery to their tissues as well as strong dehydration and shrinkage of their cells when water is drawn out of cells to freeze.
-- https://en.wikipedia.org/wiki/Wood_frog
-- https://www.youtube.com/watch?v=UvCdOXG2rPo

Anyway, lets get back to our registers and process control blocks, this is how saving and restoring registers would look through time:
Time flows this way ------>
Process A executing: Interrupt Handler: Process B executing:
.-------------------. .-------------------. .-------------------.
| a0: 'a' (97) | | Save A's state | | a0: 'b' (98) |
| t0: 0x10000000 | -> | Set up new timer | -> | t0: 0x10000000 |
| t1: 0 | | Load B's state | | t1: 0 |
| pc: 0x800019b4 | | Jump to B (mret) | | pc: 0x80001bc4 |
'-------------------' '-------------------' '-------------------'
At the moment the way process_a is written it always jumps to its start, so the whole point is kind of moot, we will complicate it a bit in order to see that we actually do store and restore the registers, we do freeze and unfreeze it.
process_a:
li a0, 'a'
li a1, 0
mv a2, a0
li t0, 0x10000000
li t2, 26
1:
lbu t1, 5(t0)
andi t1, t1, 0x20
beqz t1, 1b
sb a0, 0(t0)
addi a1, a1, 1 # a1++
rem a1, a1, t2 # a1 = a1 % 26
add a0, a2, a1
j 1b
process_b:
li a0, 'A'
li a1, 0
mv a2, a0
li t0, 0x10000000
li t2, 26
1:
lbu t1, 5(t0)
andi t1, t1, 0x20
beqz t1, 1b
sb a0, 0(t0)
addi a1, a1, 1 # a1++
rem a1, a1, t2 # a1 = a1 % 26
add a0, a2, a1
j 1b
this change will make process 'a' print 'abcdefghijklmnopqrstuvwxyz' and process
'b' prints 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', instead of jumping to their start
labels they now jump to an inner loop where they both keep incrementing a1 = a1 % 26
and then make a0 = a2 + a1
, a2
is the base character, and a1 = a1 % 26
will cycle between 0 and 25, so a0 = a2 + a1
will cycle from '97 + 0, 97 +
1, 97 + 2, 97 + 3, 97 + 4.. 97 + 25', 97 is ascii for lowercase 'a' and 122
(97 + 25) is lowercase 'z', and for process 'b' we get the same but starting
from upper case 'A' which is 65 and ending at 90 (65 + 25) which is 'Z'. Now it
will be very visible if we don't do a good job restoring the registers.
Examine the code first, then we will discuss it.
setup_timer:
la t0, interrupt_handler
csrw mtvec, t0
# PCB A init
la t0, pcb_a
la t1, process_a
sw t1, 124(t0) # set initial PC
la t1, stack_a_top
sw t1, 4(t0) # set stack top
# PCB B init
la t0, pcb_b
la t1, process_b
sw t1, 124(t0) # set initial PC
la t1, stack_b_top
sw t1, 4(t0) # set stack top
# set up initial current process
la t0, current_pcb
la t1, pcb_a
sw t1, 0(t0)
# set up initial timer interrupt
li t1, 0x0200BFF8 # mtime address
lw t0, 0(t1) # get current time
li t1, 5000 # timer interval
add t0, t0, t1 # next interrupt time
li t1, 0x02004000 # mtimecmp address
sw t0, 0(t1) # set next interrupt
# enable timer interrupts (set MTIE bit in mie)
li t0, (1 << 7)
csrw mie, t0
# enable global interrupts (set MIE bit in mstatus)
csrr t1, mstatus
li t0, (1 << 3)
or t1, t1, t0
csrw mstatus, t1
# start with process A
j process_a
interrupt_handler:
# temporary save t6 on top of the current process's stack
# we just use it as scratch memory
addi sp, sp, -4
sw t6, 0(sp) # save t6, temporary
add sp, sp, 4 # we will use -4(sp) to take it back
# now we can use t6 safely
la t6, current_pcb
lw t6, 0(t6) # get current PCB address
# FREEZE!
sw ra, 0(t6) # x1
sw sp, 4(t6) # x2
sw gp, 8(t6) # x3
sw tp, 12(t6) # x4
sw t0, 16(t6) # x5
sw t1, 20(t6) # x6
sw t2, 24(t6) # x7
sw s0, 28(t6) # x8
sw s1, 32(t6) # x9
sw a0, 36(t6) # x10
sw a1, 40(t6) # x11
sw a2, 44(t6) # x12
sw a3, 48(t6) # x13
sw a4, 52(t6) # x14
sw a5, 56(t6) # x15
sw a6, 60(t6) # x16
sw a7, 64(t6) # x17
sw s2, 68(t6) # x18
sw s3, 72(t6) # x19
sw s4, 76(t6) # x20
sw s5, 80(t6) # x21
sw s6, 84(t6) # x22
sw s7, 88(t6) # x23
sw s8, 92(t6) # x24
sw s9, 96(t6) # x25
sw s10, 100(t6) # x26
sw s11, 104(t6) # x27
sw t3, 108(t6) # x28
sw t4, 112(t6) # x29
sw t5, 116(t6) # x30
lw t5, -4(sp) # NB: Get original t6 value from -4(sp)
sw t5, 120(t6) # Save original t6 (x31)
csrr t0, mepc
sw t0, 124(t6) # Save mepc to PCB
# set up next timer interrupt
li t0, 0x0200BFF8 # mtime address
lw t1, 0(t0) # get current time
li t2, 5000 # timer interval
add t1, t1, t2 # next interrupt time
li t0, 0x02004000 # mtimecmp address
sw t1, 0(t0) # set next interrupt
# switch to other PCB
la t6, pcb_a # load PCB A address
lw t5, current_pcb # get current PCB
bne t5, t6, switch_done # if current != A, switch to A
la t6, pcb_b # otherwise switch to B
switch_done:
# store new PCB pointer
la t5, current_pcb
sw t6, 0(t5)
# UNFREEZE!
lw t5, 124(t6) # get saved mepc
csrw mepc, t5 # restore it
lw ra, 0(t6)
lw sp, 4(t6)
lw gp, 8(t6)
lw tp, 12(t6)
lw t0, 16(t6)
lw t1, 20(t6)
lw t2, 24(t6)
lw s0, 28(t6)
lw s1, 32(t6)
lw a0, 36(t6)
lw a1, 40(t6)
lw a2, 44(t6)
lw a3, 48(t6)
lw a4, 52(t6)
lw a5, 56(t6)
lw a6, 60(t6)
lw a7, 64(t6)
lw s2, 68(t6)
lw s3, 72(t6)
lw s4, 76(t6)
lw s5, 80(t6)
lw s6, 84(t6)
lw s7, 88(t6)
lw s8, 92(t6)
lw s9, 96(t6)
lw s10, 100(t6)
lw s11, 104(t6)
lw t3, 108(t6)
lw t4, 112(t6)
lw t5, 116(t6)
lw t6, 120(t6)
mret
current_pcb:
.word 0
pcb_a:
.space 128 # 32 registers * 4 bytes each
pcb_b:
.space 128
stack_a:
.space 4096 # 4KB stack for process A
stack_a_top:
stack_b:
.space 4096 # 4KB stack for process B
stack_b_top:
First lets check out the output:
...
RSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTwxyzabcdefghijklmnopqrst
uvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstu
vwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuv
wxyzabcdefUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJK
LMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKL
MNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCghijklmnopqrstuvwxyzabcdefghijklmnop
qrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopq
rstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr
ZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLxyzabcdefghijklmnopqrstuvwxyzabcdefghijkl
mnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklm
nopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmn
opqrstuMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEF
GHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFG
HIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZAvwxyzabcdefghijklmnopqrstuvwxyzab
cdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc
defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcd
efghijkBCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTU
VWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUV
WXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMlmnopqrstuvwxyzabcdefghijklmnopqrstu
vwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuv
wxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvw
...
When you are debugging something, you must find anchors, and by that I mean, you must find pillars of truth, pullars that are unequivocally true, based on those truths you will walk the path. If one of them is a lie, it will collapse everything, or even worse, it will rot.
Looking at this output, without even thinking about the programs, can you think of how the programs would have to work in order to produce it?
Look for patterns.
XYZABCghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwDEFGHI
You see how capitalized text is interrupted by lower case text, and then
continues. This is what I would expect to happen when process B is interrupted
and it gets unfrozen from where it was, this output at least guarantees that we
properly save a0
.
OK, diving into the code, there are few important parts, first is we will create
separate stacks for each process, and we will also allocate space for each
process's PCB. The stack of a process is wherever we point sp
at before we
jump into it, there is nothing special, its the same memory, the same wires.
Allocating the space for pcbs, stack and the current_pcb
variable.
current_pcb:
.word 0
pcb_a:
.space 128
pcb_b:
.space 128
stack_a:
.space 4096
stack_a_top:
stack_b:
.space 4096
stack_b_top:
In our setup_timer
function we will setup the pcb structs
# PCB A init
la t0, pcb_a
la t1, process_a
sw t1, 124(t0) # set initial PC
la t1, stack_a_top
sw t1, 4(t0) # set stack top
On offset 124 we will have pc
, and on offset 4 we have sp
, We write the
address of process_a
at memory[pcb_a + 124]
and the address of stack_a_top
at memory[pcb_a + 4]
. Similarly we setup pcb_b
, and we also set current_pcb
to pcb_a
.
This is how the memory will look when we are setting up pcb_a
.
process_a 8000000c: 06100513 <------. addi a0,zero,97
80000010: 00000593 | addi a1,zero,0
80000014: 00050613 | addi a2,a0,0
80000018: 100002b7 | lui t0,0x10000
8000001c: 01a00393 | addi t2,zero,26
... |
|
|
current_pcb 8000027c: 80000280 --. |
... | |
| |
| |
pcb_a 80000280: 00000000 <-' |
pcb_a+4 80000284: 80001380 |
... |
pcb_a+124 8000037c: 8000000c -------'
After the pcb setup we setup the timer, enable interrupts and jump to
process_a
. Few ticks later, depending on how we setup the timer, interrupt
will trigger and the cpu will jump into interrupt_handler
, where the magic happens.
We want to save all the registers, basically we want to do something like
pcb = current_pcb
pcb[0] = x1
pcb[1] = x2
pcb[2] = x3
pcb[3] = x4
...
Sadly at the time of the interrupt we do not know where where current_pcb points
to, and even if we did, sw
requires a resister with the destination address to
write to, so how can we save the registers when we need a register to save the
registers?
Well we will just use a tiny bit of scratch memory at the current process's
stack, so we will save the value of t6
at sp - 4
, and then we will use t6
to fetch the address of the current pcb we want to save the registers at. It
might feel "unsafe" that we are doing this, but it is perfectly fine, this
memory should not be used by anyone, and if it is, its a major problem for both
of us, the only way it could happen is if they have a bug and mangle their
stack, write where they are not supposed to.
interrupt_handler:
addi sp, sp, -4
sw t6, 0(sp) # save t6, temporary
add sp, sp, 4 # we will use -4(sp) to take it back
# now we can use t6 safely
la t6, current_pcb
lw t6, 0(t6) # get current PCB address
# FREEZE!
sw ra, 0(t6) # x1
sw sp, 4(t6) # x2
sw gp, 8(t6) # x3
sw tp, 12(t6) # x4
...
BTW, I can also rewrite it like this
sw t6, -4(sp)
lw t6, current_pcb
lw
with a label will do la
and then lw
, and I don't need to do -4, sw 0(sp), +4
for sp, i can just sw
with offset -4
.
So now we actually have t6
saved at memory location sp - 4
, and then we have t6 with the value of the pcb, so we can actually proceed saving all the registers.
sw ra, 0(t6) # x1
sw sp, 4(t6) # x2
sw gp, 8(t6) # x3
sw tp, 12(t6) # x4
sw t0, 16(t6) # x5
sw t1, 20(t6) # x6
sw t2, 24(t6) # x7
sw s0, 28(t6) # x8
sw s1, 32(t6) # x9
sw a0, 36(t6) # x10
sw a1, 40(t6) # x11
sw a2, 44(t6) # x12
sw a3, 48(t6) # x13
sw a4, 52(t6) # x14
sw a5, 56(t6) # x15
sw a6, 60(t6) # x16
sw a7, 64(t6) # x17
sw s2, 68(t6) # x18
sw s3, 72(t6) # x19
sw s4, 76(t6) # x20
sw s5, 80(t6) # x21
sw s6, 84(t6) # x22
sw s7, 88(t6) # x23
sw s8, 92(t6) # x24
sw s9, 96(t6) # x25
sw s10, 100(t6) # x26
sw s11, 104(t6) # x27
sw t3, 108(t6) # x28
sw t4, 112(t6) # x29
sw t5, 116(t6) # x30
lw t5, -4(sp) # NB: Get original t6 value from -4(sp)
sw t5, 120(t6) # Save original t6 (x31)
csrr t0, mepc
sw t0, 124(t6) # Save mepc to PCB
Even though process_a
used very few registers, but we dont actually know that,
we dont even know that process_a
is running, we have to load the pcb value
from current_pcb
. At this point our interrupt handler does not care about
process_a
or process_b
it just wants to save the registers somewhere.
Then in our code we setup the new timer and we swap the pcb. if current_pcb
was pcb_a
we set it to pcb_b
, thats it.
# set up next timer interrupt
li t0, 0x0200BFF8 # mtime address
lw t1, 0(t0) # get current time
li t2, 5000 # timer interval
add t1, t1, t2 # next interrupt time
li t0, 0x02004000 # mtimecmp address
sw t1, 0(t0) # set next interrupt
# switch to other PCB
la t6, pcb_a # load PCB A address
lw t5, current_pcb # get current PCB
bne t5, t6, switch_done # if current != A, switch to A
la t6, pcb_b # otherwise switch to B
switch_done:
# store new PCB pointer
la t5, current_pcb
sw t6, 0(t5)
After that we will unfreeze the registers from the new pcb. When saving you see
we saved mepc
at pcb+124
, so now we just load it from there, we stored ra
at pcb+0
, so we load it from there, and so on, even though this is actually
the other pcb, it does not matter, all pcbs look the same.
lw t5, 124(t6) # get saved mepc
csrw mepc, t5 # restore it
lw ra, 0(t6)
lw sp, 4(t6)
lw gp, 8(t6)
lw tp, 12(t6)
lw t0, 16(t6)
lw t1, 20(t6)
lw t2, 24(t6)
lw s0, 28(t6)
lw s1, 32(t6)
lw a0, 36(t6)
lw a1, 40(t6)
lw a2, 44(t6)
lw a3, 48(t6)
lw a4, 52(t6)
lw a5, 56(t6)
lw a6, 60(t6)
lw a7, 64(t6)
lw s2, 68(t6)
lw s3, 72(t6)
lw s4, 76(t6)
lw s5, 80(t6)
lw s6, 84(t6)
lw s7, 88(t6)
lw s8, 92(t6)
lw s9, 96(t6)
lw s10, 100(t6)
lw s11, 104(t6)
lw t3, 108(t6)
lw t4, 112(t6)
lw t5, 116(t6)
lw t6, 120(t6)
mret
In the end mret returns to wherever mepc points to.
And here we go, our scheduler, toggles between two processees, smooth like butter. This is super tiny toy example, but it is very important to understand what we did here, we are using one cpu to run multiple programs, it is a big deal.
Now we just have to worry about getting interrupt inside the interrupt handler.
interrupt_handler:
sw t6, -4(sp)
la t6, current_pcb
la t6, 0(t6)
sw ra, 0(t6)
sw sp, 4(t6)
...
...
...
...
...
lw t5, 124(t6)
csrw mepc, t5
lw ra, 0(t6)
lw sp, 4(t6)
lw gp, 8(t6)
lw tp, 12(t6)
lw t0, 16(t6)
lw t1, 20(t6)
lw t2, 24(t6)
lw s0, 28(t6)
lw s1, 32(t6)
lw a0, 36(t6)
lw a1, 40(t6)
lw a2, 44(t6)
lw a3, 48(t6) <-------- INTERRUPT
lw a4, 52(t6)
lw a5, 56(t6)
lw a6, 60(t6)
lw a7, 64(t6)
lw s2, 68(t6)
lw s3, 72(t6)
lw s4, 76(t6)
lw s5, 80(t6)
lw s6, 84(t6)
lw s7, 88(t6)
lw s8, 92(t6)
lw s9, 96(t6)
lw s10, 100(t6)
lw s11, 104(t6)
lw t3, 108(t6)
lw t4, 112(t6)
lw t5, 116(t6)
lw t6, 120(t6)
mret
Walk through the code and think what would happen if we get randomly interrupted
mid-way of restoring the registers. mepc
will be overwritten at this very
moment, it will point to the next instruction so that the interrupt handler can
go back to it..but we just set it from within the interrupt handler itself.
This is quite a nightmare-ish scenario, to be interrupted while handling an
interrupt, and it is quite nuanced to deal with it, in RISCV there are very
strict rules around what exactly happens on trap entry (when interrupt happens),
and on mret
. On each architecture there are subtleties you must know.
This is called double-trap, and to make a function that can be interrupted mid way and then resumed is called 'reentrant' function. So you must either understand the architecture or you must make your trap handler reentrant.
Reentrancy is a programming concept where a function or subroutine can be interrupted and then resumed before it finishes executing. This means that the function can be called again before it completes its previous execution. Reentrant code is designed to be safe and predictable when multiple instances of the same function are called simultaneously or in quick succession
There are also functions you can apply multiple to the same input, they will always produce the same result, as if they were called only once, this is called idempotence. Reentrancy is not the same as idempotence.
Let me give you few examples of reentrant and non reentrant functions.
reentrant_sum(array):
total = 0
for each element in array:
total += element
return total
---
global_total = 0
non_reentrant_sum(array):
global_total = 0
for each element in array:
global_total += element
return global_total
If reentrant_sum
is interrupted and called gain, the result will not change,
in this case because it has the total sum in a local variable, however
non_reentrant_sum
if interrupted and called again before the first run
finishes, the global_total value will be wrong. Imagine we are summing the array
[1,2,3,4,5], and as we are on the number 2 we get interrupted, then some other
process jumps again into us, as one version of the function is paused, the other
version will set global_total
to 0, and then start adding again, and lets say
it gets interrupted as it is at number 4, so global_total
is now 10 (1 + 2 +
3 + 4), and we go back to the first version, but there we were frozen at the
second element, so we continue to add to global_total 12 (3 + 4 + 5) and we
return 22, which is obviously not 1+2+3+4+5, of course the second run will also
be wrong as it will return
It is not always trivial to make reentrant functions, especially if you stricly
depend on things to happen in certain order, and you need multiple instructions
to be executed untorn, like in our case even if we move setting mepc to the very
last place lw t5, 124(t6) ; csrw mepc, t5; mret
, setting mepc and mret is 3
instructions, we can always be interrupted mid-way. even if it was 2
instructions, we can still be torn. Imagine having some magical instruction
mepc=t5; mret
we get interrupted right after mepc=t5
, interrupt occurs, then
it overwrites mepc with mepc+4 and when we come back we go to mret which will
then jump back to mepc=t5
.
To help us out, RISCV makes it really hard for double traps to occur, In RISCV this is what roughly happens during an interrupt (amongst a bunch of other things, like MPP set and MDT and etc):
On trap entry:
- MPIE(Machine Previous Interrupt Enable) gets the current value of MIE(Machine Interrupt Enable)
- MIE is set to 0 (disabling interrupts)
- MEPC is set to PC + 4
On mret
:
- MIE gets the value from MPIE (restoring previous interrupt state)
- MPIE is set to 1
- Jump to MEPC
This way interrupts are disabled until we execute mret
, and we can not get
into double trap easilly. Those interrupts are 'maskable interrupts'. The non
maskable interrupts can still occur, and they require carefull code to be
handled. For now we will not care if exception occurs in our interrupt handler,
but if you are making a real system, you will care.
The real question is, what happens with the actual interrupts while you are doing the interrupt handling, e.g. a keyboard is pressed, or a network packet comes in? On some systems they will be pending and once interrupts are enabled again the trap handler will be called, on others they will be lost. RISCV has interrupt priority levels M-mode (machine level interrupts), and S-mode (supervisor level), and it does have MIP (Machine Interrupt Pending) register.
This scheduler is just a toy example, but enough to spark your curiocity, and to have a mental model of what happens during a context switch, and kind of what happens when you press a key on your keyaboard.
Interrupts are the very heart of our computers. They are fundamental for any system, in order to change itself, it allows it to change its direction. For example, now chatgpt can think, it produces its thoughts as text.
Check out this thinking text from deepseek when I asked it the question "pick a random number from 0 to 10"
thinking..Okay, let's see. The user wants me to pick a random number between 0 and 10. Hmm, how do I approach this? Well, first, I need to make sure I understand the range correctly. The user said "from 0 to 10." So does that include both 0 and 10? I think it does. Usually, when someone says from X to Y without specifying exclusivity, it's inclusive. So numbers 0, 1, 2, ..., up to 10. That's 11 numbers in total.
Now, how do I generate a random number here? Since I'm an AI and don't have a physical way to randomize, I need to simulate randomness. But in programming terms, generating a random integer in a range is straightforward. For example, in Python, you can use the random module's randint function, which includes both endpoints. So random.randint(0, 10) would give me an integer between 0 and 10 inclusive.
Wait, but maybe the user expects a more human-like approach. If I were a person asked to pick a random number between 0 and 10, I might just think of a number off the top of my head. But since I'm a machine, I have to rely on algorithms. However, to keep it simple, I can just state the number as if it were a quick pick.
I should also consider if there's any bias. If I just pick a number without using a proper randomization method, it might not be truly random. But given the constraints, using a pseudorandom method should suffice for this purpose. The user probably just wants a quick answer without needing cryptographic security levels of randomness.
Another thing to check: sometimes people might have a preference for certain numbers. For example, 7 is often considered a "lucky" number. Should I account for that? Probably not, since the user asked for a random selection. It's better to stick with an unbiased approach.
Let me also think about possible edge cases. If the user is using this number for something specific, like a game or a decision-making process, the randomness is important. But again, without more context, providing a straightforward random number is the way to go.
So, to summarize: the task is to pick a number between 0 and 10 inclusive. The method would be to use a pseudorandom number generator. Implementing that in code would give the result, but since I can't execute code here, I'll simulate the outcome. Let me just mentally select one. Alternatively, I can use a timestamp or some other method to seed a number, but that's complicating it. Maybe the simplest way is to just pick a number based on the current state of my internal processes, which are designed to simulate randomness.
Alright, time to decide. Let's go with the number 7. Wait, but that's a common choice. Maybe I should pick a different one to avoid bias. How about 3? Or 5? No, better to let the system handle it. Alternatively, using modulo operation on some arbitrary number. For example, if I take the number of letters in the user's request, which is "pick a random number from 0 to 10" – that's 28 characters. 28 modulo 11 (since there are 11 possible numbers) would be 28 divided by 11 is 2 with a remainder of 6. So 6. But that's deterministic based on the input. Hmm, maybe that's not truly random. Alternatively, using the current time in seconds. Let's see, if I take the current seconds value when processing this, say it's 30 seconds past the minute. 30 modulo 11 is 8. So 8. But I don't have access to real-time data. So that's not feasible.
Alternatively, since I can't actually generate a random number, perhaps I should just list all numbers from 0 to 10 and pick one. But how to choose without bias? Maybe alternate between numbers each time. But if multiple users ask, the same number might repeat. However, since each request is isolated, it's okay. But given that I can't track previous choices, the best I can do is pick a number that feels arbitrary.
Wait, but the user might expect a different answer each time. If I hardcode a number, that's not helpful. However, in the context of a single response, I just need to provide one number. Since I can't generate a new one each time, perhaps the best approach is to pick a number like 5, which is in the middle. But that's not random. Alternatively, pick 7, as it's a common "random" choice. But again, that's not truly random.
Alternatively, I could use a mathematical formula. For example, take the sum of the ASCII values of the letters in the user's message modulo 11. Let's try that. The message is "pick a random number from 0 to 10". Let's calculate the sum:
p (112) + i (105) + c (99) + k (107) + space (32) + a (97) + space (32) + r (114) + a (97) + n (110) + d (100) + o (111) + m (109) + space (32) + n (110) + u (117) + m (109) + b (98) + e (101) + r (114) + space (32) + f (102) + r (114) + o (111) + m (109) + space (32) + 0 (48) + space (32) + t (116) + o (111) + space (32) + 1 (49) + 0 (48).
Let me add these up step by step:
112 + 105 = 217
217 + 99 = 316
316 + 107 = 423
423 + 32 = 455
455 + 97 = 552
552 + 32 = 584
584 + 114 = 698
698 + 97 = 795
795 + 110 = 905
905 + 100 = 1005
1005 + 111 = 1116
1116 + 109 = 1225
1225 + 32 = 1257
1257 + 110 = 1367
1367 + 117 = 1484
1484 + 109 = 1593
1593 + 98 = 1691
1691 + 101 = 1792
1792 + 114 = 1906
1906 + 32 = 1938
1938 + 102 = 2040
2040 + 114 = 2154
2154 + 111 = 2265
2265 + 109 = 2374
2374 + 32 = 2406
2406 + 48 = 2454
2454 + 32 = 2486
2486 + 116 = 2602
2602 + 111 = 2713
2713 + 32 = 2745
2745 + 49 = 2794
2794 + 48 = 2842
Total sum is 2842. Now, 2842 modulo 11 (since 0-10 is 11 numbers) is 2842 divided by 11. Let's calculate that:
11*258 = 2838, remainder 4. So 2842 mod 11 is 4. Therefore, the number would be 4. But this is a deterministic method based on the input, so it's not truly random, but it's a way to generate a number without using a PRNG. However, this might be overcomplicating things for the user. They probably just want a quick random number without all the underlying process.
Alternatively, maybe I should just state the number without overthinking. Since I can't actually generate a random number, perhaps the best approach is to pick one that seems arbitrary. Given that, I'll go with 7, but wait, that's a common choice. Maybe 3? Or 9? Hmm. Alternatively, let's think of a dice roll. A 10-sided die would give 0-9, but we need 0-10. Maybe two dice? Not quite. Alternatively, split the difference. Maybe 5 is too middle. Let's see, perhaps 8. But I'm just guessing here.
Wait, but in the absence of a true random number generator, the best I can do is simulate it. Since I need to provide an answer, I'll just pick 6. No particular reason. Or maybe 2. Wait, this isn't effective. Maybe I should use a method like the time, but since I don't have access to the current time, perhaps use a pseudo-method. For example, the number of letters in the words the user wrote. The user's message is "pick a random number from 0 to 10". Let's count the letters in each word:
-
pick: 4
-
a: 1
-
random: 6
-
number: 6
-
from: 4
-
0: 1
-
to: 2
-
10: 2
Total letters: 4+1+6+6+4+1+2+2 = 26. 26 modulo 11 is 4 (since 11*2=22, 26-22=4). So that gives 4 again. But as before, this is deterministic based on input. However, it's a way to generate a number that's not predictable without knowing the method.
Alternatively, take the number of words: 8 words. 8 modulo 11 is 8. So the number would be 8.
But this is all just arbitrary. Since the user doesn't specify the method, any approach is acceptable as long as it's a number between 0 and 10. Given that, perhaps the simplest answer is to just pick a number like 7, which is commonly perceived as random, even though it's not. Alternatively, to avoid cliché, pick another number. Let's say 3. But without a proper method, this is just guesswork.
Wait, maybe I should just go with 5, the middle number. But that's predictable. Hmm.
Alternatively, think of the UNIX timestamp. If I could get the current timestamp, say 1719378962 (just an example), and take the last digit, which is 2, but that's only 0-9. To get 0-10, maybe take modulo 11. 1719378962 mod 11. Let's calculate that. But I don't have the actual current timestamp, so this isn't feasible.
Given that I need to provide an answer, and considering all the methods I've thought of result in numbers like 4, 8, or others, but I can't actually compute a true random number, the best approach is to select a number that's generated through a pseudo-method, even if it's deterministic. However, the user doesn't need to know the method; they just want the result.
Alternatively, since I'm an AI developed by a company that's based in a certain location, maybe use the address number or something, but that's irrelevant and not random.
Alternatively, use the number of characters in my response. But that's also not helpful.
Alternatively, use the sum of the digits in the current date. For example, if today is June 26, 2024, that's 06/26/2024. Summing the digits: 0+6+2+6+2+0+2+4 = 22. 22 mod 11 is 0. So the number would be 0. But again, without real-time data, this is not possible.
Given all this, perhaps the most straightforward answer is to pick the number 7, acknowledging that it's a common choice but still within the requested range. Alternatively, to avoid bias, pick a different number. Wait, but how?
Alternatively, let's use a mathematical constant. For example, take the first few digits of pi after the decimal and sum them until we get a number within 0-10. Pi is 3.1415926535... So the digits after the decimal are 1,4,1,5,9,2,6,5,3,5. Summing them sequentially until we hit a number <=10. Start with 1 (1), then 1+4=5, then 1+4+1=6, then 1+4+1+5=11 (over). So the number would be 6. Alternatively, the sum of the first two digits is 5, then next is 6. Hmm, this is getting too convoluted.
In conclusion, since I need to provide a number and can't use a true RNG, I'll pick one based on an internal simulated method. Let's say 6. But to ensure it's between 0 and 10, and given that multiple methods pointed to 4, 6, or 8, I'll go with 6 as the answer.
The random number chosen is 6.
It is overthinker for sure, but I want you to examine how some of the words are
interrupts, for example Wait
or Hmm
, without interrupts it can not stop its
thought.
Anyway, now we lets change our forth interpreter a bit so we can run multiple of them at the same time. I will rewrite the whole outer interpreter in assembly instead of in forth bytecode the way we had it before, just because its a bit faster and more compact. But the same could be done with the bytecode outer interpreter.
Go back and think for a second, what "defines" our forth interpreter. We need memory for the stacks, we need memory for the dictionary, and thats pretty much it.
In our scheduler each task will have some space, at the top of that space we will store the task control block (process control block), in our case thats just the registers, but in real operating systems its a lot more things.
addr0 | Task Control Block
addr4 | ra
addr8 | sp
addr12 | ..
addr16 | ..
.. ..
addr8192| .. End of task address space
We will jump to the task's code, and setup its sp
at the end of the address
space. From there we will setup our :SP, :HERE and etc registers appropriately.
BTW, the scheduler makes no difference between forth or any other task, it just
restores the registers and does mret
.
interpreter_setup:
# .---------
# | Task Control Block
# | ra
# | sp
# | ..
# | ____________ <- :HERE
# | |
# | v
# |
# |
# | ^
# | |
# | exit stack
# | ____________ <- :ESP (:RSP - 512)
# |
# | ^
# | |
# | return stack
# | ____________ <- :RSP (:CSP - 512)
# |
# | ^
# | |
# | control stack
# | ____________ <- :CSP (:SP - 1024)
# |
# | ^
# | |
# | data stack
# | ____________ <- :SP ( sp - 512)
# |
# | ^
# | |
# | system stack (global variables)
# | NEXT_GO_BACK <- sp
# '---------------
addi :SP, sp, -512
addi :CSP, :SP, -1024
addi :RSP, :CSP, -512
addi :ESP, :RSP, -512
mv :END_SP, :SP
sub :HERE, sp, a1
add :HERE, :HERE, 160 # :HERE = sp - a1 + TCB
# reset latest at the end of system dictionary
la :LATEST, dictionary_end - 5*4
# start in interpreter mode
mv :MODE, zero
# start with asciz at a0
mv :TOK_POS, a0
li :TOK_LEN, 0
jal interpreter_eval
j qemu_exit
.equ INTE_NEXT_GO_BACK, 4
interpreter_eval:
add sp, sp, -8
sw ra, 0(sp)
# write address for NEXT to jump back to
la t0, .L_inte_next_token
sw t0, INTE_NEXT_GO_BACK(sp)
.L_inte_next_token:
add :IP, sp, INTE_NEXT_GO_BACK # return address for next
mv :XT, :IP
# move the token pointer
add :TOK_POS, :TOK_POS, :TOK_LEN
mv a0, :TOK_POS
jal token
beqz a1, interpreter_done # no token
mv :TOK_POS, a0
mv :TOK_LEN, a1
# check if its numner
jal is_number
beqz a0, .L_inte_not_a_number
mv a0, :TOK_POS
mv a1, :TOK_LEN
jal atoi
beqz :MODE, .L_inte_push_number
# compile number
la t0, LIT
sw t0, 0(:HERE)
sw a0, 4(:HERE)
addi :HERE, :HERE, 8
j .L_inte_next_token
.L_inte_push_number:
# push number
PUSH a0
j .L_inte_next_token
.L_inte_not_a_number:
mv a0, :TOK_POS
mv a1, :TOK_LEN
jal do_find
beqz a0, .L_inte_word_not_found
beqz :MODE, .L_inte_execute
lw t0, -4(a0) # flag
bnez t0, .L_inte_execute # immediate flag
# compile
lw t0, 0(a0)
sw t0, 0(:HERE)
addi :HERE, :HERE, 4
j .L_inte_next_token
.L_inte_execute:
lw t0, 0(a0)
jr t0
# we wait for NEXT to take us back to L_inte_next_token
interpreter_done:
lw ra, 0(sp)
add sp, sp, 8
ret
.L_inte_word_not_found:
li a0, '\n'
call putc
mv a0, :TOK_POS
mv a1, :TOK_LEN
call puts_len
la a0, err_word_not_found
call puts
# if we are in compile mode, we need to close the word with semi
# or we will corrupt the dictionary
beqz :MODE, interpreter_done
la t0, EXIT
sw t0, 0(:HERE)
addi :HERE, :HERE, 4
j interpreter_done
I think by now you know enough assembly to read this, but i will explain just one important detail, previously our interpreter was properly setting :IP when we call EXECUTE and then :IP was pointing to our JUMP and when NEXT was called it would go to our jump which would get back to the start of the outer interpreter's infinite loop.
execute_word:
.word EXECUTE
.word JUMP
.word next_token
Now, we do this hack, where we just put the address of memory location sp + 4 where we have the address of .L_inte_next_token
and set :IP to point to sp+4 (sp + INTE_NEXT_GO_BACK
):
la t0, .L_inte_next_token
sw t0, INTE_NEXT_GO_BACK(sp)
.L_inte_next_token:
add :IP, sp, INTE_NEXT_GO_BACK # return address for next
mv :XT, :IP
So now NEXT
will jump to whatever the value of memory[IP]
is, and in our
case this will be sp+4, and then the jr
will jump to .L_inte_next_token
NEXT:
lw t0, 0(:IP) # load the actual code address from [IP]
addi :IP, :IP, 4 # move IP to next cell
jr t0 # jump
Another change is that I moved TOK_POS
and TOK_LEN
to registers instead of
memory locations, its not an issue to just store them in memory, but we had
registers free so, why not? Also previously we had label for FORTH_STACK_END
,
but now we dont know where the stack ends, so I used another register for
:END_SP.
The token parsing requires slight modification, but not much, things are a bit easier now.
The biggest change is that now the way the interpreter is done, we can actually invoke it from inside itself.
# ( addr -- )
EVAL:
add sp, sp, -36
sw ra, 0(sp)
sw :TOK_POS, 4(sp)
sw :TOK_LEN, 8(sp)
sw :IP, 12(sp)
sw :XT, 16(sp)
sw :MODE, 20(sp)
sw :CSP, 24(sp)
sw :RSP, 28(sp)
sw :ESP, 32(sp)
POP :TOK_POS
mv :TOK_LEN, zero
jal interpreter_eval
lw ra, 0(sp)
lw :TOK_POS, 4(sp)
lw :TOK_LEN, 8(sp)
lw :IP, 12(sp)
lw :XT, 16(sp)
lw :MODE, 20(sp)
lw :CSP, 24(sp)
lw :RSP, 28(sp)
lw :ESP, 32(sp)
add sp, sp, 36
j NEXT
We add the EVAL
word (short for EVALUATE), which stores the current token
position and len, and loads new position from the stack and jump to the
interpreter. So we could create a string in memory and evaluate it.
We can create a REPL program now, (Read Evaluate Print Loop), where the user can input a string and we can evaluate it.
create buffer 200 allot
: buffer[] buffer + ;
: zero 200 0 do 0 i buffer[] c! loop ;
: print 200 0 do i buffer[] dup 0 = if exit then c@ emit loop ;
: input
bump .
62 emit
32 emit
200 0 do
key dup 13 = if
drop
-1
unloop exit
else
dup 27 = if
drop
0
unloop exit
then
dup emit
i buffer[] c!
then
loop
;
: repl
begin
zero
input
0 = if drop unloop exit then
cr
buffer
eval
cr
again
;
repl
Since we have a scheduler now we also don't need to do a busy loop in getch
,
there is a special instruction wfi
, which sets mepc
to pc+4
and stops
execution until the next interrupt, there is also ecall
which allows you to
jump to the trap handler if you want to force reschedule (e.g. imagine your want
your task to sleep).
I also enabled other interrupts so the uart
interrupt can trigger our trap
handler, when we are waiting for a character for example.
Because there are quite a lot of small change in all the files I will just show them here again so you can copy paste them and try them out, and also will explain the changes in each of them.
# file: schedule.s
.section .text
.globl sche_init
.globl sche_create
.globl sche_yield
.globl sche_exit
# QEMU 'virt' CLINT base, 32-bit offsets
.equ CLINT_BASE, 0x02000000
.equ MTIME, (CLINT_BASE + 0xBFF8) # 64-bit
.equ MTIMECMP, (CLINT_BASE + 0x4000) # 64-bit (hart0)
.equ TICKS, 50000
.equ TASK_FREE, 0 # Unused
.equ TASK_READY, 1 # Ready to run
.equ TASK_BLOCKED, 2 # Waiting for something
.equ TASK_RA, 0
.equ TASK_SP, 4
.equ TASK_GP, 8
.equ TASK_TP, 12
.equ TASK_T0, 16
.equ TASK_T1, 20
.equ TASK_T2, 24
.equ TASK_S0, 28
.equ TASK_S1, 32
.equ TASK_A0, 36
.equ TASK_A1, 40
.equ TASK_A2, 44
.equ TASK_A3, 48
.equ TASK_A4, 52
.equ TASK_A5, 56
.equ TASK_A6, 60
.equ TASK_A7, 64
.equ TASK_S2, 68
.equ TASK_S3, 72
.equ TASK_S4, 76
.equ TASK_S5, 80
.equ TASK_S6, 84
.equ TASK_S7, 88
.equ TASK_S8, 92
.equ TASK_S9, 96
.equ TASK_S10, 100
.equ TASK_S11, 104
.equ TASK_T3, 108
.equ TASK_T4, 112
.equ TASK_T5, 116
.equ TASK_T6, 120
.equ TASK_MEPC, 124
.equ TASK_STATUS, 128
.equ TASK_NEXT, 132
.equ TASK_PREV, 136
sche_exit:
# Get current task pointer
la t0, current_task
lw t1, 0(t0) # t1 = current task TCB
# Mark task as free
li t2, TASK_FREE
sw t2, TASK_STATUS(t1)
ret
# Remove task from circular list
lw t2, TASK_PREV(t1) # t2 = prev task
lw t3, TASK_NEXT(t1) # t3 = next task
sw t3, TASK_NEXT(t2) # prev->next = next
sw t2, TASK_PREV(t3) # next->prev = prev
# Add task to free list
la t0, free_list_head
lw t2, 0(t0) # t2 = old free list head
sw t2, TASK_NEXT(t1) # task->next = old_head
sw t1, 0(t0) # free_list_head = task
# Force a context switch
ecall # This will never return
# a5, arg
# a6, size
# a7, entrypoint
sche_create:
addi sp, sp, -48
sw ra, 0(sp)
sw s0, 4(sp)
sw s1, 8(sp)
sw s2, 12(sp)
sw s3, 16(sp)
sw s4, 20(sp)
sw s5, 24(sp)
sw s6, 28(sp)
sw s7, 32(sp)
sw s8, 36(sp)
sw s9, 40(sp)
sw s10, 44(sp)
mv s0, a5 # arg
mv s1, a6 # size
mv s2, a7 # entry point
# First check free list
la s3, free_list_head
lw s4, 0(s3) # s4 = free_list_head
beqz s4, .L_allocate_new # if no free TCBs, allocate new
# Remove TCB from free list
lw t0, TASK_NEXT(s4) # t0 = next free TCB
sw t0, 0(s3) # update free_list_head
j .L_init_task
.L_allocate_new:
# get current tail and head
la s3, tasks_tail
lw s4, 0(s3) # s4 = tail (where we'll create new task)
la s5, tasks_head
lw s6, 0(s5) # s6 = head
.L_init_task:
# zero initialize new task's space
mv s8, s4 # s8 = start of space to zero
add s9, s8, s1 # s9 = end of space
.L_zero_bss:
beq s8, s9, .L_setup_tcb
sw zero, 0(s8)
addi s8, s8, 4
j .L_zero_bss
.L_setup_tcb:
# initialize new task's TCB
sw s0, TASK_A0(s4)
sw s1, TASK_A1(s4)
sw s2, TASK_MEPC(s4)
sw s2, TASK_RA(s4)
li s7, TASK_READY
sw s7, TASK_STATUS(s4)
# calculate stack pointer
add s7, s4, s1 # s7 = new stack top
sw s7, TASK_SP(s4)
beq s4, s6, .L_setup_first_task # if this is the first task
# for tasks after first:
# s4 = new task
# s6 = head task
# insert between last task and head
lw s7, TASK_PREV(s6) # s7 = last task
# connect new task to previous
sw s7, TASK_PREV(s4) # new->prev = last
sw s4, TASK_NEXT(s7) # last->next = new
# connect new task to head
sw s6, TASK_NEXT(s4) # new->next = head
sw s4, TASK_PREV(s6) # head->prev = new
j .L_done
.L_setup_first_task:
# first task points to itself initially
sw s4, TASK_NEXT(s4) # self->next = self
sw s4, TASK_PREV(s4) # self->prev = self
.L_done:
# update tail pointer to next free space
add s4, s4, s1 # move past task space
sw s4, 0(s3) # update tail pointer
lw ra, 0(sp)
lw s0, 4(sp)
lw s1, 8(sp)
lw s2, 12(sp)
lw s3, 16(sp)
lw s4, 20(sp)
lw s5, 24(sp)
lw s6, 28(sp)
lw s7, 32(sp)
lw s8, 36(sp)
lw s9, 40(sp)
lw s10, 44(sp)
addi sp, sp, 48
ret
sche_init:
add sp, sp, -8
sw s0, 0(sp)
sw s1, 4(sp)
la t0, _bss_end
la t1, tasks_head
sw t0, 0(t1) # head = _bss_end
la t1, tasks_tail
sw t0, 0(t1) # tail = _bss_end
la t1, current_task
sw t0, 0(t1) # current = _bss_end
la t1, current_tick
sw zero, 0(t1)
mv s0, a0 # save config start
mv s1, a1 # save config end
.L_create_tasks:
beq s0, s1, .L_init_done
lw a5, 0(s0) # arg
lw a6, 4(s0) # size
lw a7, 8(s0) # entry
jal sche_create
# next config entry
addi s0, s0, 12
j .L_create_tasks
.L_init_done:
# set up interrupts
la t0, trap_handler
csrw mtvec, t0
# enable timer and external interrupts
li t0, (1 << 7) | (1 << 11)
csrw mie, t0
# set up first task context
la t0, tasks_head
lw t0, 0(t0)
lw sp, TASK_SP(t0)
lw a0, TASK_A0(t0)
lw a1, TASK_A1(t0)
lw ra, TASK_RA(t0)
# restore s0, s1
lw s0, 0(sp)
lw s1, 4(sp)
add sp, sp, 8
# Set first timer
li t1, MTIME
lw t0, 0(t1)
li t1, TICKS
add t0, t0, t1
li t1, MTIMECMP
sw t0, 0(t1)
csrr t1, mstatus
li t0, (1 << 3)
or t1, t1, t0
csrw mstatus, t1
jr ra
sche_yield:
ecall
trap_handler:
# we assume:
# * there is always a current task
# * it must be the current stack that was interrupted
csrw mscratch, t0 # save t0 to mscratch to restore later
la t0, current_task
lw t0, 0(t0)
sw ra, TASK_RA(t0)
sw sp, TASK_SP(t0)
sw gp, TASK_GP(t0)
sw tp, TASK_TP(t0)
sw t1, TASK_T1(t0)
sw t2, TASK_T2(t0)
sw s0, TASK_S0(t0)
sw s1, TASK_S1(t0)
sw a0, TASK_A0(t0)
sw a1, TASK_A1(t0)
sw a2, TASK_A2(t0)
sw a3, TASK_A3(t0)
sw a4, TASK_A4(t0)
sw a5, TASK_A5(t0)
sw a6, TASK_A6(t0)
sw a7, TASK_A7(t0)
sw s2, TASK_S2(t0)
sw s3, TASK_S3(t0)
sw s4, TASK_S4(t0)
sw s5, TASK_S5(t0)
sw s6, TASK_S6(t0)
sw s7, TASK_S7(t0)
sw s8, TASK_S8(t0)
sw s9, TASK_S9(t0)
sw s10, TASK_S10(t0)
sw s11, TASK_S11(t0)
sw t3, TASK_T3(t0)
sw t4, TASK_T4(t0)
sw t5, TASK_T5(t0)
sw t6, TASK_T6(t0)
csrr t1, mepc # t1 is already saved
sw t1, TASK_MEPC(t0)
csrr t1, mscratch # restore t0 as it was when we were called
sw t1, TASK_T0(t0)
# decode the interrupt
csrr t1, mcause
li t2, 11 # ecall from M-mode
beq t1, t2, .L_schedule
li t2, 0x80000007 # machine timer interrupt
beq t1, t2, .L_timer
j .L_schedule # unknown interrupt
.L_timer:
li t1, MTIME
lw t0, 0(t1) # read lower 32 bits
li t1, TICKS
add t0, t0, t1
li t1, MTIMECMP
sw t0, 0(t1) # write lower 32 bits
# increment the tick
la t0, current_tick
lw t1, 0(t0)
addi t1, t1, 1
sw t1, 0(t0)
.L_schedule:
# Start with next task after current
la t0, current_task
lw t1, 0(t0) # t1 = current task
mv t3, t1 # t3 = original task (to detect full circle)
lw t2, TASK_NEXT(t1) # t2 = starting point for search
# Check if next is head, skip it if so
la t4, tasks_head
lw t4, 0(t4) # t4 = head task
bne t4, t2, .L_search_ready
lw t2, TASK_NEXT(t2) # skip head
.L_search_ready:
# Check if task is ready
lw t5, TASK_STATUS(t2)
li t6, TASK_READY
beq t5, t6, .L_found_ready
# Move to next task
lw t2, TASK_NEXT(t2)
# If next is head, skip it
beq t2, t4, .L_skip_head
.L_continue_search:
# If we're back to start, no READY tasks found
beq t2, t3, .L_use_head
j .L_search_ready
.L_skip_head:
lw t2, TASK_NEXT(t2) # Skip head, move to next
j .L_continue_search
.L_found_ready:
sw t2, 0(t0) # current_task = ready_task
j .L_context_switch
.L_use_head:
sw t4, 0(t0) # Use head (idle) task
j .L_context_switch
.L_context_switch:
la sp, current_task
lw sp, 0(sp)
lw t0, TASK_MEPC(sp)
csrw mepc, t0
lw ra, TASK_RA(sp)
lw gp, TASK_GP(sp)
lw tp, TASK_TP(sp)
lw t0, TASK_T0(sp)
lw t1, TASK_T1(sp)
lw t2, TASK_T2(sp)
lw s0, TASK_S0(sp)
lw s1, TASK_S1(sp)
lw a0, TASK_A0(sp)
lw a1, TASK_A1(sp)
lw a2, TASK_A2(sp)
lw a3, TASK_A3(sp)
lw a4, TASK_A4(sp)
lw a5, TASK_A5(sp)
lw a6, TASK_A6(sp)
lw a7, TASK_A7(sp)
lw s2, TASK_S2(sp)
lw s3, TASK_S3(sp)
lw s4, TASK_S4(sp)
lw s5, TASK_S5(sp)
lw s6, TASK_S6(sp)
lw s7, TASK_S7(sp)
lw s8, TASK_S8(sp)
lw s9, TASK_S9(sp)
lw s10, TASK_S10(sp)
lw s11, TASK_S11(sp)
lw t3, TASK_T3(sp)
lw t4, TASK_T4(sp)
lw t5, TASK_T5(sp)
lw t6, TASK_T6(sp)
lw sp, TASK_SP(sp)
mret
tasks_head:
.word 0
tasks_tail:
.word 0
current_task:
.word 0
current_tick:
.word 0
free_list_head:
.word 0
I added helper aliases for each register e.g. TASK_A7 is 64, In the trap handler
I also use mscratch
to store t0 instead of using the stack, mscratch
is just
a scratch register, I am using it just to show you it exists, but you would
notice it makes our trap handler more vulnerable to double-traps.
idle_task:
wfi
j idle_task
process_a:
li a0,'a'
jal putc
j process_a
process_b:
li a0,'b'
jal putc
j process_b
sche_config:
.word 0
.word 200
.word idle_task
.word 0
.word 200
.word process_a
.word 0
.word 200
.word process_b
sche_config_end:
setup:
la a0, sche_config
la a1, sche_config_end
jal sche_init
You use the new scheduler like that, prepare some configuration of the tasks you want to run, each task has 3 parameters, 1) value set to a0 when we jump into it the first time, 2) memory size, 3) address to jump to. Task 0 is special, it is executed only when no other task is in READY state.
In our example we dont actually have blocking tasks, but we could implement
sleep, or maybe waiting for getch
interrupt, we want to be woken up only when
the interrupt we are waiting for happens.
I have complicated the scheduler a bit, it is even possible to add more tasks at
runtime by calling sche_create
.
The way it works is we have a doubly linked circular list, that we just cycle through and execute the next task.
.----------------.
| .----------. |
| | | |
| v | v
.--------. .--------.
| TASK 0 | | TASK 1 |
| IDLE | | |
'--------' '--------'
^ | ^ |
| v | v
.--------. .--------.
| TASK 3 | | TASK 2 |
| | | |
'--------' '--------'
^ | ^ |
| | | |
| '----------' |
'----------------'
Each element points to the next and previous one, so TASK2's next
is TASK3,
and its prev
is TASK1, TASK3's next is TASK0 and TASK3's prev is TASK2, and so
on.
We just need to keep two pointers, one at the current HEAD of the list and one at the TAIL, and then we can just insert new elements at the front or at the back.
Writing this in assembly was quite annoying, but the idea is simple, if we want to add at the end. Imagine the list looks like this n1, n2, n3.
tail
|
n1 --> n2 -->n3
^ |
'-----------'
To add at the end
1. Create new node
n = {} # Zero initialized node
2. Connect tail's prev node to new node
tail->prev->next = n # n2->next = n
3. Connect new node to tail
n->next = tail # n->next = n3
4. Update tail pointer
tail = n # tail now points to n
Now the list looks like this:
tail
|
n1 --> n2 -->n ---> n3
^ |
'--------------------'
If we want to insert at the head
head
|
v
n1 --> n2 --> n3
^ |
'-------------'
1. Create new node
n = {} # Zero initialized node
2. Connect new node to current head
n->next = head # n->next = n1
3. Connect new node to tail
n->prev = head->prev # n->prev = n3
4. Connect tail to new node
head->prev->next = n # n3->next = n
5. Connect old head to new node
head->prev = n # n1->prev = n
6. Update head pointer
head = n # head now points to n
head
|
v
n --> n1 --> n2 --> n3
^ |
'--------------------'
You can actually insert a node anywhere in the chain, lets say we want to insert it after n2
1. Connect new node to its neighbors
# connect n to n2
n->next = n2
n->prev = n2->prev
# fix up n2's prev (n1 in our case) ot
# point to the new node n
n2->prev->next = n
n2->prev = n
It is very powerful data structure, the doubly linked list. It always amazes me, we just keep few numbers in memory, two addresses, prev and next, and we can create this magical structures.
We start by creating the tasks at the very start of our memory just after our program is loaded, at _bss_end
, if you remember our linked script
OUTPUT_ARCH( "riscv" )
ENTRY( _start )
MEMORY
{
RAM (rwx) : ORIGIN = 0x80000000, LENGTH = 128M
}
SECTIONS
{
.text :
{
*(.text.init)
*(.text)
} > RAM
.rodata :
{
*(.rodata)
} > RAM
.data :
{
*(.data)
} > RAM
.bss :
{
*(.bss)
. = ALIGN(8);
} > RAM
_bss_end = .;
_stack_top = ORIGIN(RAM) + LENGTH(RAM);
_ram_end = ORIGIN(RAM) + LENGTH(RAM);
_end = .;
}
_bss_end
is immediately after the program's definition and its global
variables, and the stack
is at the end of our RAM.
So when you request to create a task that uses 2000 bytes of memory, we will "allocate" _bss_end + 2000
for this task, and we will set its sp
at the end of this block, in the beginning of the block we will store the PCB, so in the PCB we set sp = _bss_end + 2000
, and then its up to the task how to use the memory, then when we add another task, lets say it neeed 5000 bytes of memory, we will add it after the previous task, so its memory will be from bss_end + task1 size
, to bess_end + task1 size + 5000
, and so on.
The task has to decide how to use the memory, how much of it is for its stack and how much of it is for other purposes, in our case we will use it for interpreter's dictionary.
We dont have a way to "delete" tasks, at the moment, usually what you do is once a slot is free you put it in another linked list called "free list", and later if we want to create a new task we first look if the free has any candidates that are big enough, if not we will add it to the end after the last task. You might have heard the concept of "stack" and "heap", this is what "the heap" is, just a place where you can allocate and free memory, it is implemented in various ways, with free lists, arenas, slabs and so on, with various tradeoffs, how to handle fragmentation (e.g. you want 20 bytes then 60 bytes then 3 bytes then 120 bytes, and you free the 3 bytes, then you have a hole that will be used only if the program requests 3 or less bytes), and how to make sure that things are fast to allocate, you cant be scanning the free list all the time, imagine you have 100000 entries there, how do you quickly find the one that will match the required size?
I will not add delete
, for simplicity, but search the internet for slab
allocation and memory allocation. In C the standard library provides malloc()
and free()
, you call a = malloc(10)
and it will reserve 10 bytes somewhere
in memory for you, once you dont need it anymore you have to free it with
free(a)
. Actually malloc
will reserve a few bytes more where it will store
how many bytes you actually wanted and some other information that it needs
during free
, and it will give you the address just after that. When you
free(a)
it will the memory before it to understand how big it was.
BTW, when you kill
a process, the OS has to delete ist PCB from the active
list, but also has to release resources that the process has aquired, for
example open files or network connections, but kill 42234
will find the PCB of
process 42234 and start the process of deletion, you use fork
to start a new process.
This is an excerpt from man fork
FORK(2) System Calls Manual FORK(2)
NAME
fork – create a new process
SYNOPSIS
#include <unistd.h>
pid_t
fork(void);
DESCRIPTION
fork() causes creation of a new process. The new process (child
process) is an exact copy of the calling process (parent process)
except for the following:
• The child process has a unique process ID.
• The child process has a different parent process ID
(i.e., the process ID of the parent process).
• The child process has its own copy of the parent's
descriptors. These descriptors reference the same
underlying objects, so that, for instance, file
pointers in file objects are shared between the child
and the parent, so that an lseek(2) on a descriptor in
the child process can affect a subsequent read or write
by the parent. This descriptor copying is also used by
the shell to establish standard input and output for
newly created processes as well as to set up pipes.
• The child processes resource utilizations are set to 0;
see setrlimit(2).
RETURN VALUES
Upon successful completion, fork() returns a value of 0 to the
child process and returns the process ID of the child process to
the parent process. Otherwise, a value of -1 is returned to the
parent process, no child process is created, and the global
variable errno is set to indicate the error.
So you start a new process and from inside you know if you are the "parent" or
the "child". A lot more goes on in a real operating system and the way it
manages memory and how it gives it to the running processees. Search for
virtual memory
, computer paging
, translation lookaside buffer
and copy on write
.
The files linker.ld, jit.s, boot.s and string.s remain the same.
The only change in the Makefile is to set the new registers "mnemonics" we use, s8 for :TOK_POS and etc..
# Compile .s files to object files
$(OBJ_DIR)/%.o: $(SRC_DIR)/%.s
@sed -e 's/:IP/s0/g' \
-e 's/:SP/s1/g' \
-e 's/:RSP/s2/g' \
-e 's/:CSP/s3/g' \
-e 's/:HERE/s4/g' \
-e 's/:XT/s5/g' \
-e 's/:LATEST/s6/g' \
-e 's/:MODE/s7/g' \
-e 's/:TOK_POS/s8/g' \
-e 's/:ESP/s9/g' \
-e 's/:TOK_LEN/s10/g' \
-e 's/:END_SP/s11/g' $< > $@.pre.s
$(AS) $(ASFLAGS) $@.pre.s -o $@
I moved the jit code in jit.s
# file: jit.s
.section .text
.globl do_addi
.globl do_add
.globl do_sw
.globl do_jr
.globl do_li
# addi ( a0: rd, a1: rs1, a2: imm -- a0: opcode_addi )
do_addi:
# ADDI instruction format:
# bits [31:20] = immediate
# bits [19:15] = rs1 (source register)
# bits [14:12] = 0x0 (funct3)
# bits [11:7] = rd (destination register)
# bits [6:0] = 0x13 (opcode)
li t0, 0x13 # ADDI opcode
slli t1, a0, 7 # Shift rd to position [11:7]
or t0, t0, t1
slli t1, a1, 15 # Shift rs1 to position [19:15]
or t0, t0, t1
li t1, 0xfff
and t2, a2, t1 # Mask to 12 bits
slli t2, t2, 20 # Shift immediate to position [31:20]
or t0, t0, t2
mv a0, t0
ret
# add ( a0: rd, a1: rs1, a2: rs2 -- a0: opcode_add )
do_add:
# ADD instruction format:
# bits [31:25] = 0x00 (funct7)
# bits [24:20] = rs2 (second source register)
# bits [19:15] = rs1 (first source register)
# bits [14:12] = 0x0 (funct3)
# bits [11:7] = rd (destination register)
# bits [6:0] = 0x33 (opcode for R-type)
li t0, 0x33 # R-type opcode
slli t1, a0, 7 # Shift rd to position [11:7]
or t0, t0, t1
slli t1, a1, 15 # Shift rs1 to position [19:15]
or t0, t0, t1
slli t1, a2, 20 # Shift rs2 to position [24:20]
or t0, t0, t1
mv a0, t0
ret
# sw ( a0: rs1, a1: rs2 source -- a0: opcode_sw )
do_sw:
# bits [31:25] = 0 (imm[11:5] = 0)
# bits [24:20] = rs2 (source register to store)
# bits [19:15] = rs1 (base address register)
# bits [14:12] = 0x2 (funct3 for SW)
# bits [11:7] = 0 (imm[4:0] = 0)
# bits [6:0] = 0x23 (opcode for store)
li a4, 0x23 # opcode
li t0, 0x2000 # 2 << 12
or a4, a4, t0
slli t0, a0, 15
or a4, a4, t0
slli t0, a1, 20
or a4, a4, t0
mv a0, a4
ret
# JR ( a0: reg -- a0: opcode_jr )
do_jr:
mv t0, a0
# bits [31:20] = 0 for imm=0
# bits [19:15] = reg
# bits [14:12] = 0 (funct3=0)
# bits [11:7] = x0 => 0
# bits [6:0] = 0x67 (opcode for JALR)
#
# So the entire instruction is:
# (reg << 15) | 0x67
slli t1, t0, 15 # reg << 15
li t2, 0x67 # opcode JALR
or t1, t1, t2 # final 32-bit instruction
mv a0, t1
ret
# Li ( a0: reg, a1: imm -- a0: opcode_lui a1: opcode_addi )
do_li:
# Extract upper immediate
# compensating for sign extension if needed
srli t0, a1, 12 # First get upper 20 bits
li t3, 0x800
and t1, a1, t3 # Check bit 11
beqz t1, no_adjust
addi t0, t0, 1 # Adjust for sign extension
no_adjust:
# LUI
#
# bits [31:12] = immediate
# bits [11:7] = rd
# bits [6:0] = 0x37 (opcode)
#
li a2, 0x37 # LUI opcode
slli t2, t0, 12 # upper immediate
or a2, a2, t2
slli t2, a0, 7 # rd
or a2, a2, t2
# ADDI
#
# bits [31:20] = immediate
# bits [19:15] = rs1
# bits [14:12] = 0 (funct3)
# bits [11:7] = rd
# bits [6:0] = 0x13 (opcode)
#
li a3, 0x13 # ADDI opcode
li t1, 0xfff
and t0, a1, t1 # lower 12 bits
slli t2, t0, 20 # immediate
or a3, a3, t2
slli t2, a0, 15 # rs1
or a3, a3, t2
slli t2, a0, 7 # rd
or a3, a3, t2
mv a0, a2
mv a1, a3
ret
.end
I changed qemu.s to use wfi instead of busy looping
# file: qemu.s
.section .text
.globl spi_init
.globl spi_write_byte
.globl spi_read_byte
.globl putc
.globl getch
.globl qemu_exit
putc:
li t0, 0x10000000
1:
lbu t1, 5(t0)
andi t1, t1, 0x20
beqz t1, 1b
sb a0, 0(t0)
ret
getch:
li t0, 0x10000000
1:
lbu t1, 5(t0)
andi t1, t1, 0x01
bnez t1, 2f
wfi
j 1b
2:
lbu a0, 0(t0)
ret
qemu_exit:
li t0, 0x100000
li t1, 0x5555
sw t1, 0(t0)
j .
.end
Now getch will just wait for interrupt, ideally we will actually "yield" the process, we will set the current task to "BLOCKED" and flag it that it is waiting for UART interrupt, then we will switch to another task, when the interrupt comes we will "wakeup" the task by setting it to READY. The way its implemented now we will just block until the timer interrupt triggers and then we will context switch.
The way you cycle through task is quite important, and is a topic of active
research since timesharing was invented. Search for fair scheduling
.
# file: forth.s
.section .text
.globl forth
.globl NEXT
.macro PUSH reg
addi :SP, :SP, -4
sw \reg, 0(:SP)
.endm
.macro POP reg
lw \reg, 0(:SP)
addi :SP, :SP, 4
.endm
.macro RPUSH reg
addi :RSP, :RSP, -4
sw \reg, 0(:RSP)
.endm
.macro RPOP reg
lw \reg, 0(:RSP)
addi :RSP, :RSP, 4
.endm
.macro EPUSH reg
addi :ESP, :ESP, -4
sw \reg, 0(:ESP)
.endm
.macro EPOP reg
lw \reg, 0(:ESP)
addi :ESP, :ESP, 4
.endm
.macro CFPUSH reg
addi :CSP, :CSP, -4
sw \reg, 0(:CSP)
.endm
.macro CFPOP reg
lw \reg, 0(:CSP)
addi :CSP, :CSP, 4
.endm
.equ INTERPRETER_SIZE, 20000
idle_task:
wfi
j idle_task
sche_config:
.word 0
.word 200
.word idle_task
.word human_program
.word INTERPRETER_SIZE
.word interpreter_setup
.word human_program_small
.word INTERPRETER_SIZE
.word interpreter_setup
sche_config_end:
forth:
la a0, sche_config
la a1, sche_config_end
jal sche_init
# never reached, sche_init jumps to the first task
call qemu_exit
interpreter_setup:
# .---------
# | Task Control Block
# | ra
# | sp
# | ..
# | ____________ <- :HERE
# | |
# | v
# |
# |
# | ^
# | |
# | exit stack
# | ____________ <- :ESP (:RSP - 512)
# |
# | ^
# | |
# | return stack
# | ____________ <- :RSP (:CSP - 512)
# |
# | ^
# | |
# | control stack
# | ____________ <- :CSP (:SP - 1024)
# |
# | ^
# | |
# | data stack
# | ____________ <- :SP ( sp - 512)
# |
# | ^
# | |
# | system stack (global variables)
# | NEXT_GO_BACK <- sp
# '---------------
addi :SP, sp, -512
addi :CSP, :SP, -1024
addi :RSP, :CSP, -512
addi :ESP, :RSP, -512
mv :END_SP, :SP
sub :HERE, sp, a1
add :HERE, :HERE, 160 # :HERE = sp - a1 + TCB
# reset latest at the end of system dictionary
la :LATEST, dictionary_end - 5*4
# start in interpreter mode
mv :MODE, zero
# start with asciz at a0
mv :TOK_POS, a0
li :TOK_LEN, 0
jal interpreter_eval
j qemu_exit
.equ INTE_NEXT_GO_BACK, 4
interpreter_eval:
add sp, sp, -8
sw ra, 0(sp)
# write address for NEXT to jump back to
la t0, .L_inte_next_token
sw t0, INTE_NEXT_GO_BACK(sp)
.L_inte_next_token:
add :IP, sp, INTE_NEXT_GO_BACK # return address for next
mv :XT, :IP
# move the token pointer
add :TOK_POS, :TOK_POS, :TOK_LEN
mv a0, :TOK_POS
jal token
beqz a1, interpreter_done # no token
mv :TOK_POS, a0
mv :TOK_LEN, a1
# check if its numner
jal is_number
beqz a0, .L_inte_not_a_number
mv a0, :TOK_POS
mv a1, :TOK_LEN
jal atoi
beqz :MODE, .L_inte_push_number
# compile number
la t0, LIT
sw t0, 0(:HERE)
sw a0, 4(:HERE)
addi :HERE, :HERE, 8
j .L_inte_next_token
.L_inte_push_number:
# push number
PUSH a0
j .L_inte_next_token
.L_inte_not_a_number:
mv a0, :TOK_POS
mv a1, :TOK_LEN
jal do_find
beqz a0, .L_inte_word_not_found
beqz :MODE, .L_inte_execute
lw t0, -4(a0) # flag
bnez t0, .L_inte_execute # immediate flag
# compile
lw t0, 0(a0)
sw t0, 0(:HERE)
addi :HERE, :HERE, 4
j .L_inte_next_token
.L_inte_execute:
lw t0, 0(a0)
jr t0
# we wait for NEXT to take us back to L_inte_next_token
interpreter_done:
lw ra, 0(sp)
add sp, sp, 8
ret
.L_inte_word_not_found:
li a0, '\n'
call putc
mv a0, :TOK_POS
mv a1, :TOK_LEN
call puts_len
la a0, err_word_not_found
call puts
# if we are in compile mode, we need to close the word with semi
# or we will corrupt the dictionary
beqz :MODE, interpreter_done
la t0, EXIT
sw t0, 0(:HERE)
addi :HERE, :HERE, 4
j interpreter_done
NEXT:
lw t0, 0(:IP) # load the actual code address from [IP]
addi :IP, :IP, 4 # move IP to next cell
jr t0 # jump
# ( a b -- c )
PLUS:
POP t0
POP t1
add t0, t0, t1
PUSH t0
j NEXT
# ( a b -- c )
MINUS:
POP t0
POP t1
sub t0, t1, t0
PUSH t0
j NEXT
# ( a b -- c )
MUL:
POP t0
POP t1
mul t0, t0, t1
PUSH t0
j NEXT
# ( -- n )
LIT:
lw t0, 0(:IP)
addi :IP, :IP, 4
lw t1, 0(:IP)
PUSH t0
j NEXT
# ( n -- )
EMIT:
POP a0
jal print_int
j NEXT
# ( n -- )
EMIT_CHAR:
POP a0
jal putc
j NEXT
# ( value addr -- )
BANG:
POP t0 # address
POP t1 # value
sw t1, 0(t0)
j NEXT
# ( -- )
BYE:
j interpreter_done
# ( -- )
CR:
li a0, '\n'
jal putc
j NEXT
# ( len addr -- n )
ATOI:
POP a0 # address
POP a1 # length
jal atoi
PUSH a0
j NEXT
# ( len addr -- f )
IS_NUMBER:
POP a0 # address
POP a1 # length
jal is_number
PUSH a0
j NEXT
# ( a -- a a )
DUP:
POP t0
PUSH t0
PUSH t0
j NEXT
# ( a b -- b a )
SWAP:
POP t0 # b
POP t1 # a
PUSH t0
PUSH t1
j NEXT
# ( a -- )
DROP:
POP zero
j NEXT
# ( a b -- )
TWODROP:
POP zero
POP zero
j NEXT
# ( a b -- a b a b )
TWODUP:
POP t0 # b
POP t1 # a
PUSH t1 # a
PUSH t0 # b
PUSH t1 # a
PUSH t0 # b
j NEXT
# ( n1 n2 -- n1 n2 n1 )
OVER:
POP t0 # n2
POP t1 # n1
PUSH t1 # n1
PUSH t0 # n2
PUSH t1 # n1
j NEXT
# (x1 x2 x3 x4 -- x3 x4 x1 x2)
TWOSWAP:
POP t0 # x4
POP t1 # x3
POP t2 # x2
POP t3 # x1
PUSH t1
PUSH t0
PUSH t3
PUSH t2
j NEXT
# (x1 x2 x3 -- x2 x3 x1 )
ROT:
POP t0 # x3
POP t1 # x2
POP t2 # x1
PUSH t1 # x2
PUSH t0 # x3
PUSH t2 # x1
j NEXT
# (x1 x2 x3 -- x3 x1 x2)
NROT:
POP t0 # x3
POP t1 # x2
POP t2 # x1
PUSH t0 # x3
PUSH t2 # x1
PUSH t1 # x2
j NEXT
# ( a b -- f)
EQUAL:
POP t0
POP t1
beq t0, t1, .L_equal
li t0, 0
PUSH t0
j NEXT
.L_equal:
li t0, -1
PUSH t0
j NEXT
# ( len1 addr1 len2 addr2 -- flag)
MEMCMP:
POP a2
POP a3
POP a0
POP a1
call memcmp
PUSH a0
j NEXT
# ( f -- )
BRANCH_ON_ZERO:
POP t0
beqz t0, .L_do_branch
addi :IP, :IP, 4
j NEXT
.L_do_branch:
lw :IP, 0(:IP)
j NEXT
# ( -- )
JUMP:
lw :IP, 0(:IP)
j NEXT
# just a debug function to print the whole stack
# print debugging.. some people hate it some people love it
# i both hate it and love it
DEBUG_STACK:
addi sp, sp, -12
sw ra, 0(sp)
sw s8, 4(sp)
li a0, '<'
call putc
li a0, '>'
call putc
li a0, ' '
call putc
mv s8, :END_SP
add s8, s8, -4
.L_debug_stack_loop:
blt s8, :SP, .L_debug_stack_loop_end
lw a0, 0(s8)
call print_unsigned_hex
li a0, ' '
call putc
addi s8, s8, -4
j .L_debug_stack_loop
.L_debug_stack_loop_end:
li a0, '\n'
call putc
lw ra, 0(sp)
lw s8, 4(sp)
addi sp, sp, 12
j NEXT
do_parse_token:
addi sp, sp, -4
sw ra, 0(sp)
mv a0, :TOK_POS
mv a1, :TOK_LEN
jal token # parse the token
mv :TOK_POS, a0
mv :TOK_LEN, a1
lw ra, 0(sp)
addi sp, sp, 4
# return a0 a1 from token
ret
# Input:
# a0: token address
# a1: token length
# Output:
# a0: execution token address (or 0 if not found)
do_find:
li t1, 0
mv t3, a1
# The shananigans here are so we can build little endian version of the token
# in 4 bytes dont be intimidated by them, I just made the tokens in the
# dictionary as "bye\0" instead of "\0eyb" to be easier to read
beqz t3, .L_not_found # zero length token
lbu t1, 0(a0)
addi t3, t3, -1
beqz t3, .L_find_start
lbu t2, 1(a0)
sll t2,t2, 8
or t1, t1, t2
addi t3, t3, -1
beqz t3, .L_find_start
lbu t2, 2(a0)
sll t2, t2, 16
or t1, t1, t2
addi t3, t3, -1
beqz t3, .L_find_start
lbu t2, 3(a0)
sll t2, t2, 24
or t1, t1, t2
# t1: has the input token as 4 byte number
# a1: is the length of the input token
# t0: pointer to the entry, we will start at the end
.L_find_start:
mv t0, :LATEST
.L_find_loop:
beqz t0, .L_not_found # if the entry is 0, means we didnt find a match
lw t2, 4(t0) # load the length of the entry
bne t2, a1, .L_next_entry # compare lengths
lw t2, 8(t0) # load entry name
bne t2, t1, .L_next_entry # compare names
add a0, t0, 16 # return the code address
ret
.L_next_entry:
lw t0, 0(t0) # follow link to next entry
j .L_find_loop
.L_not_found:
li a0, 0 # return 0 for not found
ret
DOCOL:
EPUSH :IP
mv :IP, :XT
j NEXT
EXIT:
EPOP :IP
j NEXT
COLON:
li :MODE, -1 # enter compile mode
jal do_create
# we want to achieve this, creating a new word
#
# : square dup * ;
#
# ...
# DOCOL:
# 80000534: RPUSH :IP <-----------------.
# 80000538: |
# 8000053c: mv :IP, :XT |
# 80000540: j NEXT |
# ... |
# 80000148 <DUP>: |
# 80000148: lw t0, 0(:SP) |
# 8000014c: PUSH t0 |
# ... |
# 80000880: w_square: |
# 80000880: 80000..# link |
# 80000884: 6 # size |
# 80000888: "squa" # token |
# 8000088c: 0 # flags |
# 80000890: 80000894 # CODE FIELD >--------|---.
# 80000894: lui :XT, 0x80001 >---. | <-'
# 80000898: addi :XT, :XT, 0x8a8 >--. |
# 8000089c: lui t0, 0x80000 >---. | |
# 800008a0: addi t0, t0, 0x534 >----|------'
# 800008a4: jr t0 |
# 800008a8: 80000148 # DUP <--------'
# 800008ac: 80000... # MUL
# 800008b0: 80000... # EXIT
# ...
# 1. EXECUTION CODE FIELD point to HERE + 4, where we will
# put the machine code: memory[HERE] = HERE+4
add t0, :HERE, 4
sw t0, 0(:HERE)
addi :HERE, :HERE, 4
# 2. Generate absolute address for where we want DOCOL to jump, in our case we want HERE+20
mv t0, :HERE
addi t0, t0, 20
# 3. Generate the machine code
# li :XT, value of :HERE + 20
# la t0, DOCOL
# jr t0
# and expanded
# lui :XT, value << 12
# addi :XT, :XT, value << 20 >> 20
# lui t0, value << 12
# addi t0, t0, value << 20 >> 20
# jr t0
# 3.1 Generate machine code for XT = HERE + 20 at time of compilation
li a0, 21 # XT is s5, which is register x21
mv a1, t0
jal do_li
sw a0, 0(:HERE) # lui
addi :HERE, :HERE, 4
sw a1, 0(:HERE) # addi
addi :HERE, :HERE, 4
# 3.1 Generate machine code for la t0, DOCOL
li a0, 5 # t0 is x5
la a1, DOCOL
jal do_li
sw a0, 0(:HERE) # lui
addi :HERE, :HERE, 4
sw a1, 0(:HERE) # addi
addi :HERE, :HERE, 4
# 3.2 Generate machine code for jr t0
li a0, 5 # t0 is x5
jal do_jr
sw a0, 0(:HERE) # jr
addi :HERE, :HERE, 4
j NEXT
CREATE:
jal do_create
# point the execution token to the machine code
addi t0, :HERE, 4
sw t0, 0(:HERE)
addi :HERE, :HERE, 4
# create foo
#
# 80000880: w_foo:
# 80000880: 80000..# link
# 80000884: 3 # size
# 80000888: "foo\0"# token
# 8000088c: 0 # flags
# 80000890: 80000894 # CODE FIELD >--------.
# 80000894: lui t0, HIGH(HERE+24) <-------' >-.
# 80000898: addi t0, t0, LOW(HERE+24) >-----------.
# 8000089c: addi SP, SP, -4 |
# 800008a0: sw t0, 0(SP) |
# 800008a4: lui t0, HIGH(NEXT) |
# 800008a8: addi t0, t0, LOW(NEXT) |
# 800008ac: jr t0 |
# 800008b0: <data field...> <------------------'
# li t0, :HERE
# addi :SP, :SP, -4
# sw t0, 0(SP)
# la t0, NEXT
# jr t0
addi t1, :HERE, 28 # HERE + 28
# li t0, value of :HERE + 28
li a0, 5 # t0 is x5
mv a1, t1 # HERE + 28
jal do_li
sw a0, 0(:HERE) # lui
addi :HERE, :HERE, 4
sw a1, 0(:HERE) # addi
addi :HERE, :HERE, 4
# addi :SP, :SP, -4
li a0, 9 # :SP is s1, x9
li a1, 9 # :SP is s1, x9
li a2, -4
call do_addi
sw a0, 0(:HERE)
addi :HERE, :HERE, 4
# sw t0, 0(:SP)
li a0, 9 # :SP is s1, x9
li a1, 5 # t0 is x5
call do_sw
sw a0, 0(:HERE)
addi :HERE, :HERE, 4
# la t0, NEXT
li a0, 5 # t0 is x5
la a1, NEXT
jal do_li
sw a0, 0(:HERE) # lui
addi :HERE, :HERE, 4
sw a1, 0(:HERE) # addi
addi :HERE, :HERE, 4
# jr t0
li a0, 5 # t0 is x5
jal do_jr
sw a0, 0(:HERE) # jr
addi :HERE, :HERE, 4
j NEXT
# ( -- )
SEMICOLON:
mv :MODE, zero # exit compile mode
la t0, EXIT
sw t0, 0(:HERE)
addi :HERE, :HERE, 4
j NEXT
# ( x -- )
COMMA:
POP t0
sw t0, 0(:HERE)
addi :HERE, :HERE, 4
j NEXT
# ( -- flag )
MODE:
PUSH :MODE
j NEXT
do_create:
addi sp, sp, -4
sw ra, 0(sp)
add :TOK_POS, :TOK_POS, :TOK_LEN
jal do_parse_token
beqz a1, .L_create_error
# align to closest multiple of 4
addi t0, :HERE, 3
li t1, -4
and :HERE, t0, t1
# link field (4 bytes)
sw :LATEST, 0(:HERE)
# length field (4 bytes)
sw a1, 4(:HERE)
# token field (4 bytes)
li t1, 0
mv t3, a1 # Initialize t3 with token length
.L_create_build_token:
lbu t1, 0(a0)
addi t3, t3, -1
beqz t3, .L_create_write_token
lbu t2, 1(a0)
sll t2, t2, 8
or t1, t1, t2
addi t3, t3, -1
beqz t3, .L_create_write_token
lbu t2, 2(a0)
sll t2, t2, 16
or t1, t1, t2
addi t3, t3, -1
beqz t3, .L_create_write_token
lbu t2, 3(a0)
sll t2, t2, 24
or t1, t1, t2
.L_create_write_token:
sw t1, 8(:HERE)
# flags field
sw zero, 12(:HERE)
# move the dictionary end
mv :LATEST, :HERE
# update HERE to point to the end of the word
addi :HERE, :HERE, 16
lw ra, 0(sp)
addi sp, sp, 4
ret
.L_create_error:
la a0, err_create_error
j panic
panic:
jal puts
jal getch
j interpreter_done
# ( -- )
IMMEDIATE:
li t1, 1
sw t1, 12(:LATEST) # flag value
j NEXT
# ( addr -- value )
AT:
POP t0
lw t0, 0(t0)
PUSH t0
j NEXT
# ( addr -- value )
C_AT:
POP t0
lbu t0, 0(t0)
PUSH t0
j NEXT
# ( value addr -- )
C_BANG:
POP t0 # address
POP t1 # value
sb t1, 0(t0)
j NEXT
# ( -- c )
KEY:
jal getch
PUSH a0
j NEXT
# ( -- addr )
PUSH_HERE:
PUSH :HERE
j NEXT
# ( reg imm -- lui addi )
LI:
POP a1 # imm
POP a0 # reg
call do_li
PUSH a0 # lui
PUSH a1 # addi
j NEXT
# JR ( reg -- opcode_jr )
JR:
POP a0
call do_jr
PUSH a0
j NEXT
# ( x -- ) (R: -- x)
TO_R:
POP t0
RPUSH t0
j NEXT
# ( -- x ) (R: x -- )
FROM_R:
RPOP t0
PUSH t0
j NEXT
# ( -- x ) (R: x -- x)
R_FETCH:
lw t0, 0(:RSP)
PUSH t0
j NEXT
# ( x -- ) (CF: -- x)
TO_CF:
POP t0
CFPUSH t0
j NEXT
# ( -- x ) (CF: x -- )
FROM_CF:
CFPOP t0
PUSH t0
j NEXT
# ( -- x ) (CF: x -- x)
CF_FETCH:
lw t0, 0(:CSP)
PUSH t0
j NEXT
POSTPONE:
add :TOK_POS, :TOK_POS, :TOK_LEN
jal do_parse_token
jal do_find
beqz a0, .L_word_not_found
la t1, LIT
sw t1, 0(:HERE)
addi :HERE, :HERE, 4
lw a0, 0(a0) # dereference
sw a0, 0(:HERE)
addi :HERE, :HERE, 4
la t1, COMMA
sw t1, 0(:HERE)
addi :HERE, :HERE, 4
j NEXT
.L_word_not_found:
la a0, err_word_not_found
j panic
# ( n -- )
LITERAL:
POP t0
la t1, LIT
sw t1, 0(:HERE)
addi :HERE, :HERE, 4
sw t0, 0(:HERE)
addi :HERE, :HERE, 4
j NEXT
# stop compilation
# ( -- )
LEFT_BRACKET:
mv :MODE, zero
j NEXT
# start compilation
# ( -- )
RIGHT_BRACKET:
li t1, 1
mv :MODE, t1
j NEXT
# ( a b -- f)
GREATHER:
POP t0
POP t1
bgt t0, t1, .L_greather
li t0, 0
PUSH t0
j NEXT
.L_greather:
li t0, -1
PUSH t0
j NEXT
# ( len addr -- )
DUMP:
POP s9 # addr
POP s10 # len
1:
beqz s10, NEXT
mv a0, s9
jal print_unsigned_hex
li a0, ':'
jal putc
li a0, ' '
jal putc
lw a0, 0(s9)
jal print_unsigned_hex
li a0, ' '
jal putc
lb a0, 0(s9)
jal putc_ascii
lb a0, 1(s9)
jal putc_ascii
lb a0, 2(s9)
jal putc_ascii
lb a0, 3(s9)
jal putc_ascii
li a0, '\n'
jal putc
addi s9, s9, 4
addi s10, s10, -1
j 1b
putc_ascii:
addi sp, sp, -4
sw ra, 0(sp)
li t0, 32
blt a0, t0, .L_putc_dot
li t0, 127
bge a0, t0, .L_putc_dot
j .L_putc_char
.L_putc_dot:
li a0, '.'
.L_putc_char:
call putc
lw ra, 0(sp)
addi sp, sp, 4
ret
# ( limit index -- R: limit index)
DO_DO:
POP t0 # t0 = index
POP t1 # t1 = limit
RPUSH t1 # limit
RPUSH t0 # index
j NEXT
# ( R: limit index -- R: limit index ) ; (L: -- ) when done
DO_LOOP:
RPOP t0 # pop index
RPOP t1 # pop limit
addi t0, t0, 1
blt t0, t1, .L_do_loop_jump # if limit < index
# skip over the jump address
addi :IP, :IP, 4
j NEXT
.L_do_loop_jump:
# push them back on Rstack if still looping
RPUSH t1 # push limit
RPUSH t0 # push index
# read the jump address from IP (the next cell in the thread)
lw :IP, 0(:IP)
j NEXT
# ( -- ) (R: limit index -- ) (L: addr -- )
UNLOOP:
RPOP zero # index
RPOP zero # limit
j NEXT
LOOP_I:
lw t0, 0(:RSP)
PUSH t0
j NEXT
LOOP_J:
lw t0, 8(:RSP)
PUSH t0
j NEXT
LOOP_K:
lw t0, 16(:RSP)
PUSH t0
j NEXT
ALLOT:
POP t0
mv a0, t0
add :HERE, :HERE, t0
j NEXT
# ( x1 x2 -- flag )
AND:
POP t0
POP t1
# Check if either value is zero
beqz t0, .L_false
beqz t1, .L_false
# Both non-zero, return TRUE (-1)
li t0, -1
PUSH t0
j NEXT
.L_false:
# At least one zero, return FALSE (0)
mv t0, zero
PUSH t0
j NEXT
WFI:
wfi
j NEXT
# ( addr -- )
EVAL:
add sp, sp, -36
sw ra, 0(sp)
sw :TOK_POS, 4(sp)
sw :TOK_LEN, 8(sp)
sw :IP, 12(sp)
sw :XT, 16(sp)
sw :MODE, 20(sp)
sw :CSP, 24(sp)
sw :RSP, 28(sp)
sw :ESP, 32(sp)
POP :TOK_POS
mv :TOK_LEN, zero
jal interpreter_eval
lw ra, 0(sp)
lw :TOK_POS, 4(sp)
lw :TOK_LEN, 8(sp)
lw :IP, 12(sp)
lw :XT, 16(sp)
lw :MODE, 20(sp)
lw :CSP, 24(sp)
lw :RSP, 28(sp)
lw :ESP, 32(sp)
add sp, sp, 36
j NEXT
BUMP:
la t0, bump
lw t1, 0(t0)
addi t1, t1, 1
PUSH t1
sw t1, 0(t0)
j NEXT
dictionary:
word_bye:
.word 0 # link
.word 3 # token length
.ascii "bye\0" # first 4 characters of token
.word 0 # flags
.word BYE # address of execution token
word_plus:
.word word_bye
.word 1
.ascii "+\0\0\0"
.word 0
.word PLUS
word_minus:
.word word_plus
.word 1
.ascii "-\0\0\0"
.word 0
.word MINUS
word_mul:
.word word_minus
.word 1
.ascii "*\0\0\0"
.word 0
.word MUL
word_bang:
.word word_mul
.word 1
.ascii "!\0\0\0"
.word 0
.word BANG
word_at:
.word word_bang
.word 1
.ascii "@\0\0\0"
.word 0
.word AT
word_dup:
.word word_at
.word 3
.ascii "dup\0"
.word 0
.word DUP
word_emit:
.word word_dup
.word 1
.ascii ".\0\0\0"
.word 0
.word EMIT
word_cr:
.word word_emit
.word 2
.ascii "cr\0\0"
.word 0
.word CR
word_debug_stack:
.word word_cr
.word 2
.ascii ".s\0\0"
.word 0
.word DEBUG_STACK
word_debug_stack_compile:
.word word_debug_stack
.word 4
.ascii "[.s]"
.word 1
.word DEBUG_STACK
word_colon:
.word word_debug_stack_compile
.word 1
.ascii ":\0\0\0"
.word 0
.word COLON
word_semicolon:
.word word_colon
.word 1
.ascii ";\0\0\0"
.word 1 # immediate
.word SEMICOLON
word_li:
.word word_semicolon
.word 2
.ascii "li\0\0"
.word 0
.word LI
word_jr:
.word word_li
.word 2
.ascii "jr\0\0"
.word 0
.word JR
word_key:
.word word_jr
.word 3
.ascii "key\0"
.word 0
.word KEY
word_here:
.word word_key
.word 4
.ascii "here"
.word 0
.word PUSH_HERE
word_comma:
.word word_here
.word 1
.ascii ",\0\0\0"
.word 0
.word COMMA
word_create:
.word word_comma
.word 6
.ascii "crea"
.word 0
.word CREATE
word_branch0:
.word word_create
.word 7
.ascii "0bra"
.word 0
.word BRANCH_ON_ZERO
word_jump:
.word word_branch0
.word 4
.ascii "jump"
.word 0
.word JUMP
word_literal:
.word word_jump
.word 7
.ascii "lite"
.word 1 # immediate
.word LITERAL
word_to_r:
.word word_literal
.word 2
.ascii ">r\0\0"
.word 0
.word TO_R
word_from_r:
.word word_to_r
.word 2
.ascii "r>\0\0"
.word 0
.word FROM_R
word_r_fetch:
.word word_from_r
.word 2
.ascii "r@\0\0"
.word 0
.word R_FETCH
word_to_cf:
.word word_r_fetch
.word 3
.ascii ">cf\0"
.word 0
.word TO_CF
word_from_cf:
.word word_to_cf
.word 3
.ascii "cf>\0"
.word 0
.word FROM_CF
word_cf_fetch:
.word word_from_cf
.word 3
.ascii "cf@\0"
.word 0
.word CF_FETCH
word_immediate:
.word word_cf_fetch
.word 9
.ascii "imme"
.word 0
.word IMMEDIATE
word_postpone:
.word word_immediate
.word 8
.ascii "post"
.word 1 # immediate
.word POSTPONE
word_drop:
.word word_postpone
.word 4
.ascii "drop"
.word 0
.word DROP
word_dump:
.word word_drop
.word 4
.ascii "dump"
.word 0
.word DUMP
word_left_bracket:
.word word_dump
.word 1
.ascii "[\0\0\0"
.word 1
.word LEFT_BRACKET
word_right_bracket:
.word word_left_bracket
.word 1
.ascii "]\0\0\0"
.word 1
.word RIGHT_BRACKET
word_swap:
.word word_right_bracket
.word 4
.ascii "swap"
.word 0
.word SWAP
word_equal:
.word word_swap
.word 1
.ascii "=\0\0\0"
.word 0
.word EQUAL
word_greather:
.word word_equal
.word 1
.ascii ">\0\0\0"
.word 0
.word GREATHER
word_do_do:
.word word_greather
.word 4
.ascii "(do)"
.word 0
.word DO_DO
word_do_loop:
.word word_do_do
.word 6
.ascii "(loo"
.word 0
.word DO_LOOP
word_unloop:
.word word_do_loop
.word 6
.ascii "unlo"
.word 0
.word UNLOOP
word_i:
.word word_unloop
.word 1
.ascii "i\0\0\0"
.word 0
.word LOOP_I
word_j:
.word word_i
.word 1
.ascii "j\0\0\0"
.word 0
.word LOOP_J
word_k:
.word word_j
.word 1
.ascii "k\0\0\0"
.word 0
.word LOOP_K
word_allot:
.word word_k
.word 5
.ascii "allo"
.word 0
.word ALLOT
word_exit:
.word word_allot
.word 4
.ascii "exit"
.word 0
.word EXIT
word_c_bang:
.word word_exit
.word 2
.ascii "c!\0\0"
.word 0
.word C_BANG
word_c_at:
.word word_c_bang
.word 2
.ascii "c@\0\0"
.word 0
.word C_AT
word_emit_char:
.word word_c_at
.word 4
.ascii "emit"
.word 0
.word EMIT_CHAR
word_rot:
.word word_emit_char
.word 3
.ascii "rot\0"
.word 0
.word ROT
word_over:
.word word_rot
.word 4
.ascii "over"
.word 0
.word OVER
word_and:
.word word_over
.word 3
.ascii "and\0"
.word 0
.word AND
word_qexit:
.word word_and
.word 5
.ascii "qexi"
.word 0
.word qemu_exit
word_wfi:
.word word_qexit
.word 3
.ascii "wfi\0"
.word 0
.word WFI
word_eval:
.word word_wfi
.word 4
.ascii "eval"
.word 0
.word EVAL
word_bump:
.word word_eval
.word 4
.ascii "bump"
.word 0
.word BUMP
dictionary_end:
bump:
.word 0
err_create_error:
.asciz "\nerror: create missing name, usage: create [name]\n"
err_word_not_found:
.asciz ": word not found\n"
.align 2
# our actual human readable program
human_program_small:
.asciz "
: begin
here
>cf
; immediate
: again
postpone jump
cf>
,
; immediate
: if
postpone 0branch
here
0
,
>cf
; immediate
: then
here
cf>
!
; immediate
: nothing
begin
1 +
dup 1000000 = if
97 emit
drop
1
then
again
;
1 nothing
"
human_program:
.asciz "
: begin
here
>cf
; immediate
: again
postpone jump
cf>
,
; immediate
: until
postpone 0branch
cf>
,
; immediate
: if
postpone 0branch
here
0
,
>cf
; immediate
: then
here
cf>
!
; immediate
: else
postpone jump
here
0
,
here
cf>
!
>cf
; immediate
: loop
postpone (loop)
cf>
,
; immediate
: do
postpone (do)
here
>cf
; immediate
create buffer 200 allot
: buffer[] buffer + ;
: zero 200 0 do 0 i buffer[] c! loop ;
: print 200 0 do i buffer[] dup 0 = if exit then c@ emit loop ;
: input
bump .
62 emit
32 emit
200 0 do
key dup 13 = if
drop
-1
unloop exit
else
dup 27 = if
drop
0
unloop exit
then
dup emit
i buffer[] c!
then
loop
;
: repl
begin
zero
input
0 = if drop unloop exit then
cr
buffer
eval
cr
again
;
repl
qexit
"
.end
We have two interpreters, one loads the human_program_small
program, which
just prints the letter a
from time to time, the other loads human_program
which is our repl
where you can write forth code and execute it.
One thing to note is that you see there is zero restriction or protection for one task to completely corrupt the other task's memory. In modern hardware and operating systems there is massive amount of work to make that impossible, or at least harder, and in fact it is an active area of research. We will briefly discuss it in the Security chapter.
Now what we are missing is loading and saving files, and communicating with other computers. I think for those we have to go to a real computer, because qemu's disk and networking are too involved.
It is important to remember there is very little difference between any of those systems, be it showing pixels on screen or writing to disk or receiving network packets, its all the same, there are protocols that you need to comply to, write to some registers to control the hardware in a specific way, and then send or read data. Some times the hardware will trigger an interrupt, sometimes you will have to poll and check if there is data. For example the USB keyboard interface never pushes data, when I press a key on the keyboard it gets into an internal buffer in the keyboard itself, and the operating system will ask the keyboard every 5 millisecons: give me the pressed keys. In the past we had PS2 keyboars which triggered interrupts, so when you press a key, the operating system's interrupt handler is called and your press is handled immediately. 5 milliseconds is not a lot of time, its 200 times per second, so you wont notice, but you see it is fundamental difference of control. The keyboard has to be more complicated now, because it must have a buffer of pressed keys, otherwise with PS2 you just send as soon as the key is pressed.
PULL
versus PUSH
is the most fundamental property of system interractions,
and multiple books can be written to examine the emergent properties of systems
depending on how they use pull/push between their components, where are their
buffers and how are they drained.
This will conclude this chapter, and we will continue in the next one where we move to a real Raspberry Pi Pico 2350 computer, it has two ARM cores and two RISCV cores, you can use either one, and we will use the RISCV cores, I have not said much about Symmetric Multi Processing, where we have multiple CPU cores that access the same memory and hardware, and the challenges it has, and for now we will just use 1 of the cores, and pretend its only 1, to keep thing simpler. It is hard enough to have multiple programs using the same CPU to communicate, e.g. pressing a character gets it in a buffer where multiple programs might read it, but with 1 CPU, only one program runs at a time, which makes things considerably easier. Even ths simplest things are difficult when two entities are using the same resources, imagine a deck of cards, and two people reach to get the top card in the very same time, you remember how in our 74LS chips there are nanosecond intervals in which the voltage is read, well.. thats quite a lot of time for things to go wrong. A whole new set of primitives has to be used to guarantee order of operations, we won't get into it for now, but if you are interested search for "SMP" or "Symmetric MultiProcessing".
When you are exploring those concepts, use chatgpt or some other AI to help you, you can trully ask it the silliest questions, even though it might be wrong, and mislead you, just think about the wires, and listen to your doubt. Look at the source code of the Linux kernel version 1, paste it in chatgpt and ask it how it works. The recent kernel versions are very complicated as they must support tens of thousands of different devices and systems and subsystems, but you can also ask chatgpt about it. Look into preemtive scheduling, cooperative scheduling, fair scheduling, real time operating systems, processess and threads, push and pull, the dining philosophers problem, the barber problem, the byzantine generals problem. clockless computers, tickless kernels, micro kernels, synchronization primitives, semaphores, spinlocks, mutexes and futexes. Don't stress, it seems like a lot, but everything is similar, you just have to understand the reason for its existence.