Who looks outside, dreams; who looks inside, awakes.

-- Carl Jung

Memory

Now you know how to store 1 bit with a latching circuit, there is another configuration using 6 transistors to form the infinite loop, called "6T SRAM cell", that makes it easier to build a huge array of cells and allows us to access the data.

This is how a cell looks:

The picture looks complicated, but the idea is the same as the Flip Flop and SR Latch loops. The circuit guarantees that as long as there is power, it will remember.

In order to read the picture I will have to explain a bit more about the transistors. There are many kinds of transistors, but their purpose is the same, to be an electrically controlled switch. The way they work is by opening or closing a channel in which electrons can flow.

The ones we were discussing previously are usually NPN transistors, but for memory we use MOSFET transistors, which are Metal Oxide Semiconductor FET (Field Effect Transistor). Anyway, the names are not important, the idea is important.

There are two kinds of MOSFETs, NMOS and PMOS, both have 3 legs, but they have different names than the NPN transistors. The MOS legs (I am not even sure if we should call them legs, since we make them so tiny that they are few atoms in size) - I can't overstate the amount of progress we have had in this area, and I am actually afraid that we will forget how to make them. Anyway, the PMOS and NMOS's legs are called Gate, Source, Drain.

There are hundreds of videos on youtube that explain how they work, Electro BOOM made a video recently as well, please check it out before you continue, its just 20 minutes or so and its really good.

In the memory cell, M2 and M4 are PMOS, you can see they have a small circle on their gate, and M1 and M3 are NMOS.

PMOS:

It turns ON when its gate voltage is LOWER than its source voltage
It turns OFF when its gate voltage is HIGHER than its source voltage

NMOS:

It turns ON when its gate voltage is HIGHER than its source voltage
It turns OFF when its gate voltage is LOWER than its source voltage

You see the on M5 and M6 (both of which are NMOS), the Source and Drain actually depend on which side the voltage is, which depends on the value of the inner loop between M1, M2, M3 and M4.

We will zoom in on M3 and M4:

When the input is LOW: The PMOS transistor (M4) turns ON; The NMOS transistor (M3) turns OFF; The output Q is pulled up to VDD (HIGH).

When the input is HIGH: The PMOS transistor (M4) turns OFF; The NMOS transistor (M3) turns ON; The output Q is pulled down to ground (LOW).

This is just a NOT gate, whatever we have as input, the output is the inverse.

So, lets think about our memory cell in a bit more simplified way. It is just a loop of NOT gates.

The symbol for a NOT gate, also called an inverter, is a triangle with a circle.

Now, follow the loop, if Q is HIGH the output from GATE1 is LOW, so Q is LOW, and then the input to GATE2 is LOW, so its output is HIGH

If Q is LOW the output from GATE1 is HIGH, so Q is HIGH, and then the input to GATE2 is HIGH, so its output is LOW.

This is the crux of the memory loop, two CMOS inverters in a loop, or two NOT gates in a loop, same thing.

Now lets talk about how are we going to read or write from the inner cell. After all we want to store many many bytes of data, and the cell is only 1 bit, so we have to organize a whole array of cells into a structure that makes it possible to read multiple in the same time.

First lets check the WL (Word Line), you see that when its LOW M5 and M5 are OFF so nothing happens, we dont touch the inner cell, it is isolated from BL (the bit lines), and it is storing its value in the infinite loop of the not gates. Which is quite poetic BTW, infinite denial stores the bit. Whatever the value was it stays like that, so if Q is 1 Q is 0 and vice versa. As long as VDD exists this state is mantained.

If we want to read, we must set the Word Line to HIGH, both BL and BL are 'precharged' to HIGH, meaning they are HIGH before the Word Line is HIGH. At the moment that WL is set to HIGH, depending on the value of the inner cell, one of the bit lines will be pulled LOW. If Q = HIGH then BL will be HIGH and Q will be LOW so BL will be LOW. And if Q = LOW, BL will be LOW, and Q is HIGH which pulls BL HIGH. A special circuit called sense amplifier can detect this effect.

I wont get into detail why precharging is needed, as it is beyond the scope of the book, but I encourage you to investigate it.

Writing is very similar to reading, but instead of sensing the change in BL and BL, they are set to the value we want, so to write 1 we set BL to HIGH and BL to LOW, to set 0 we set BL to LOW and BL to HIGH, and once WL is HIGH the bit is stored in the inner cell.

Don't panic if you don't get all this LOW and HIGH business. Draw the circuit on paper and follow it with a pen, or even better, just take a pen and write on this book. Follow the lines, imagine water flowing through and think about the transistors as valves that turn it on or off.

This is how an organization of cells looks like in the real world:

Or as a diagram:

We make a grid of cells, there is a Row Decoder and a Column Decoder and Sense Amplifiers. The row decoder controls the Word Line, and the column decoder the bit lines. Only one word line can be HIGH at a time, while multiple bit lines can be active from the column decoder, and by active I mean it connects them to the sense amplifier or the write drivers (circuits that force the state on BL andBL).

On our diagram we have 8 x 8 cells, so in total we have 64 bits of memory, Imagine we want to write the value 0 at the purple inner cell, it is at location ROW: 3, COL: 4, we want the row decoder to disable all other Word Lines besides the one at row 3, and we want the column decoder to enable the write driver at column 4, and set the BL to LOW, and BL to HIGH on this column. Now if you follow the lines you see that since no other word line is enabled, only our purple cell will get set to 0.

We actually want to give the number 3 to the row decoder, which is 0011, and the number 4 to the column decoder, which is 0100, and they should enable the right lines. So there are 8 cables going into the memory if we set them to LOW LOW HIGH HIGH LOW HIGH LOW LOW, or 0011 0100, then from the output of the memory we will read the value of the purple cell. This is what a memory address is. It is literally its row and column position. In our case the decimal number of 00110100 is 52, so our bit is at address 52.

This kind of memory is called RAM, or Random Access Memory, because you are allowed to read and write to any address. It is also called volatile memory, because once the power goes down, the data disappears.

There are many kinds of RAM, the one we discussed is SRAM, or Static RAM, because as long as there is power data is stable, there is also DRAM which has to be refreshed every few milliseconds to keep the data.

You can see in our example that when we enable the word line we can actually write or read all the value of the row, thats why the word line is called a word line, a word is the natural unit of data that the processor can work with. In different systems they have different values, in the past we had systems with 8, 12, 16, 18, 21 .. bit words, now almost everything 32 bit or 64 bits. That is why in C the size of int is defined in the standard as minimum 2 bytes and maximum 4 bytes.

There are much more complicated organizations, but that is beyond our scope, if you are interested search for DRAM, NAND flash memory, FRAM.

But the real question is, why would we want to address individual bytes or bits? Do programs need addressable memory? After all most of the things we do are sequences, for example this text, is read and written as a sequence of characters. The laws of physics are updated sequentially, in a smooth continous flow of communication through bosons, nothing is abrupt, so why would want to randomly access the purple bit 53 for example?

Lets look at this program:

That which is in locomotion must arrive at the half-way stage 
before it arrives at the goal.

-- Aristotle, Physics VI:9

Lets say we want to travel a distance of 2 meters, before we get there we surely must travel 1 meter, and before we get there we must travel half a meter, .. and so on.. before we travel 0.0001 meters we must travel 0.00005 meters..

And so, when we evaluate the program in our head, it seems like nothing should move, because it will infinitely get the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half of the half...

Now imagine we want to follow 10 people, and we have to remember each person's half, so that we can compute its half, we must "look up" the previous value. How do you imagine keeping track of all the halves when people complete them at different time?

What about this program:

copy this sentence below

Amazingly the program writes more of itself:

copy this sentence below  
copy this sentence below  
copy this sentence below  
copy this sentence below  
copy this sentence below  
copy this sentence below

In order to do that its evaluator must know where it ends, and where is 'below'.

copy this sencence below, then delete the sentence above

after few interations we get:

........................................................  
........................................................  
........................................................  
........................................................  
........................................................  
copy this sencence below, then delete the sentence above

Look again at this program:

I am what I was plus what I was before I was.
Before I began, I was nothing.
When I began, I was one.

When we executed the values "slide" through memory,

0   | 0: Before I began I was nothing
1   | 1: When I began I was one
2   | 1 = 1 + 0 I am what I was plus what I was before I was.
3   | 2 = 1 + 1 I am what I was plus what I was before I was.
4   | 3 = 2 + 1 I am what I was plus what I was before I was.
5   | 5 = 3 + 2 I am what I was plus what I was before I was.
6   | 8 = 5 + 3 I am what I was plus what I was before I was.
7   | 13 = 8 + 5 ...
8   | 21 = 13 + 8 ...
... | ...
50  | 12586269025 = 4807526976 + 7778742049
... | ...
250 | 7896325826131730509282738943634332893686268675876375 = ...
... | ...

You see "before I was" is just CURRENT ADDRESS - 2, but this could be at address 1024, then when you say again "before I was" it is at address 1032, so the "before I was" moves as the program is evaluated.

You see how natural it is to be able to refer to the information's location, for example knowing where is 'below' or 'above', or knowing where you stored the half of the half, so that you can take its half.

There is subtle difference between infinite half of the half (1) for 10 people and I am what I was plus what I was before I was (2).

Feels more like a filing cabinet, where you just need to find the value of the previous half, and then replace it with the new value. Updates are abrupt, first person 7 passes their half, then person 3, then person 8.
Feels more like a river carrying data with it. Things only communicate/interact with their surroundings. One thing leads to the next and so on. Maybe a better example is lyrics of a song, for me it is really hard to sing a song from the middle, but have no issue to sing it from start to finish otherwise.

I don't know why, but we seem to think with addressable memory, It is much easier to express our complex ideas by storing information in places and be able to look it up and change it. Since Gilgamesh and Enkidu of Uruk, and possibly even before that, 4000 years ago, we know that the people of Sumer were making lists, storing and indexing information.

This is the list of kings:

In Ur, Mesannepada became king; he ruled for 80 years. Meskiagnun, the son of Mesannepada, became king; he ruled for 36 years. Elulu ruled for 25 years. Balulu ruled for 36 years. 4 kings; they ruled for 171 years. Then Ur was defeated and the kingship was taken to Awan...

Even today in the modern office you will see everything is indexed in file cabinets and folders with labels, our TV channels, our houses, our book pages are numbered and addressable, books even have inverted indexes of which information is on which page, which company is at which address, etc. The principle is the same as the sumerian king list, which year did which king rule, which king ruled how many years.

When you think of ways how to track the 10 people's halves, you intiuitively imagine all kinds of devices, like boxes, or pages, or you can just "remember them", but think for a second, what does "remembering them" mean, it means when runner number 1 gets to their half you have to conjure the previous half divide it by 2 and then remember the new value. If you build a system with pages, e.g. runner 1 is on page 1, runner 2 on page 2, etc, and runner 1 reaches the half, you just open to page 1, read the current value, halve it, and write the new value.

Again, we "think" with addressable memory. Today, programming languages that allow direct memory manipulation, and the ability to label memory, are vastly more popular than the ones that don't, that of course does not make them better or worse, just different.

There are stack computers for example, that do not have a concept of an address, and are just as powerful. Or neural network computers, where the program and its memory is in the interaction strengths between the neurons. In biological or chemical computers it seems the information is stored and retrieved in potential energy and the emergent structures because of it. There are also graph computers, quantum computers, and so on.

But for us, human beings, it seems it is easiest to express ourselves by mutating (changing) memory.

OK, now things are going to get crazy, I will show you how powerful addressable memory is, and how we can build very simple universal computers with it.

Just with addressable memory, subtract and if we can build universal computer. Our computer will be able to do only 1 thing, given 3 numbers, A,B,C it will subtract the value at location B - value at location A, store the result back in location B, and if the result is less or equalt zero, move to location C, if not continue to execute the next location.

This language is called SUBLEQ (SUBtract and branch if Less than EQal to zero) is possibly the simplest one instruction language.

This is a pseudocode of what it does:

PC = 0
forever:
   a = memory[PC]
   b = memory[PC + 1]
   c = memory[PC + 2]
   memory[b] = memory[b] - memory[a]
   if memory[b] <= 0:
       PC = c
   else:
       PC += 1

PC means Program Counter, it is just a bit of memory where track where exactly are we in the program and what instruction we should execute, like your finger keeping the book open when you want to remember which page you are at. memory[a] means the stored value at address a, which itself mean particular row and column in the grid of CMOS circuits, or if the memory was a book, and our values were whole pages, a will be the page number. If the memory was a street with houses, then a will be the street number, and inside the house at a will be the value at this address.

Examine the following program: 7 6 9 8 8 0 3 1 0 8 8 9, looks a bit scary, but let me rewrite it in a grid, on each call you see the value and its address.

7₀	6₁	9₂
8₃	8₄	0₅
3₆	1₇	0₈
8₉	8₁₀	9₁₁

When the processor starts, it will load the first instruction and start executing:

Breakdown of the execution:
0: subleq 7, 6, 9
   a = memory[0], which is 7
   b = memory[1], which is 6
   c = memory[2], which is 9
   memory[b] = memory[b] - memory[a]
   if memory[b] <= 0:
      PC = c
   else
      PC += 1
   in our case, on location 6 we have 3, and on 7 we have 1
   so we will store 2 (the result of 3 - 1) at location 6
   and since it is greather than 0, we will continue to the
   next instruction.

3: subleq 8,8,0    
   a = memory[2], which is 8
   b = memory[3], which is 8
   c = memory[4], which is 0
   memory[b] = memory[b] - memory[a]
   if memory[b] <= 0:
      PC = c
   else
      PC += 1

   you will notice, that in locaiton 8 we have: 0
   so 0 - 0 is 0, so we will jump to the 3rd parameter
   of the instruciton, which is 0

9:
   subleq 8, 8, 9
   a = memory[9], which is 8
   b = memory[10], which is 8
   c = memory[11], which is 9
   memory[b] = memory[b] - memory[a]
   if memory[b] <= 0:
      PC = c
   else
      PC += 1

   and.. surprise, we are at location 9
   so it will execute this instruction forever

It is a simple counter that counts from 3 to 0.

What it can do is only limited by our ability to program it. If we make it big enough, it can simmulate the weather on our planet, or, some people say, the universe. It is, what we call now, an universal computer.

Alan Turing, in 1930s found the universal computing machine, now we call it a Turing Machine.

...an unlimited memory capacity obtained in the form of an infinite tape marked out into squares, on each of which a symbol could be printed. At any moment there is one symbol in the machine; it is called the scanned symbol. The machine can alter the scanned symbol, and its behavior is in part determined by that symbol, but the symbols on the tape elsewhere do not affect the behavior of the machine. However, the tape can be moved back and forth through the machine, this being one of the elementary operations of the machine. Any symbol on the tape may therefore eventually have an innings. -- Alan Turing 1948

What Turing has found is that any machine that has memory and can make choices based on said memory can compute any computable sequence. You see, being able to replace the whole memory at once, or being able to read individual bytes or bits of information is not important for the theoretical machine. Anything that can simulate the universal Turing machine can compute anything computable; we call this property Turing-completeness. The term "memory" is used a bit losely here, memory can be obfscure, like the memory of neural networks is not obvious to us, but there is still memory there.

We design our computers so that we can program them, and that means to be able to express our ideas in their language. Even this primitive SUBLEQ language is much easier for us to program than the simplest chemical computer. Again, possibly due to the way we use our memory, somehow our memory can recall information on demand, when you think of an apple, an apple will appear in your imagination. The same program can be written in infinitely many ways, in different languages, or for different computation machines, even though it might do the same thing, so we have to pick the one that works for us.

You saw how the grid of RAM cells looks, it is instant to access specific bytes form it, we just have to toggle a switch and with almost the speed of light we get the data. So it is not only natural to us, but also extremely practical to use addressing for our programs.

Alonzo Church, a titan, who at the same time as Turing, discovered another universal computer. Both of them made their machines, and even though they look nothing alike, each can simmulate the other. Church discovered that everything that can be computed can be expressed as transformation of symbols. I won't go into detail, just enough to leave you confused. It does not use memory in the same way; its memory is stored in recursion, and its choices are stored in selection.

Computation is far more general than the machines we built, don't be confused by the bits and bytes, ones and zeroes. Everything is the same, but, you must be able to talk to the machine, to make your program do what you want, so you must understand the machine in order to think like it and find a way to communicate with it.

Humans have 'theory of mind', I can pretend that I am you, and think what you would do, how would you feel, why are you doing the thing that you are doing. Proven by the famous 'Sally-Anne test": Sally puts her marble in the red box and goes outside. While she’s gone, Anne moves the marble to the blue box. When Sally comes back where would she look for the marble first? You could think what she would do, she of course might surprise you, and not look for the marble at all, and if she doesn't you could think of reasons why, maybe she hid because she hates it and never wants to see it again. This is theory of mind, you being to able to think what another human would do and why would they do it. Theory of Mind is in the fabric of our ability to communicate, interact and build complex societies. That is why human language is so different than machine language. Language for humans is not only communication mechanism, each symbol produced, modifies the writer themself, as well as the reader. What does that mean for a writer who writes for themselves? Human language is ever changing. Its purpose is to express subjective experience, emotion, intention, it has nuance and metaphor, and its meaning emerges from interpretation and introspection. It is ambigous and contextual by nature, one symbol can mean nothing and everything.

Programming language is very different, it is determinism, int a = 1 + 1, it is completely unambigous, strict, it is more of an encoded set of instructions than what we mean by "language".

Both human and programming languages have structure, grammar and vocabulary, and this is in fact the formal definition of "a language", but you can see they are in fact very different in the way the symbols are evaluated, due to the nature of their evaluator. The purpose of a programming language is for humans to be able to express their idea to the machine. Any computer can run all programs, but the program for a chemical computer looks very different than a program for a digital computer, e.g. the program a = 1 + 1, we could compile that into instructions for both computers, but it could be that for the chemical computer this is incredibly difficult task, could take 1 year to execute reliably, but in the digital computer it takes 1 nanosecond. Our programming languages are bound by the computer which will execute their program. In the same time programs can live in some very abstract space, e.g. the expression x = x + 1 can work with value of x so large that there are not enough electrons in the whole universe to encode its value. But the language must be practical, it must make it as easy as possible for the human to write the program, and for the computer to execute said program.

Most programming languages try to ignore that our computers are what they are, of course, for noble goals: to write complex programs is beyond our abilities. We keep trying to create languages with emergent properties to save us from ourselves. Look at the average programmer and think how would they use it the language, will their program require more maintenence, will there be more bugs, can you replace the programmer easilly, is it productive, is it performant, and so on. Language designer have all kinds of inspirations. Sometimes they forget that the average programmer does not exist. Nothing average exists. If you were to make a chair, the perfect chair for me might be a torture device for you, so the chair designer have to compromise, because they want to sell chairs both to you and me. And we get an average chair, worse for both.

Understand how the digital computer remembers and how it thinks, will help you to have a 'theory of mind' when talking to it. This applies to any system you are interracting with, that is what understanding physics and math gives you, the ability to think like the universe. To ask questions: why is it moving, why is did it stop? When you save a file on your One Drive disk, then the you open the drive on another compuiter, and the file is gone, why is it gone? How could it be that things are the way they are? How do pixels work on your screen, or WiFi, how about the TV's remote control? You see how well you understand Sally, you can understand anything in the same way, if you think like it, examine its parts, and the part's interractions, empathize with it.

Many give up on understanding, some they confuse it with success, their goal is to get a good job, or impress their teacher, parents or peers, or even themselves, others think they are not good enough, others think they have gained mastery, "there is nothing more to understand" they say.

Fools.

To understand one thing means to understand everything. Hundred lifetimes are not enough.

Be careful, as Jung says, There is only one way and that is your way.

There is only one way and that is your way; there is only one salvation and that is your salvation. Why are you looking around for help? Do you believe that help will come from outside? What is to come is created in you and from you. Hence look into yourself. Do not compare, do not measure. No other way is like yours. All other ways deceive and tempt you. You must fulfill the way that is in you.

Oh, that all men and all their ways become strange to you! Thus might you find them again within yourself and recognize their ways. But what weakness! What doubt! What fear! You will not bear going your way. You always want to have at least one foot on paths not your own to avoid the great solitude! So that maternal comfort is always with you! So that someone acknowledges you, recognizes you, bestows trust in you, comforts you, encourages you. So that someone pulls you over onto their path, where you stray from yourself and where it is easier for you to set yourself aside. As if you were not yourself! Who should accomplish your deeds? Who should carry your virtues and your vices? You do not come to an end with your life, and the dead will besiege you terribly to live your unlived life. Everything must be fulfilled. Time is of the essence, so why do you want to pile up the lived and let the unlived rot?

-- Carl Jung, Liber Secundus

I have confused you enough, but will leave you with one more riddle:

I am what I read plus what I write.  
Before I began, I read nothing.  
When I began, I wrote "I am what I read plus what I write."

This language program, creates itself, defines itself, and its output is itself. How do you think it uses memory?

Going back to the wires. Lets have a look of how SRAM actually looks, this is the HY-6116 2048 x 8 bit SRAM chip

This chip is quite old, from 1986, and it has only 2048 bytes of memory, but we will use it for education purposes.

When you buy a chip you get a datasheet where you can see its specifications, and how it works.

In the first page of the datasheet you can spot some quite familiar words, you can see the row decoder, the column decoder, you can see the grid of 128 x 128 cells. You can see the row decoder has 7 wires, from A4 to A7, so we can represent any number from 0 to 127, but strangely the column decoder takes only 4 wires coming in, A0, A1, A2, A3, so it can represent only 16 columns, from 0 to 15. Which gives us 128 * 16 = 2048 locations, but the grid has 16384 cells. This is because we always read or write one byte at a time, we are not addressing each bit, but each 8 bits.

https://pdf.datasheetcatalog.com/datasheets/480/499400_DS.pdf

The 8 IO lines are the input and output for the data. We either read a byte or write a byte using them.

There are few more wires that are important, CS, WE, OE, the bars on top of them mean "active low", so when it is connected to ground it is active, and when it has voltage it is inactive.

CS: chip select - when enabled the chip is active
WE: write enable - using the IO lines are we reading or writing, that tells the column decoder if it should enable the sensors or the bit lines to the IO lines
OE: output enable - for reading, we want to tell the chip WHEN to put the data on the IO lines, putting the data means setting them HIGH or LOW, so in order to read, we disable WE, and at the very moment that OE is active, the chip will put the data on the lines. Once OE is inactive the sensors are disconnected from the IO lines.

For our computer we will use a smaller chip, but it has similar pins, and it is way smaller, only 16 bytes, but it will work for us.

https://www.alldatasheet.com/datasheet-pdf/download/1132262/FAIRCHILD/74LS189.html

One important thing to notice is that the output of this chip is inverted, so if we store 1, in a location, the output will be 0, and if we store 0 the output will be 1, which means we will have to use a NOT gate to invert the outputs to use them properly.

Machine Code

Memory