Chapter 31 - Let’s make our own CPU #
This Chapter will be split into three parts:Part 1, understanding the interval workings of a CPU. This is effectively just a computer architecture class, and will be by far the largest section.Part 2, making that CPU on an FPGA.Part 3, making that CPU do something.
Part 1: Computer Architecture #
Any sufficiently advanced technology is indistinguishable from magic.
Computers, to a normal user look like magic. In previous chapters I’ve shown you how to harness this magic though programming, and revealed at least some of it’s internal worknings in Chapter 3, Into The Hardware. But this is all low level magic, the sort the lowest level mages can handle. To truely master the craft one must dive deep into the dark arts and study the origin of magic itself and understand how we breathe life into otherwise inanimate rock. Back in Chapter 14, Semiconductors I covered how the basics of the very, very low level work to form semiconductors, then in Chapter 15, Digital Logic we saw how transistors could be chained to give rise to the fundamental logic gates. Finally, last chapter we looked at the basics of FPGAs to see how they let us make large scale logic circuits quickly. That all builds to this, making complex logic circuits that actually do useful work. These can be everything from application specific tasks, like bitcoin miners or hardware encoders and decoders for video codecs to the generic CPU. For this guide, I’ll focus on the CPU. However, I think this can be realatively boring without further motivation, so, before we actually get to the CPU, let me show you how and why CPUs have evolved
[TODO] loop memory operations w/ godbolt, cisc/risc, chiplets, core counts, branch prediction, N/S bridge to chipset (memory controller), etc. Moore’s Law ending, voltage minimums, minimum transistor sizes before quantum effects, etc.
Here you can see there are three memory operations(2x MOV and one ADD that have DWORD PTR in them). With 4-byte ints on a 4Ghz CPU, this works out to $ 3 \times4(Bytes) \times (4 \times10^9(hz)) = 48GB/s $. This is pretty close to the 55GB/s that this linux tool reports that I get, per core, on my system. Fortunately, most modern CPUs have ways of parrelizing these operations internally, having multiple integer units per core as well.
[TODO] newer systems, https://www.cerebras.net/cerebras-wafer-scale-engine-why-we-need-big-chips-for-deep-learning/, https://www.anandtech.com/show/14750/hot-chips-31-analysis-inmemory-processing-by-upmem, intel optane
Data-Level, Task-Level, and Instruction-Level Paralleism (Application Parrallelism)
Intstruction Level Parrelism (Pipelining), Thread-Level and Request-Level Parellism (Architectural Parallelism)
Single Insturtion Stream, Single Data Stream (SISD)
Single Instruction Stream, Multiple Data Streams (SIMD)
MISD is not used commercially
Multiple Instruction Streams, Multiple Data STreams (MIMD) (tightly and loosely coupled)
Power wall, max Silicon freq, max density before tunnling
Part 2: What Make a CPU tick? #
maybe Ben Eaters bread board CPU series?
Part 3: Making the CPU #
Part 4: Making It Do Something #
Before we start, I’ll warn you this isn’t easy, but it’s also not as bad as it sounds, and it’s certainly not as bad as it used to be
The DooM-chip! It will run E1M1 till the end of times (or till power runs out, whichever comes first).— Sylvain Lefebvre (@sylefeb) May 8, 2020
Algorithm is burned into wires, LUTs and flip-flops on an #FPGA: no CPU, no opcodes, no instruction counter.
Running on Altera CycloneV + SDRAM. (1/n) pic.twitter.com/wd7j4JnfWn
My Nor - “a single board computer that does not have a CPU. Instead, the CPU consists of discrete logic gates from the 74HC series. This computer also has no ALU. Only a single NOR gate is used to perfom all computations such as addition, subtraction, AND, OR and XOR.”