Comparch

Chapter 37 - Let’s make our own CPU #

How To Make A CPU - A Simple Picture Based Explanation (RobertElder)

This chapter will be split into three parts:

Part 1, understanding the interval workings of a CPU. This is effectively just a computer architecture class, and will be by far the largest section.

Part 2, making that CPU on an FPGA.

Part 3, making that CPU do something.

[TODO] http://www.lighterra.com/papers/modernmicroprocessors/

[TODO] A Density Metric for Semiconductor Technology - past nm for sizing

Part 1: Computer Architecture #

Motivation: #

Any sufficiently advanced technology is indistinguishable from magic.

Arthur C. Clarke’s 3rd law

Computers, to a normal user, look like magic. In previous chapters I’ve shown you how to harness this magic though programming, and revealed at least some of its internal workings in Chapter 3, Into The Hardware. But this is all low level magic, the sort the lowest level mages can handle. To truly master the craft, one must dive deep into the dark arts and study the origin of magic itself and understand how we breathe life into otherwise inanimate rock. Back in Chapter 14, Semiconductors I covered how the basics of the very, very low level work to form semiconductors, then in Chapter 15, Digital Logic we saw how transistors could be chained to give rise to the fundamental logic gates. Finally, last chapter we looked at the basics of FPGAs to see how they let us make large scale logic circuits quickly. That all builds to this, making complex logic circuits that actually do useful work. These can be everything from application specific tasks, like bitcoin miners or hardware encoders and decoders for video codecs, to the generic CPU. For this guide, I’ll focus on the CPU. However, I think this can be relatively boring without further motivation, so, before we actually get to the CPU, let me show you how and why CPUs have evolved

[TODO] loop memory operations w/ godbolt, cisc/risc, chiplets, core counts, branch prediction, N/S bridge to chipset (memory controller), etc. Moore’s Law ending, voltage minimums, minimum transistor sizes before quantum effects, etc.

Here you can see there are three memory operations(2x MOV and one ADD that have DWORD PTR in them). With 4-byte ints on a 4Ghz CPU, this works out to $ 3 \times4(Bytes) \times (4 \times10^9(hz)) = 48GB/s $. This is pretty close to the 55GB/s that this linux tool reports that I get, per core, on my system. Fortunately, most modern CPUs have ways of parrelizing these operations internally, having multiple integer units per core as well.

[TODO] newer systems, https://www.cerebras.net/cerebras-wafer-scale-engine-why-we-need-big-chips-for-deep-learning/, https://www.anandtech.com/show/14750/hot-chips-31-analysis-inmemory-processing-by-upmem, intel optane

[TODO]

Data-Level, Task-Level, and Instruction-Level Paralleism (Application Parrallelism)

Instruction Level Parrelism (Pipelining), Thread-Level and Request-Level Parellism (Architectural Parallelism)

Single Instruction Stream, Single Data Stream (SISD)

Single Instruction Stream, Multiple Data Streams (SIMD)

MISD is not used commercially

Multiple Instruction Streams, Multiple Data Streams (MIMD) (tightly and loosely coupled)

In Memory Processing

Neuromorphic Computing

Power wall, max Silicon freq, max density before tunneling

Branch predictor: How many “if"s are too many? Including x86 and M1 benchmarks! (Cloudflare Blog)

Part 2: What Make a CPU tick? #

[TODO]

https://computersciencewiki.org/index.php/Architecture_of_the_central_processing_unit_(CPU)#Major_parts_of_a_CPU

Maybe Ben Eaters bread board CPU series?

Part 3: Making the CPU #

Part 4: Making It Do Something #

Before we start, I’ll warn you this isn’t easy, but it’s also not as bad as it sounds, and it’s certainly not as bad as it used to be

[TODO]

FPGA Linux Kernel drivers

An FPGA that only plays Doom

Archive.org backup of the above tweet

My Nor - “a single board computer that does not have a CPU. Instead, the CPU consists of discrete logic gates from the 74HC series. This computer also has no ALU. Only a single NOR gate is used to perform all computations such as addition, subtraction, AND, OR and XOR.”

Chapter 29.1 - FPAAs #

ZRNA FPAA

Weird Things #

https://hackaday.com/2020/11/23/a-cpu-less-computer-with-a-single-nor-gate-alu/


If you would like to support my development of OpGuides, please consider supporting me on GitHub Sponsors or dropping me some spare change on Venmo @vegadeftwing - every little bit helps ❤️