From ab027a48a7608c696ed019fd3fadf8ae9cd2c577 Mon Sep 17 00:00:00 2001 From: crc Date: Thu, 31 Jan 2019 13:08:25 +0000 Subject: [PATCH] check in some short articles on design and reflections on historic retro systems FossilOrigin-Name: eb357b5905393a162430aaaa4f8d4c3803a060ee6715e899efa3d3973a6bdc1e --- .../Notes_on_Metacompilation_and_Assembly.txt | 77 +++++++++++ ...otes_on_Prefixes_as_a_Language_Element.txt | 45 +++++++ ...Notes_on_the_Evolution_of_Ngaro_to_Nga.txt | 72 ++++++++++ doc/papers/Notes_on_the_Kernel_Wordset.txt | 64 +++++++++ .../Retro_11_(2011-2019):_A_Look_Back.md | 79 +++++++++++ doc/papers/The_Path_to_Self_Hosting.md | 126 ++++++++++++++++++ 6 files changed, 463 insertions(+) create mode 100644 doc/papers/Notes_on_Metacompilation_and_Assembly.txt create mode 100644 doc/papers/Notes_on_Prefixes_as_a_Language_Element.txt create mode 100644 doc/papers/Notes_on_the_Evolution_of_Ngaro_to_Nga.txt create mode 100644 doc/papers/Notes_on_the_Kernel_Wordset.txt create mode 100644 doc/papers/Retro_11_(2011-2019):_A_Look_Back.md create mode 100644 doc/papers/The_Path_to_Self_Hosting.md diff --git a/doc/papers/Notes_on_Metacompilation_and_Assembly.txt b/doc/papers/Notes_on_Metacompilation_and_Assembly.txt new file mode 100644 index 0000000..edfcbf0 --- /dev/null +++ b/doc/papers/Notes_on_Metacompilation_and_Assembly.txt @@ -0,0 +1,77 @@ +Retro 10 and 11 were written in themselves using a metacompiler. +I had been fascinated by this idea for a long time and was able +to explore it heavily. While I still find it to be a good idea, +the way I ended up doing it was problematic. + +The biggest issue I faced was that I wanted to do this in one +step, where loading the Retro source would create a new image +in place of the old one, switch to the new one, and then load +the higher level parts of the language over this. In retrospect, +this was a really bad idea. + +My earlier design for Retro was very flexible. I allowed almost +everything to be swapped out or extended at any time. This made +it extremely easy to customize the language and environment, but +made it crucial to keep track of what was in memory and what had +been patched so that the metacompiler wouldn't refer to anything +in the old image during the relocation and control change. It +was far too easy to make a mistake, discover that elements of +the new image were broken, and the have to go and revert many +changes to try to figure out what went wrong. + +This was also complicated by the fact that I built new images +as I worked, and, while a new image could be built from the last +built one, it wasn't always possible to build a new image from +the prior release version. (Actually, it was often worse - I +failed to check in every change as I went, so often even the +prior commits couldn't rebuild the latest images). + +For Retro 12 I wanted to avoid this problem, so I decided to go +back to writing the kernel ("Rx") in assembly. I actually wrote +a Machine Forth dialect to generate the initial assembly, before +eventually hand tuning the final results to its current state. + +I could (and likely will eventually) write the assembler in +Retro, but the current one is in C, and is built as part of the +standard toolchain. + +My VM actually has two assemblers. The older one is Naje. This +was intended to be fairly friendly to work with, and handles +many of the details of packing instructions for the user. Here +is an example of a small program in it: + + :square + dup + mul + ret + :main + lit 35 + lit &square + call + lit &square + call + end + +The other assembler is Muri. This is a far more minimalistic +assembler, but I've actually grown to prefer it. The above +example in Muri would become: + + i liju.... + r main + : square + i dumure.. + : main + i lilica.. + d 35 + r square + i en...... + +In Muri, each instruction is reduced to two characters, and the +bundlings are listed as part of an instruction bundle (lines +starting with `i`). This is less readable if you aren't very +familiar with Nga's assembly and packing rules, but allows a +very quick, efficient way of writing assembly for those who are. + +I eventually rewrote the kernel in the Muri style as it's what +I prefer, and since there's not much need to make changes in it. + diff --git a/doc/papers/Notes_on_Prefixes_as_a_Language_Element.txt b/doc/papers/Notes_on_Prefixes_as_a_Language_Element.txt new file mode 100644 index 0000000..5b91e5c --- /dev/null +++ b/doc/papers/Notes_on_Prefixes_as_a_Language_Element.txt @@ -0,0 +1,45 @@ +A big change in Retro 12 was the elimination of the traditional +parser from the language. This was a sacrifice due to the lack +of an I/O model. Retro has no way to know *how* input is given +to the `interpret` word, or whether anything else will ever be +passed into it. + +And so `interpret` operates only on the current token. The core +language does not track what came before or attempt to guess at +what might come in the future. + +This leads into the prefixes. Retro 11 had a complicated system +for prefixes, with different types of prefixes for words that +parsed ahead (e.g., strings) and words that operated on the +current token (e.g., `@`). Retro 12 eliminates all of these in +favor of just having a single prefix model. + +The first thing `interpret` does is look to see if the first +character in a token matches a `prefix:` word. If it does, it +passes the rest of the token as a string pointer to the prefix +specific handler to deal with. If there is no valid prefix +found, it tries to find it in the dictionary. Assuming that it +finds the words, it passes the `d:xt` field to the handler that +`d:class` points to. Otherwise it calls `err:notfound`. + +This has an important implication: *words can not reliably +have names that start with a prefix character.* + +It also simplifies things. Anything that would normally parse +becomes a prefix handler. So creating a new word? Use the `:` +prefix. Strings? Use `'`. Pointers? Try `&`. And so on. E.g., + + In ANS | In Retro + : foo ... ; | :foo ... ; + ' foo | &foo + : bar ... ['] foo ; | :bar ... &foo ; + s" hello world!" | 'hello_world! + +If you are familiar with ColorForth, prefixes are a similar +idea to colors, but can be defined by the user as normal words. + +After doing this for quite a while I rather like it. I can see +why Chuck Moore eventually went towards ColorForth as using +color (or prefixes in my case) does simplify the implementation +in many ways. + diff --git a/doc/papers/Notes_on_the_Evolution_of_Ngaro_to_Nga.txt b/doc/papers/Notes_on_the_Evolution_of_Ngaro_to_Nga.txt new file mode 100644 index 0000000..97179ed --- /dev/null +++ b/doc/papers/Notes_on_the_Evolution_of_Ngaro_to_Nga.txt @@ -0,0 +1,72 @@ +When I decided to begin work on what became Retro 12, I knew +the process would involve updating Ngaro, the virtual machine +that Retro 10 and 11 ran on. + +Ngaro rose out of an earlier experimental virtual machine I had +written back in 2005-2006. This earlier VM, called Maunga, was +very close to what Ngaro ended up being, though it had a very +different approach to I/O. (All I/O in Maunga was intended to be +memory mapped; Ngaro adopted a port based I/O system). + +Ngaro itself evolved along with Retro, gaining features like +automated skipping of NOPs and a LOOP opcode to help improve +performance. But the I/O model proved to be a problem. When I +created Ngaro, I had the idea that I would always be able to +assume a console/terminal style environment. The assumption was +that all code would be entered via the keyboard (or maybe a +block editor), and that proved to be the fundamental flaw as +time went on. + +As Retro grew it was evident that the model had some serious +problems. Need to load code from a file? The VM and language had +functionality to pretend it was being typed in. Want to run on +something like a browser, Android, or iOS? The VM would need to +be implemented in a way that simulates input being typed into +the VM via a simulated keyboard. And Retro was built around this. +I couldn't change it because of a promise to maintain, as much +as possible, source compatibility for a period of at least five +years. + +When the time came to fix this, I decided at the start to keep +the I/O model separate from the core VM. I also decided that the +core Retro language would provide some means of interpreting +code without requiring an assumption that a traditional terminal +was being used. + +So Nga began. I took the opportunity to simplify the instruction +set to just 26 essential instructions, add support for packing +multiple instructions per memory location (allowing a long due +reduction in memory footprint), and to generally just make a far +simpler design. + +I've been pleased with Nga. On its own it really isn't useful +though. So with Retro I embed it into a larger framework that +adds some basic I/O functionality. The *interfaces* handle the +details of passing tokens into the language and capturing any +output. They are free to do this in whatever model makes most +sense on a given platform. + +So far I've implemented: + + - a scripting interface, reading input from a file and + offering file i/o, gopher, and reading from stdin, and + sending output to stdout. + - an interactive interface, built around ncurses, reading + input from stdin, and displaying output to a scrolling + buffer. + - an iOS interface, built around a text editor, directing + output to a separate interface pane. + - an interactive block editor, using a gopher-based block + data store. Output is displayed to stdout, and input is + done via the blocks being evaluated or by reading from + stdin. + +In all cases, the only common I/O word that has to map to an +exposed instruction is `putc`, to display a single character to +some output device. There is no requirement for a traditional +keyboard input model. + +By doing this I was able to solve the biggest portability issue +with the Retro 10/11 model, and make a much simpler, cleaner +language in the end. + diff --git a/doc/papers/Notes_on_the_Kernel_Wordset.txt b/doc/papers/Notes_on_the_Kernel_Wordset.txt new file mode 100644 index 0000000..9fd4529 --- /dev/null +++ b/doc/papers/Notes_on_the_Kernel_Wordset.txt @@ -0,0 +1,64 @@ +In implementing the Retro 12 kernel (called Rx) I had to decide +on what functionality would be needed. It was important to me +that this be kept clean and minimalistic, as I didn't want to +spend a lot of time changing it as time progressed. It's far +nicer to code at the higher level, where the Retro language is +functional, as opposed to writing more assembly code. + +So what made it in? + +Primitives + +These are words that map directly to Nga instructions. + + dup drop swap call eq? -eq? lt? gt? + fetch store + - * /mod and or + xor shift push pop 0; + +Memory + + fetch-next store-next , s, + +Strings + + s:to-number s:eq? s:length + +Flow Control + + choose if -if repeat again + +Compiler & Interpreter + + Compiler Heap ; [ ] Dictionary + d:link d:class d:xt d:name d:add-header + class:word class:primitive class:data class:macro + prefix:: prefix:# prefix:& prefix:$ + interpret d:lookup err:notfound + +I *could* slightly reduce this. The $ prefix could be defined in +higher level code, and I don't strictly *need* to expose the +`fetch-next` and `store-next` here. But since the are already +implemented as dependencies of the words in the kernel, it would +be a bit wasteful to redefine them later in higher level code. + +With these words the rest of the language can be built up. Note +that the Rx kernel does not provide any I/O words. It's assumed +that the Retro interfaces will add these as best suited for the +systems they run on. + +There is another small bit. All images start with a few key +pointers in fixed offsets of memory. These are: + + | Offset | Contains | + | ------ | --------------------------- | + | 0 | lit call nop nop | + | 1 | Pointer to main entry point | + | 2 | Dictionary | + | 3 | Heap | + | 4 | Retro version identifier | + +An interface can use the dictionary pointer and knowledge of the +dictionary format for a specific Retro version to identify the +location of essential words like `interpret` and `err:notfound` +when implementing the user facing interface. + diff --git a/doc/papers/Retro_11_(2011-2019):_A_Look_Back.md b/doc/papers/Retro_11_(2011-2019):_A_Look_Back.md new file mode 100644 index 0000000..0f3ef6d --- /dev/null +++ b/doc/papers/Retro_11_(2011-2019):_A_Look_Back.md @@ -0,0 +1,79 @@ +# Retro 11 (2011 - 2019): A Look Back + +So it's now been about five years since the last release of +Retro 11. While I still see some people obtaining and using it, +I've moved on to the twelth generation of Retro. It's time for +me to finally retire Retro 11. + +As I prepare to do so, I thought I'd take a brief look back. + +Retro 11 began life in 2011. It grew out of Retro 10, which +was the first version of Retro to not be written in x86 assembly +language. For R10 and R11, I wrote a portable virtual machine +(with numerous implementations) and the Forth dialect was kept +in an image file which ran on the VM. + +Retro 10 worked, but was always a bit too sloppy and changed +drastically between releases. The major goal of Retro 11 was +to provide a stable base for a five year period. In retrospect, +this was mostly achieved. Code from earlier releases normally +needed only minor adjustments to run on later releases, though +newer releases added significantly to the language. + +There were seven releases. + +- Release 11.0: 2011, July +- Release 11.1: 2011, November +- Release 11.2: 2012, January +- Release 11.3: 2012, March +- Release 11.4: 2012, July +- Release 11.5: 2013, March +- Release 11.6: 2014, August + +Development was fast until 11.4. This was the point at which +I had to slow down due to RSI problems. It was also the point +which I started experiencing some problems with the metacompiler +(as discussed previously). + +Retro 11 was flexible. All colon definitions were setup as hooks, +allowing new functionality to be layered in easily. This allowed +the later releases to add things like vocabularies, search order, +tab completion, and keyboard remapping. This all came at a cost +though: later things could use the hooks to alter behavior of +existing words, so it was necessary to use a lot of caution to +ensure that the layers didn't break the earlier code. + +The biggest issue was the I/O model. Retro 11 and the Ngaro VM +assumed the existence of a console environment. All input was +required to be input at the keyboard, and all output was to be +shown on screen. This caused some problems. Including code from +a file required some tricks, temporarily rewriting the keyboard +input function to read from the file. It also became a major +issue when I wrote the iOS version. The need to simulate the +keyboard and console complicated everything and I had to spend +a considerable amount of effort to deal with battery performance +resulting from the I/O polling and wait states. + +But on the whole it worked well. I used Retro 11.6 until I +started work on Retro 12 in late 2016, and continued running +some tools written in R11 until the first quarter of last year. + +The final image file was 23,137 cells (92,548 bytes). This was +bloated by keeping some documentation (stack comments and short +descriptions) in the image, which started in 11.4. This contained +269 words. + +I used Retro 11 for a wide variety of tasks. A small selection +of things that were written includes: + +- a pastebin +- front end to ii (irc client) +- small explorations of interactive fiction +- irc log viewer +- tool to create html from templates +- tool to automate creation of an SVCD from a set of photos +- tools to generate reports from data sets for my employer + +In the end, I'm happy with how Retro 11 turned out. I made some +mistakes in embracing too much complexity, but despite this it +was a successful system for many years. diff --git a/doc/papers/The_Path_to_Self_Hosting.md b/doc/papers/The_Path_to_Self_Hosting.md new file mode 100644 index 0000000..19647ce --- /dev/null +++ b/doc/papers/The_Path_to_Self_Hosting.md @@ -0,0 +1,126 @@ +# The Path to Self Hosting + +Retro is an image based Forth system running on a lightweight +virtual machine. This is the story of how that image is made. + +The first Retro to use an image based approach was Retro 10. +The earliest images were built using a compiler written in +Toka, an earlier experimental stack language I had written. +It didn't take long to want to drop the dependency on Toka, +so I rewrote the image compiler in Retro and then began +development at a faster pace. + +Retro 11 was built using the last Retro 10 image and an +evolved version of the metacompiler. This worked well, but +I eventually found it to be problematic. + +One of the issueo sI faced was the inability to make a new +image from the prior stable release. Since I develop and +test changes incrementally, I reached a point where the +current metacompiler and image required each other. This +wasn't a fatal flaw, but it was annoying. + +Perhaps more critical was the fragility of the system. In +R11 small mistakes could result in a corrupt image. The test +suite helped identify some of these, but there were a few +times I was forced to dig back through the version control +history to recover a working image. + +The fragile nature was amplified by some design decisions. +In R11, after the initial kernel was built, it would be +moved to memory address 0, then control would jump into the +new kernel to finish building the higher level parts. + +Handling this was a tricky task. In R11 almost everything +could be revectored, so the metacompiler had to ensure that +it didn't rely on anything in the old image during the move. +This caused a large number of issues over R11's life. + +So on to Retro 12. I decided that this would be different. +First, the kernel would be assembly, with an external tool +to generate the core image. The kernel is in `Rx.md` and the +assembler is `Muri`. To load the standard library, I wrote a +second tool, `retro-extend`. This separation has allowed me +many fewer headaches as I can make changes more easily and +rebuild from scratch when necessary. + +But I miss self-hosting. So last fall I decided to resolve +this. And today I'm pleased to say that it is now done. + +There are a few parts to this. + +**Unu**. I use a Markdown variation with fenced code blocks. +The tool I wrote in C to extract these is called `unu`. For +a self hosting Retro, I rewrote this as a combinator that +reads in a file and runs another word against each line in the +file. So I could display the code block contents by doing: + + 'filename [ s:put nl ] unu + +This made it easier to implement the other tools. + +**Muri**. This is my assembler. It's minimalistic, fast, and +works really well for my purposes. Retro includes a runtime +version of this (using `as{`, `}as`, `i`, `d`, and `r`), so +all I needed for this was to write a few words to parse the +lines and run the corresponding runtime words. As with the C +version, this is a two pass assembler. + +Muri generates a new `ngaImage` with the kernel. To create a +full image I needed a way to load in the standard library and +I/O extensions. + +This is handled by **retro-extend**. This is where it gets +more complex. I implemented the Nga virtual machine in Retro +to allow this to run the new image in isolation from the +host image. The new ngaImage is loaded, the interpreter is +located, and each token is passed to the interpreter. Once +done, the new image is written to disk. + +So at this point I'm pleased to say that I can now develop +Retro using only an existing copy of Retro (VM+image) and +tools (unu, muri, retro-extend, and a line oriented text +editor) written in Retro. + +This project has delivered some additional side benefits. +During the testing I was able to use it to identify a few +bugs in the I/O extensions, and the Nga-in-Retro will replace +the older attempt at this in the debugger, allowing a safer +testing environment. + +What issues remain? + +The extend process is *slow*. On my main development server +(Linode 1024, OpenBSD 6.4, 64-bit) it takes a bit over five +minutes to complete loading the standard library, and a few +additional depending on the I/O drivers selected. + +Most of the performance issues come from running Nga-in-Retro +to isolate the new image from the host one. It'd be possible +to do something a bit more clever (e.g., running a Retro +instance using the new image via a subprocess and piping in +the source, or doing relocations of the data), but this is +less error prone and will work on all systems that I plan to +support (including, with a few minor adjustments, the native +hardware versions [assuming the existance of mass storage]). + +Sources: + +**Unu** + +http://forth.works/c8820f85e0c52d32c7f9f64c28f435c0 + +gopher://forth.works/0/c8820f85e0c52d32c7f9f64c28f435c0 + +**Muri** + +http://forth.works/09d6c4f3f8ab484a31107dca780058e3 + +gopher://forth.works/0/09d6c4f3f8ab484a31107dca780058e3 + +**retro-extend** + +http://forth.works/c812416f397af11db58e97388a3238f2 + +gopher://forth.works/0/c812416f397af11db58e97388a3238f2 +