retroforth/literate/Rx.md at 4df377b9f2872f36288422ab8c44118386564fcb

crc 4df377b9f2 formatting improvements

FossilOrigin-Name: 706ad99e6ebc6ff023e9db8fc0641c55bbf1c9c62d9aa0820c27d680117ca0b4

2018-02-12 23:48:40 +00:00

27 KiB

Raw Blame History

____  _   _
|| \\ \\ //
||_//  )x(
|| \\ // \\ 2018.2
a minimalist forth for nga

Rx (retro experimental) is a minimal Forth implementation for the Nga virtual machine. Like Nga this is intended to be used within a larger supporting framework adding I/O and other desired functionality. Various example interface layers are included.

General Notes on the Source

Rx is developed using a literate tool called unu. This allows easy extraction of fenced code blocks into a separate file for later compilation. Developing in a literate approach is beneficial as it makes it easier for me to keep documentation current and lets me approach the code in a more structured manner.

This source is written in Muri, an assembler for Nga.

In the Beginning...

All code built with the Nga toolchain starts with a jump to the main entry point. With cell packing, this takes two cells. We can take advantage of this knowledge to place a couple of variables at the start so they can be easily identified and interfaced with external tools. This is important as Nga allows for a variety of I/O models to be implemented and I don't want to tie Rx into any one specific model.

Here's the initial memory map:

Offset	Contains	Notes
0	lit call nop nop	Compiled by Naje
1	Pointer to main entry point	Compiled by Naje
2	Dictionary
3	Heap

Naje, the Nga assembler, compiles the initial instructions automatically. Muri does not, so provide this here.

i liju....
d -1

The two variables need to be declared next, so:

: Dictionary
r 9999

: Heap
d 1536

: Version
d 201802

Both of these are pointers. Dictionary points to the most recent dictionary entry. (See the Dictionary section at the end of this file.) Heap points to the next free memory address. This is hard coded to an address beyond the end of the Rx kernel. It'll be fine tuned as development progresses. See the Interpreter & Compiler section for more on this.

Nga Instruction Set

The core Nga instruction set consists of 27 instructions. Rx begins by assigning each to a separate function. These are not intended for direct use; in Rx the compiler will fetch the opcode values to use from these functions when compiling. Some of them will also be wrapped in normal functions later.

: _nop
d 0
i re......
: _lit
d 1
i re......
: _dup
d 2
i re......
: _drop
d 3
i re......
: _swap
d 4
i re......
: _push
d 5
i re......
: _pop
d 6
i re......
: _jump
d 7
i re......
: _call
d 8
i re......
: _ccall
d 9
i re......
: _ret
d 10
i re......
: _eq
d 11
i re......
: _neq
d 12
i re......
: _lt
d 13
i re......
: _gt
d 14
i re......
: _fetch
d 15
i re......
: _store
d 16
i re......
: _add
d 17
i re......
: _sub
d 18
i re......
: _mul
d 19
i re......
: _divmod
d 20
i re......
: _and
d 21
i re......
: _or
d 22
i re......
: _xor
d 23
i re......
: _shift
d 24
i re......
: _zret
d 25
i re......
: _end
d 26
i re......

Nga also allows for multiple instructions to be packed into a single memory location (called a cell). Rx doesn't take advantage of this yet, with the exception of calls. Since calls take a value from the stack, a typical call (in Muri assembly) would look like:

i lica....
r bye

Without packing this takes three cells: one for the lit, one for the address, and one for the call. Packing drops it to two since the lit/call combination can be fit into a single cell. We define the opcode for this here so that the compiler can take advantage of the space savings.

: _packedcall
d 2049
i re......

Likewise, I define a packed jump for use with quotations. This saves several hundred cells (and thus fetch/decode cycles) when loading the standard library.

: _packedjump
d 1793
i re......

Stack Shufflers

These add additional operations on the stack elements that'll keep later code much more readable. The dup-pair is the same as over over, but I inline the machine code as it's smaller and faster in this case.

: over
i puduposw
i re......

: dup-pair
i puduposw
i puduposw
i re......

Memory

The basic memory accesses are handled via fetch and store. These two functions provide slightly easier access to linear sequences of data.

fetch-next takes an address and fetches the stored value. It returns the next address and the stored value.

: fetch-next
i duliadsw
d 1
i fere....

store-next takes a value and an address. It stores the value to the address and returns the next address.

: store-next
i duliadpu
d 1
i stpore..

Conditionals

The Rx kernel provides three conditional forms:

flag true-pointer false-pointer choose
flag true-pointer if
flag false-pointer -if

Implement choose, a conditional combinator which will execute one of two functions, depending on the state of a flag. We take advantage of a little hack here. Store the pointers into a jump table with two fields, and use the flag as the index. Default to the false entry, since a true flag is -1.

: choice:true
d 0

: choice:false
d 0

: choose
i listlist
r choice:false
r choice:true
i liadfeca
r choice:false
i re......

Next the two if forms. Note that I allow -if to fall through into if. This saves two cells of memory.

: -if
i pulieqpo
d 0
: if
i cc......
i re......

Strings

The kernel needs two basic string operations for dictionary searches: obtaining the length and comparing for equality.

Strings in Rx are zero terminated. This is a bit less elegant than counted strings, but the implementation is quick and easy.

First up, string length. The process here is trivial:

Make a copy of the starting point
Fetch each character, comparing to zero
- If zero, break the loop
- Otherwise discard and repeat
When done subtract the original address from the current one
Then subtract one (to account for the zero terminator)

: count
i lica....
r fetch-next
i zr......
i drliju..
r count

: s:length
i dulica..
r count
i lisuswsu
d 1
i re......

String comparisons are harder.

dup fetch push n:inc swap dup fetch push n:inc pop dup pop -eq? [ drop-pair drop #0 pop pop drop drop ] [ 0; drop s:eq? pop pop drop drop ] choose drop-pair #-1 ;

: mismatch
i drdrdrli
d 0
i popodrdr
i re......

: matched
i zr......
i drlica..
r s:eq
i popodrdr
i re......

: s:eq
i dufepuli
d 1
i adswdufe
i puliadpo
d 1
i duponeli
r mismatch
i lilica..
r matched
r choose
i drdrlire
d -1

Interpreter & Compiler

Compiler Core

The heart of the compiler is comma which stores a value into memory and increments a variable (Heap) pointing to the next free address. here is a helper function that returns the address stored in Heap.

: comma
i lifelica
r Heap
r store-next
i listre..
r Heap

With these we can add a couple of additional forms. comma:opcode is used to compile VM instructions into the current defintion. This is where those functions starting with an underscore come into play. Each wraps a single instruction. Using this we can avoid hard coding the opcodes.

This performs a jump to the comma word instead of using a call/ret to save a cell and slightly improve performance.

: comma:opcode
i feliju..
r comma

comma:string is used to compile a string into the current definition. As with comma:opcode, this uses a jump to eliminate the final tail call.

: ($)
i lica....
r fetch-next
i zr......
i lica....
r comma
i liju....
r ($)

: comma:string
i lica....
r ($)
i drliliju
d 0
r comma

With the core functions above it's now possible to setup a few more things that make compilation at runtime more practical.

First, a variable indicating whether we should compile or run a function. This will be used by the word classes.

: Compiler
d 0

: t-;
i lilica..
r _ret
r comma:opcode
i lilistre
d 0
r Compiler

Word Classes

Rx is built over the concept of word classes. Word classes are a way to group related words, based on their compilation and execution behaviors. A special word, called a class handler, is defined to handle an execution token passed to it on the stack. The compiler uses a variable named class to set the default class when compiling a word. We'll take a closer look at class later. Rx provides several classes with differing behaviors:

class:data provides for dealing with data structures.

interpret	compile
leave value on stack	compile value into definition

: class:data
i lifezr..
r Compiler
i drlilica
r _lit
r comma:opcode
i liju....
r comma

class:word handles most functions.

interpret	compile
call a function	compile a call to a function

: class:word:interpret
i ju......

: class:word:compile
i lilica..
r _packedcall
r comma:opcode
i liju....
r comma

: class:word
i lifelili
r Compiler
r class:word:compile
r class:word:interpret
i liju....
r choose

class:primitive is a special class handler for functions that correspond to Nga instructions.

interpret	compile
call the function	compile the instruction into the definition

: class:primitive
i lifelili
r Compiler
r comma:opcode
r class:word:interpret
i liju....
r choose

class:macro is the class handler for compiler macros. These are functions that always get called. They can be used to extend the language in interesting ways.

interpret	compile
call the function	call the function

: class:macro
i ju......

The class mechanism is not limited to these classes. You can write custom classes at any time. On entry the custom handler should take the XT passed on the stack and do something with it. Generally the handler should also check the Compiler state to determine what to do in either interpretation or compilation.

Dictionary

Rx has a single dictionary consisting of a linked list of headers. The current form of a header is shown in the chart below. Pay special attention to the accessors. Each of these words corresponds to a field in the dictionary header. When dealing with dictionary headers, it is recommended that you use the accessors to access the fields since it is expected that the exact structure of the header will change over time.

field	holds	accessor
link	link to the previous entry, 0 if last entry	d:link
xt	link to start of the function	d:xt
class	link to the class handler function	d:class
name	zero terminated string	d:name

The initial dictionary is constructed at the end of this file. It'll take a form like this:

: 0000
d 0
r _dup
r class:primitive
s dup

: 0001
r 0000
r _drop
r class:primitive
s drop

: 0002
r 0001
r _swap
r class:primitive
s swap

Each entry starts with a pointer to the prior entry (with a pointer to zero marking the first entry in the dictionary), a pointer to the start of the function, a pointer to the class handler, and a nul terminated string indicating the name exposed to the Rx interpreter.

Rx will store the pointer to the most recent entry in a variable called Dictionary. For simplicity, we just assign the last entry an arbitrary label of 9999. This is set at the start of the source. (See In the Beginning...)

Rx provides accessor functions for each field. Since the number of fields (or their ordering) may change over time, using these reduces the number of places where field offsets are hard coded.

: d:link
i re......

: d:xt
i liadre..
d 1

: d:class
i liadre..
d 2

: d:name
i liadre..
d 3

A traditional Forth has create to make a new dictionary entry pointing to the next free location in Heap. Rx has newentry which serves as a slightly more flexible base. You provide a string for the name, a pointer to the class handler, and a pointer to the start of the function. Rx does the rest.

: newentry
i lifepuli
r Heap
r Dictionary
i felica..
r comma
i lica....
r comma
i lica....
r comma
i lica....
r comma:string
i polistre
r Dictionary

Rx doesn't provide a traditional create as it's designed to avoid assuming a normal input stream and prefers to take its data from the stack.

Dictionary Search

: Which
d 0

: Needle
d 0

: found
i listlire
r Which
r _nop

: find
i lilistli
d 0
r Which
r Dictionary
i fe......

: find_next
i zr......
i dulica..
r d:name
i lifelica
r Needle
r s:eq
i licc....
r found
i feliju..
r find_next

: d:lookup
i listlica
r Needle
r find
i lifere..
r Which

Number Conversion

This code converts a zero terminated string into a number. The approach is very simple:

Store an internal multiplier value (-1 for negative, 1 for positive)
Clear an internal accumulator value
Loop:
- Fetch the accumulator value
- Multiply by 10
- For each character, convert to a numeric value and add to the accumulator
- Store the updated accumulator
When done, take the accumulator value and the modifier and multiply them to get the final result

At this time Rx only supports decimal numbers.

: next
i lica....
r fetch-next
i zr......
i lisuswpu
d 48
i swlimuad
d 10
i poliju..
r next

: check
i dufelieq
d 45
i zr......
i drswdrli
d -1
i swliadre
d 1

: s:to-number
i liswlica
d 1
r check
i liswlica
d 0
r next
i drmure..

Token Processing

An input token has a form like:

<prefix-char>string

Rx will check the first character to see if it matches a known prefix. If it does, it will pass the string (without the token) to the prefix handler. If not, it will attempt to find the token in the dictionary.

Prefixes are handled by functions with specific naming conventions. A prefix name should be:

prefix:<prefix-char>

Where is the character for the prefix. These should be compiler macros (using the class:macro class) and watch the compiler state to decide how to deal with the token. To find a prefix, Rx stores the prefix character into a string named prefixed. It then searches for this string in the dictionary. If found, it sets an internal variable (prefix:handler) to the dictionary entry for the handler function. If not found, prefix:handler is set to zero. The check, done by prefix?, also returns a flag.

: prefix:no
d 32
d 0
: prefix:handler
d 0
: prefixed
s prefix:_
: prefix:prepare
i feliliad
r prefixed
d 7
i stre....
: prefix:has-token?
i dulica..
r s:length
i lieqzr..
d 1
i drdrlire
r prefix:no
: prefix?
i lica....
r prefix:has-token?
i lica....
r prefix:prepare
i lilica..
r prefixed
r d:lookup
i dulistli
r prefix:handler
d 0
i nere....

Rx uses prefixes for important bits of functionality including parsing numbers (prefix with #), obtaining pointers (prefix with &), and starting new functions (using the : prefix).

I use jump for tail call eliminations here.

prefix	used for	example
#	numbers	#100
$	ASCII characters	$e
&	pointers	&swap
:	definitions	:foo
(	Comments	(n-)

: prefix:(
i drre....

: prefix:#
i lica....
r s:to-number
i liju....
r class:data

: prefix:$
i feliju..
r class:data

: prefix::
i lilifeli
r class:word
r Heap
r newentry
i ca......
i lifelife
r Heap
r Dictionary
i lica....
r d:xt
i stlilist
d -1
r Compiler
i re......

: prefix:&
i lica....
r d:lookup
i lica....
r d:xt
i feliju..
r class:data

Quotations

Quotations are anonymous, nestable blocks of code. Rx uses them for control structures and some aspects of data flow. A quotation takes a form like:

[ #1 #2 ]
#12 [ square #144 eq? [ #123 ] [ #456 ] choose ] call

Begin a quotation with [ and end it with ].

: t-[
i lifeliad
r Heap
d 2
i lifelili
r Compiler
d -1
r Compiler
i stlilica
r _packedjump
r comma:opcode
i lifelili
r Heap
d 0
r comma
i ca......
i lifere..
r Heap

: t-]
i lilica..
r _ret
r comma:opcode
i lifeswli
r Heap
r _lit
i lica....
r comma:opcode
i lica....
r comma
i swstlist
r Compiler
i lifezr..
r Compiler
i drdrre..

Lightweight Control Structures

Rx provides a couple of functions for simple flow control apart from using quotations. These are repeat, again, and 0;. An example of using them:

:s:length dup [ repeat fetch-next 0; drop again ] call swap - #1 - ;

These can only be used within a definition or quotation. If you need to use them interactively, wrap them in a quote and call it.

: repeat
i lifere..
r Heap

: again
i lilica..
r _lit
r comma:opcode
i lica....
r comma
i liliju..
r _jump
r comma:opcode

: t-0;
i lifezr..
r Compiler
i drliliju
r _zret
r comma:opcode

: t-push
i lifezr..
r Compiler
i drliliju
r _push
r comma:opcode

: t-pop
i lifezr..
r Compiler
i drliliju
r _pop
r comma:opcode

Interpreter

The interpreter is what processes input. What it does is:

Take a string
See if the first character has a prefix handler
- Yes: pass the rest of the string to the prefix handler for processing
- No: lookup in the dictionary
  - Found: pass xt of word to the class handler for processing
  - Not found: report error via err:notfound

First up, the handler for dealing with words that are not found. This is defined here as a jump to the handler for the Nga NOP instruction. It is intended that this be hooked into and changed.

As an example, in Rx code, assuming an I/O interface with some support for strings and output:

[ $? putc space 'word not found' puts ]
&err:notfound #1 + store

: err:notfound
i liju....
r _nop

call:dt takes a dictionary token and pushes the contents of the d:xt field to the stack. It then calls the class handler stored in d:class.

: call:dt
i dulica..
r d:xt
i feswlica
r d:class
i feju....

: input:source
d 0

: interpret:prefix
i lifezr..
r prefix:handler
i lifeliad
r input:source
d 1
i swliju..
r call:dt
: interpret:word
i lifeliju
r Which
r call:dt

: interpret:noprefix
i lifelica
r input:source
r d:lookup
i linelili
d 0
r interpret:word
r err:notfound
i liju....
r choose

: interpret
i dulistli
r input:source
r prefix?
i ca......
i lililiju
r interpret:prefix
r interpret:noprefix
r choose

The Initial Dictionary

The dictionary is a linked list. This sets up the initial dictionary. Maintenance of this bit is annoying, but it generally shouldn't be necessary to change this unless you are adding new functions to the Rx kernel.

: 0000
d 0
r _dup
r class:primitive
s dup
: 0001
r 0000
r _drop
r class:primitive
s drop
: 0002
r 0001
r _swap
r class:primitive
s swap
: 0003
r 0002
r _call
r class:primitive
s call
: 0004
r 0003
r _eq
r class:primitive
s eq?
: 0005
r 0004
r _neq
r class:primitive
s -eq?
: 0006
r 0005
r _lt
r class:primitive
s lt?
: 0007
r 0006
r _gt
r class:primitive
s gt?
: 0008
r 0007
r _fetch
r class:primitive
s fetch
: 0009
r 0008
r _store
r class:primitive
s store
: 0010
r 0009
r _add
r class:primitive
s +
: 0011
r 0010
r _sub
r class:primitive
s -
: 0012
r 0011
r _mul
r class:primitive
s *
: 0013
r 0012
r _divmod
r class:primitive
s /mod
: 0014
r 0013
r _and
r class:primitive
s and
: 0015
r 0014
r _or
r class:primitive
s or
: 0016
r 0015
r _xor
r class:primitive
s xor
: 0017
r 0016
r _shift
r class:primitive
s shift
: 0018
r 0017
r t-push
r class:macro
s push
: 0019
r 0018
r t-pop
r class:macro
s pop
: 0020
r 0019
r t-0;
r class:macro
s 0;
: 0021
r 0020
r fetch-next
r class:word
s fetch-next
: 0022
r 0021
r store-next
r class:word
s store-next
: 0023
r 0022
r s:to-number
r class:word
s s:to-number
: 0024
r 0023
r s:eq
r class:word
s s:eq?
: 0025
r 0024
r s:length
r class:word
s s:length
: 0026
r 0025
r choose
r class:word
s choose
: 0027
r 0026
r if
r class:word
s if
: 0028
r 0027
r -if
r class:word
s -if
: 0029
r 0028
r prefix:(
r class:macro
s prefix:(
: 0030
r 0029
r Compiler
r class:data
s Compiler
: 0031
r 0030
r Heap
r class:data
s Heap
: 0032
r 0031
r comma
r class:word
s ,
: 0033
r 0032
r comma:string
r class:word
s s,
: 0034
r 0033
r t-;
r class:macro
s ;
: 0035
r 0034
r t-[
r class:macro
s [
: 0036
r 0035
r t-]
r class:macro
s ]
: 0037
r 0036
r Dictionary
r class:data
s Dictionary
: 0038
r 0037
r d:link
r class:word
s d:link
: 0039
r 0038
r d:xt
r class:word
s d:xt
: 0040
r 0039
r d:class
r class:word
s d:class
: 0041
r 0040
r d:name
r class:word
s d:name
: 0042
r 0041
r class:word
r class:word
s class:word
: 0043
r 0042
r class:macro
r class:word
s class:macro
: 0044
r 0043
r class:data
r class:word
s class:data
: 0045
r 0044
r newentry
r class:word
s d:add-header
: 0046
r 0045
r prefix:#
r class:macro
s prefix:#
: 0047
r 0046
r prefix::
r class:macro
s prefix::
: 0048
r 0047
r prefix:&
r class:macro
s prefix:&
: 0049
r 0048
r prefix:$
r class:macro
s prefix:$
: 0050
r 0049
r repeat
r class:macro
s repeat
: 0051
r 0050
r again
r class:macro
s again
: 0052
r 0051
r interpret
r class:word
s interpret
: 0053
r 0052
r d:lookup
r class:word
s d:lookup
: 0054
r 0053
r class:primitive
r class:word
s class:primitive
: 0055
r 0054
r Version
r class:data
s Version
: 9999
r 0055
r err:notfound
r class:word
s err:notfound

Appendix: Words, Stack Effects, and Usage

Word	Stack	Notes
dup	n-nn	Duplicate the top item on the stack
drop	nx-n	Discard the top item on the stack
swap	nx-xn	Switch the top two items on the stack
call	p-	Call a function (via pointer)
eq?	nn-f	Compare two values for equality
-eq?	nn-f	Compare two values for inequality
lt?	nn-f	Compare two values for less than
gt?	nn-f	Compare two values for greater than
fetch	p-n	Fetch a value stored at the pointer
store	np-	Store a value into the address at pointer
+	nn-n	Add two numbers
-	nn-n	Subtract two numbers
*	nn-n	Multiply two numbers
/mod	nn-mq	Divide two numbers, return quotient and remainder
and	nn-n	Perform bitwise AND operation
or	nn-n	Perform bitwise OR operation
xor	nn-n	Perform bitwise XOR operation
shift	nn-n	Perform bitwise shift
fetch-next	a-an	Fetch a value and return next address
store-next	na-a	Store a value to address and return next address
push	n-	Move value from data stack to address stack
pop	-n	Move value from address stack to data stack
0;	n-n OR n-	Exit word (and `drop`) if TOS is zero
s:to-number	s-n	Convert a string to a number
s:eq?	ss-f	Compare two strings for equality
s:length	s-n	Return length of string
choose	fpp-?	Execute p1 if f is -1, or p2 if f is 0
if	fp-?	Execute p if flag f is true (-1)
-if	fp-?	Execute p if flag f is false (0)
Compiler	-p	Variable; holds compiler state
Heap	-p	Variable; points to next free memory address
,	n-	Compile a value into memory at `here`
s,	s-	Compile a string into memory at `here`
;	-	End compilation and compile a return instruction
[	-	Begin a quotation
]	-	End a quotation
Dictionary	-p	Variable; points to most recent header
d:link	p-p	Given a DT, return the address of the link field
d:xt	p-p	Given a DT, return the address of the xt field
d:class	p-p	Given a DT, return the address of the class field
d:name	p-p	Given a DT, return the address of the name field
class:word	p-	Class handler for standard functions
class:primitive	p-	Class handler for Nga primitives
class:macro	p-	Class handler for immediate functions
class:data	p-	Class handler for data
d:add-header	saa-	Add an item to the dictionary
prefix:#	s-	# prefix for numbers
prefix::	s-	: prefix for definitions
prefix:&	s-	& prefix for pointers
prefix:$	s-	$ prefix for ASCII characters
repeat	-a	Start an unconditional loop
again	a-	End an unconditional loop
interpret	s-?	Evaluate a token
d:lookup	s-p	Given a string, return the DT (or 0 if undefined)
err:notfound	-	Handler for token not found errors

Legalities

Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

My thanks go out to Michal J Wallace, Luke Parrish, JGL, Marc Simpson, Oleksandr Kozachuk, Jay Skeer, Greg Copeland, Aleksej Saushev, Foucist, Erturk Kocalar, Kenneth Keating, Ashley Feniello, Peter Salvi, Christian Kellermann, Jorge Acereda, Remy Moueza, John M Harrison, and Todd Thomas. All of these great people helped in the development of Retro 10 & 11, without which Rx wouldn't have been possible.

27 KiB Raw Blame History