2018-12-28 05:58:29 +01:00
_
__ _ _ _| |_ ___ _ __ ___ _ _
/ _` | | | | __/ _ \| '_ \/ __| | | |
| (_| | |_| | || (_) | |_) \__ | |_| |
\__,_|\__,_|\__\___/| .__/|___/\__, |
|_| |___/
2017-10-16 18:09:39 +02:00
2018-12-28 05:58:29 +01:00
Autopsy is a set of debugging tools for RETRO.
2017-10-16 18:09:39 +02:00
2018-12-28 05:58:29 +01:00
# Background
2017-10-16 18:09:39 +02:00
2018-12-28 05:40:04 +01:00
RETRO runs on a virtual machine called Nga. The instruction set is MISC inspired, consisting of just 30 instructions:
2017-10-16 18:09:39 +02:00
2018-12-28 05:58:29 +01:00
0 nop 10 return 20 divmod
1 lit <v> 11 eq 21 and
2 dup 12 neq 22 or
3 drop 13 lt 23 xor
4 swap 14 gt 24 shift
5 push 15 fetch 25 zret
6 pop 16 store 26 end
7 jump 17 add 27 ienumerate
8 call 18 subtract 28 iquery
9 ccall 19 multiply 29 iinvoke
The first two characters of each instruction name are sufficient to identify the instruction.
2017-10-16 18:09:39 +02:00
2018-12-28 05:58:29 +01:00
Nga exposes memory as an array of 32-bit signed integers. Each memory location can store four instructions. The assembler expects the instructions to be named using their two character identifiers. E.g.,
2017-10-16 18:09:39 +02:00
2018-04-12 18:19:32 +02:00
'lica.... i
#100 d
2017-10-16 18:09:39 +02:00
2018-12-28 05:58:29 +01:00
# Disassembly
I use '..' for 'no(p)' and then construct a string with all of these. This will be used to resolve names. The ?? at the end will be used for unidentified instructions.
2018-04-12 18:19:32 +02:00
~~~
2018-12-28 05:22:19 +01:00
'..lidudrswpupojucaccreeqneltgtfestadsumudianorxoshzrenieiqii??
2018-04-12 18:19:32 +02:00
'INST s:const
~~~
2017-10-16 18:09:39 +02:00
2018-12-28 05:58:29 +01:00
Since instructions are packed, I need to unpack them before I can run or display the individual instructions. I implement `unpack` for this.
2017-10-16 18:09:39 +02:00
2018-04-12 18:19:32 +02:00
~~~
2017-10-16 18:09:39 +02:00
{{
:mask #255 and ;
:next #8 shift ;
---reveal---
:unpack (n-dcba)
dup mask swap next
dup mask swap next
dup mask swap next
2018-04-12 18:19:32 +02:00
'abcd 'dcba reorder ;
2017-10-16 18:09:39 +02:00
}}
2018-04-12 18:19:32 +02:00
~~~
Now it's possible to write words to display instruction bundles. The formats are kept simple. For a bundle with `lit / lit / add / lit`, this will display either the opcodes (`1,1,17,1`) or a string with the abbreviations (`liliadli`).
~~~
:name-for (n-cc)
2018-12-28 05:22:19 +01:00
#30 n:min #2 * &INST + fetch-next swap fetch swap ;
2018-04-12 18:19:32 +02:00
:display:bundle<raw> (n-)
2018-05-07 18:36:37 +02:00
unpack '%n,%n,%n,%n s:format s:put ;
2018-04-12 18:19:32 +02:00
:display:bundle<named> (n-)
2018-05-07 18:36:37 +02:00
unpack #4 [ name-for c:put c:put ] times ;
2018-04-12 18:19:32 +02:00
~~~
2018-12-28 05:58:29 +01:00
So now I'm ready to write an actual disassembler. I'll provide an output setup like this:
2018-04-12 18:19:32 +02:00
2018-12-28 05:58:29 +01:00
(address) 'instructionbundle i
(address) #value d (possibly_`reference`)
2018-04-12 18:19:32 +02:00
If the value corresponds to a word in the `Dictionary`, the disassembler will display a message indicating the possible name that corresponds to the value.
To begin, I'll add a variable to track the number of `li` instructions. (These require special handling as they push a value in the following cells to the stack).
~~~
'LitCount var
~~~
I then wrap `name-for` with a simple check that increments `LitCount` as needed.
~~~
:name-for<counting-li> (n-cc)
dup #1 eq? [ &LitCount v:inc ] if name-for ;
~~~
To actually display a bundle, I need to decide on what it is. So I have a `validate` word to look at each instruction and make sure all are actual instructions.
~~~
:valid? (n-f)
unpack
2018-12-28 05:22:19 +01:00
[ #0 #29 n:between? ] bi@ and
[ [ #0 #29 n:between? ] bi@ and ] dip and ;
2018-04-12 18:19:32 +02:00
~~~
With this and the `LitCount`, I can determine how to render a bundle.
I split out each type (instruction, reference/raw, and data) into a separate handler.
~~~
:render-inst (n-)
2018-05-07 18:36:37 +02:00
$' c:put unpack #4 [ name-for<counting-li> c:put c:put ] times sp $i c:put ;
2018-04-12 18:19:32 +02:00
:render-data (n-)
2018-05-07 18:36:37 +02:00
$# c:put n:to-string s:put sp $d c:put ;
2018-04-12 18:19:32 +02:00
:render-ref (n-)
dup d:lookup-xt n:-zero?
2018-12-28 05:40:04 +01:00
[ dup render-data
d:lookup-xt d:name '\t\t(possibly\_`%s`) s:format s:put ]
2018-04-12 18:19:32 +02:00
[ render-data ] choose ;
~~~
Then I use these and my `valid?` checker to implement a single word to
render the packed cell in a meaningful manner.
~~~
2017-10-16 18:09:39 +02:00
:render-packed (n-)
2018-04-12 18:19:32 +02:00
@LitCount n:zero?
[ dup valid?
[ render-inst ]
[ render-ref ] choose ]
[ render-ref &LitCount v:dec ] choose ;
~~~
And now to tie it all together:
~~~
'TryToIdentifyWords var
2017-10-16 18:09:39 +02:00
:disassemble (an-)
2018-12-28 05:22:19 +01:00
#0 !LitCount
2018-04-12 18:19:32 +02:00
[
@TryToIdentifyWords
[ dup d:lookup-xt n:-zero?
2018-05-07 18:36:37 +02:00
[ dup d:lookup-xt d:name nl s:put nl ] if ] if
2018-04-12 18:19:32 +02:00
fetch-next
2018-12-28 05:40:04 +01:00
over $( c:put n:put $) c:put sp (address)
2018-04-12 18:19:32 +02:00
render-packed nl (inst_or_data)
2017-10-16 18:09:39 +02:00
] times drop ;
2018-04-12 18:19:32 +02:00
~~~
2018-12-28 05:58:29 +01:00
# Execution Trace and Single Stepper
2018-04-12 18:19:32 +02:00
Ok, now on to the fun bit: execution trace and single stepping through a word.
This entails writing an implementation of Nga in RETRO. So to start, setup space for the data and address ("return") stacks, as well as variables for the stack pointers and instruction pointer.
~~~
2018-12-28 05:58:29 +01:00
'DataStack d:create #128 allot
'ReturnStack d:create #768 allot
2018-04-12 18:19:32 +02:00
'SP var
'RP var
'IP var
~~~
Next, helpers to push values from the real stacks to the simulated ones. The stack pointer will point to the next available cell, not the actual top element.
2017-10-25 14:06:01 +02:00
~~~
2019-02-15 19:51:35 +01:00
:>s (n-) &DataStack @SP + store &SP v:inc ;
:s> (-n) &SP v:dec &DataStack @SP + fetch ;
:>r (n-) &ReturnStack @RP + store &RP v:inc ;
:r> (-n) &RP v:dec &ReturnStack @RP + fetch ;
2018-04-12 18:19:32 +02:00
~~~
One more helper, `[IP]` will return the value in memory at the location `IP` points to.
~~~
:[IP] @IP fetch ;
~~~
Now for the instructions. Taking a cue from the C implementation, I have a separate word for each instruction and then a jump table of addresses that point to these.
~~~
2019-02-15 19:51:35 +01:00
:i:no ;
:i:li &IP v:inc [IP] >s ;
:i:du s> dup >s >s ;
:i:dr s> drop ;
:i:sw s> s> (swap >s >s ;
:i:pu s> >r ;
:i:po r> >s ;
:i:ju s> n:dec !IP ;
:i:ca @IP >r i:ju ;
:i:cc s> s> nl dump-stack nl [ >s i:ca ] [ drop ] choose ;
:i:re r> !IP ;
:i:eq s> s> eq? >s ;
:i:ne s> s> -eq? >s ;
:i:lt s> s> swap lt? >s ;
:i:gt s> s> swap gt? >s ;
:i:fe s> fetch >s ;
:i:st s> s> swap store ;
:i:ad s> s> + >s ;
:i:su s> s> swap - >s ;
:i:mu s> s> * >s ;
:i:di s> s> swap /mod swap >s >s ;
:i:an s> s> and >s ;
:i:or s> s> or >s ;
:i:xo s> s> xor >s ;
:i:sh s> s> swap shift >s ;
:i:zr s> dup n:zero? [ drop i:re ] [ >s ] choose ;
:i:en #0 !RP ;
:i:ie #1 >s ;
:i:iq #0 dup >s >s ;
:i:ii s> s> nip c:put ;
2018-04-12 18:19:32 +02:00
~~~
With the instructions defined, populate the jump table. The order is crucial as the opcode number will be the index into this table.
~~~
'Instructions d:create
2018-12-28 05:58:29 +01:00
&i:no , &i:li , &i:du , &i:dr , &i:sw , &i:pu ,
&i:po , &i:ju , &i:ca , &i:cc , &i:re , &i:eq ,
&i:ne , &i:lt , &i:gt , &i:fe , &i:st , &i:ad ,
&i:su , &i:mu , &i:di , &i:an , &i:or , &i:xo ,
&i:sh , &i:zr , &i:en , &i:ie , &i:iq , &i:ii ,
2018-04-12 18:19:32 +02:00
~~~
With the populated table of instructions, implementing a `process-single-opcode` is easy. This will check the instruction to make sure it's valid, then call the corresponding handler in the instruction table. If not valid, this will report an error.
~~~
:process-single-opcode (n-)
2018-12-28 05:28:39 +01:00
dup #0 #29 n:between?
2018-04-12 18:19:32 +02:00
[ &Instructions + fetch call ]
2018-05-07 18:36:37 +02:00
[ 'Invalid_Instruction:_%n_! s:format s:put nl ] choose ;
2018-04-12 18:19:32 +02:00
~~~
Next is to unpack an instruction bundle and process each instruction.
~~~
:process-packed-opcode (n-)
unpack
process-single-opcode
process-single-opcode
process-single-opcode
process-single-opcode ;
~~~
So the guts of the Nga-in-RETRO are done at this point. Now to implement a method of stepping through execution of a word.
This will display output indicating state. It'll provide:
- current memory location
- values in instruction bundle
- stack depths
- data stack before execution
- data stack after exection
E.g.,
IP:13966 SP:3 RP:1
[19,0,0,0] - mu......
Stack: 9 3 3 -> 9 9
So helpers for displaying things:
~~~
:display-status
2018-05-07 18:36:37 +02:00
@RP @SP @IP 'IP:%n_SP:%n_RP:%n\n s:format s:put
[IP] [ unpack ] sip '__%n_->_[%n,%n,%n,%n]_->_ s:format s:put
[IP] unpack #4 [ name-for<counting-li> c:put c:put ] times nl ;
2018-04-12 18:19:32 +02:00
:display-data-stack
2018-05-07 18:36:37 +02:00
#0 @SP [ &DataStack over + fetch n:put sp n:inc ] times drop ;
2018-04-12 18:19:32 +02:00
:display-return-stack
2018-05-07 18:36:37 +02:00
#0 @RP [ &ReturnStack over + fetch n:put sp n:inc ] times drop ;
2018-04-12 18:19:32 +02:00
~~~
And then using the display helpers and instruction processor, a single stepper. (This also updates a `Steps` counter)
~~~
'Steps var
:step (-)
2019-02-15 19:51:35 +01:00
@IP d:lookup-xt n:-zero? [ @IP d:lookup-xt d:name nl tab s:put nl ] if
2018-04-12 18:19:32 +02:00
display-status
2019-02-15 19:51:35 +01:00
@IP n:inc fetch sp sp n:put nl
@IP #2 + fetch sp sp n:put nl
2018-05-07 18:36:37 +02:00
'__Stack:_ s:put display-data-stack '_->_ s:put
2018-04-12 18:19:32 +02:00
[IP] process-packed-opcode &IP v:inc
display-data-stack nl nl
&Steps v:inc ;
2019-02-15 19:51:35 +01:00
:astep
[IP] process-packed-opcode &IP v:inc
&Steps v:inc ;
2018-04-12 18:19:32 +02:00
~~~
And then wrap it with `times` to run multiple steps.
~~~
:steps (n-)
&step times ;
~~~
Then on to the tracer. This will `step` through execution until the word returns. I use a similar approach to how I handle this in the interface layers for RETRO (word execution ends when the address stack depth reaches zero).
The `trace` will empty the step counter and display the number of steps used.
~~~
:trace (a-)
#0 !Steps
2019-02-15 19:51:35 +01:00
!IP #0 >r
[ step @RP n:zero? @IP n:negative? or ] until
nl @Steps '%n_steps_taken\n s:format s:put ;
2018-04-12 18:19:32 +02:00
~~~
2018-12-28 05:58:29 +01:00
# Tests
2018-04-12 18:19:32 +02:00
2019-02-15 19:51:35 +01:00
```
2018-04-12 18:19:32 +02:00
:test
as{ 'liliaddu i #22 d #33 d }as
#3 #4 gt? [ #1 ] if ;
#0 #100 disassemble
2018-05-07 18:36:37 +02:00
nl '-------------------------- s:put nl
2018-04-12 18:19:32 +02:00
&TryToIdentifyWords v:on
#0 #100 disassemble
2019-02-15 19:51:35 +01:00
```