2021-09-10 19:55:07 +02:00
|
|
|
|
# UTF-8 Characters
|
|
|
|
|
|
|
|
|
|
UTF-8 allows for characters to be one to four bytes long. Since Retro
|
|
|
|
|
is 32-bits internally, all characters can fit into a sincle entry on
|
|
|
|
|
the stack. These words will be used to pack and unpack the character
|
|
|
|
|
values.
|
|
|
|
|
|
|
|
|
|
~~~
|
|
|
|
|
:uc:pack (????n-c) ;
|
|
|
|
|
:uc:unpack (c-????n) ;
|
|
|
|
|
~~~
|
|
|
|
|
|
2021-08-26 14:25:19 +02:00
|
|
|
|
# UTF-8 Strings
|
|
|
|
|
|
|
|
|
|
Strings in Retro have been C-style null terminated sequences of ASCII
|
|
|
|
|
characters. I'm seeking to change this as I'd like to support Unicode
|
|
|
|
|
(UTF-8) and to merge much of the string and array handling code.
|
|
|
|
|
|
|
|
|
|
This will be an ongoing process.
|
|
|
|
|
|
|
|
|
|
Temporary sigil.
|
|
|
|
|
|
|
|
|
|
~~~
|
|
|
|
|
:sigil:" (-a) a:from-string class:data ; immediate
|
|
|
|
|
~~~
|
|
|
|
|
|
|
|
|
|
Return the length (in utf8 characters or bytes) of a string.
|
|
|
|
|
|
|
|
|
|
~~~
|
|
|
|
|
:us:length (a-n) #0 swap [ #192 and #128 -eq? + ] a:for-each n:abs ;
|
|
|
|
|
:us:length/bytes (a-n) a:length ;
|
|
|
|
|
~~~
|
|
|
|
|
|
2021-09-10 19:55:07 +02:00
|
|
|
|
~~~
|
|
|
|
|
~~~
|
|
|
|
|
|
|
|
|
|
|
2021-08-26 14:25:19 +02:00
|
|
|
|
Fetch a character from a string.
|
|
|
|
|
|
|
|
|
|
~~~
|
2021-09-10 19:55:07 +02:00
|
|
|
|
:us:fetch (an-c) ;
|
2021-08-26 14:25:19 +02:00
|
|
|
|
~~~
|
|
|
|
|
|
|
|
|
|
Store a character into a string.
|
|
|
|
|
|
|
|
|
|
~~~
|
2021-09-10 19:55:07 +02:00
|
|
|
|
:us:store (can-) ;
|
2021-08-26 14:25:19 +02:00
|
|
|
|
~~~
|
|
|
|
|
|
|
|
|
|
Tests.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
"((V⍳V)=⍳⍴V)/V←,V us:length n:put nl
|
|
|
|
|
"((V⍳V)=⍳⍴V)/V←,V us:length/bytes n:put nl
|
|
|
|
|
```
|
|
|
|
|
|