retroforth/future/utf8.retro

57 lines
1 KiB
Forth
Raw Normal View History

# UTF-8 Characters
UTF-8 allows for characters to be one to four bytes long. Since Retro
is 32-bits internally, all characters can fit into a sincle entry on
the stack. These words will be used to pack and unpack the character
values.
~~~
:uc:pack (????n-c) ;
:uc:unpack (c-????n) ;
~~~
# UTF-8 Strings
Strings in Retro have been C-style null terminated sequences of ASCII
characters. I'm seeking to change this as I'd like to support Unicode
(UTF-8) and to merge much of the string and array handling code.
This will be an ongoing process.
Temporary sigil.
~~~
:sigil:" (-a) a:from-string class:data ; immediate
~~~
Return the length (in utf8 characters or bytes) of a string.
~~~
:us:length (a-n) #0 swap [ #192 and #128 -eq? + ] a:for-each n:abs ;
:us:length/bytes (a-n) a:length ;
~~~
~~~
~~~
Fetch a character from a string.
~~~
:us:fetch (an-c) ;
~~~
Store a character into a string.
~~~
:us:store (can-) ;
~~~
Tests.
```
"((VV)=V)/V←,V us:length n:put nl
"((VV)=V)/V←,V us:length/bytes n:put nl
```