add s:tokenize-on-string to stdlib

FossilOrigin-Name: fbbc60112e0011614b639c24a061eca41d3924b93f9a8e02c0685243e888c3a9
2017-11-13 13:05:46 +00:00 · 2017-11-13 13:05:46 +00:00 · 9e7c27fcc7
commit 9e7c27fcc7
parent 990066ac08
5 changed files with 34 additions and 2 deletions
--- a/doc/Glossary.txt
+++ b/doc/Glossary.txt
@ -3433,6 +3433,18 @@ Class Handler: class:word | Namespace: {n/a} | Interface Layer: {n/a}

 ----------------------------------------------------------------

+s:tokenize-on-string
+
+  Data:  ss-a
+  Addr:  -
+  Float: -
+
+Takes a string (s1) and a substring (s2)  use as a separator. It splits the string into a set of substrings and returns a set containing pointers to each of them.
+
+Class Handler: class:word | Namespace: {n/a} | Interface Layer: {n/a}
+
+----------------------------------------------------------------
+
 s:trim

  Data:  s-s
--- a/interfaces/image.c
+++ b/interfaces/image.c
--- a/literate/RetroForth.md
+++ b/literate/RetroForth.md
@ -925,6 +925,25 @@ pointers to each of them.
 }}
 ~~~

+`s:tokenize-on-string` is like `s:tokenize`, but for strings.
+
+~~~
+{{
+  'Tokens var
+  'Needle var
+  :-match? (s-sf) dup @Needle s:contains-string? ;
+  :save-token (s-s) @Needle s:split-on-string s:keep buffer:add n:inc ;
+  :tokens-to-set (-a) here @Tokens buffer:size dup , [ fetch-next ,  ] times drop ;
+---reveal---
+  :s:tokenize-on-string (ss-a)
+    [ s:keep !Needle here #8192 + !Tokens
+      @Tokens buffer:set
+      [ repeat -match? 0; drop save-token again ] call s:keep buffer:add
+      tokens-to-set ] buffer:preserve ;
+}}
+~~~
+
+
 Ok, This is a bit of a hack, but very useful at times.

 Assume you have a bunch of values:
--- a/BIN
+++ b/BIN
--- a/words.tsv
+++ b/words.tsv
@ -277,6 +277,7 @@ s:to-lower	s-s	-	-	Convert uppercase ASCII characters in a string to lowercase.
 s:to-number	s-n	-	-	Convert a string to a number.			class:word	{n/a}	{n/a}	s	all	
 s:to-upper	s-s	-	-	Convert lowercase ASCII characters in a string to uppercase.			class:word	{n/a}	{n/a}	s	all	
 s:tokenize	sc-a	-	-	Takes a string and a character to use as a separator. It splits the string into a set of substrings and returns a set containing pointers to each of them.			class:word	{n/a}	{n/a}	{n/a}	{n/a}	
+s:tokenize-on-string	ss-a	-	-	Takes a string (s1) and a substring (s2)  use as a separator. It splits the string into a set of substrings and returns a set containing pointers to each of them.			class:word	{n/a}	{n/a}	{n/a}	{n/a}	
 s:trim	s-s	-	-	Trim leading and trailing whitespace from a string.			class:word	{n/a}	{n/a}	s	all	
 s:trim-left	s-s	-	-	Trim leading whitespace from a string.			class:word	{n/a}	{n/a}	s	all	
 s:trim-right	s-s	-	-	Trim trailing whitespace from a string.			class:word	{n/a}	{n/a}	s	all