## Saturday, June 18, 2016

### String operations and bioinformatics

Strings makes it possible to generalize the concept of sets. In BigZ a set is a nested set of nested sets and lists like

{ 123 ( 234 { 345 456 { 567 678 } } ) } cr zet.
{123,(234,{345,456,{567,678}})} ok

and the only lack of generality concern the atomic elements, which must be non negative single numbers. But virtually anything can be denoted as a string which can be interpreted as a list of characters:

s" {Hello world!,How are you?}" >str stringset>zet cr zet.
{(72,101,108,108,111,32,119,111,114,108,100,33),(72,111,119,32,97,114,101,32,121,111,117,63)} ok
In this way also sets of big integers, Gaussian integers etc can be elements of sets.

A nice way to handle strings in Forth is using a string stack, which in this implementation consists of two stacks, one for the arrays of ASCII signs and one for addresses to the arrays of signs.

>str \ ad n -- string    Push a string on the stack
str> \ string -- ad n    Pop a string from the stack
str@ \ string -- string | -- ad n

sempty \ string -- string | -- flag
```.str \ --  Prints the stack without changing it
str. \ str --  Print and drop the topmost element

sdup sdrop sover snip sswap srot stuck spick does the normal operations.```
`soover \ str1 str2 str3 -- str1 str2 str3 str1`
`A shorter way to enter strings from commando line is`
`s Hello world"`
`However, in definitions one must use `
`s" Hello world" >str`
`Some words for string manipulations:`
`s& \ s1 s2 -- s1&s2   Concatenation`
`sleft \ s1 -- s2 | n --  Skip all but the n leftmost characters`
sright \ s1 -- s2 | n -- The samr for the n rightmost chars
`ssplit \ s -- s' s" | n --  split string after the nth letter`
`sanalyze \ s1 s2 -- s1 s3 s1 s4 / s2 | -- flag `
`split s2 if s1 is a part of s2 and if true flag then s2=s3&s1&s4.`
`substring \ s1 s2 -- s1 s2 | -- flag`
`sreplace \ s1 s2 s3 -- s4    Replace s2 with s1 in s3`
`scomp \ s1 s2 -- | -- n    -1:s1>s2, +1:s1<s2, 0:s1=s2`
`snull \ -- emptystring`
`schr& \ s -- s' | ch --   Concatenate ch to top string`
```slen= \ s1 s2 -- | -- flag   Test if same length
strail \ s -- s'  Remove trailing spaces
>capital \ ch -- ch'  Change common to capital
>common \ ch -- ch'  The oposite
capital \ ch --flag  Test if capital letter
common \ ch -- flag  Test if common letter
slower \ s -- s'  Change to lower in string
supper \ s -- s'  Opposite as above
str>ud \ s -- s' | -- ud flag   Unsigned double from string
```
```str>d \ s -- s' | -- d flag     Double from string
```
`snobl \ s -- s'      Remove all blanks`
`sjustabc \ s -- s'   Remove all signs but eng. letters`
`alphabet \ s -- s'   Gives the alphabet of string`
`zet>stringset \ set -- string`
`stringset>zet \ string -- set`
`sunion \ str1 str2 -- str3`
`sintersection \ str1 str2 -- str3`
`sdiff \ str1 str2 -- str3`
`s {brown,red,orange,yellow,green}"  ok`
```s {blue,violet,brown,black}"  ok
sunion str. {black,brown,violet,blue,green,yellow,orange,red} ok```
```hamming \ s1 s2 -- s1 s2 | n   The Hamming distance
```
`editdistance \ s1 s2 -- s1 s2 | n   The Levenshtein distance`
```
```
`This code is now included in the BigZ code.`