.. include:: ./global.rst ####### Strings ####### Strings are the usual ``"``-delimited sequences of "characters", here, Unicode code points. Normal escape sequences are available: ``\n``, ``\t`` etc. and ``\u...`` (for up to four digit hex sequences) and ``\U...`` (for up to eight digit hex sequences). There is an additional ``\x..`` (for up to two digit hex sequences) which is only useful for constructing non-UTF-8 pathnames. Multi-line strings are perfectly reasonable. .. code-block:: idio :caption: :file:`strings.idio` s1 := "hello\nworld" s2 := "hello world" printf "Does <<%s>> equal? <<%s>>? %s\n" s1 s2 (equal? s1 s2) ;; enter one directly (source files are UTF-8) s1 = "ħello" ;; or using an escape sequence s2 = "\u0127ello" printf "Does <<%s>> equal? <<%s>>? %s\n" s1 s2 (equal? s1 s2) ;; Note that you only need pass as many hex digits as necessary to ;; distinguish the code point so long as the following characters are ;; not possible hex digits. ;; ;; We can't say \u127ello as the "up to four digits" will consume 127e ;; which is ቾ (U+127E ETHIOPIC SYLLABLE CO). ;; ;; In the next example U+0050 LATIN CAPITAL LETTER P can be reduced to ;; \u50 because the next code point is l (U+006C LATIN SMALL LETTER L) ;; which is not a hex digit. s1 = "\u50lay time!" ;; unicode/describe works for strings too which is helpful when the ;; visual representation is confusing, for example, this is not é ;; (U+00E9 LATIN SMALL LETTER E WITH ACUTE) unicode/describe "é" .. code-block:: console $ idio strings Does <> equal? <>? #t Does <<ħello>> equal? <<ħello>>? #t 0065;;Ll;;;;;;;;;;0045;;0045 # Letter Lowercase Alphabetic ASCII_Hex_Digit Uppercase=0045 Titlecase=0045 0301;;Mn;;;;;;;;;;;; # Mark Extend String Indexing =============== You can make integer-index accesses of strings which return Unicode code points. You can capture substrings of strings from an index position :samp:`{p0}` through to but excluding another index position (defaulting to the rest of the string). .. code-block:: idio :caption: :file:`string-access.idio` s1 := "ħello" ;; s1.0 is slightly slower printf "first code point is %s (or %s)\n" (string-ref s1 0) s1.0 printf "last code point is %s\n" s1.-1 slen := string-length s1 ss1 := substring s1 1 (slen - 1) printf "s1 from 1 up to %d is %s\n" (slen - 1) ss1 printf "ss1 is %d code points\n" (string-length ss1) ;; you can loop over the code points in strings for c in (substring s1 0 3) { write c ; output a reader-friendly format (newline) } printf "the first l of %s is at index %d\n" s1 (string-index s1 #\l) printf "the last l of %s is at index %d\n" s1 (string-rindex s1 #\l) .. code-block:: console $ idio string-access first code point is ħ (or ħ) last code point is o s1 from 1 up to 4 is ell ss1 is 3 code points #U+0127 #\e #\l the first l of ħello is at index 2 the last l of ħello is at index 3 Split/Join/Trim =============== :lname:`Idio` defaults to shell-like behaviour for splitting strings in that multiple adjacent instances of a delimiter are consumed together giving the sense of splitting a line of text into fields or words. You can, of course, be more exacting than that. .. code-block:: idio :caption: :file:`string-parts.idio` ;; two SPACEs at start and end and a TAB in the middle s1 := " hello world " words := split-string s1 " \t" printf "words are %s\n" words ;; fields is similar but uses IFS as the delimiter (which defaults to ;; the usual " \t\n") and returns an array with the first element ;; being the original string printf "fields are %s\n" (fields s1) printf "joined up words are %s\n" (join-string "-+-" words) printf "right-stripped string is '%s'\n" (strip-string s1 " ") printf "double-stripped string is '%s'\n" (strip-string s1 " " 'both) .. code-block:: console $ idio string-parts words are ("hello" "world") fields are #[ " hello world " "hello" "world" ] joined up words are hello-+-world right-stripped string is ' hello world' double-stripped string is 'hello world' ******************** Interpolated Strings ******************** :lname:`Idio` strings are not interpolated (a bit like the shell's single-quoted strings) but we have frequent need for interpolated strings. Here we can go one better and not just allow the expansion of a variable but the expansion of an expression. Interpolated strings are encoded as ``#S{...${expr}...}`` where everything between the outermost matching ``{`` and ``}`` are scanned for instances of the *interpolation sigil*, ``$``. A matching set of ``{`` and ``}`` is read in and the expression therein is evaluated, the result being converted to a string (if required) and replacing the interpolated expression. The rest of the string is added in a similar way. .. code-block:: idio :caption: :file:`interpolated-string.idio` printf #S{Your SHELL is ${SHELL} (${string-length SHELL} code points)\n} .. code-block:: console $ idio interpolated-string Your SHELL is /bin/bash (9 code points) Interpolated strings are used very heavily in code generation. .. include:: ./commit.rst