JSubstitute replaces escaped characters like \a and variables of the
form $name in a JString. "name" can either be a literal, in which case
we store the value and perform the replacement, or it can be a regular
expression, in which case we call the virtual function GetValue(). An
example of the latter case is [+-]?[0-9]+, which is used in regular
expression replace patterns to denote submatches.
By default C escapes are not expanded since this is most convenient for
patterns specified in source code; in user-specified patterns in
interactive programs, it may be better to add these escapes so that
non-printing characters may be entered conveniently.
void JSubstitute::SetCEscapes |
( |
| ) |
|
Adds entries to the table corresponding to the non-numeric escapes specific to C (to get exactly C's escape set you also need to call SetNonprintingEscapes and SetWhitespaceEscapes, then remove the '' escape). The escapes and their values are:
\\ \
\' '
\" "
\? ?
The numeric values are naturally those chosen by the compiler for those escape sequences; this means they not only vary by character set but also by system.
Note that ANSI does not define what happens if you backslash a character other than one of the character escapes or an octal or hex code, so you have to choose the behavior you want with IgnoreUnrecognized().
void JSubstitute::SetNonprintingEscapes |
( |
| ) |
|
Adds entries to the table corresponding to standard escapes for certain non-printing characters. The escapes and their values are:
\a bell
\b backspace
\e escape
The numeric values for and are those chosen by the compiler for those escape sequences; is not a C escape (though it appears in Perl) and has the value 1B hex.
Note that ANSI does not define what happens if you backslash a character other than these or an octal or hex code, but you do. If you want other backslashed characters to represent themselves (so that the backslash is effectively removed) call SetIgnoreUnrecognized() first. If you want no changes to be made other than those listed here, call SetIgnoreUnrecognized(false).
void JSubstitute::SetRegexExtensions |
( |
| ) |
|
Adds entries to the table corresponding to escapes useful as shorthands in defining regular expressions with JRegex. The escapes and their values are:
\d a digit, [0-9]
\D a non-digit
\w a word character, [a-zA-Z0-9_]
\W a non-word character
\s a whitespace character, [ \f\n\r\t\v]
\S a non-whitespace character
\< an anchor just before a word (between \W and \w)
\> an anchor just after a word (between \w and \W)
These escapes behave as atoms so they can be quantified normally and will not affect parenthesis numbering (this last requirement is why certain popular shorthands will not be added until Spencer's regexes acquire non-capturing parentheses so they can be defined atomically).
Note: these are normally most useful when the behavior for unrecognized escapes is to leave them alone. This is true whenever the string will be passed to another object which will do further backslash escape processing, such as JRegex (where "[" begins a character class while "\[" inserts a literal "[").
void JSubstitute::SetWhitespaceEscapes |
( |
| ) |
|
Adds entries to the table corresponding to the standard (in C, Perl, and other unixy things anyway) codes for whitespace. The escapes and their values are:
\f form feed
\n newline
\r carriage return
\t horizontal tab
\v vertical tab
The numeric values are naturally those chosen by the compiler for those escape sequences; this means they not only vary by character set but also by system.
void JSubstitute::Substitute |
( |
JString * |
s | ) |
const |
Scans the given JString for each backslash and dollar symbol.
If the backslash is followed by a character that has a value, that value is substituted for the backslash plus character. Otherwise, the backslash is removed if IgnoreUnrecognized is not set.
If ControlEscapes is set, and the backslash is followed by 'c', then the next character is converted to a control character if it is between 'A' and '_'. Otherwise, the '\c' is removed if IgnoreUnrecognized is not set. '@' is not included because this would produce nullptr, which is the C string terminator.
If PureEscapeEngine is not set and a $ is found, the value of the longest matching variable name is used to replace the $ and the variable name. If nothing matches, the $ is removed.
If a special character ('\', '\c' if ControlEscapes, '$' if not PureEscapeEngine) is found at the end of the string, it is removed.
To avoid infinite loops, substituted values are not re-scanned, so backslashes and dollars in value strings are left untouched.