AO currently supports regular expressions in its Utility activity. What it doesn't do, however, is really exploit the power of these expressions. Currently, this utility can:
- Evaluate a single regular expression to true or false, placing the result into a context item.
- Perform a simple substitution of matched characters with a fixed string, placing the new string into a context item.
I propose the that we extend this utility in the following ways:
- Allow more than one operation per Utility activity, the same as is done with the Assign activity. This would allow us to perform multiple regex operations without needing a separate activity for each one.
- Support insertion of context items into the output string of a regular expression.
- Allow the use of capture groups to extract information into an XML document.
The first item is, I think, self-explanatory. The second would allow us to replace a series of characters in the input with a variable string from a context item.
This last one is, for me, the one that adds the most functionality. As an example, I will use the following input string:
To capture the three items of information here into context items, I could use three separate substitutions like so:
Search: username=(\w+) Replace: $1
Search: password=(\w+) Replace: $1
Search: role=(\w+) Replace: $1
Each of these could place the relevant item into a context item, but three Utility activities are required which is hardly concise.
Another option is to concatenate with a separator like so:
Search: .*username=(\w+)&password=(\w+)&role=(\w+).* Replace: $1,$2,$3
Then I'd need to call a tokenize function to split these for me. What we currently can't do, but that would make these regexes significantly more powerful for data retrieval, is an extract function that would allow this input:
And would produce output like this:
<matches> <match group="1">myUser</match> <match group="2">myPassword</match> <match group="3">BLAdmins</match> </matches>
That's a single regex required, all three items of information have been extracted, and they're in a nice XML format that I can now process as desired.