RegexVar.jl

A macro to fill variables straight from the string
Popularity
0 Stars
Updated Last
9 Years Ago
Started In
March 2013

Regex var

This library contains @regex_var which fills the regular expression straight from the string; you can type the variable next to the regular expression you want it to match. It uses the regular expressions Julia provides.(which is PCRE)

Usage:(Note: the throwing is currently poorly tested)

@regex_var input_string "regex string with $var, $(var::SomeType) and $(var::"regex string")" body
@regex_var input_string "regex string" i_happen_on_mismatch() body

There is also @regex_case, which runs the body of the first (fully)matching regex string.

@regex_case input_string begin
    if "a thing:.+" 
        do_a()
    end
    "b thing:.+" : do_b()
end
@regex_case input_string { "a thing:.+" : do_a(), "b thing" : do_b() }

Regex string structure

In using the macros effectively you need to look at what the arguments are that enter the macro body. If you look at :"abc$(var_a)def$(var_b)" you see the Expr structure does not simply contain a string. The arguments for the regex string is basically an expr(:macrocall, {:str, interesting_array...}) Everything except the interesting_array is not important. (but asserted to be there)

Basically each entry has a matcher, starting from the front, you match the first one, then you start from the beginning of the match to match the next one. The reason to start from the beginning, is that the matches can then be very general, but be terminated by the next match.

Going through the different elements we have:

Strings: they're just matched as mentioned.

Symbols -$a- they are completely general matches, the variable is defined to be whatever between the end of the previous and beginning of the next match.

Symbols with regex -$(a::"regex")- are filled with whatever matches, constrained by when the next match takes place. $(a::".+") is $a

Symbols with types -$(a::SomeType)- uses the regular expression returned from matcher_of_type, the result is then parsed to get the value with parse_thing(Type,String), both are ment to be extended by users.

Symbols where the symbol-string ends in !, they're not one of the programmed-in exceptions; it will take up the regular expression of the corresponding type without the !. Otherwise it corresponds to regular expressions:

Number sequence a::n! "[0-9]+"
Lowercase letters a::l! "[a-z]+"
Letters a::L! "([a-z]|[A-Z])+"
Whitespace a::w! "[ \t\n]+"

There are also 'settings' that can be changed with some of them;

TODO these dont currently work..

$now!,$n!: Makes one next regular expression search start from the end of the match of the the previous one.

$always_now!,$an!: Makes all next searches start from the end.

$not_now!,$nn!: Disables the above.

Convention on matcher_of_type

The convention on matcher_of_type is to have the regex of a thing to be 'minimal'. For instance none of the numbers allow whitespace, neither does matcher_of_type(Uint)=="[0-9]+"; does not allow for a + infront, whereas Integer does, because it might also be negative. Similarly Float requires a dot in it, whereas Number doesnt.

In the future i might try add a way to indicate 'allow whitespace around it' or 'widen what is allowed'.

Examples

Note: perhaps these are a bit arbitrary.

@regex_var input "The kittens name is $now!$(name::L!) he is $age weeks old." (name,age)
@regex_var input "$a,$b,$c" (a,b,c) #Comma separated thing.
@regex_var input ".+$(x::Number)" x #Get the first number.

More edge-case examples

@regex_var "123q" "$(x::Integer)3" x #returns 12
@regex_var "123q" "$(x::Integer)." x #Invalid input; 1 matches . so parser gets empty string!
#Following returns 123,"12"$n! asks to read _just_ the next one as its own match.
@regex_var "123q123" "$n!$(x::Integer).$(y)3" x,y 

Reference

matcher_of_type (tp)`

Returns a regular expression string corresponding to the given type or symbol. Symbols can have inbuild

Definitions should be minimal in what they allow to match.

parse_thing(tp::Type, str::String)

Tries to parse the string with the type as a guide.

@regex_var input regex_with_variables throw_out body

Parses the input string, using the regular expressions, and filling variables within the regex_with_variables, which are available in the body. If it mismatches throw_out is run, it should return or throw() to 'exit', throw_out can also be omitted; then it will always error if it mismatches.

For the nature of the regular expressions, read the readme.

@regex_case input clauses

Clauses may be in begin ...... end or {clause,clause,clause} and single clauses look like

if "regex_thing"
    then...
end #Or
"regex" : then_do()

The regular expressions themselves are just like in @regex_var

License: MIT

Written by Jasper den Ouden(entirely, currently) put under the MIT license, see doc/mit.txt.

TODO

  • Get $now! to work.
  • A 'match' should mean complete match, if throwing is enabled, it should throw if last matche doesnt reach the end?
  • More test coverage.
  • A behavior changer that wides the allowed range, for instance automatically allowing whitespace around?
  • parse_thing on streams.(Regex until the matcher doesnt match anymore)