Lerche (German for Lark) is a partial port of the Lark grammar processor from Python to Julia. Lark grammars should work unchanged in Lerche.
Installation: at the Julia REPL,
using Pkg; Pkg.add("Lerche")
See also 'Notes for Lark users' below.
Lerche reads Lark EBNF grammars to produce a parser. This parser, when
provided with text conforming to the grammar, produces a parse
tree. This tree can be visited and transformed using "rules". A rule is
a function named after the production whose arguments it should be called on, and
the first argument of a rule is an object which is a subtype of
Given an EBNF grammar, it can be used to parse text into your data structure as follows:
- Define one or more subtypes of
Visitorinstances of which will be passed as the first argument to the appropriate rule. The instance can also be used to hold information during transformation if you wish, in which case it must have a concrete type.
visit_tokens(t::MyNewType) = falseif you will not be processing token values. This is about 25% faster than leaving the default
- For every production in your grammar that you wish to process, write a rule with identical name to the production
- The rule should be prefixed with macro
@ruleif the second argument is an array containing all of the arguments to the grammar production
- The rule should be prefixed with macro
@inline_ruleif the second and following arguments refer to each argument in the grammar production
- For every token which you wish to process, define an identically-named method
as for rules, but precede it with a
@terminalmacro instead of
If your grammar is in
mygrammar, your text to be parsed and transformed
mytext, and your
Transformer subtype is
following commands will produce a data structure from the text:
p = Lark(mygrammar,parser="lalr",lexer="contextual") #create parser
t = Lerche.parse(p,mytext) #Create parse tree
x = Lerche.transform(MyTransformer(),t) #transform parse tree
For a real-world example of usage, see this file.
If you are publishing work where Lerche has been useful, please consider citing the Lerche paper.
Please raise any issues or problems with using Lerche in the Github issue tracker.
Contributions of all types are welcome. Examples include:
- Improvements to processing speed
- Improved documentation
- Links to projects using Lerche
- Commenting and triaging issues
The most straightforward way to make a contribution is to fork the repository, make your changes, and create a pull request.
Notes for Lark users
Please read the Lark documentation. When converting from Lark programs written in Python to Lerche programs written in Julia, the changes outlined below are necessary.
- All Transformer and Visitor classes become subtypes of Transformer/Visitor
- All class method calls become Julia method calls with an instance of the type as the first argument
- Transformation or visitor rules should be preceded by the
@rulemacro. Inline rules use the
@inline_rulemacro and token processing methods use
- The first argument of transformer and visitor rules is a variable of the desired transformer/visitor type.
- Any grammars containing backslash-double quote sequences need to be fixed (see below).
- Any grammars containing backslash-x to denote a byte value need to be fixed (see below).
Inconsistencies with Lark
- Earley and CYK grammars are not implemented.
- Dynamic lexer is not implemented.
- All errors with messages attached must be at the bottom of the
exception type hierarchy, as these are the only types that can have
contents. Thus an
UnexpectedInputexception must become e.g an
UnexpectedCharacterexception if a message is included.
PuppetParserinvoked when there is a parse error is not yet functional
- There may be issues with correctly interpreting import paths to find imported grammars: please raise an issue if this happens.
- No choice of
Treestructure or byte/string choices are available as they make no sense for Julia.
Implementation notes and hints
Lerche is currently based off Lark 0.11.1. The priority has been on
maintaining fidelity with Lark. For example, global
which are integers in Lark are still integers in Lerche, which means
you will need to look their values up. This may be changed to a more
Julian approach in future.
@inline_rule macros define methods of Lerche function
transformer_func. Julia multiple dispatch is used to select the
appropriate method at runtime.
@terminal similarly defines methods
Parsing a large (500K) file suggest Lerche is about 3 times faster
than Lark with CPython for parsing. Parser generation is much slower as no
optimisation techniques have been applied (yet). Calculating and
storing your grammar in a Julia
const variable at the top level
of your package will allow it to be precompiled and thus avoid
grammar re-analysis each time your package is loaded.