CSTParser.jl

A concrete syntax tree parser for Julia
Popularity
105 Stars
Updated Last
5 Months Ago
Started In
January 2017

CSTParser

Dev Project Status: Active - The project has reached a stable, usable state and is being actively developed. Run CI on master codecov

A parser for Julia using Tokenize that aims to extend the built-in parser by providing additional meta information along with the resultant AST.

Installation and Usage

using Pkg
Pkg.add("CSTParser")
using CSTParser
CSTParser.parse("x = y + 123")

Documentation: Dev

Structure

CSTParser.EXPR are broadly equivalent to Base.Expr in structure. The key differences are additional fields to store, for each expression:

  • trivia tokens such as punctuation or keywords that are not stored as part of the AST but are needed for the CST representation;
  • the span measurements for an expression;
  • the textual representation of the token (only needed for certain tokens including identifiers (symbols), operators and literals);
  • the parent expression, if present; and
  • any other meta information (this field is untyped and is used within CSTParser to hold errors).

All .head values used in Expr are used in EXPR. Unlike in AST, tokens (terminal expressions with no child expressions) are stored as EXPR and additional head types are used to distinguish between different types of token. These possible head values include:

:IDENTIFIER
:NONSTDIDENTIFIER (e.g. var"id")
:OPERATOR

# Punctuation
:COMMA
:LPAREN
:RPAREN
:LSQUARE
:RSQUARE
:LBRACE
:RBRACE
:ATSIGN
:DOT

# Keywords
:ABSTRACT
:BAREMODULE
:BEGIN
:BREAK
:CATCH
:CONST
:CONTINUE
:DO
:ELSE
:ELSEIF
:END
:EXPORT
:FINALLY
:FOR
:FUNCTION
:GLOBAL
:IF
:IMPORT
:LET
:LOCAL
:MACRO
:MODULE
:MUTABLE
:NEW
:OUTER
:PRIMITIVE
:QUOTE
:RETURN
:STRUCT
:TRY
:TYPE
:USING
:WHILE

# Literals
:INTEGER
:BININT (0b0)
:HEXINT (0x0)
:OCTINT (0o0)
:FLOAT
:STRING
:TRIPLESTRING
:CHAR
:CMD
:TRIPLECMD
:NOTHING 
:TRUE
:FALSE

The ordering of .args members matches that in Base.Expr and members of .trivia are stored in the order in which they appear in text.