XML.jl
Read and write XML in pure Julia.
Introduction
This package offers fast data structures for reading and writing XML files with a consistent interface:
Node
/LazyNode
Interface:
nodetype(node) → XML.NodeType (an enum type)
tag(node) → String or Nothing
attributes(node) → Dict{String, String} or Nothing
value(node) → String or Nothing
children(node) → Vector{typeof(node)}
is_simple(node) → Bool (whether node is simple .e.g. <tag>item</tag>)
simplevalue(node) → e.g. "item" from <tag>item</tag>)
LazyNode
Extended Interface for depth(node) → Int
next(node) → typeof(node)
prev(node) → typeof(node)
parent(node) → typeof(node)
Quickstart
using XML
filename = joinpath(dirname(pathof(XML)), "..", "test", "books.xml")
doc = read(filename, Node)
children(doc)
# 2-Element Vector{Node}:
# Node Declaration <?xml version="1.0"?>
# Node Element <catalog> (12 children)
doc[end] # The root node
# Node Element <catalog> (12 children)
doc[end][2] # Second child of root
# Node Element <book id="bk102"> (6 children)
Data Structures that Represent XML Nodes
NodeType
Preliminary: - Each item in an XML DOM is classified by its
NodeType
. - Every
XML.jl
struct defines anodetype(x)
method that returns itsNodeType
.
NodeType | XML Representation | Node Constructor |
---|---|---|
Document |
An entire document | Document(children...) |
DTD |
<!DOCTYPE ...> |
DTD(...) |
Declaration |
<?xml attributes... ?> |
Declaration(; attrs...) |
ProcessingInstruction |
<?tag attributes... ?> |
ProcessingInstruction(tag; attrs...) |
Comment |
<!-- text --> |
Comment(text) |
CData |
<![CData[text]]> |
CData(text) |
Element |
<tag attributes... > children... </NAME> |
Element(tag, children...; attrs...) |
Text |
the text part of <tag>text</tag> |
Text(text) |
Node
: Probably What You're Looking For
read
-ing aNode
loads the entire XML DOM in memory.- This is what you would use to build an XML document programmatically.
- See the table above for convenience constructors.
Node
s have some additional methods that aid in construction/mutation:
# Add a child:
push!(parent::Node, child::Node)
# Replace a child:
parent[2] = child
# Add/change an attribute:
node["key"] = value
node["key"]
Node
is an immutable type. However, you can easily create a copy with one or more field values changed by using theNode(::Node; kw...)
constructor wherekw
are the fields you want to change. For example:
node = XML.Element("tag", XML.Text("child"))
simplevalue(node)
# "child"
node2 = Node(node, children=XML.Text("changed"))
simplevalue(node2)
# "changed"
XML.LazyNode
: For Fast Iteration through an XML File
A lazy data structure that just keeps track of the position in the raw data (Vector{UInt8}
) to read from.
- You can iterate over a
LazyNode
to "read" through an XML file:
doc = read(filename, LazyNode)
foreach(println, doc)
# LazyNode Declaration <?xml version="1.0"?>
# LazyNode Element <catalog>
# LazyNode Element <book id="bk101">
# LazyNode Element <author>
# LazyNode Text "Gambardella, Matthew"
# LazyNode Element <title>
# ⋮
Reading
# Reading from file:
read(filename, Node)
read(filename, LazyNode)
# Parsing from string:
parse(Node, str)
parse(LazyNode, str)
Writing
XML.write(filename::String, node) # write to file
XML.write(io::IO, node) # write to stream
XML.write(node) # String
Performance
- XML.jl performs comparatively to EzXML.jl, which wraps the C library libxml2.
- See the
benchmarks/suite.jl
for the code to produce these results. - The following output was generated in a Julia session with the following
versioninfo
:
julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.5.0)
CPU: 10 × Apple M1 Pro
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
Threads: 1 on 8 virtual cores
Reading an XML File
XML.LazyNode 0.012084
XML.Node ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 888.367
EzXML.readxml ■■■■■■ 200.009
XMLDict.xml_dict ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 1350.63
Writing an XML File
Write: XML ■■■■■■■■■■■■■■■■■■■■■■ 244.261
Write: EzXML ■■■■■■■■■■ 106.953
Lazily Iterating over Each Node
LazyNode ■■■■■■■■■■■■■■■■ 55.1
EzXML.StreamReader ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 142.515
Collecting All Names/Tags in an XML File
XML.LazyNode ■■■■■■■■■■■■■■■■■■■■■■■■■■ 152.298
EzXML.StreamReader ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 165.21
EzXML.readxml ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 239.197
Possible Gotchas
- XML.jl doesn't automatically escape special characters (
<
,>
,&
,"
, and'
) for you. However, we provide utility functions for doing the conversions back and forth:XML.escape(::String)
andXML.unescape(::String)
XML.escape!(::Node)
andXML.unescape!(::Node)
.