Note: This package is now deprecated in favor of https://github.com/quinnj/Strings.jl (see #3)
Large scale text processing often requires several changes to be made on large string objects. Using immutable strings can result in significant inefficiencies in such cases. Using byte arrays directly prevents us from using the convenient string methods. This package provides Mutable ASCII and UTF8 string types that allow mutating the string data through the familiar string methods.
- MutableASCIIString:
immutable MutableASCIIString <: DirectIndexString
- MutableUTF8String:
immutable MutableUTF8String <: String
- MutableString:
typealias MutableString Union(MutableASCIIString, MutableUTF8String)
All methods on immutable strings can also be applied to a MutableString. Additionally the below methods allow modifications on MutableString objects:
uppercase!(s::MutableString)
: In-place uppercase conversionlowercase!(s::MutableString)
: In-place lowercase conversionucfirst!(s::MutableString)
: Convert the first letter to uppercase in-placelcfirst!(s::MutableString)
: Convert the first letter to lowercase in-place
The usual search
methods on String type also applies to MutableStrings.
replace!(s::MutableString, pattern, repl::Union(ByteString,Char,Function), limit::Integer=0)
The above method allows in-place replacement of patterns matching pattern
with repl
upto limit
occurrences. If limit
is zero, all occurrences are replaced.
As with search, the pattern
argument may be a single character, a vector or a set of characters, a string, or a regular expression.
If repl
is a ByteString, it replaces the matching region. If it is a Char, it replaces each character of the matching region. If repl
is a function, it must accept a SubString representing the matching region and return either a Char or a ByteString to be used as the replacement.
setindex!(s::MutableString, x, i0::Real)
setindex!(s::MutableString, r::ByteString,I::Range1{T<:Real})
setindex!(s::MutableString, c::Char, I::Range1{T<:Real})
reverse!(s::MutableString)
map!(f, s::MutableString)
Parts of a mutable string can be modified as:
s[10] = 'A'
s[12:14] = "ABC"
- Most operations on a MutableString are faster than those on an immutable String.
- Replacing segments of mutable strings with different length replacements is slower than recreating the entire string.
- MutableStrings are always more memory efficient than immutable Strings.
ASCIIString | MutableASCIIString | |||
---|---|---|---|---|
function | time | bytes | time | bytes |
case conversion | 0.00499 | 700080 | 0.00476 | 0 |
reverse | 0.0105 | 711384 | 0.0010 | 0 |
regex search and blank out matches | 0.00679 | 917000 | 0.00295 | 64 |
regex search and delete matches | 0.02495 | 6144072 | 1.01742 | 292768 |
- Significant code has been duplicated from Julia base to specialize the MutableString methods. A proper type-reorganization would eliminate this.
- The hash method on MutableString behaves similar to that on String. This can result in surprises when it is used as a key in collections.
- Since UTF8 has variable character byte lengths, MutableUTF8String also allows replacing segments of the string with arbitrary length replacements, e.g:
s[10] = "ABC"
. This is inconsistent with behavior of MutableASCIIString, and remains to be debated.