| Julia Version | Unit Tests | Coverage |
|---|---|---|
| [![][]][] | ||
| Julia Latest |
This provides the basic types and mode methods for dealing with character sets, encodings, and character set encodings.
Currently, there are the following types:
CodeUnitTypesa Union of the 3 codeunit types (UInt8, UInt16, UInt32) for convenienceCharSeta struct type, which is parameterized by the name of the character set and the type needed to represent a code pointEncodinga struct type, parameterized by the name of the encoding
-
BinaryFor storing non-textual data as a sequence of bytes, 0-0xff -
ASCIIASCII (Unicode subset, 0-0x7f) -
LatinLatin-1 (ISO-8859-1) (Unicode subset, 0-0xff) -
UCS2UCS-2 (Unicode subset, 0-0xd7ff, 0xe000-0xffff, BMP only, no surrogates) -
UTF32UTF-32 (Full Unicode, 0-0xd7ff, 0xe000-0x10ffff) -
UniPlusUnvalidated Unicode (i.e. likeString, can contain invalid codepoints) -
Text1Unknown 1-byte character set -
Text2Unknown 2-byte character set -
Text4Unknown 4-byte character set
UTF8EncodingNative1ByteNative2ByteNative4ByteNativeUTF16Swapped4ByteSwapped2ByteSwappedUTF16LE2BE2LE4BE4UTF16LEUTF16BE2Byte4ByteUTF16
-
BinaryCSE,Text1CSE,ASCIICSE,LatinCSE -
Text2CSE,UCS2CSE -
Text4CSE,UTF32CSE -
UTF8CSEUTF32CharSet, all valid, usingUTF8Encoding, conforming to the Unicode Organization's standard, i.e. no long encodings, surrogates, or invalid bytes. -
RawUTF8CSEUniPlusCharSet, not validated, usingUTF8Encoding, may have invalid sequences, long encodings, encode surrogates and characters up to0x7fffffff -
UTF16CSEUTF32CharSet, all valid, usingUTF16Encoding (native order), conforming to the Unicode standard, i.e. no out of order or isolated surrogates.
_LatinCSEIndicates has at least 1 character > 0x7f, all <= 0xff_UCS2CSEIndicates has at least 1 character > 0xff, all <= 0xffff_UTF32CSEIndicates has at least 1 non-BMP character
The cse function returns the character set encoding for a string type, string.
Returns RawUTF8CSE as a fallback for AbstractString (i.e. same as String)
The charset function returns the character set for a string type, string, character type, or character.
The encoding function returns the encoding for a type or string.
The codeunit function returns the code unit used for a character set encoding
The cs"..." string macro creates a CharSet type with that name
The enc"..." string macro creates an Encoding type with that name
The @cse(cs, enc) macro creates a character set encoding with the given character set and encoding
Also Exports the helpful constant Bool flags BIG_ENDIAN and LITTLE_ENDIAN