TEXT: Numerous tools for text processing
This package is a julia implementation of:
- Text classification based on BoW models (e.g. topic/langauge id)
- Language ID (training and processing) based on word and character n-grams
- Lewis's SMART stop list for English
- tfidf/tfllr text feature normalization
- ngram feature extractors
Stage- Needed for logging and memoization (Note: requires manual install)
Ollam- online learning modules (Note: requires manual install)
Devectorize- macro-based devectorization
DataStructures- for DefaultDict
Iterators- for iterator helper functions
This is an experimental package which is not currently registered in the julia central repository. You can install via:
Pkg.clone("https://github.com/saltpork/Stage.jl") Pkg.clone("https://github.com/mit-nlp/Ollam.jl") Pkg.clone("https://github.com/mit-nlp/Text.jl")
test/runtests.jl for detailed usage.
This package was created for the DARPA XDATA and Memex program under an Apache v2 License.