WikiText.jl

Julia interface to the WikiText dataset.
Author dellison
Popularity
1 Star
Updated Last
5 Years Ago
Started In
July 2018

WikiText.jl

Build Status codecov.io

About

WikiText.jl provides an interface to the WikiText Long Term Dependency Language Modeling dataset.

Usage

WikiText exports the following 4 types, corresponding to the 4 available datasets:

  • WikiText2
  • WikiText103,
  • WikiText2Raw
  • WikiText103Raw

Wikitext also exports following 3 functions:

  • trainfile
  • validationfile
  • testfile

Downloading and unzipping the datasets will happen automatically (with your approval) when you access them for the first time, courtesy of DataDeps.jl.

julia> ]add WikiText
julia> using WikiText
julia> corpus = WikiText2v1()
julia> trainfile(corpus)
"/path/to/wiki.train.tokens"
julia> validationfile(corpus)
"/path/to/wiki.valid.tokens"

Used By Packages

No packages found.