NGram.jl

Implement the NGram model in julia
Popularity
1 Star
Updated Last
5 Years Ago
Started In
November 2013

NGram

Linear interpolation

This implementation uses the linear interpolation to build the model. For example, with a simple trigram model

`p("book" | "the", "green") = count("the green book") / count("the green")`

But there are some limitations

• We need a bigger corpus to efficiently train a trigram model compared to bigram or unigram
• Count(trigram) is often equal to zero
• With bigram or unigram we don't capture as much information

The idea is then to combine the results of `trigram` with `bigram` and `unigram`. We can generalize by saying that to compute ngram, we also use the results of `(n-1)gram`, ..., `bigram`, `unigram`. Here is an exemple in the case of a trigram model.

```p("book" | "the", "green") = a * count("the green book") / count("the green")
+  b * count("the green") / count("the")
+  c * count("the") / count()
where
a + b + c = 1
a >= 0
b >= 0
c >= 0

# For example: a = b = c = 1 / 3```

Example

```using NGram

texts = String["the green book", "my blue book", "his green house", "book"]

# Train a trigram model on the documents
model = NGramModel(texts, 3)

# Query on the model
# p(book | the, green)
model["the green book"]```