StopWords.jl

A julia package contains a collection of stop words for multiple languages.
Author guo-yong-zhi
Popularity
1 Star
Updated Last
7 Months Ago
Started In
September 2023

StopWords.jl

docs CI CI-nightly codecov

Stop words are the words in a negative dictionary which are filtered out before or after processing of natural language data (text) because they are insignificant. This julia package contains a collection of stop words for multiple languages. The data is sourced from: https://github.com/stopwords-iso/stopwords-iso. Currently, this package supports 57 languages, identified by their ISO 639-3 codes:

afr ara ben bre bul cat ces dan deu ell eng epo est eus fas fin fra gle glg guj hau hbs heb hin hun hye ita jpn kor kur lat lav lit mar msa nld nor pol por ron rus slk slv som sot spa swa swe tgl tha tur ukr urd vie yor zho zul

Installation

import Pkg; Pkg.add("StopWords")

Usage

The stopwords variable is the only exported symbol of this package. It can be regarded as a lazy dictionary of stop words for multiple languages. You can access the stop words for a given language using the language name or ISO 639 code. For example, to get the stop words for English, you can use stopwords["eng"], stopwords["en"], or stopwords["English"].

julia> using StopWords
julia> stopwords["eng"]
Set{String} with 1298 elements:
  "nu"
  "youd"
  "whoever"
  "shouldn"
  "null"
  "everywhere"
   
julia> stopwords["eng"] === stopwords["en"] === stopwords["English"]
true

You can also get the stop words for multiple languages at once.

julia> stopwords[["eng", "fra"]]
Set{String} with 1922 elements:
  "nu"
  "youd"
  "ont"
  "pfut"
  "whoever"
  "shouldn"
  "enfin"
  "tac"
   
julia> stopwords[["eng", "fra"]] === stopwords[("eng", "fra")] == stopwords["eng"]  stopwords["fra"]
true

You can also get the stop words for all languages at once.

julia> stopwords[:] === stopwords[] === stopwords[StopWords.supported_languages()]
true

The StopWords.supported_languages() function returns a set of all the languages currently supported by the package. To check if a specific language is supported, you can use the haskey function. And for multiple languages, you can pass a list to the haskey function.

julia> haskey(stopwords, "eng")
true
julia> haskey(stopwords, ["English", "fra"])
true
julia> haskey(stopwords, ["English", "foo"])
false

Required Packages

No packages found.

Used By Packages