Search

Visit Github File Issue Email Request

Learn More Sponsor Project

Visit Github File Issue Email Request

Learn More Sponsor Project

SimString.jl

Native Julia implementation of CPMerge (SimString) algorithm

Author PyDataBlog

Suggest Category

If you are a human, ignore this field

OR

Category

Sub Category

Github

Popularity: 6 Stars

Updated Last: 2 Years Ago

Started In: October 2021

SimString

A native Julia implementation of the CPMerge algorithm, which is designed for approximate string matching. This package is be particulary useful for natural language processing tasks which demand the retrieval of strings/texts from a very large corpora (big amounts of texts). Currently, this package supports both Character and Word based N-grams feature generations and there are plans to open the package up for custom user defined feature generation methods.

Features

Fast algorithm for string matching
100% exact retrieval
Support for unicodes
Support for building databases directly from text files
Mecab-based tokenizer support
Support for persistent databases like MongoDB

Suported String Similarity Measures

Dice coefficient
Jaccard coefficient
Cosine coefficient
Overlap coefficient
Exact match

Installation

You can grab the latest stable version of this package from Julia registries by simply running;

NB: Don't forget to invoke Julia's package manager with ]

pkg> add SimString

The few (and selected) brave ones can simply grab the current experimental features by simply adding the master branch to your development environment after invoking the package manager with ]:

pkg> add SimString#main

You are good to go with bleeding edge features and breakages!

To revert to a stable version, you can simply run:

pkg> free SimString

Required Packages

Adapt
BinDeps
CircularArrays
Compat
DataStructures
OffsetArrays
OrderedCollections
Requires
SHA
URIParser
Wakame

Used By Packages

No packages found.

Julia Packages

This website serves as a package browsing tool for the Julia programming language. It works by aggregating various sources on Github to help you find your next package.

By analogy, Julia Packages operates much like PyPI, Ember Observer, and Ruby Toolbox do for their respective stacks.