Artificial Benchmark for Community Detection (ABCD) - A Fast Random Graph Model with Community Structure
Bogumił Kamiński, Paweł Prałat, François Théberge
The package does not export functions and types. The public API is the following:
ABCDGraphGenerator.ABCDParams: type holding information about sampled degrees, sampled cluster sizes and required mode of ABCD graph generation
ABCDGraphGenerator.gen_graph: ABCD graph generator that uses
ABCDGraphGenerator.sample_degrees: sample degrees of vertices following power law
ABCDGraphGenerator.sample_communities: sample community sizes following power law
ABCDGraphGenerator.get_ev: get expected value of truncated discrete power law distribution
ABCDGraphGenerator.find_v_min: find the lower truncation given expected value and upper truncation of truncated discrete power law distribution
The resason for such split of the functionality is that generation of vertex degrees and community sizes is fast, while the generation of the final graph is the most expensive step.
Using ABCDGraphGenerator.jl from R and Python
The functions provided in the package can be directly called from R and Python.
Instructions how to seamlessly integrate Julia into R session are given in the JuliaCall package documentation.
An interface to call Julia directly from Python is provided by the PyJulia package.
Command Line Interface
utils/ folder contains command line interface utilities that are aimed at users that want
to the use package without using the API directly. A requirement for these
utilities to be run is to have the Julia language in version at least 1.0 installed on a computer.
It contains the following files:
install.jl: installs all required packages
abcd_sampler.jl: generates an ABCD graph following a configuration file
deg_sampler.jl: samples degrees of vertices in the graph
com_sampler.jl: samples communitiy sizes in the graph
graph_sampler.jl: samples edges and community assignments in the graph
The main file intended to be used is
Here is an example configuration file, named
example_config.toml, in this guide:
seed = "42" # RNG seed, use "" for no seeding n = "10000" # number of vertices in graph t1 = "3" # power-law exponent for degree distribution d_min = "5" # minimum degree d_max = "50" # maximum degree d_max_iter = "1000" # maximum number of iterations for sampling degrees t2 = "2" # power-law exponent for cluster size distribution c_min = "50" # minimum cluster size c_max = "1000" # maximum cluster size c_max_iter = "1000" # maximum number of iterations for sampling cluster sizes # Exactly one of xi and mu must be passed as Float64. Also if xi is provided islocal must be set to false or omitted. xi = "0.2" # fraction of edges to fall in background graph #mu = "0.2" # mixing parameter islocal = "false" # if "true" mixing parameter is restricted to local cluster, otherwise it is global isCL = "false" # if "false" use configuration model, if "true" use Chung-Lu degreefile = "deg.dat" # name of file do generate that contains vertex degrees communitysizesfile = "cs.dat" # name of file do generate that contains community sizes communityfile = "com.dat" # name of file do generate that contains assignments of vertices to communities networkfile = "edge.dat" # name of file do generate that contains edges of the generated graph
In this file all parameters required to generate an ABCD graph and store to on disk are passed. Here is an output from an example session using CLI in the ABCD-generation mode using the above file:
$ julia abcd_sampler.jl example_config.toml [ Info: Usage: julia abcd_sampler.jl config_filename [ Info: For the syntax of config_filename see example_config.toml file [ Info: Expected value of degree: 8.327743727955891 [ Info: Expected value of community size: 156.5613820733916 $ shasum -a 256 edge.dat #sha256sum edge.dat on Linux 1cf38c513db5890938b04a0e0e8059d32271ee3a96792a6992558f727c5b6ed8 edge.dat
After the program terminates four files,
are created in the working directory.
graph_sampler.jl files are provided
mainly to facilitate comparisons with LFR algorithm.
Here is an output from an example session using CLI in the LFR-comparison mode:
$ julia install.jl Updating registry at `~\.julia\registries\General` Updating git-repo `https://github.com/JuliaRegistries/General.git` Updating git-repo `https://github.com/bkamins/ABCDGraphGenerator.jl` Updating git-repo `https://github.com/bkamins/ABCDGraphGenerator.jl` Resolving package versions... Updating `~\.julia\environments\v1.3\Project.toml` [4c9194b5] ~ ABCDGraphGenerator v0.1.0 #master (https://github.com/bkamins/ABCDGraphGenerator.jl) Updating `~\.julia\environments\v1.3\Manifest.toml` [4c9194b5] ~ ABCDGraphGenerator v0.1.0 #master (https://github.com/bkamins/ABCDGraphGenerator.jl) $ julia deg_sampler.jl degrees.dat 3 5 50 10000 1000 42 [ Info: Usage: julia deg_sampler.jl filename τ₁ d_min d_max n max_iter [seed] [ Info: Example: julia deg_sampler.jl degrees.dat 3 5 50 10000 1000 42 [ Info: Expected value of degree: 8.327743727955891 $ shasum -a 256 degrees.dat #sha256sum degrees.dat on Linux 10f8a9528c8f4560040c63c1431f9b0ddeb7d3c9cb426f9b943c1099a8185c94 degrees.dat $ julia com_sampler.jl community_sizes.dat 2 50 1000 10000 1000 42 [ Info: Usage: julia com_sampler.jl filename τ₂ c_min c_max n max_iter [seed] [ Info: Example: julia com_sampler.jl community_sizes.dat 2 50 1000 10000 1000 42 [ Info: Expected value of community size: 156.5613820733916 $ shasum -a 256 community_sizes.dat #sha256sum community_sizes.dat on Linux d03bccc03937b620e6db4ba661781e49c1e40dcfb46c04355a9804edb49cfc86 community_sizes.dat $ julia graph_sampler.jl network.dat community.dat degrees.dat community_sizes.dat xi 0.2 true false 42 [ Info: Usage: julia graph_sampler.jl networkfile communityfile degreefile communitysizesfile mu|xi fraction isCL islocal [seed] [ Info: Example: julia graph_sampler.jl network.dat community.dat degrees.dat community_sizes.dat xi 0.2 true false 42 $ shasum -a 256 network.dat #sha256sum network.dat on Linux fbbc415fcc08c60f3370b646b019f108a924a43986f84ddb092255c7caa868f3 network.dat
After running these commands you will have the following files in your working directory (all data is 1-based)):
degrees.data sequence of vertex degrees (in descending order)
community_sizes.data sequence of cluster sizes (in descending order)
community.data sequence of vertex number-community number pairs
network.data sequence of generated edges sorted lexicographically as pairs of vertices (in increasing order)