The SyntheticDatasets.jl package is a library with functions for generating synthetic artificial datasets.
The package can be installed with the Julia package manager.
From the Julia REPL, type ] to enter the Pkg REPL mode and run:
pkg> add SyntheticDatasets
Or, equivalently, via the Pkg API:
julia> import Pkg; Pkg.add("SyntheticDatasets")A set of pluto notebooks and codes demonstrating the project's current functionality is available in the examples folder.
Here are a few examples to show the Package capabilities:
using StatsPlots, SyntheticDatasets
blobs = SyntheticDatasets.make_blobs(   n_samples = 1000, 
                                        n_features = 2,
                                        centers = [-1 1; -0.5 0.5], 
                                        cluster_std = 0.25,
                                        center_box = (-2.0, 2.0), 
                                        shuffle = true,
                                        random_state = nothing);
@df blobs scatter(:feature_1, :feature_2, group = :label, title = "Blobs")
gauss = SyntheticDatasets.make_gaussian_quantiles(  mean = [10,1], 
                                                    cov = 2.0,
                                                    n_samples = 1000, 
                                                    n_features = 2,
                                                    n_classes = 3, 
                                                    shuffle = true,
						    random_state = 2);
@df gauss scatter(:feature_1, :feature_2, group = :label, title = "Gaussian Quantiles")
spirals = SyntheticDatasets.make_twospirals(n_samples = 2000, 
                                            start_degrees = 90,
                                            total_degrees = 570, 
                                            noise =0.1);
@df spirals scatter(:feature_1, :feature_2, group = :label, title = "Two Spirals")
kernel = SyntheticDatasets.make_halfkernel( n_samples = 1000, 
                                            minx = -20,
                                            r1 = 20, 
                                            r2 = 35,
                                            noise = 3.0, 
                                            ratio = 0.6);
@df kernel scatter(:feature_1, :feature_2, group = :label, title = "Half Kernel")The SyntheticDatasets.jl is a library with functions for generating synthetic artificial datasets. The package has some functions are interfaces to the dataset generator of the ScikitLearn.
List of package datasets:
| Dataset | Title | Reference | 
|---|---|---|
| make_blobs | Generate isotropic Gaussian blobs for clustering. | link | 
| make_moons | Make two interleaving half circles | link | 
| make_s_curve | Generate an S curve dataset. | link | 
| make_regression | Generate a random regression problem. | link | 
| make_classification | Generate a random n-class classification problem. | link | 
| make_friedman1 | Generate the “Friedman #1” regression problem. | link | 
| make_friedman2 | Generate the “Friedman #2” regression problem. | link | 
| make_friedman3 | Generate the “Friedman #3” regression problem. | link | 
| make_circles | Make a large circle containing a smaller circle in 2d | link | 
| make_regression | Generate a random regression problem. | link | 
| make_classification | Generate a random n-class classification problem. | link | 
| make_low_rank_matrix | Generate a mostly low rank matrix with bell-shaped singular values. | link | 
| make_swiss_roll | Generate a swiss roll dataset. | link | 
| make_hastie_10_2 | Generates data for binary classification used in Hastie et al. | link | 
| make_gaussian_quantiles | Generate isotropic Gaussian and label samples by quantile. | link | 
Disclaimer: SyntheticDatasets.jl borrows code and documentation from scikit-learn in the dataset module, but it is not an official part of that project. It is licensed under MIT.
| Dataset | Title | Reference | 
|---|---|---|
| make_twospirals | Generate two spirals dataset. | link | 
| make_halfkernel | Generate two half kernel dataset. | link | 
| make_outlier | Generate outlier dataset. | link | 
