## Simpsons.jl

Check data for a Simpson's statistical paradox
Author wherrera10
Popularity
1 Star
Updated Last
1 Year Ago
Started In
May 2021

## Simpsons.jl

Julia module to check data for a Simpson's statistical paradox

### Usage

``````using Simpsons

has_simpsons_paradox(df, cause, effect, factor; continuous_threshold = 5, verbose = true)
``````

Returns true if the DataFrame `df`'s `cause` and `effect` column data, as aggregated by `factor`, exhibits Simpson's paradox. A continuous data `factor` (one with `continuous_threshold` or more discrete levels) will be grouped into a binary factor so as to avoid too many clusters. Prints the regression slope directions for overall data and groups if `verbose` is true.

``````simpsons_analysis(df, cause_column, effect_column; verbose = true, show_plots = true)
``````

Analyze the dataframe `df` assuming a cause is in `cause_column` and an effect in `effect_column` of the dataframe. Output data including any Simpson's paradox type first degree slope reversals in subgroups found. Plots shown if `show_plots` is true (default).

``````make_paradox(nsubgroups = 3 , N = 8192)
``````

Return a dataframe containing `N` rows of random data in 3 columns `:x` (cause), `:y` (effect), and `:z` (cofactor) which displays the Simpson's paradox.

``````plot_clusters(df, cause, effect)
``````

Plot, with subplots, clustering of the dataframe `df` using `cause` and `effect` plotted and color coded by clusterings. Use kmeans clustering analysis on all fields of dataframe. Use 2 to 5 as cluster numbers.

``````plot_kmeans_by_factor(df, cause_column, effect_column, factor_column)
``````

Plot clustering of the dataframe `df` using cause as X, effect Y, with the `factor_column` used for kmeans clustering into 2 clusters on the plot.

### Examples

``````using Simpsons

# Create a dataframe with cause :x, effect :y, and cofactor :z columns

# Test for a Simpson's paradox, where the regression direction :x with :y
#    reverses if the data is split by factor :z.
has_simpsons_paradox(dfp, :x, :y, :z)  # true with this data

# Analyze with plots made of data clustering.
# To see the plots, run in REPL to prevent premature display closure.
simpsons_analysis(dfp, :x, :y)
``````

### Installation

Install the package using the package manager (Press ] to enter pkg> mode):

``````(v1) pkg> add Simpsons
``````

### Required Packages

View all packages

### Used By Packages

No packages found.