Simpsons.jl

Check data for a Simpson's statistical paradox
Author wherrera10
Popularity
1 Star
Updated Last
1 Year Ago
Started In
May 2021

Simpsons.jl

Julia module to check data for a Simpson's statistical paradox

Usage

using Simpsons

has_simpsons_paradox(df, cause, effect, factor; continuous_threshold = 5, verbose = true)

Returns true if the DataFrame df's cause and effect column data, as aggregated by factor, exhibits Simpson's paradox. A continuous data factor (one with continuous_threshold or more discrete levels) will be grouped into a binary factor so as to avoid too many clusters. Prints the regression slope directions for overall data and groups if verbose is true.


simpsons_analysis(df, cause_column, effect_column; verbose = true, show_plots = true)

Analyze the dataframe df assuming a cause is in cause_column and an effect in effect_column of the dataframe. Output data including any Simpson's paradox type first degree slope reversals in subgroups found. Plots shown if show_plots is true (default).


make_paradox(nsubgroups = 3 , N = 8192)

Return a dataframe containing N rows of random data in 3 columns :x (cause), :y (effect), and :z (cofactor) which displays the Simpson's paradox.


plot_clusters(df, cause, effect)

Plot, with subplots, clustering of the dataframe df using cause and effect plotted and color coded by clusterings. Use kmeans clustering analysis on all fields of dataframe. Use 2 to 5 as cluster numbers.


plot_kmeans_by_factor(df, cause_column, effect_column, factor_column)

Plot clustering of the dataframe df using cause as X, effect Y, with the factor_column used for kmeans clustering into 2 clusters on the plot.


Examples

using Simpsons

# Create a dataframe with cause :x, effect :y, and cofactor :z columns
dfp = make_paradox()

# Test for a Simpson's paradox, where the regression direction :x with :y 
#    reverses if the data is split by factor :z.
has_simpsons_paradox(dfp, :x, :y, :z)  # true with this data

# Analyze with plots made of data clustering. 
# To see the plots, run in REPL to prevent premature display closure. 
simpsons_analysis(dfp, :x, :y)



Installation

Install the package using the package manager (Press ] to enter pkg> mode):

(v1) pkg> add Simpsons

Used By Packages

No packages found.