This is a dimensionality reduction algorithm which has the goal of maintaining interpretability i.e we eliminate variables directly from potential models that don't seem to add any predictive power. This is accomplished by the use of decision trees to approximate a function between two variables. This is a modified version of the Predictive Power Score inspired by Florian Wetschoreck's article
We'll start with a set of observations which can be further split into a set of features
Decision Trees are universal function approximators which basically means, we can split two dimensional subset of our data into different bins which are chosen based on minimizing a cost function. In this case the boundaries of the bins are chosen so as to minimize the error of the tree model makes when making predicitons. Spliting the data into different bins is constructing a function, but we need to understand how well this function does compared to a more naive model of prediction: taking the median of the target
If we have two different models
We can compare the how well the "smart model" does as compared to the "naive model" by looking at the ratio of $\text{MAE}{\text{smart}}$ to $\text{MAE}{\text{naive}}$ which is defined as
as the smart model does better, this ratio becomes smaller and as the smart model starts doing as good or worse than the naive model, this ratio becomes larger. Up to this point, this is pretty much just the predictive power score. If our smart model is doing better than the naive model, then we have at least established that constructing a function between
There are a number of features that would be nice to have to make the process for judging how well a variable does at predicting another
The main thing we can do is to use a gaussian to map
The main advantage of doing this beside bounding our score between
which implies that
Low Score | Intermediate | High Score |
---|---|---|
$$r=\frac{\text{MAE}\text{smart}}{\text{MAE}\text{naive}}$$
[1] Wetschoreck, Florian. (Apr 23, 2020). RIP correlation. Introducing the Predictive Power Score. https://towardsdatascience.com/rip-correlation-introducing-the-predictive-power-score-3d90808b9598
[2] Mathonline The Simple Function Approximation Theorem. http://mathonline.wikidot.com/the-simple-function-approximation-theorem
[3] kenndanielso Blog Universal Function Approximation. https://kenndanielso.github.io/mlrefined/blog_posts/12_Nonlinear_intro/12_5_Universal_approximation.html