SHAP Feature Analysis Code Challenge

Key Information

The challenge is finished.
Show Deadlines

Challenge Overview


1st place - $800

2nd place - $400

3rd place - $200

Challenge Overview

We currently have 4 working algorithmic based models for a forecast model, and potentially a fifth. All five models were derived using the same data sets but each using a different model technique to derive their respective solution.


Different approaches were used to undertake Feature Analysis on various models with mixed results and certainly limited insight / explainability.  The purpose of this exercise is three fold:

  • Interpretation Model: Complete model-agnostic Feature Analysis to compare and contrast the Feature Importance of the various models. The results will be used to generate Business Insight to improve confidence in the model(s) through explainability.

  • Next Steps: Steer refinement of the next phases of the development of the forecast model by validating which models should form the basis for further iterations and steering  further sourcing of potentially valuable datasets to improve accuracy in subsequent phases.

  • Benchmark: Create a benchmark for the project from which the project can build and track progress and refinement of the course of the project.


In this challenge, we have 5 target variables for each Product.

  1. Gross Adds

  2. Churn

  3. ARPU

  4. Closing Base

  5. Revenue


Note: Derived Feature vs. Original Variable 

Note that, in each model, the "derived features" might be different than the "original variables". For example, you may see a feature like “The previous/delta value of the variable X”. We would like to see importance analysis at the “original variable” level, instead of the “derived feature” level. We would like to see some conclusion like "variable X is critically impactful (positively or negatively) to variable Y".


Challenge Goals


In this challenge, we require you to do the following things.

  • For all models and target variable listed below, complete SHAP analysis to sufficient depth of variables including standard ‘scatter’ plots, mean absolute Shapley ranking, and partial dependence plots (PDP) for key variables. Extra value will be attributed to PDP’s that bring our significant second order impacts.

  • Provide an explanation of the key variables, particularly if dataset has been significantly transformed during the modeling process

  • Propose and complete alternative model-agnostic Feature Importance Analysis if it is deemed to generate a better interpretation model.

  • SHAP Analysis / plots are required. An example of SHAP plot is as below.

Example SHAP Summary Plot


Example SHAP Feature Importance Plot


Example SHAP Dependence plots - please include for all Target variables


Example SHAP Dependence plot - with second order impact


Example SHAP explanation force plots - please create these for all predicted values for the 5 Target variables and 3 products.


Final Submission Guidelines


In your submission, you should have one codebase that works for all 4 submissions. Basically, your feature analysis should be able to be plugged into 4 submissions and generate the required analysis and plots.


Please carefully document your solution and make sure it’s robust that we can run it on an unseen but similar input file in a seamless way. README file about how to run/use your code should be included.


A report summarizing your ideas and code at a high level is also required. Please also include the example results you got based on the provided dataset.

Judging Criteria

  • Insightful (50%)

    • Your feature analysis must provide insightful results to the business analysis.

    • Your feature analysis must be at the “variable” level, instead of the “feature” level.

  • Seamless (30%)

    • The code of your feature analysis must be easy to use.

    • The code must be robust enough to deal with some unseen but similar input.

  • Clear (20%)

    • Your report and README files must be clear and easy to follow.

    • Figures and charts are encouraged to improve readability. 

Reliability Rating and Bonus

For challenges that have a reliability bonus, the bonus depends on the reliability rating at the moment of registration for that project. A participant with no previous projects is considered to have no reliability rating, and therefore gets no bonus. Reliability bonus does not apply to Digital Run winnings. Since reliability rating is based on the past 15 projects, it can only have 15 discrete values.
Read more.


Final Review:

Community Review Board


User Sign-Off


Review Scorecard