CFO Forecasting - Mobile Market - Sandesh Brand 2 - Refinement

Key Information

The challenge is finished.
Show Deadlines

Challenge Overview

Challenge Objective


The objective of this challenge is to generate time-series forecasts with the highest accuracy predictions possible for only the 5 financial variables outlined below.


The accuracy of the forecast must at least improve on the Threshold targets quoted for each variable - targets based on the results of a previous challenge.


The model should be tailored to a 12-mth forecast horizon but must be extendable beyond this time period.  


The accuracy of a prediction will be evaluated using MAPE (Mean Absolute Percentage Error) on the privatised data set over a period of 6 months and on the maximum Absolute Percentage Error on any one month over this same period.



A Mobile network, to be known as Sandesh, has been creating a high quality, high accuracy forecast for a number of key Financial metrics.  


The first challenge in this creation process was held in February 2020 and a complete set of time-series forecasts were developed.  A selection of these now need further improving to achieve the target required accuracy over the evaluation period - neither the MAPE nor the prediction trends were sufficiently close to the actuals.


The five target variables are financial metrics

  • (Mobile Data) Revenue - Panther - the revenue generated by the subscriber base per month for the ‘airtime and data’ service for product Panther.   The initial prediction failed to anticipate the flattening performance from Apr ‘19.
    The training data set provided now has been extended forward to Aug ‘19.

  • Average Revenue per user (ARPU) - Panther - the average monthly revenue paid by a subscriber per month for their ‘airtime and data allowance’ service.
    The initial prediction failed to model the steady decline from Apr ‘19, and potentially starting from May ‘18.
    The training data set provided now has been extended forward to Aug ‘19

  • Leavers (aka Churn) - Leopard and Panther - the number of subscribers per product who terminated service with the brand during that month. 
    The initial predictions failed to anticipate the steady climb in volumes from Apr ‘19 onwards.
    The data set has been extended to Aug ‘19.  Please see ‘Switching - new regulation’ section for further insight.

  • Gross adds – Panther - the number of new subscribers to Panther joining the brand during a month. 
    A completely new data set for this variable has now been provided, invalidating the previous predictions.


Additional data sets are included to provide a range of independent variables that may prove to be valuable in forecasting the 6 target variables.  A complete list is included with the data set but a selected number are described here

  • Brand Net Promoter Score (Brand NPS). A measure of customer satisfaction with the product based on customers’ willingness to recommend the product to their friends and colleagues.

  • Leavers - Number of customers about to leave each month.  This is the number of customers in the final month of their contract.  These customers will therefore be legally allowed to leave the following month, or upgrade their contract without punitive charges.  This metric is a strong indicator of ‘Leavers’ volumes - please see Business insight section.
    This metric is closely linked to Gross Adds volumes from 24 months earlier.

  • Out of Contract %.  This is a measure of the proportion of the customer base that are no longer within their contract period.  These customers are legally allowed to leave without cost, and will therefore have a clear link to ‘Leavers.  For historic reasons, this measure should only be considered from April 2016 onwards


Business Insight


Switching - New regulation

In July 2019 the regulations changed and it became easier for customers to Leave / Churn / .  This change in market conditions needs to be modeled into the predictions.  

  1. The underlying Churn performance has dramatically increased across both Products and needs to be reflected in any predictions. 

  2. In June and July, the actuals were distorted by the introduction of the regulations.  June performance is believed to have been reduced due to a temporary drop in demand, and July increased as a result from the pent up demand.  This distortion to these two months can be treated as outliers - any treatment needs to be clearly documented, and reproducible as part of the model when used on the real data set.


Training data has therefore been provided until Aug 2019.  



Factors impacting Leavers 

It is important to note that a number of secondary factors will be contributing to the Churn performance.  These factors, if included, may improve accuracy. In addition to the change in switching regulations, the following factors may cause change in Churn

  • Out of Contract %.  Describes the number of customers eligible to Leaver

  • Leavers - Number of customers about to churn per month. Describes the number of customers about to come out of contract and drive an increase in ‘Out of Contract%’

  • Gross Adds.  From two years previously for Leopard and one year previously for Panther closely defines the ‘Number of customers about to churn per month’ as these customers come to the end of their contract period.

  • Upgrades - volume of existing customers changing their contract (not shown on the above graph)


Financial Year modeling:


Sandesh reports its financial year from April - March.  This may contribute to seasonality based on financial year, and quarters (Jun, Sep, Dec, and Mar), rather than calendar year.


Anonymised and Privatised data set:


‘Z-score’ is used to privatise the real data.


For all the variables, following is the formula used to privatise the data:

            zi = (xi – μ) / σ


where zi = z-score of the ith value for the given variable

            xi  = actual value

            μ = mean of the given variable

            σ = standard deviation for the given variable


Targets and Thresholds


The performance of the models on privatised data is not direcly correlated with performance on Real Data.  Please use the provided foundation model - 351686, as benchmark.


Your submission will be judged on two criteria.

  1. Minimizing error (MAPE).

  2. Achieving the Thresholds and Targets designated in the tables above.


It is recommended to optimise the models to minimise RMSE, as opposed to MAPE.  The privatisation method used (see later section) can distort the error analysis.


The details will be outlined in the Quantitative Scoring section below.


Final Submission Guidelines

Submission Format

You submission must include the following items

  • The filled test data. We will evaluate the results quantitatively (See below)

    • Please use Time Period, Generic Keys as the column names. 

    • The values in Time Period column are something like 2019-08

    • The values in each Generic Key column is the predicted values, i.e., floating numbers.

    • The final spreadsheet has a Nx(M+1) shape, where N is the number of time periods and M is the number of variables that we want to predict in this challenge. “+1” is for the Time Period column.

  • A report about your model, including data analysis, model details, local cross validation results, and variable importance. 

  • A deployment instructions about how to install required libs and how to run.

Expected in Submission

  1. Working Python code which works on the different sets of data in the same format

  2. Report with clear explanation of all the steps taken to solve the challenge (refer section “Challenge Details”) and on how to run the code

  3. No hardcoding (e.g., column names, possible values of each column, ...) in the code is allowed. We will run the code on some different datasets

  4. All models in one code with clear inline comments 

  5. Flexibility to extend the code to forecast for additional months

Quantitative Scoring

Given two values, one ground truth value (gt) and one predicted value (pred), we define the relative error as:


    MAPE(gt, pred) = |gt - pred| / gt


We then compute the raw_score(gt, pred) as


    raw_score(gt, pred) = max{ 0, 1 - MAPE(gt, pred) }


That is, if the relative error exceeds 100%, you will receive a zero score in this case.


The final MAPE score for each variable is computed based on the average of raw_score, and then multiplied by 100.


Final score = 100 * average( raw_score(gt, pred) )


MAPE scores will be 50% of the total scoring.


You will also receive a score between 0 and 1 for all the thresholds and targets that you achieve.  Each threshold will be worth 0.033 points and each target will be worth 0.05 points. Obviously if you achieve the target for a particular variable you’ll get the threshold points as well so you’ll receive 0.083 points for that variable.  Your points for all the variables will be added together.

Judging Criteria

Your solution will be evaluated in a hybrid of quantitative and qualitative way. 

  • Effectiveness (80%)

    • We will evaluate your forecasts by comparing it to the ground truth data. Please check the “Quantitative Scoring” section for details.

    • The smaller MAPE the better. 

    • Please review the targets and thresholds above as these will be included in the scoring.

  • Clarity (10%)

    • The model is clearly described, with reasonable justifications about the choice.

  • Reproducibility (10%)

    • The results must be reproducible. We understand that there might be some randomness for ML models, but please try your best to keep the results the same or at least similar across different runs.


Reliability Rating and Bonus

For challenges that have a reliability bonus, the bonus depends on the reliability rating at the moment of registration for that project. A participant with no previous projects is considered to have no reliability rating, and therefore gets no bonus. Reliability bonus does not apply to Digital Run winnings. Since reliability rating is based on the past 15 projects, it can only have 15 discrete values.
Read more.


Final Review:

Community Review Board


User Sign-Off


Review Scorecard