Key Information

Register
Submit
The challenge is finished.

Challenge Overview

Prize

1st place - $2000

2nd place - $1000

3rd place - $500

Challenge Background

In the client’s Group, all Business Units (BUs) track their cash flow (CF), so the Group Finance team can monitor the overall Cash Flow (CF) performance. Due to the span between service delivery and payment to vendors, the Group faces the monthly challenge to correctly assess the impact of accounts payable (AP) on Cash Flow (CF). Can we develop (a set of) cash flow forecast models to identify the Group Cash Flow impact from Accounts Payable stemming from vendors in key countries? 

 

Some Useful Definitions:

Accounts payable (standard definition) are amounts due to vendors or suppliers for goods or services received that have not yet been paid for. The sum of all outstanding amounts owed to vendors is shown as the accounts payable balance on the company's balance sheet. 

Spend amount is the amount that client owe to vendors in local currency. 

Cost is the cost incurred by the client.

Challenge Data

The data for this challenge sources from 1) Internal Financial Reporting System - Account Payable, Cost; and 2) Procurement data (Spend Value), which are monthly data. We will have the training period ends in July 2019. And the forecast period begins from Aug 2019 and ends in Dec 2019.

 

There are around a total 224 reporting units for approx. 68 countries in the client’s portfolio. There are multiple reporting units for a particular country. Each country uses their local currency in the data. We have provided the file Key_Countries_RU.csv. It has following information:

  • Reporting Units for Particular Country

  • Country

This project is focused on forecasts for five key countries.

  • USA 

  • China 

  • Germany

  • Switzerland

  • Ireland

 

We have provided 3 files:

  • anonymized_train_data.csv: This is the major training data you should use. Each country will have 55 data points in the training set (Jan 2015 to July 2019) for Account Payable as well as Cost. 

  • anonymized_aggregated_data.csv: This is some related data that might be helpful to your predictions.

  • test_sheet_to_fill.csv: This is the test sheet you should fill in your projected cash flow.

 

The anonymized_train_data.csv has the following columns. The only difference is that the Accounts Payable and Cost columns in the test sheet are all empty. 

  • Country: Country (All 5 Key Countries data is present)

  • Date: Month_Year

  • Accounts Payable: Accounts Payable for any business events or movements

  • Cost: Cost incurred 

For different countries, the Account Payable and Cost are following different units. The aforementioned five key countries have different currencies so each country should have a separate unit. Specifically, USD (USA) / CNY (Yuan) / Euro (Germany/Ireland) / Switzerland (CHF). There is no standardized unit for all countries. 

 

The anonymized_aggregated_data.csv contains procurement related data/information:

  • NH_REPUN (Reporting Unit Within a Country): We can have more than one Reporting Units for each country

  • NH_ACCDOC (Account Number): It has been replaced by the md5 hashing value. 

  • NH_NETDUE (Net Due Date): The last date time until when the payments needs to be done to the vendor

  • NH_VCLRDT (Vendor Clearing Document Date): The actual date time when the payment was done to the vendor

  • 0CALDAY (Posting Date): The date time from when the services/product from the vendor was used/purchased. This is the date of record posted on the SAP platform.

  • NH_VENDOR__TEXT (Vendor Names): It has been replaced by the md5 hashing value.

  • CURRENCY002 (The Local Currency of Spend Value)

  • 00O2SPADWAPXIIOCRYG5A9QEI (Spend Value)

This file has complete procurement data with all the reporting units. However not all of them are required for forecasting - only five key countries/Reporting Unit data need to be filtered by looking at Key_Countries_RU.csv. To map the reporting units (Reporting Units for Particular Country) from Key_Countries_RU.csv to anonymized_aggregated_data.csv file, you can extract the 2nd to 6th characters of Column1 (Reporting Unit Within a country).

Task Details

In this Code challenge, you are asked to build models for cash flow forecasts on the 5 key countries mentioned earlier. It is preferred to have one model for all countries, and it is also acceptable to have 5 models for 5 countries. Cash Flow is the target variable. Cash flow is defined as the difference between the Accounts payable. For example, Cash Flow of Feb 2019 = AP of Feb 2019 - AP of Jan 2019.

 

We have prepared a test_sheet_to_fill.csv that your model must fill. Please rename it as “test_sheet.csv” when you make the submission. For different countries, the prediction values should follow their local currency units. You are free to build model on Account Payable or on Cash Flow but the final submission will have to be on Cash Flow.

 

Our client already has a baseline model. The focus of new model should be (1) a better solution in terms of error value; (2) any new data comes in model should be efficient - this will be kind of value addition; (3) one common model for five key countries is ideal solution however customer is also open to have individual model for each key country. Participants should use data wisely to get better insights - create features from data - spot peculiar anomalies in the data - see if any additional driver impacting cash flow. 

Scoring Function

We are going to use the evaluation metric wMAD (weighted mean absolute deviation). It is a weighted mean of the absolute error |Y_i - \hat{Y_i}|, where the weights are also the absolute errors, i.e., \sum{|Y_i - \hat{Y_i}| * |Y_i - \hat{Y_i}|} / \sum{|Y_i - \hat{Y_i}|}. Please check “wMAD Example.xlsx” for an example.

 

During evaluation, we will convert the local currencies to one single currency, e.g., USD, using a common exchange rate. The evaluation will be conducted at two levels: 

  1. Country level. We will evaluate “Cash Flow” for each country as a separate time-series. And then, we average their wMAD scores.

  2. Global level. We will first aggregate the “Cash Flow” of different countries and treat it as an aggregated time series. And then, we compute the wMAD scores.

The final score will be a weighted sum of 40% * country-level wMAD and 60% * global-level wMAD. More details could be found in the scorer.py shared in the forum. 

 

To make it more specific, we will go through the following steps during evaluation.

  1. We will take the actual values of the testing time frame (Aug’19 to Dec’19) and convert them into USD and the values for all the countries. This way we get aggregated values for all months (Aug’19 to Dec’19).

  2. Similarly, we will do the same above process for the predicted values

  3. We will calculate the wMAD values as shared earlier (Actual vs Predicted)

  4. Finally, we will rank the contestants mostly based on the wMAD score

Dataset Anonymization

‘Z-score’ is used to anonymize the real data. A variable can be uniquely identified by the triplet <Account-Mapping, Version, Market-Mapping>. For all the variables, following is the formula used to privatise the data:

 

            zi = (xi – μ) / σ

 

where zi = z-score of the ith value for the given variable

            xi  = actual value

            μ = mean of the given variable

            σ = standard deviation for the given variable

 

Other account and vendor names are replaced by their md5 values.

 
 


Final Submission Guidelines

Submission Format

Your submission must include the following items

  • The filled “test_sheet.csv” file.

  • A report about your model, including data analysis, model details, local train/test splits, k-fold cross validation results, and variable importance.

  • A deployment instruction about how to install required libs and how to run.

Expected in Submission

  1. Working Python code (could be Jupyter-Notebooks) which works on the different sets of data in the same format

  2. Report with clear explanation of all the steps taken to solve the challenge and how to run the code

  3. No hardcoding (e.g., hyper-parameters in your model must be adaptive to the training set) in the code is allowed. We will run the code on some different datasets

  4. All models in one code with clear inline comments

  5. Flexibility to extend the code to forecast for additional months

Judging Criteria

Your solution will be evaluated in a hybrid of quantitative and qualitative way.

  • Effectiveness (80%)

    • We will evaluate your forecasts by comparing it to the ground truth data.

    • The smaller the wMAD scores are, the better.

  • Clarity (10%)

    • The model is clearly described, with reasonable justifications about the choice.

  • Reproducibility (10%)

    • The results must be reproducible. We understand that there might be some randomness for ML models, but please try your best to keep the results the same or at least similar across different runs.

ELIGIBLE EVENTS:

2021 Topcoder(R) Open

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30150909