Topcoder Challenge | Topcoder Community

Challenge Overview

Challenge Details

Welcome to the Beacon - Cash Flow Forecast Collaboration challenge! This is a new type of Topcoder data science challenge, called Beacon, for which detailed rules are explained below. In a few words, we ask for community feedback on the client problem, the feasibility to solve it through a Topcoder competition, and what parameters of that competition should be to amuse the community, and achieve the outcomes expected by the client.

Challenge Background

In the client’s Group, all Business Units (BUs) track their cash flow (CF), so the Group Finance team can monitor the overall Cash Flow (CF) performance. Due to the span between service delivery and payment to vendors, the Group faces the monthly challenge to correctly assess the impact of accounts payable (AP) on Cash Flow (CF). Can we develop a cash flow forecast model to identify the Group Cash Flow impact from Accounts Payable stemming from vendors in key countries?. Data drift and Model Drift are major problems in these types of models. Customer is looking for solutions to define, measure, control and predict drift in data and models. Drift happens over time and thus it becomes a key challenge which needs constant evaluation and modification.

Some Useful Definitions:

Accounts payable (standard definition) are amounts due to vendors or suppliers for goods or services received that have not yet been paid for. The sum of all outstanding amounts owed to vendors is shown as the accounts payable balance on the company's balance sheet.

Spend amount is the amount that client owe to vendors in local currency.

Cost is the cost incurred by the client.

Data

The data to be shared has sources from 1) Foreign Credit Reporting System (FCRS) - Account Payable, Cost; and 2) Procurement data (Spend Value), which are monthly data with a timeframe from Jan 2015 to July 2019.

Customer has provided us with the file Key_Countries_RU.csv. It has following information:

Reporting Units for Particular Country
Country

There are multiple reporting units for a particular country.

There are around a total 224 reporting units for approx. 68 countries in the client’s portfolio. Each country uses their local currency in the data. This project is focused on forecasts for five key countries.

USA
China
Germany
Switzerland
Ireland

The data given in a spreadsheet (“anonymized_detailing_finance_data.csv”) has following columns:

Country: Country (All 5 Key Countries data is present)
Date: Month_Year (Data from Jan 2015 to Mar 2020 is given for each key country)
Accounts Payable: Accounts Payable for any business events or movements
Cost: Cost incurred

Each country will have 63 data points (Jan 2015 to Mar 2020) for Account Payable as well as Cost. For different countries, the Account Payable and Cost are following different units. The aforementioned five key countries have different currencies so each country should have a separate unit. Specifically, USD (USA) / CNY (Yuan) / Euro (Germany/Ireland) / Switzerland (CHF). There is no standardized unit for all countries.

There is one more spreadsheet (masked_cleaned_aggregated_data.csv) having procurement related data/information:

Column1 (Reporting Unit Within a Country): We can have more than one Reporting Units for each country
Column2 (Account Number)
Column3 (Net Due Date): The last date time until when the payments needs to be done to the vendor
Column4 (Vendor Clearing Document Date): The actual date time when the payment was done to the vendor
Column5 (Posting Date): The date time from when the services/product from the vendor was used/purchased. This is the date of record posted on the SAP platform.
Column6 (Vendor Names)
Column7 (The Local Currency of Spend Value)
Column8 (Spend Value)

This file has complete procurement data with all the reporting units. However not all of them are required for forecasting - only five key countries/Reporting Unit data need to be filtered by looking at the file Key_Countries_RU.csv.

To map the reporting units from Key_Countries_RU.csv file (Reporting Units for Particular Country) to cleaned_aggregated_data.csv file, extract 2nd to 6th character of Column1 (Reporting Unit Within a country)

*For illustration purposes, we have given a sample of 10 rows from each data file.”

Task Detail

The client’s team has developed many models ranging from Econometric Time Series and Machine Learning models. The best champion model shows the better evaluation metric wMAD (weighted mean absolute deviation) is now chosen for production. It is a weighted mean of the absolute error |Y_i - \hat{Y_i}|, where the weights are also the absolute errors, i.e., \sum{|Y_i - \hat{Y_i}| * |Y_i - \hat{Y_i}|} / \sum{|Y_i - \hat{Y_i}|}. Please check “wMAD Example.xlsx” for an example.

Baseline performance numbers for these models varies across countries for different cycles. These numbers are relative and we can not really compare. (E.g. for Switzerland as the size of the reporting unit is bigger a deviation of 5M is good enough whereas for the US it may be 2M and for other countries it may be 500K.)

In this Beacon challenge, you are asked to think about the following questions, try to come up with potential solutions, and provide reasonings (either theoretical or empirical) to choose the best solution.

Data Drift / Model Drift: How can we define, measure, control and predict the data drift & model drift effectively? What different approaches are taken to solve this important problem so that we can clearly identify the data drift and model drift in the production environment when we industrialize? This is a key focus area of our customer.
Different from wMAD, what are other possible evaluation metrics? Could any of them be more helpful than wMAD? Could any of them provide additional insights?
If we are asked to anonymize the dataset, what kind of methods you will consider and which one do you prefer and why?
We have different countries in the dataset. Shall we build a unified model or separate models? If a unified model, how to incorporate the country information? Any pros and cons?
If we are asked to design the hold-out test, do you have any suggestions on approach & method?
How to analyze the other important drivers/factors?
What are other data elements that can be helpful in solving these types of problems more effectively?
What are the candidate models for this problem? In general, what are their pros and cons?
Is there any detailed illustration of your approach in the public domain (along with dataset, method, analysis, results, etc) If so, could you please provide reference?

Final Submission Guidelines

Beacon Challenge Rules

Once registered to the challenge, look into the challenge forum. A number of discussion threads are open there with questions about the feasibility of solving the client’s problem with the data they have, and meeting their expectations. If deemed feasible, the intention is to prepare and run the main competition as “first-to-finish” data science competition: the first solution to achieve the set performance threshold will win. As a part of the present Beacon challenge we want to discuss what the best way will be to benchmark solution performance, and what the winning threshold should be.

To participate in this Beacon competition you just provide your thoughts in the forum threads, and participate in the discussion there. You are encouraged to upvote or downvote the ideas of other participants (while keeping in mind Topcoder Code of Conduct). As the discussion progresses, the copilot will draft, iteratively elaborate, and share with you the challenge details for the main competition, with the idea that working together we can count on your feedback, and further improve them. This iterative work will continue until the discussion, and preparation converges to the final rules of the main challenge; or the project is deemed infeasible.

To award your participation in this Beacon competition, the total prize pool $2000 will be distributed by the copilot among the active and most useful contributions into the discussion, based on both on your up- and down-votes in the forum, and also based on the subjective copilot judgement. Please keep in mind, that participating in this Beacon competition you not only get the chance to earn some prize right away, but also contribute substantially to the future main challenge, which is beneficial for the entire data science segment of Topcoder community, and you personally, if you decide to take part in the main challenge.

[Beacon] Cash Flow Forecast Collaboration Challenge

Challenge Overview

Challenge Details

Challenge Background

Data

Task Detail

Final Submission Guidelines

Beacon Challenge Rules

Learn

Review style

Final Review

Approval

Challenge links

Toolbox

ID: 30145099