Challenge Overview

As of 2018, Wind Turbine Generators (WTGs) are the second largest source of renewable energy worldwide (after hydropower), and also the cleanest source of energy by its overall environmental impact. To ensure further widespread and adoption of WTGs, which helps to keep our planet green and pleasant home for all of us, an efficient and timely maintenance of WTGs is very important, as WTGs sites are often found in isolated locations, and through their entire service life they are subjected to tough environmental conditions: winds, rains, and sun are continuously damaging the machinery. Taking part in this match, not only you will work on an important real-world problem, and have a chance to win good prizes, but you also contribute to the well-being of our planet.

The customer’s renewable energy business has three WTG sites, and would like to improve the overall productivity of the wind turbines. At present customer's Asset Maintenance Strategy is centered around calendar-based maintenance and fixing a failure, after the failure has already occurred. Now customer is planning to improve the overall productivity and reduce the number of failures by adopting predictive maintenance models. The idea is to predict the failure before the failure has occurred and take corrective action.

This match is rated, and TCO19 eligible. Everybody who overcomes the prize threshold in the system test will get TCO19 points according to these rules.

Prizes: $8,000 / $6,000 / $4,000 / $2,000 / $1,000 / $800 / $600 / $400 / $200

Also additional bonuses of $1,000 / $750 / $250 are offered to Top submissions implemented in Python 3.5+, subjected to conditions specified in the Submission Format section. It is important you read the Prize eligibility expectations for the final delivery and SAVE all your analysis work


WTGs are equipped by Supervisory Control and Data Acquisition (SCADA) system, which consists of various sensors installed on each WTG to provide real time streams of telemetry data describing the up-to-date status of different WTG sub-systems and components.

From maintenance reports, filed by engineers after scheduled routine equipment checks, as well as after repairs done following unexpected equipment failures, we have historic data on past technical issues.

In this marathon match you need  to predict future technical failures and issues based on the incoming stream of telemetry data.

Also calculate RUL (Remaining Useful Life) based on current model run date and future failure dates for turbines, System, sub system until sub component level. Model will run daily and the RUL should be calculated on a daily basis.

Training Data and Competitor Pack

In the Competitor Pack, provided in the match forum (~900 MB download), you will find the training, and supplementary data, along with the local tester.

The training dataset (located inside tester/data/training folder) includes the following files:

  • input.csv - 2013-15 SCADA data for customer’s WTGs;

  • DataDescription.xlsx - data dictionary for SCADA data, explaining the meaning of each dataset column;

  • ground-truth.csv - the list of failure events in 2013-15 you are expected to predict. Each event record consists of date & time (in YYYY-MM-DD HH:MM:SS format), turbine ID, name of the turbine system, subsystem, component, and sub-component where the failure occurred. Notice that some events may be attributed at subsystem or component level only, thus they have empty values in the last columns;

  • valid-failures.csv - each entry in this file lists a valid combination of system, subsystem, component, and sub-component names that can be present in the ground truth and your forecasts. Any other combination will be considered incorrect by our tester and will lead to failure of your submission.

    BEWARE: All these names are case-sensitive, and not trimmed by the tester; i.e. there are a few cases where a whitespace follows a name, and that whitespace must not be trimmed in your outputs for the entry being considered as valid.

The supplementary_data folder contains additional datasets related to the problem, which the client feels are valuable for solving the problem.

The POC_winners folder contains two winning solutions from the related Abnormal Behavior Detection PoC challenge. In that challenge we asked competitors to come up with solutions relying on unsupervised learning to detect / predict anomalies in the SCADA data. We wonder, whether these solutions might be used to enhance short-time prediction of failures that have none or few occurrences in the training data. You may freely use these submissions or any insights from them in your solution, if that helps you to achieve better scores.

The local tester (you will find its detailed usage instructions inside its tester/ will benchmark your solution, provided in the form of a Docker container sources. For the benchmark it will iteratively feed input.csv data into your solution by consecutive one-day data chunks; after providing each chunk, it will fetch the failure forecast from your solution; and then score it according to the criteria explained below.

Submission Format

As mentioned above, in this match you should prepare a dockerized solution, which may use any technologies available under permissive open-source licenses. The client’s language of choice is Python 3.5. If you can develop your solution with Python, and have no strong reasons to use a different language, we strongly encourage you to prefer Python 3.5 and above. We will pay $1,000 / $750 / $250 bonuses for solutions written in Python 3.5 (or newer) that are either within Top-3 winning places, or outside of them, but still score no less than 90% of the winner score. If you are new to Docker, and need related help, ask in the dedicated thread in the match forum. Also, have a look at the sample dummy solution included into the competitor pack.

Once the Docker container with your solution is built, it must implement the command solution inside it, which will take a single command line argument: the working directory path. When solution /path/to/workdir is executed inside your container, it should read chunk.csv file from that working directory inside the container (it will have the same format as input.csv, but just a single day worth of data), process it, and write its forecast about future failures into output.csv file (having the same format as ground-truth.csv) in the same working directory, overwriting this file on each solution call. A few important points beside it:
  • The tester will keep your container alive between invocations of solution, thus any data/state of your model inside the container is persistent between the invocations. The content of working directory is also persistent, with exception of chunk.csv, which will be overwritten by tester between the invocations, and output.csv, which will be overwritten by your solution;

  • Any record you output to the forecast must have its date and time after the last date in the last chunk.csv; the turbine ID which was already present in chunk.csv files passed into your solution, and the combination of system, sub-system, component, and sub-component values that is present in the valid-failures.csv. If any of these conditions are violated at any moment, the benchmarking of your solution will be interrupted at that point.

To submit your solution to the online provisional testing you will ZIP the sources of your dockerized solution, so that your Dockerfile is present in the root of ZIP archive (i.e. when your archive is opened the Dockerfile should be right there at the top-level, and not in a nested folder contained in the ZIP); and submit that ZIP into Topcoder system. The online tester works the same way as the local one. The online scorer limits the maximum runtime of your solution by 6 hours. No strict limit of submission frequency is implemented as of now, but we encourage you to not abuse it: if we see that somebody submits too often, i.e. you make multiple new submissions before getting the result from the previously submitted once, we reserve the right to warn you, remove some of your submissions from the scoring queue, and set the limit for submssion frequency for everyone.  If the benchmark has not completed by this time, the remaining part of the test will be skipped, and your final score will be based on the score part accumulated before the limit.

Score Metrics

At the time ti, i = 0 ... N - 1 (consider ti it as the latest timestamp in the last input chunk received before a forecast) for each failure type m = 0 ... M - 1 (by failure type we understand any possible combination of a valid turbine ID and system, subsystem, component, sub-component values), we define positive values  and  so that:

  •  is the time of the soonest failure of type m after the time ti, which is known from the ground truth data, provided that it happened before the end of ground truth dataset T, i.e. ; otherwise, if no future failures of type m are present in the ground truth, we define .

  • , similarly, is the time of the soonest failure of type m after the time ti, predicted by your solution when it was invoked after being provided with the ith chunk of input data. If you predicted no future failure of type m at that invocation, or you have predicted it at the time , then we define  instead.

We define the score component sim as:

  •  (i.e. when your prediction does not contradict the ground truth as far as it is known);

  •  otherwise. I.e., if you prediction diverges from the ground truth, then this score component will be between 0.0 and 1.0, based on the relative error in the predicted failure timestamp.

The overall score S is calculated as:

  •  (i.e. we average the partial scores over all invocations of your solution, and all possible failure types at each invocation). As mentioned before, there is a runtime limit for benchmarking of your solution. If hit, summation over i is interrupted at that point, and the score is calculated further from the sum accumulated so far.

  • . This formula just monotonously maps ssum into 0.0 - 100.0 range, so that scores ssum > 0.90 correspond to S within the range [33.0, 100.0].This is necessary, as failures in WTGs are rare events, compared to the normal functioning of the equipment, thus an optimistic solution that always predicts no failures gets ssum0.96 (as it is, naturally, the correct prediction most of the time). Thus, we use this logarithmic transformation to extend important range of ssumto the significant part of 0.0 - 100.0 range.

With this formula, the optimistic solution that always predicts no failures, scores S ~ 35.0 (33.16 on the entire training dataset; 38.96 on the provisional dataset; at the moment of writing we are still finalizing preparation of the final test dataset, and will have its score later). The prize and TCO19 points eligibility threshold will be the the score of optimistic solution +10%.

Prize Eligibility

To be eligible for monetary prizes, beside landing into prize-eligible position after the final testing, competitors will have to:

  • Within seven days after the announcement of the final results, provide a complete write-up on their approach and solution, including solution (re-)training procedure, Evaluation in ( Jupyter note books or equivalent code note books  ) for Feature selection , Model selection and Model finetuning;   Justification behind those data tag selection, Other modelling techniques used and why choose current approach, Dataset needs to be used for every model run

  • You need to share Cross validation results which you will doing part of modelling along with  any explicit assumptions about ground truth/input data for modelling, any data pre processing/data filter set; Clear documentation for all blocks of codes

  • We will share any additional predefined questions that needs to be answer part of your write up upfront

  • Winning Model should give output including RUL, i.e. add additional column to forecasts that contain the Remaining Useful Life for each failure event (the time difference between the predicted failure timestamp and the forecast moment).

  • If requested, re-train their solutions on the complete dataset (including data used for provisional and final scoring);