As of 2018, Wind Turbine Generators (WTGs) are the second largest source of renewable energy worldwide (after hydropower), and also the cleanest source of energy by its overall environmental impact. To ensure further widespread and adoption of WTGs, which helps to keep our planet green and pleasant home for all of us, an efficient and timely maintenance of WTGs is very important, as WTGs sites are often found in isolated locations, and through their entire service life they are subjected to tough environmental conditions: winds, rains, and sun are continuously damaging the machinery.
This PoC challenge is a part of a large data science project WTG Predictive Asset Management (PAM), Topcoder is proud to be a part of. Taking part in this competition you do not only get a chance to work on an important real-world problem, and win good prizes, but you also contribute to future well-being of our planet.
The customer’s renewable energy business has 3 wind sites and would like to improve the overall productivity of the wind turbines. At present customer's Asset Maintenance Strategy is centered around calendar-based maintenance and fixing the failure, after failure has already occurred. Now customer is planning to improve the overall productivity and reduce the number of failures by adopting predictive maintenance models. The idea is to predict the failure before the failure has occrued and take corrective action.
WTGs are equipped by Supervisory Control and Data Acquisition (SCADA) system, which consists of various sensors installed on each WTG to provide real time streams of data describing up-to-date status of different WTG sub-systems and components. When any WTG component breaks bad, SCADA allows to trigger an alarm, and shutdown the WTG until the maintenance crew arrives to the site and does necessary repairs. It results in WTG downtimes, and the desire is to use machine learning, or other data-science methods to forecast future WTG breakdowns, based on current and historic SCADA data and maintenance reports. Such possibility will allow to provide necessary maintenance proactively, while keeping its frequency and associated costs as low as possible.
The focus of this PoC is to explore abilities of unsupervised machine learning to detect anomalous behavior of WTGs at system, component, and sub-component level. Provided in the competitor pack are historic SCADA data corresponding to normal operation of a set of WTGs, of the same model and located at the same site. You should come up with an algorithm able to learn from the provided data about the normal operation patterns. Then, fed interactively with newer portions of data from similar sensors, it should understand when SCADA data suggest development of an abnormal situation, and report it as early as possible.
Included into the competitor pack (you'll find it in the challenge forum) are pre-filtered SCADA data for an array of 23 WTGs of the same model, located at the same site. The dataset consist of ~164k records and 241 columns of data in CSV format. The meaning of each column is specified in an additional legend file included into the pack. The data were filtered in such way that within at least one day from any record the behavior of that WTG was normal, in the sense that it was not down on a maintenance, nor any automatic alarm or warning happened within a day from any record. Apparently, it does not guarantee entirely that some of these data still hint about an abnormal behavior, which was not detected by the current WTG setup.
We try to not include examples of an abnormal behavior into the training data, as the primary purpose of this PoC is to explore the possibility to detect such rare anomalies for which no enough training data exist.
Also included into the competitor pack is a harness we are going to use for testing your solutions, and a sample dummy solution that just showcase the program interface we want you to implement in your solutions.
You will create and submit a source code for a Docker container, which we will build ourselves, that exposes a single ./solution command which takes as its only command line argument a path of working directory inside the container. The tester will start your container, and then it will iteratively call that ./solution command. Before each call it will put inside the specified working directory a chunk of SCADA data, about 100 records in the file chunk.csv. Your code should write into the working directory the file called result, where for each record from the input data chunk you’ll write a comma-separated list of numbers. The first one will be 0, if that record corresponds to the normal behavior of that WTG, or 1, if it shows an abnormal behavior. The following numbers in the row should specify 0-based indices of all SCADA columns that demonstrate the abnormal behavior in this case.
A few things to consider:
- The records provided in the input chunks will follow in chronological order, with 10 mins intervals between consecutive records for the same WTG. However, records corresponding for different WTGs will be mixed together, and there might be gaps between the data. Please, use assertions in your code to help figure out any problems with input data not following your algorithm assumptions.
- We’ll test your solutions on data from different years, where some SCADA columns, not present in the training data, might be present, while some of SCADA columns present in the training data may be missing. Be sure that your solution can handle this correctly, and use assertions to help detect any related problems.
- The tester keeps your container live between calls to the ./solution command, thus any data can be kept persistently inside the container between the calls. Also, you can use the mounted working directory to place some additional data there.
- You may use any technology stack, as long as it allows to create a solution behaving as specified above, and is covered by a permissive license, and is free to use. In case you aware of a solution that does not satisfy this requirement, please double-check with us in the challenge forum, or privately via the Contact Manager option in the Online Review system.
- Along with the source code of your solution, your submission should also include a write-up explaining your solution, including how to re-train it on different input data.
Submissions to this challenge will be scored semi-subjective. We will build submitted Docker containers, and run them using the tester and a different set of real SCADA data for the same WTGs and site, but different year. Then we will analyse the aggregated output of your solution, comparing its assessment of normal and abnormal WTG behavior against the ground truth data we have: the log of WTG maintenance downtimes, system alarms and warnings. We will look for the solutions that (i) recognize abnormal behavior of WTGs, especially for the rare failure & alarms; (ii) generates fewer false positive indications of abnormal behavior; (iii) performs fast enough for realtime use with the real systems. We will figure out the exact criteria during the review, and we’ll share the details of our review along with the challenge results