Submit a solution
The challenge is finished.

Challenge Overview

Problem Statement


Yaw Alignment Marathon Challenge

Prize Distribution

1st place - $17,000

2nd place - $7,000

3rd place - $4,000

4th place - $2,000

5th place - $1,000


We are looking for solutions that can compute yaw misalignment angle of wind turbines from analysis of Supervisory Control and Data Acquisition (SCADA) data. The submitted solutions will be validated using yaw misalignment values obtained from another data source. We hope to receive wonderful solutions. Good Luck!

Important Notes: in order to avoid overfitting the provisional test, we will only provide the leaderboard after Nov 12, 2018 EDT.

Requirements to Win a Prize

In order to receive a prize, you must do the following:

  1. Achieve a score in the top 6, according to system test results. The score must be higher than the baseline result (i.e., 200,000). See the "Data Description and Scoring" section below.
  2. We will run an additional round of system test. We refer it as a VM-sys test. Top-6 contestants according to the system test will be asked to submit their codes and a step-by-step deployment guide within 48 hours. We will evaluate them in an AWS EC2 instance. (p2.xlarge if GPU is required, otherwise c5.2xlarge). The main purpose is to avoid look-ahead bias by utilizing any future data.
    • The code must be in Python or R.
    • You should make sure the licenses of all dependent libraries are commercially friendly.
    • You must provide a bash script, i.e.,, which takes two arguments as input: (1) a folder name that contains the SCADA data. There will be a couple csv files under that folder, containing all data before a certain date. The csv filenames are of different time periods and named after the starting dates (YYYY-MM-DD). Currently, they are divided at a yearly basis. If you would like to further divide them into a fine-grain level, (e.g., 3 months or 6 months), please state this in your submission. We will try to accommodate your requests (or you can put some script to generate them). (2) the output txt filename of the prediction. It has 3*N lines, where N is the number of turbines. For every turbine, the first line is the turbine name; the second line must be next week's single value prediction; and the third line contains 1008 values separated by spaces representing the 10-min values for the following week. The turbines must be sorted in the alphabetical order.
    • The whole script must be finished within 1 hour.
  3. The contest winners are determined based on the VM-sys test's results, i.e., a re-ranking of the top 6.
    • In the VM-sys test results if you achieve score more than 200,000 but within 600,000, you will receive 30% of the corresponding prize money.
    • If you achieve 600,000 or more, you will receive full prize money according to your position.
  4. Within 7 days from the announcement of the contest winners, submit a complete 2-page (minimum) report that: (1) outlines your final algorithm and (2) explains the logic behind and steps of your approach. Additionally, you should bundle your code and write a deployment guide so that we can run it easily on the Azure databricks platforms. The output file should follow the format described in the output file section below.

If you place in the top 6, but fail to do all of the above, then you will not receive a prize, which will be awarded to the contestant with the next best performance who completed the submission requirements above.

Note that your submission will be disqualified if you use validation data to train your model.


The business objective is to optimize energy generation at each WTG (Wind Turbine Generator) by dynamic calculation of yaw misalignment angle. We seek a predictive model to calculate yaw misalignment angles in 10 minute intervals for the following 7 days using only historical SCADA, as well as to provide an optimal yaw misalignment correction value for the 7-day period.

In wind turbines, correct nacelle alignment to main wind direction is necessary for optimal power generation and thus maximize the annual energy production. Much evidence and analysis exists to suggest that a 4 degree deviation of the nacelle with respect to the true wind direction would result in an AEP (Annual Energy Production) loss of 1%. Ideally, the yaw misalignment angle should be 0 degrees, but we will allow solutions that produce correction values within 2 degrees of the true wind direction.

A power curve derived from an operating wind turbine describes the relationship between its output power and different wind speeds at hub height. Power curves help in energy assessment and performance monitoring of wind turbines. I.e. they describe the relationship between wind speed and actual power generated. If the wind vector is perpendicular to the rotor area, the turbine performs optimally, but large inflow angles due to yaw misalignment compared to the plane of the rotor lead to lower performance. The figure below depicts a typical measured power curve, before and after yaw misalignment correction.

The improvement in generated power through yaw correction based on the dynamic yaw misalignment values given by your submitted algorithm/model will be measured using such power curves.

Key Data challenge: We have SCADA data that covers a period of 5+ years but the other data source only characterizes a few turbines for a much shorter time period. Hence we would like to use SCADA data to build and train models that can be used to predict the yaw misalignment for any other turbine independently.

Related Topcoder Challenge: Previously, we had launched a related ideation challenge. The winning solution provided an accurate model that used a small subset of data, but still relied on other data sources. You can find that submission in an attachment after you register for the Marathon challenge.


In this Marathon challenge, your submitted algorithm/model would need to produce yaw misalignment value predictions in 10 minute intervals for the following 7 days, i.e. it will generate a total of 1008 values. Your system should also generate a single optimum value from these 1008 values for next 7 days. Specifically, at a specific time T, you are given SCADA data before time T + 7 days for training. You need to estimate the yaw misalignment values for the time period of [T, T + 7 days] at 10 minute intervals, i.e., [T, T + 10 mins], [T + 10 mins, T + 20 mins], ��� [T + (10 mins * 1007) ; T + 7 days]. Then, you will need to aggregate these 1008 values into a single value.

At WTG Unit level

  • Using 10 min aggregated historical SCADA data Yaw misalignment angle A (as shown in Fig 1) needs to be calculated for the time period of [T, T + 7 days].

    (A+A') tends toward ��E Where

    • A = Yaw misalignment angle
    • A' = Wind direction measured by wind vane recorded by SCADA (Relative Wind Direction)
    • ��E = True wind direction with respect to the nacelle (Total Yaw Error)

  • A' is used by auto yawing mechanisms that control wind turbine alignment to wind direction as measured by the wind vane.
  • The effect of the wake is to be incorporated. Since the turbines extract energy from the wind, the wind leaving the turbine blades have lower kinetic energy compared to the free wind passing by the blades. The model should consider the effect of wake in yaw misalignment calculation.

Data Description

SCADA data of all turbines is being provided to the community. We will provide a few data points from the other data source that we mentioned above so that you can perform your own validation. The URL for downloading the data will be provided in the discussion forum.

There are multiple columns in both types of data. We also provide codebooks in spreadsheet form that explain the meaning of the column names. The YMA (in degrees) column in the other data is the yaw misalignment you are going to estimate.

Important Domain Knowledge on Ground Truth

Below are a few possible approaches to identify target variable. Please note that this is one of the many approaches which can be used to solve the Business problem. The contestants are at complete liberty to use any SCADA tag as a target variable as long as doing so results in an algorithm/model which produces the most accurate result.

  1. In the absence of Target Variable (Yaw Misalignment) in SCADA data we can possibly assume that the top 10% of the available records sorted in descending order of Active Power for each Wind speed bin have minimum yaw misalignment. We can take the RWD (Relative Wind Direction) for these records as the ground truth (i.e. the yaw misalignment)

    Detailed analysis on this is shared as a separate PDF document. (Pdf file) and we tested this using sample data

    RWD in SCADA is the Relative Wind Direction w.r.t. Nacelle Position.

    Pmax explanation

    • ��E is the Total Yaw error and not Yaw misalignment (denoted by angle A in the diagram). The difference between two is as follows:
      • ��E (Total Yaw error) = Angle b/w Nacelle position and True wind direction.
      • A (Yaw Offset) = Angle b/w True Wind Direction and Relative Wind Direction

    P ��� Pmax only when ��E���0;

    By definition ��E is the angle b/w NP and True Wind Direction (TWD) and when ��E ��� 0 NP = TWD; A is caused by wind vane error so it won't be 0 in figure 2 (rt side)

    • Total Yaw misalignment gets partially corrected with OEMs auto Yawing mechanism. The remaining error is due to the inaccuracy of measurement of Wind direction by Wind vane and is termed as Yaw Offset which we need to correct
    • No Data point in SCADA gives the wind wane error (Yaw offset) or True wind direction that���s why as a probable solution we thought of following approach:
      • When the total ��E (Total Yaw error) ��� 0 the inaccuracy of Wind vane measurement is still there which needs to captured
      • When the total ��E (Total Yaw error) ��� 0 , Nacelle Position = True Wind direction
      • We can capture Relative wind direction (RWD) for these records and train the model using this RWD as Target variable
      • In fig. 2 Total Yaw error is 0 however Yaw Offset is not 0
  2. A similar approach using SCADA Pmax value was brought to our attention through the ideation challenge (see, however we have not created and tested the ground truth using that ideation approach, as this approach encourages use of different bin sizes of Wind Speed and Yaw angle to arrive at the best Power generated bucket.

You are free to take either approach using SCADA as Target variable, provided it meets the validation and success criteria.

Wake effect: Lat and Long details are given separately for your reference, which can be used to understand the distance between the turbines.

Model Success Criteria

  1. We will evaluate your model using SCADA data only. The input will be SCADA data ONLY and the output will be compared against validation data.
  2. The predictions produced by your system need to be within 0 to 2 degrees of variance either side compared to our weekly validation data
  3. With wake effect consideration


In your submission, you are asked to submit a link to a CSV file containing the weekly single value estimation for the all Turbines in the following time periods

  • 19/05/2018 to 25/05/2018
  • 26/05/2018 to 01/06/2018
  • 02/06/2018 to 08/06/2018
  • 09/06/2018 to 15/06/2018
  • 24/08/2018 to 30/08/2018
  • 28/08/2018 to 03/09/2018

The CSV file's header should be "Turbine", "Date Range", and "Weekly Estimation YAW Error". For example,

        Turbine,Date Range,Weekly Estimation YAW Error
        B01,19/05/2018 to 25/05/2018,1.23
        B01,26/05/2018 to 01/06/2018,-3.21

In the provisional test, we will evaluate the first 2 time periods; In the system test, we will evaluate the rest 4 time periods. In the VM-sys test, we will re-evaluate the rest 4 time periods.

The link must be downloadable. One example is to use Dropbox. Once you copied the Dropbox link, you need to further modify the "dl=0" to "dl=1" at the end of the link.


For each test case, we will call predict exactly once. Based on the turbines where we have the ground truth labels, we calculate the Mean Square Error (MSE). For all test cases, we first calculate the average of the MSEs, and then take a square root of it. The square root is denoted as RMSE.

The final score will be max(0, (5 - RMSE) / 5 * 1,000,000). That is, when RMSE is greater than 5, you will receive a zero final score.

[Required for Winning Solution only] Output File

The details can be found at here.



Method signature:String getURL()
(be sure your method is public)


-This match is rated.
-The allowed programming language is only Python and R.
-The usage of external data and pre-trained models are allowed, as long they meet the license requirements and free/unrestricted for commercial use.
-Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about the problem itself, possible solution techniques or related to data analysis.
-You must train the model based on and only based on the SCADA data before the date that we want to make predictions. You can hardcode some hyperparameters, but not parameters in the model itself.


Returns: "Seed: 1"

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2020, TopCoder, Inc. All rights reserved.