Challenge Overview
Problem Statement  
Yaw Alignment Marathon ChallengePrize Distribution
1st place  $17,000 IntroductionWe are looking for solutions that can compute yaw misalignment angle of wind turbines from analysis of Supervisory Control and Data Acquisition (SCADA) data. The submitted solutions will be validated using yaw misalignment values obtained from another data source. We hope to receive wonderful solutions. Good Luck! Important Notes: in order to avoid overfitting the provisional test, we will only provide the leaderboard after Nov 12, 2018 EDT. Requirements to Win a PrizeIn order to receive a prize, you must do the following:
If you place in the top 6, but fail to do all of the above, then you will not receive a prize, which will be awarded to the contestant with the next best performance who completed the submission requirements above. Note that your submission will be disqualified if you use validation data to train your model. BackgroundThe business objective is to optimize energy generation at each WTG (Wind Turbine Generator) by dynamic calculation of yaw misalignment angle. We seek a predictive model to calculate yaw misalignment angles in 10 minute intervals for the following 7 days using only historical SCADA, as well as to provide an optimal yaw misalignment correction value for the 7day period. In wind turbines, correct nacelle alignment to main wind direction is necessary for optimal power generation and thus maximize the annual energy production. Much evidence and analysis exists to suggest that a 4 degree deviation of the nacelle with respect to the true wind direction would result in an AEP (Annual Energy Production) loss of 1%. Ideally, the yaw misalignment angle should be 0 degrees, but we will allow solutions that produce correction values within 2 degrees of the true wind direction. A power curve derived from an operating wind turbine describes the relationship between its output power and different wind speeds at hub height. Power curves help in energy assessment and performance monitoring of wind turbines. I.e. they describe the relationship between wind speed and actual power generated. If the wind vector is perpendicular to the rotor area, the turbine performs optimally, but large inflow angles due to yaw misalignment compared to the plane of the rotor lead to lower performance. The figure below depicts a typical measured power curve, before and after yaw misalignment correction.
The improvement in generated power through yaw correction based on the dynamic yaw misalignment values given by your submitted algorithm/model will be measured using such power curves. Key Data challenge: We have SCADA data that covers a period of 5+ years but the other data source only characterizes a few turbines for a much shorter time period. Hence we would like to use SCADA data to build and train models that can be used to predict the yaw misalignment for any other turbine independently. Related Topcoder Challenge: Previously, we had launched a related ideation challenge. The winning solution provided an accurate model that used a small subset of data, but still relied on other data sources. You can find that submission in an attachment after you register for the Marathon challenge. ObjectiveIn this Marathon challenge, your submitted algorithm/model would need to produce yaw misalignment value predictions in 10 minute intervals for the following 7 days, i.e. it will generate a total of 1008 values. Your system should also generate a single optimum value from these 1008 values for next 7 days. Specifically, at a specific time T, you are given SCADA data before time T + 7 days for training. You need to estimate the yaw misalignment values for the time period of [T, T + 7 days] at 10 minute intervals, i.e., [T, T + 10 mins], [T + 10 mins, T + 20 mins], ��� [T + (10 mins * 1007) ; T + 7 days]. Then, you will need to aggregate these 1008 values into a single value. At WTG Unit level
Data DescriptionSCADA data of all turbines is being provided to the community. We will provide a few data points from the other data source that we mentioned above so that you can perform your own validation. The URL for downloading the data will be provided in the discussion forum. There are multiple columns in both types of data. We also provide codebooks in spreadsheet form that explain the meaning of the column names. The YMA (in degrees) column in the other data is the yaw misalignment you are going to estimate. Important Domain Knowledge on Ground Truth Below are a few possible approaches to identify target variable. Please note that this is one of the many approaches which can be used to solve the Business problem. The contestants are at complete liberty to use any SCADA tag as a target variable as long as doing so results in an algorithm/model which produces the most accurate result.
You are free to take either approach using SCADA as Target variable, provided it meets the validation and success criteria. Wake effect: Lat and Long details are given separately for your reference, which can be used to understand the distance between the turbines. Model Success Criteria
ImplementationIn your submission, you are asked to submit a link to a CSV file containing the weekly single value estimation for the all Turbines in the following time periods
The CSV file's header should be "Turbine", "Date Range", and "Weekly Estimation YAW Error". For example, Turbine,Date Range,Weekly Estimation YAW Error B01,19/05/2018 to 25/05/2018,1.23 B01,26/05/2018 to 01/06/2018,3.21 ... In the provisional test, we will evaluate the first 2 time periods; In the system test, we will evaluate the rest 4 time periods. In the VMsys test, we will reevaluate the rest 4 time periods. The link must be downloadable. One example is to use Dropbox. Once you copied the Dropbox link, you need to further modify the "dl=0" to "dl=1" at the end of the link. ScoringFor each test case, we will call predict exactly once. Based on the turbines where we have the ground truth labels, we calculate the Mean Square Error (MSE). For all test cases, we first calculate the average of the MSEs, and then take a square root of it. The square root is denoted as RMSE. The final score will be max(0, (5  RMSE) / 5 * 1,000,000). That is, when RMSE is greater than 5, you will receive a zero final score. [Required for Winning Solution only] Output FileThe details can be found at here.  
Definition  
 
Notes  
  This match is rated.  
  The allowed programming language is only Python and R.  
  The usage of external data and pretrained models are allowed, as long they meet the license requirements and free/unrestricted for commercial use.  
  Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about the problem itself, possible solution techniques or related to data analysis.  
  You must train the model based on and only based on the SCADA data before the date that we want to make predictions. You can hardcode some hyperparameters, but not parameters in the model itself.  
Examples  
0)  

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2020, TopCoder, Inc. All rights reserved.