Challenge Overview
Remark
This sprint is part of the second phase of the Rodeo II Challenge, which follows after the previous marathon match. Big part of the problem statement is the same, but there are also some major changes. It is not required that you took part in the previous marathon match to take part in this sprint.
Introduction
Water managers need more skillful information on weather and climate conditions to help efficiently utilize water resources to reduce the impact of hydrologic variations. Examples of hydrologic variations include the onset of drought or a wet weather extreme. Lacking skillful sub-seasonal information limits water managers’ ability to prepare for shifts in hydrologic regimes and can pose major threats to their ability to manage the valuable resource.
The challenge of sub-seasonal forecasting encompasses the lead times of 15 to 45 days into the future, which lay between those of weather forecasting (i.e. up to 15 days, where initial ocean and atmospheric conditions matter most) and seasonal to longer-lead climate forecasting (i.e. beyond 45 days, where slowly varying earth system conditions matter most, such as sea surface temperatures, soil moisture, snow pack).
The Rodeo II Challenge series is a continuation of Sub-Seasonal Climate Forecast Rodeo I contest. In December 2016, the US Bureau of Reclamation launched Rodeo I. The primary component of Rodeo I was a year of submitting forecasts every 2 weeks in real-time. Teams were ranked on their performance over the year and needed to outperform two benchmark forecasts from NOAA. Additional prize eligibility requirements included a method documentation summary, 11-year hind-casts, and code testing.
The current challenge is the second step in a series of contests that builds on the results of Rodeo I. This document is the problem specification of 4 × 26 (in total, 104) recurring data science challenges (sprints) that aim to create good quality predictive algorithms on weather data over the next full year.
There will be 4 independent tracks (the same as in the marathon match) - in each you will be solving a specific task. Each track will be split into 26 sprints, with each sprint lasting 2 weeks. There will be separate prizes for each sprint. There will also be quarterly and overall (annual) bonus prizes. To be eligible for the quarterly and annual prizes, you must outperform both the winning solution for that category from Rodeo I and a sub-seasonal forecast from NOAA.
Prize Structure
The total sum of all prizes is $720,000!
-
Each sprint, each track (26 × 4; no need to beat NOAA nor Rodeo I score):
-
1st $500
-
2nd $350
-
3rd $250
-
4th $175
-
5th $100
-
-
Each quarterly bonus (4 × 4; awarded to competitors who's score calculated over the quarter period is higher than NOAA and Rodeo I score for the same period)
-
1st $6,000
-
2nd $4,500
-
3rd $3,000
-
4th $2,000
-
5th $1,000
-
6th $250
-
7th $250
-
8th $250
-
9th $250
-
10th $250
-
-
Overall bonus (each track; awarded to competitors who's score calculated over the 52 weeks is higher than NOAA and Rodeo I score for the same period):
-
1st $25,000
-
2nd $17,500
-
3rd $10,000
-
4th $7,500
-
5th $5,000
-
6th $2,500
-
7th $2,000
-
8th $1,500
-
9th $1,250
-
10th $1,000
-
The sum of any bonus funds that are not awarded will be reallocated to later prizes purses. 50% of it will go to the next quarter, and the other 50% will go towards the overall prize. Any funds not awarded during the overall bonus evaluation (when there are less than 10 competitors beating NOAA and Rodeo I score) will be redistributed to the winning positions.
Task Overview
In the 4 tracks your task is to predict the following variables:
-
"temp34": 14-day average of temperature using a forecast outlook of 15-28 days (weeks 3-4),
-
"prec34": 14-day total precipitation using a forecast outlook of 15-28 days (weeks 3-4),
-
"temp56": 14-day average of temperature using a forecast outlook of 29-42 days (weeks 5-6),
-
"prec56": 14-day total precipitation using a forecast outlook of 29-42 days (weeks 5-6).
Technically these 4 tasks are organized as 4 × 26 = 104 different contests, and you are free to participate in any number of them. The 4 tracks are identical in most aspects: data, challenge specification, contest forum, schedule, etc., but they have individual leader boards and set of prizes. Also the 26 contests in the same track have individual leader boards and prizes, but to be eligible for quarterly and overall bonuses, you should take part in all the corresponding contests. This contest is about the "temp34" task, period #5.
The quality of your algorithm will be judged by how closely the predicted weather data matches the actual, measured values. See Scoring for details.
Schedule of Sprints (current sprint and track highlighted; all prediction ranges start at 00:00 UTC)
Period Submission Deadline “temp34”/“prec34” “temp56”/“prec56” Quarter
# Prediction Range Prediction Range
1 2019-10-15 00:00 UTC Oct 29 - Nov 11 Nov 12 - Nov 25 Q1
2 2019-10-29 00:00 UTC Nov 12 - Nov 25 Nov 26 - Dec 9 Q1
3 2019-11-12 00:00 UTC Nov 26 - Dec 9 Dec 10 - Dec 23 Q1
4 2019-11-26 00:00 UTC Dec 10 - Dec 23 Dec 24 - Jan 6 Q1
5 2019-12-10 00:00 UTC Dec 24 - Jan 6 Jan 7 - Jan 20 Q1
6 2019-12-24 00:00 UTC Jan 7 - Jan 20 Jan 21 - Feb 3 Q1
7 2020-01-07 00:00 UTC Jan 21 - Feb 3 Feb 4 - Feb 17 Q1
Q1 Bonus Evaluation
8 2020-01-21 00:00 UTC Feb 4 - Feb 17 Feb 18 - Mar 2 Q2
9 2020-02-04 00:00 UTC Feb 18 - Mar 2 Mar 3 - Mar 16 Q2
10 2020-02-18 00:00 UTC Mar 3 - Mar 16 Mar 17 - Mar 30 Q2
11 2020-03-03 00:00 UTC Mar 17 - Mar 30 Mar 31 - Apr 13 Q2
12 2020-03-17 00:00 UTC Mar 31 - Apr 13 Apr 14 - Apr 27 Q2
13 2020-03-31 00:00 UTC Apr 14 - Apr 27 Apr 28 - May 11 Q2
Q2 Bonus Evaluation
14 2020-04-14 00:00 UTC Apr 28 - May 11 May 12 - May 25 Q3
15 2020-04-28 00:00 UTC May 12 - May 25 May 26 - Jun 8 Q3
16 2020-05-12 00:00 UTC May 26 - Jun 8 Jun 9 - Jun 22 Q3
17 2020-05-26 00:00 UTC Jun 9 - Jun 22 Jun 23 - Jul 6 Q3
18 2020-06-09 00:00 UTC Jun 23 - Jul 6 Jul 7 - Jul 20 Q3
19 2020-06-23 00:00 UTC Jul 7 - Jul 20 Jul 21 - Aug 3 Q3
Q3 Bonus Evaluation
20 2020-07-07 00:00 UTC Jul 21 - Aug 3 Aug 4 - Aug 17 Q4
21 2020-07-21 00:00 UTC Aug 4 - Aug 17 Aug 18 - Aug 31 Q4
22 2020-08-04 00:00 UTC Aug 18 - Aug 31 Sep 1 - Sep 14 Q4
23 2020-08-18 00:00 UTC Sep 1 - Sep 14 Sep 15 - Sep 28 Q4
24 2020-09-01 00:00 UTC Sep 15 - Sep 28 Sep 29 - Oct 12 Q4
25 2020-09-15 00:00 UTC Sep 29 - Oct 12 Oct 13 - Oct 26 Q4
26 2020-09-29 00:00 UTC Oct 13 - Oct 26 Oct 27 - Nov 9 Q4
Q4 and Final (Overall) Bonus Evaluation
Note that since we will evaluate the solutions on the live weather data, the results will be announced only after the corresponding prediction range elapses.
Input Data
There is no official training data set. You are free to use any data. E. g., you can use the Subseasonal Rodeo data set, which the data files used in the previous marathon match were created from. The data set is described in detail in this publication, it also gives pointers to the original sources of the data where further information on each data file can be found. Notice especially the section A.2 (page 10) with the list of several sources, some of which are updated daily with actual measurements.
Ground Truth Data
We will use the following sources to generate ground truth for scoring:
-
Temperature: ftp://ftp.cpc.ncep.noaa.gov/precip/PEOPLE/wd52ws/global_temp/
-
Precipitation: ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/GAUGE_GLB/
These sources contain daily temperature and precipitation values over the entire globe with resolution 0.5° × 0.5°. The global field starts from the 0.5° × 0.5° lat/lon grid box centering at (lat, lon) = (−89.75, 0.25), going from west to east first, and then south to north. Together, there are 720 × 360 = 259,200 grid boxes.
The first source (for temperature) contains for each year one file with name CPC_GLOBAL_T_V0.x_0.5deg.lnx.<year>.gz. After extracting, you get the file CPC_GLOBAL_T_V0.x_0.5deg.lnx.<year>. This is a raw binary file of size
<number of days> × 4 × 259200 × 4 bytes.
For a regular year, the size is 365 × 4 × 259,200 × 4 = 1,513,728,000 bytes, and for a leap year, the size is 366 × 4 × 259,200 × 4 = 1,517,875,200 bytes.
The file contains single-precision floating-point values (i. e., 4 bytes per value) with little-endian ordering. Each block of consecutive 4 × 259,200 × 4 = 4,147,200 bytes (when reading the blocks from the beginning of the file) corresponds to a single day of the year. For each day, the block can be split into 4 sub-blocks of size 259,200 × 4 = 1,036,800 bytes. Each of these 4 sub-blocks contains the following 720 × 360 = 259,200 values (ordered as described a few paragraphs above):
-
1st sub-block: tmax values, daily maximum temperature in °C
-
2nd sub-block: nmax values, not used for ground truth calculation
-
3rd sub-block: tmin values, daily minimum temperature in °C
-
4th sub-block: nmin values, not used for ground truth calculation
Missing values are represented as -999.
For the current year, the file is updated daily and it is not compressed (no .gz extension). E. g., on Sep 10 in 2019 at 8:00 GMT, which was the 253rd day of the year, the file CPC_GLOBAL_T_V0.x_0.5deg.lnx.2019 contained regular values only in the first 251 blocks (there is some time lag between the measurements are recorded and the file is updated). The other blocks were filled with -999.
The second source (for precipitation) contains a subfolder for each year. E. g., the data for 2019 is in RT/2019/ folder. There is a separate file for each day with the name PRCP_CU_GAUGE_V1.0GLB_0.50deg.lnx.<YYYYMMDD>.RT, where <YYYYMMDD> is the respective date. This is a raw binary file of size 2 × 259,200 × 4 = 2,073,600 bytes.
The file contains single-precision floating-point values (i. e., 4 bytes per value) with little-endian ordering. The file can be split into 2 sub-blocks of size 259,200 × 4 = 1,036,800 bytes. Each of these 2 sub-blocks contains the following 720 × 360 = 259,200 values (ordered in the same way as for temperature):
-
1st sub-block: prec, daily precipitation in 0.1mm units (e. g., the value 123.45 means 12.345 mm of daily precipitation
-
2nd sub-block: number of gauges available in the grid box; not used for ground truth calculation
Also here, missing values are represented as -999.
The lat/lon resolution of the source grid is 0.5° × 0.5°, while the target contest resolution is 1° × 1°. Therefore, the interpolation described in the mentioned publication (section A.1, page 10) is performed for all the 3 relevant variables (tmax, tmin, prec). E. g., when calculating values for the target point (lat, lon) = (40, 253), we take the average of the values at points (39.75, 252.75), (39.75, 253.25), (40.25, 252.75), (40.25, 253.25), weighted by the cosine of the latitude in radians.
Finally, to obtain the ground truth values:
-
For “temp34” and “temp56”, we calculate tmaxAvg and tminAvg as the average of the 14 tmax and tmin values, respectively. Then we take (tmaxAvg + tminAvg)/2.
-
For “prec34” and “prec56”, we calculate precSum as the sum of the 14 prec values. Then we take precSum/10 (the final division by 10 is the conversion of units from 0.1mm to 1mm).
We provide the tool to generate ground truth values.