The Environmental Protection Agency (EPA) has asked TopCoder to develop an algorithm that predicts the occurrence of cyanobacterial blooms in U.S. lakes. We will do so by running a Marathon Match. Your task in this contest is to write a problem statement for the Marathon Match based on the available data sources.
The EPA is a U.S. federal government agency devoted to safeguarding the environment. One of the EPA's great concerns is the profileration of cyanobacterial harmful blooms (cyanoHABs) in the nation's lakes. The following resources provide information on what cyanoHABs are and how they threaten the environment.
The TopCoder project on cyanoHABs aims to develop an algorithm that will be deployed in an Android app with mapping and data visualization capabilities. The app will inform local and federal policy makers about locations where bloom events are likely to occur, allowing them to concentrate their efforts in those areas.
The EPA has provided us with two sets of cyanobacterial data spanning the time period from February 2009 to April 2012. One data set is synthetic and the other is empirical.
The synthetic data set is the MERIS-derived estimates of cyanobacterial concentration. We will refer to these as the MERIS estimates. This data is provided as a sequence of image files covering three regions of the United States:
- New England
These images were derived from satellite photographs by applying an experimental formula to estimate the concentration of cyanobacteria in each 300-by-300-meter area of the covered region. A sample image for each region is attached to this contest specification.
The empirical data set is called the onsite measurements of cyanobacterial concentration, which we will call the field measurements. This is a time series of field measurements taken at various locations within the same time span and the same regions covered by the MERIS estimates. The temporal and spatial coverage is very sparse. However, the field measurements are valuable because they are the only empirical readings that we can use to confirm the MERIS estimates.
In addition to the cyanobacterial data, we have several sets of data covering the same regions.
- Weather data: daily readings of temperature, air pressure, and other meteorological measurements
- National Land Cover 2006: a one-time survey of land usage (residential, industrial, agricultural)
- CropScape 2009-2012: annual surveys of what crops were cultivated in agricultural areas
The weather data is quite coarse, describing cells covering an area of one degree of latitude by one degree of longitude. The National Land Cover and CropScape data sets have a high resolution equaling that of the MERIS estimates. Agricultural data is important because the runoff from fertilizer use is the principal contributor to cyanobacterial growth.
The EPA has defined four levels of cyanobacterial concentration:
- Low: 10,000 to 109,999 cyanobacterial cells per milliliter
- Medium: 110,000 to 299,999 cells / mL
- High: 300,000 to 999,999 cells / mL
- Very High: 1,000,000 cells / mL or higher
The EPA's goals are to predict the following events at intervals of 7, 14, and 28 days into the future:
According to the EPA's calculations, the Low and Very High readings in the MERIS estimates are accurate and the readings at the intermediate levels (Medium and High) are not.
What to submit
In this contest, we are looking for the detailed ideas on how to conduct this marathon match from different aspects. This contest is NOT focused on just the creation of problem statements.
Please submit a document containing one or more Marathon Match contest ideas and problem statements. You must describe the input, output, and scoring formula to be used in the match. Also, describe in detail how you would like to conduct this marathon match and what setup will be necessary going into the marathon match. List all the components that you think needs to be built or data setups that needs to be done to conduct a successful marathon match for this contest.
If you are submitting several different ideas, please label them A, B, C, and so on.
In addition to writing prospective problem statements, you may add a section in which you describe the problems of coming up with the ideas and how you tried to address them. You may explain your decisions, offer alternate choices, and suggest further ways to improve the match.