The EPA is a U.S. federal government agency devoted to safeguarding the environment. One of the EPA's great concerns is the proliferation of cyanobacterial harmful blooms (cyanoHABs) in the nation's lakes. The following resources provide information on what cyanoHABs are and how they threaten the environment.
The TopCoder project on cyanoHABs aims to develop an algorithm that will be deployed in an Android app with mapping and data visualization capabilities. The app will inform local and federal policy makers about locations where bloom events are likely to occur, allowing them to concentrate their efforts in those areas.
The EPA has provided us with two sets of cyanobacterial data spanning the time period from February 2009 to April 2012. One data set is synthetic and the other is empirical.
The synthetic data set is the MERIS-derived estimates of cyanobacterial concentration. We will refer to these as the MERIS estimates. This data is provided as a sequence of image files covering three regions of the United States:
- New England
These images were derived from satellite photographs by applying an experimental formula to estimate the concentration of cyanobacteria in each 300-by-300-meter area of the covered region. A sample image for each region is attached to this contest specification.
The empirical data set is called the onsite measurements of cyanobacterial concentration, which we will call the field measurements. This is a time series of field measurements taken at various locations within the same time span and the same regions covered by the MERIS estimates. The temporal and spatial coverage is very sparse. However, the field measurements are valuable because they are the only empirical readings that we can use to confirm the MERIS estimates.
Contest Requirements: Ancillary Data
The above data source provides us with the temporal and spatial information for predicting the cyano counts for a point in future. But this prediction does not consider any biological factors that would affect the overall behavior of cyano bloom formation and hence prediction done based on just the above data will not be much useful.
To provide biological relevance, we would like to use the "ancillary" data sets that would provide feature values for the biological factors affecting the cyano blooms.
These datasets will provide features related to the components that affect the cyano counts. Examples of such features are temperature, light, ground water level, etc. More details about cyano
bacteria can be obtained at:
Especially, the Cause, Prevention and Mitigation tab in the center of the page provides information about different factors that would affect the rise of cyano blooms. Also, according to research that has been done in this field till now, the major features that contribute to the cyano blooms have been found to be:
1.) Water Temperature
2.) Light Exposure
3.) Trophic status of aquatic system
Currently we have 6-7 data sources from where such data can be obtained. We want to work on following data sets in this contest.
1. Climate data: http://www.ncdc.noaa.gov/
2. Surface water, ground water, water quality and water use: http://waterdata.usgs.gov/nwis
3. Water Temperature, pH and Light Exposure: http://www.waterqualitydata.us/
4. EPA’s National Lakes Assessment: http://water.epa.gov/type/lakes/NLA_data.cfm
5. Precipitation and Radiation: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html
We need your help to collect data form these sources according to the following requirements and submit the collected data into single zip file:
1. The images must be of GeoTiff format as much as possible. If there is some data set where GeoTiff images are not available, please clearly mention that and also propose ideas to convert it into GeoTiff.
2. The images must cover the regions of Florida, New England and Ohio. We do not want any other regions. Please focus on these three.
3. We need images for the time frame of 2009-2012. But in case, this time frame is not available, please describe this and submit the images belonging ot latest available time frame.
4. Any granularity of time frame is fine i.e. it is ok if the images are available weekly, daily, monthly or just one per year.
PLEASE MAKE SURE IMAGES ARE ENCODED WITH GEOSPATIAL INFORMATION (For example, thermal data set must have thermal values for each coordinates encoded in the image)