EPA PM - Image Data Collection Content Creation Contest Part 1







    The challenge is finished.
    Show Deadlinesicon-arrow-up

    Challenge Overview

    Project background


    The EPA is a U.S. federal government agency devoted to safeguarding the environment. One of the EPA's great concerns is the profileration of cyanobacterial harmful blooms (cyanoHABs) in the nation's lakes. The following resources provide information on what cyanoHABs are and how they threaten the environment.


    The TopCoder project on cyanoHABs aims to develop an algorithm that will be deployed in an Android app with mapping and data visualization capabilities. The app will inform local and federal policy makers about locations where bloom events are likely to occur, allowing them to concentrate their efforts in those areas.

    Data sources

    The EPA has provided us with two sets of cyanobacterial data spanning the time period from February 2009 to April 2012. One data set is synthetic and the other is empirical.

    The synthetic data set is the MERIS-derived estimates of cyanobacterial concentration. We will refer to these as the MERIS estimates. This data is provided as a sequence of image files covering three regions of the United States:

    • New England
    • Ohio
    • Florida

    These images were derived from satellite photographs by applying an experimental formula to estimate the concentration of cyanobacteria in each 300-by-300-meter area of the covered region. A sample image for each region is attached to this contest specification.

    The empirical data set is called the onsite measurements of cyanobacterial concentration, which we will call the field measurements. This is a time series of field measurements taken at various locations within the same time span and the same regions covered by the MERIS estimates. The temporal and spatial coverage is very sparse. However, the field measurements are valuable because they are the only empirical readings that we can use to confirm the MERIS estimates.

    Contest Requirements: Ancillary Data

    The above data source provides us with the temporal and spatial information for predicting the cyano counts for a point in future. But this prediction does not consider any biological factors that would affect the overall behavior of cyano bloom formation and hence prediction done based on just the above data will not be much useful.

    To provide biological relevance, we would like to use the "ancillary" data sets that would provide feature values for the biological factors affecting the cyano blooms.

    These datasets will provide features related to the components that affect the cyano counts. Examples of such features are temperature, light, ground water level, etc. More details about cyano
    bacteria can be obtained at:


    Especially, the Cause, Prevention and Mitigation tab in the center of the page provides information about different factors that would affect the rise of cyano blooms. Also, according to research that has been done in this field till now, the major features that contribute to the cyano blooms have been found to be:

    1.) Water Temperature
    2.) Light Exposure
    3.) Trophic status of aquatic system

    Currently we have 6-7 data sources from where such data can be obtained. We want to work on following three in this contest.

    1. Crop Scape data: http://nassgeodata.gmu.edu/CropScape/
    2. Thermal Data: LandSat Images: http://landsat.usgs.gov/landsat8.php
    3. National Land Cover Data: http://www.mrlc.gov/nlcd2006.php

    We need your help to collect data form these sources according to the following requirements and submit the collected data into single zip file:

    Please note: We will provide sample images in the forums for your reference on the format of data we are looking for.

    1. The images must be of GeoTiff format as much as possible. If there is some data set where GeoTiff images are not available, please celarly mention that and also propose ideas to convert it into GeoTiff.

    2. The images must cover the  regions of Florida, New England and Ohio. We do not want any other regions. Please foucs on these three.

    3. We need images for the time frame of 2009-2012. But in case, this time frame is not available, please desribe this and submit the images belonging ot lates available time frame.

    4. Any granularity of time frame is fine i.e. it is ok if the images are available weekly, daily, monthly or just one per year.

    PLEASE MAKE SURE IMAGES ARE ENCODED WITH GEOSPATIAL INFORMATION (For example, thermal data set must have thermal values for each coordinates encoded in the image)

    enlightened Tip for Success: Ask early in forums when in doubt.

    Final Submission Guidelines

    Submission Format:

    1. Please submit a zip file containing images inside folders for each region.

    2. Please do not subimt extra regions around the corners, etc so that we have to slice image later on. Please be very precise in providing the images.

    3. Make sure to name the images such that we know how it is taken and when it is taken.

    4. Please specifiy the spatial and the temporal details for the images in read me i.e. timeframe, granularity of time frame, bounding box coordinates, etc.

    Review Guidelines:

    1. There is NO MILESTONE phase for this contest.

    2. There will be a screening phase but final review will be done by clients.





    Reliability Rating and Bonus

    For challenges that have a reliability bonus, the bonus depends on the reliability rating at the moment of registration for that project. A participant with no previous projects is considered to have no reliability rating, and therefore gets no bonus. Reliability bonus does not apply to Digital Run winnings. Since reliability rating is based on the past 15 projects, it can only have 15 discrete values.
    Read more.


    Final Review:

    Community Review Board


    User Sign-Off