1. Project Overview
The EPA is a U.S. federal government agency devoted to safeguarding the environment. One of the EPA's great concerns is the proliferation of cyanobacterial harmful blooms (cyanoHABs) in the nation's lakes. The following resources provide information on what cyanoHABs are and how they threaten the environment.
The TopCoder project on cyanoHABs aims to develop an algorithm that will be deployed in an Android app with mapping and data visualization capabilities. The app will inform local and federal policy makers about locations where bloom events are likely to occur, allowing them to concentrate their efforts in those areas.
2. Contest Overview
Welcome to first assembly in the series of five EPA Android App Architecture Contests. Each contest will run after the other and will be using the output of previous contest to build further. This is the best time to jump in and set up yourself for the series of 5 consecutive contests while working towards providing a complete system design for the EPA Android App.
In this contest, we are looking for you to develop the module assembly for data management module system of EPA Android App. We want you to use all the available information architecture in system architecture and application requirements specification to develop this module architecture. We will also provide access to the admin website that has been already developed and which will interact with this module. Finally, we will also provide you standardization component, post-processing tools and data management module architecture.
EPA Android App Data Management Module: This module will be responsible for doing all the data-processing tasks. There is a lot of data available in different formats - mostly geo-spatial data and hence it is required to be preprocessed before either sending it to front-end or sending to the back-end module for algorithm analysis. All the validation, conversion and standardization of data will be done in this module. In addition to other tasks, this module will also contain a component that will be used for data standardization. We have provided design document of the component. This module will have a lot of data processing and I/O intensive tasks. Please give very high importance to performance of the module as it is going to process large amounts of data.
Some key points to consider while developing this assembly
- The data management module can be logically divided into three parts: data validation, data standardization and data post-processing.
- Please follow architecture fully and try to match your application to the design as closely as possible.
- The newest version data standardization component is provided in the forums. We also need a couple of updates to this component which are described in forums. Please note the "MultiDataStandartizationUtil" class which is not mentioned in component specification. You need to include that class in your submission too.
- Post-processing part will be done on the data that comes out of data standardization component mentioned above. All the data that comes out of standardization component will be in csv format. After post processing, it will still remain in the same file format (csv). The tools that needs to be embedded in the module for post-processing part are provided in the forums. (Please ask in forums on any clarifications related to these tools)
3. Data Details
We have two types of input data:
- Image Data : GeoTiff images which has geo-encoded information
- Text Data: Excel files containing data (both xls and xlsx formats) and dbf files containing metadata.
The data is in very large amounts.
4. Technology Overview
- Java 6
- Amazon EC2 Server
- Spring 3.2.8: http://www.springsource.org/
- MySQL 5.7: http://dev.mysql.com/
- OpenJPA 2.3: http://openjpa.apache.org/
- Log4j 1.2.17: http://logging.apache.org/log4j/
- Velocity 1.7: http://velocity.apache.org/
- GDAL: http://www.gdal.org/
- GDAL Java: http://trac.osgeo.org/gdal/wiki/GdalOgrInJava
- Apache Commons Net 3.3: http://commons.apache.org/proper/commons-net/
- Apache Commons IO 2.4: http://commons.apache.org/proper/commons-io/
- Quartz Scheduler 2.2.1: http://quartz-scheduler.org/
Please note that the developers are not allowed to use any component from TC Catalog for this contest.
4. Process Flow and Storage Considerations
Please follow the Architecture Design completely for process flows and storage.
5. Resources Provided
The following resources have been provided in the forums. You will be able to access it after registration:
1.) Data Management Module Architecture
2.) EPA Admin Website Prototype
3.) EPA Android App Data Standardization Component
4.) Data PostProcessing tools.
5) System Design Specification
6.) System Architecture TCUML
7.) Application Requirements Specification and Use Case TCUML
Please Note: In case of discrepancies, the System Architecture and ARS takes precedence over all other resources as they are the most update one. Please follow them and if there is still confusion, please ask in forums.