1. Project Overview
The EPA is a U.S. federal government agency devoted to safeguarding the environment. One of the EPA's great concerns is the proliferation of cyanobacterial harmful blooms (cyanoHABs) in the nation's lakes. The following resources provide information on what cyanoHABs are and how they threaten the environment.
The TopCoder project on cyanoHABs aims to develop an algorithm that will be deployed in an Android app with mapping and data visualization capabilities. The app will inform local and federal policy makers about locations where bloom events are likely to occur, allowing them to concentrate their efforts in those areas.
2. Contest Overview
Welcome to first architecture in the series of four EPA Android App Architecture Contests. Each contest will run after the other and will be using the output of previous contest to build further. This is the best time to jump in and set up yourself for the series of 4 consecutive contests while working towards providing a complete system design for the EPA Android App.
In this contest, we are looking for you to design the module architecture for data management module system of EPA Android App. We want you to use all the available information architecture in form of wireframes, system architecture and application requirements specification to develop this module architecture. We will also provide access to the admin website that has been already developed and which will interact with this module. Finally, we will also provide you component design for a component that is required to be embedded into the data management module.
Following is the description for each of functionality of data management module:
EPA Android App Data Management Module: This module will be responsible for doing all the data-processing tasks. There is a lot of data available in different formats - mostly geo-spatial data and hence it is required to be preprocessed before either sending it to front-end or sending to the back-end module for algorithm analysis. All the validation, conversion and standardization of data will be done in this module. In addition to other tasks, this module will also contain a component that will be used for data standardization. We have provided design document of the component. This module will have a lot of data processing and I/O intensive tasks.
Some key points to consider while designing architecture for this module:
- It needs to fit the component which will be used for data standardization.
- As mentioned above, this module will receive data in different formats (mostly tif and xls files) and this module will convert them into csv files based on some constraints. These constraints will be covered by the component but that component will be embedded as a part of this module.
- This module will also validate the data that is received and see if that data is useful. For example, on cannot get any Geospatial information from jpg images. The images must be in GeoTiff format to be useful for data extraction.
- This module will receive data only from admin website. It can either do it by polling a shared folder where the data will already be stored by admin website or it can directly get the data from admin website - storing in shared folder is preferable option.
- This module will also convert heavy tif images into light-weight jpg/images after the component has already extracted data from those images. These converted light weight images will then be sent to backend which will then push it to front-end for display.
- This module is also required to be performance conscious, specifically I/O performance.
3. Data Details
We have two types of input data:
- Image Data : GeoTiff images which has geo-encoded information
- Text Data: Excel files containing data (both xls and xlsx formats) and dbf files containig metadata.
The data is in very large amounts. The images will always be GeoTiff. The data management module will have the component which will read all this data and convert into single canonical structure. The component design document will provide information about this process.
4. Technology Overview
1. The Android App front-end has been developed for Android 4.1-4.4 support using android developer tools. It also uses Google Maps Android API v2, AChartEngine 1.1.0 and ProgressWheel.
2. The Data Standardization component (to be embedded in data management module) is developed in Java.
3. The Admin Website is a CMS developed using Django framework 1.6.1, Python 2.7x and PIL 1.1.7.
4. We are open to use of any technology for the data management module as far as it does not limit any specific OS requirements.
5. We plan to host the data management module along with the admin website on a single (same) server - ex. Amazon EC2 Server.
6. Open source software resources are welcome, but they must have third party support services available. Please ask in forum when in doubt.
7. You are allowed to use any open source DB, data storage mechanisms for storing data at various stages.
8. Image data is being read using gdal Library. You are welcome to suggest/use other libraries but remember the component for data conversion uses this library
9. Please note that the developers are not allowed to use any component from TC Catalog for this contest.
4. Process Flow and Storage Considerations
- We will use three main storage modules: Temporary Storage, Shared Storage and Back-End Storage.
- The admin website will upload all kinds of data and that data will be stored in the Temporary storage. Once admin website finishes uploading data, it will send a wake-up call to data management module (DMM) to process the newly uploaded data. This is to make sure that the data processing does not start while data is still uploading leading to dirty reads and corrupt processing.
- Once DMM receives wake-up call, it will read data from Temporary storage and start data validation and data conversion.
- For the data files which do not get validated, DMM will send an email to the admin listing about all the data files that failed to get validated.
- For the data files which gets validated and processed correctly, the converted data output will be stored in Back-end Storage which will be further used by back-end module (to be developed in next contest).
- Also, the original data files, whose conversion was successful, should be moved to shared storage and removed from Temporary folder. This is to ensure that Temporary storage always have new data yet to be processed.
Please Note: Both the process flow and storage details are just high-level suggestion from our side. Architects are welcome and encouraged to suggest and propose any changes and updates as deemed necessary in this process flow. Also, architects are allowed to use any media for data storage like DB or files, etc. Please open a discussion in forum to clarify/confirm your ideas when in doubt.
5. Resources Provided
The following resources have been provided in the forums. You will be able to access it after registration:
1.) Conceptualization Document
2.) EPA Admin Website Prototype
3.) EPA Android App Data Standardization Design Document.
4.) System Design Specification
5.) System Architecture TCUML
6.) Application Requirements Specification and Use Case TCUML
Please Note: In case of discrepancies, the System Architecture and ARS takes precedence over all other resources as they are the most update one. Please follow them and if there is still confusion, please ask in forums.