Key Information

Register
Submit
The challenge is finished.

Challenge Overview

Background

Bill & Melinda Gates Foundation (BMGF) Healthy Birth, Growth, and Development (HBGD) program addresses the dual problem of growth faltering/stunting and poor neurocognitive development, including contributing factors such as fetal growth restriction and preterm birth. The HBGD program is creating a unified strategy for integrated interventions to solve complex questions about (1) life cycle, (2) pathophysiology, (3) interventions, and (4) scaling intervention delivery. The HBGDki Open Innovation platform was developed to mobilize the global “unusual suspects” data science community to better understand how to improve neurocognitive and physical health for children worldwide. The data science contests are aimed at developing predictive models and tools that quantify geographic, regional, cultural, socioeconomic, and nutritional trends that contribute to poor neurocognitive and physical growth outcomes in children. The tools and scripts developed by this challenge will support the data analysis efforts of the HBGDki Open Innovation initiative.

Description

The Gates Foundation is hoping to develop capabilities to allow their SAS programmers to become more productive. In this challenge stream, we’re going to develop an application which can dynamically read source data from SAS binary files and external data files, and generate SAS scripts to read and transform those files. The first step in this process is to generate some SAS scripts as models for samples and validation purposes.

In this challenge, you’ll be validating SAS data files using a metadata file (i.e. ex01_DDF.csv) which describes the SAS import and transformation process of the data files in question. Here are the requirements:

1. You are required to implement a Java command line application. The application will receive as a parameter the path to the configuration file (a properties file). The required configuration values are described below. You may add other configuration values, if needed.

2. The application receives as input (configurable):
- the path to the folder where the SAS data files are (sas7bdat files). If the path points to a file instead, then only that file is validated.
- the path to the metadata file (CSV file). The CSV file has on the 1st column the name of the SAS data file to validate (without extension). It has on the 2nd column the name of the SAS data column to validate. And on the 3rd column the type of the data that is expected.

3. The application will retrieve the metadata of the SAS data file and validate it, by checking that the columns mentioned in the CSV metadata file are present in the data files and that they have the proper type. The SAS data files have these metadata properties for each column: label, name, length, type, format, informat.

4. The application will produce an output log file, which may contain success messages for all SAS data files, or validation error messages for each column that doesn't pass validation (missing column, wrong type), grouped by data file. There should also be error messages if the data files are missing.

The Java application should be well designed. It should only read the command line arguments in the main(..) method, then use proper object oriented programming to implement the functionality. It should also be well documented.

In the future we want the Java application to be able to process SAS data files and execute SAS scripts. Because of this, we would like to use the SAS tools as much as possible. For this task, we would like to see it working with SAS JDBC Driver (with configurable properties).



Final Submission Guidelines

- A Java application which works upon the data provided and generates the required output.
- A Deployment Guide.
- The data and metafiles files can be found in the Code Document forums attached to this challenge.
- SAS has a functional University Edition which can be used.

ELIGIBLE EVENTS:

2017 TopCoder(R) Open

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30055388