Key Information

Register
Submit
The challenge is finished.

Challenge Overview

We want you to speed up a bioinformatics algorithm currently implemented in R. Using the existing database, the current implementation takes many hours to run.  Our ultimate goal is to decrease the overall computation time. To that end, we want you to convert the R code to C++ and resolve any package dependencies. You may use any open source libraries that are available under a license that allows their use in proprietary software packages free of charge. Examples that qualify are LGPL or BSD-like licenses. Libraries with GNU GPLv3 licenses are NOT acceptable.  If in doubt about a particular license, please contact us. Optimized open source C++ libraries that are easy to install on Mac and Linux are highly encouraged.

Algorithm Introduction

The algorithm is used for assessing the statistical significance of regulator (non-evidence) nodes in a Bayesian network constructed from bio-molecular interactions.  In brief the algorithm uses a network of biological interactions and a directed graph is created with upstream nodes as “regulators” and downstream nodes as genes (evidence nodes). Gene expression data is matched against this Bayesian network to identify the “context” of the experiment and a Gibbs sampler is run to sample the joint distribution of the upstream regulators.  Finally the marginal probabilities are computed for each regulator and the results are processed and filtered to only include significant probabilities.

Requirements

The correctness of the output of the algorithm in C++ will be judged by following criteria, comparing to the original R implementation:.

  1. The same hypotheses must come up as significant in both versions.

  2. The marginal probabilities may not be identical, but should not differ by more than 1e-2.

We will also check the convergence of the algorithm as follows:

  1. In additional simulation runs, the known simulated regulator hypothesis must come up with high probability.

  2. We will inspect fluctuations in marginal probabilities to be minimal towards the end of the algorithm.

Multi-threading techniques are allowed. However, you need to provide a parameter to control the number of threads and we will evaluate submissions using only 4 threads. It is worth noting that running independent parallel chains of the Gibbs samples is not the way to go about speeding this up. Even if you run 100 chains independently each has to converge before averaging. You may get a way by reducing the number of samples a bit, but not by too much.

Finally, please note that we will use the final code on internal, proprietary networks that will vary in size and complexity. You should not attempt speed optimizations that strongly rely on the sparsity or size of the provided example networks.

Challenge Inputs

You will be provided the R code, a background paper, and some test cases.

Testing and Prize

We will evaluate your submissions based on speed and correctness using different datasets. We will test submissions and report their speed in the forum. The first correct submission with at least 10 times speedup wins $300 as a milestone prize. In the end of the contest, the fastest accurate submission wins the prize of $1200. We will break ties (if the speedups are quite close) using their submission times. In a tie, the earlier submission wins.  Final submissions will be validated on an internal run environment and machine configuration (state configurations) and a significantly larger test data set to validate the algorithm.

Testing Environment

Final testing will be done on an Intel(R) Xeon(R) CPU X5660 @ 2.80GHz (12 cores total) running Red Hat Enterprise Linux Server release 6.4 (Santiago).


Final Submission Guidelines

Please post your questions in our forum.

You should submit all your codes together with necessary documents to briefly introduce your solution and tell us how to run your code. A Makefile is required in your submission to make testing easy, as well as a step-by-step deployment of the open source libraries you have included in your code.

 

ELIGIBLE EVENTS:

2016 TopCoder(R) Open

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30051879