Join, learn, and compete for $23,000 in prizes!
CMap has generated the world’s largest gene expression dataset to date. A key factor in enabling this work is the practice of measuring the expression of two genes using the same physical material, thus dramatically reducing the costs and increasing the throughput of data generation. In the third challenge in the CMap series, we seek to improve the speed and accuracy of ‘dpeak’, the algorithm that deconvolutes the composite expression signal into two values and associates them with the appropriate genes. These improvements will enable CMap to produce higher quality data more efficiently.
To learn more about CMap, visit clue.io.
For the purpose of this challenge, the core CMap technology can be described as follows.
In a single experiment, CMap makes 488 measurements. Each measurement produces an intensity histogram (a vector of integers), which characterizes expression of two distinct genes in the sample (for a total of 488 x 2 = 976 genes). In the ideal case, each histogram consists of two peaks (see Figure), each corresponding to a single gene. The genes are mixed in 2:1 ratio, thus the areas under the peaks have 2:1 ratio, which allows us to associate each peak with the specific gene. The median position of each peak corresponds to the gene’s expression level, and that's what you need to determine in this challenge.