Precision Medicine - Datasets
Description
The fields of genetics and bioinformatics are producing troves of data with enormous value to human health. Topcoder has made datasets associated with our work in this space accessible to allow our community to gain experience with the processes and challenges to working with this type of data.
Dataset Links
![[DS ANALYTICS - DATASETS PAGE] - {Precision Medicine} - 1000 Genomes Project Subsampling Data](http://images.ctfassets.net/b5f1djy59z3a/4OjM9Y70zoDr4vjTa58I9D/e47c36176b203a879ea103343ff305ec/Double-Helix-Model-Science-Minor-Groove-Dna-694798.jpg)
1000 Genomes Project Subsampling Data
Topcoder made use of a random sampling of data from the 1000 Genomes Project to create a Genotype matrix![[DS ANALYTICS - DATASETS PAGE] - {Precision Medicine} - Genotype Annotation Data](http://images.ctfassets.net/b5f1djy59z3a/ke5DBStXVXcNjQQk8YMbA/7368f4169dd34b69b14658dc637afde4/Deoxyribonucleic-Acid-Dna-Symbol-Dns-Genetics-1500071.jpg)
Genotype Annotation Data
Annotation data for the genotype matrix, showing 8 population types![[DS ANALYTICS - DATASETS PAGE] - {Precision Medicine} - Offline GWAS Tester Tool](http://images.ctfassets.net/b5f1djy59z3a/4h0DRsraRJkh5dN2FoaZH2/01ce8b50d6e8a286817633d1976db4ab/Microbiology-Biology-Gene-Dna-Analysis-Medicine-163466.jpg)
Offline GWAS Tester Tool
The offline testger can run your solution locally and calculate its raw score. We also provide the Java source code of the offline tester. You are allowed to introduce any modifications into the code in order to make it suit your needs better.![[DS ANALYTICS - DATASETS PAGE] - {Precision Medicine} - Connectivity Map: Landmark Gene Training Data](http://images.ctfassets.net/b5f1djy59z3a/3ycHck6KolmWzX1unCE3So/05e8fa27c15c78b0dfe0e1ad2cc2e3ef/1599px-DNA_sequencing.jpg)
Connectivity Map: Landmark Gene Training Data
100,000 samples of 12,320 genes. The first 970 rows are the landmark genes and the last 11,350 are the non-landmark genes.![[DS ANALYTICS - DATASETS PAGE] - {Precision Medicine} - Connectivity Map: Landmark Gene Testing Data](http://images.ctfassets.net/b5f1djy59z3a/18astuGSapNLJ4Uwk05Thr/511584af5e42f164e2c61707f2e88046/6946913993_68b1498c28_b.jpg)
Connectivity Map: Landmark Gene Testing Data
This testing data contains 1,650 samples with the landmark genes measured by L1000History of Our Work
![[DS ANALYTICS - DATASETS PAGE] - {Precision Medicine} - GWAS Speedup](http://images.ctfassets.net/b5f1djy59z3a/30AAoKYPlDhLaeprCu6LLi/1f9751dfb05f13ff5ca9722bfe388236/7410774572_622cabc913_b.jpg)
GWAS Speedup
Modeling associations between markers and phenotypes can be complex because of the need to take into account confounding covariates and model non-quantitative traits (e.g., case vs. control status). This contest focused on speeding up the logistic regression modeling that is the most computationally demanding component of many GWAS analyses.![[DS ANALYTICS - DATASETS PAGE] - {Precision Medicine} - CMAP Generation and Analysis of Gene Expression Signatures](http://images.ctfassets.net/b5f1djy59z3a/1bGLsHbKkADTXKvRFQ6gzp/cb1d2c6955711df63786817ccf79bcb9/figure2.png)
CMAP Generation and Analysis of Gene Expression Signatures
The goal of contest on was to maximize the accuracy of the inferred gene expression values while minimizing the number of measured gene expressions. Results will further expand horizons for computational biologists and scientists who seek to find drugs that cure diseases.![[DS ANALYTICS - DATASETS PAGE] - {Precision Medicine} - CMAP 2 - Acceleration of CMAP Algorithm](http://images.ctfassets.net/b5f1djy59z3a/7vnZqsz0bUraEAHUEYInb1/4a41c911046f65c634e9ee4f33a5b8f3/graph.png)
CMAP 2 - Acceleration of CMAP Algorithm
The second challenge in the 2016 Connectivity Map series asked competitors to speed up the existing CMAP algorithm.![[DS ANALYTICS - DATASETS PAGE] - {Precision Medicine} - CMAP3 - DPeak Challenge](http://images.ctfassets.net/b5f1djy59z3a/54Rb5e5QOH1xLfF5b1e3B9/04bbeee9a92c1cb13b94836e850f754c/deconvolution_contest_Fig2_Final_revised.png)
CMAP 3 - DPeak Challenge
In this challenge we sought to improve the speed and accuracy of ‘dpeak’, the algorithm that deconvolutes the composite expression signal into two values and associates them with the appropriate genes. These improvements will enable CMap to produce higher quality data more efficiently.Blogs and Discussions
![[DS ANALYTICS - DATASETS PAGE] - {Precision Medicine} - Topcoder Customer Stories - GWAS Algorithm Optimization](http://images.ctfassets.net/b5f1djy59z3a/2edZFouQtbYxB7KdTQamod/2d77682ba9b149c804fc930499c8895b/Genome-Wide-Association-hero.jpg)
Topcoder Customer Stories - GWAS Algorithm Optimization
A pharma company’s GWAS analysis solution had proven to be accurate, yet it hampered researches due to the long run time on each experiment they administered. They wanted to speed up the logistic regression modeling — the most computationally demanding component of many GWAS analyses — that determines which markers explain specific phenotypes.![[DS ANALYTICS - DATASETS PAGE] - {Precision Medicine} - Topcoder Customer Stories - DNA Sequencing Algorithm](http://images.ctfassets.net/b5f1djy59z3a/7fyBWjNzm51h1B3XxqFBLu/6b3d63ec5723e7bf11e938d508c2fabe/Harvard-Case-Study.jpg)
Topcoder Customer Stories - DNA Sequencing Algorithm
Renowned for its innovation in medical research and genomics, Harvard Medical School wanted to speed the process of standard DNA sequencing, which is essential for making precise, high-throughput readouts of the immune system. A full-time employee had worked for a year to optimize an algorithm that calculates the distance between DNA strings, but they wanted to see if more data scientists working on the problem could deliver even better results.![[DS ANALYTICS - DATASETS PAGE] - {Precision Medicine} - Oxford GigaScience](http://images.ctfassets.net/b5f1djy59z3a/4oqDFZs7hoekp3vtg2Wkkr/17ddb951825c88446a3b452d1acf7efd/Screen_Shot_2019-04-09_at_11.23.20_AM.png)