ToxCast Prediction Challenge

Welcome to the EPA ToxCast Challenge. This challenge will develop an algorithm based on data provided by the EPA to help predict a chemical’s “systemic lowest effect level” from a traditional animal toxicity study..


The goal of this challenge is to develop a model based on data provided by the EPA to quantitatively predict a chemical’s systemic lowest effect level (LEL) in a traditional animal toxicity study. It is expected that the model will additionally produce a quantitative estimate of the uncertainty around the central estimate.

People are exposed to many man-made chemicals throughout their lives. These include food ingredients and additives, pesticides, cosmetics, medicines, cleaners, solvents, etc. Historically, a series of standard animal studies have been used as a means to evaluate whether a chemical can cause a range of different adverse effects and at what dose these effects occur. The term “systemic toxicity” is often used because the effects can occur in different organ systems such as the liver, kidney, lungs, or reproductive system. The systemic Lowest Effect Level or LEL is the lowest dose that shows adverse effects in these animal toxicity tests. The LEL is then conservatively adjusted in different ways by regulators to derive a value that can be used by the Agency to set exposure limits that are expected to be tolerated by the majority of the population.

Ideally, every chemical to which we are exposed would have a well-defined LEL. However, the full battery of animal studies required to estimate the LEL costs millions of dollars and takes many months to complete. As a result, thousands of chemicals lack the required data needed to estimate an LEL. To help fill this gap, the EPA has screened nearly 2,000 chemicals across a battery of more than 700 biochemical and cell-based in vitro assays to identify what proteins, pathways, and cellular processes these chemicals interact with and at what concentration they interact. The goal of this challenge is to develop an algorithm using data from high-throughput in vitro assays, chemical properties, or chemical structural descriptors to quantitatively predict a chemical’s systemic LEL. Chemicals causing toxicity through inhibition of acetylcholinesterase (a common mechanism for neurotoxicity) are excluded from the challenge since they may skew the LEL values and can be identified in other ways.

topcoder is also collaborating with InnoCentive for this challenge. Prior to the algorithm challenge, both topcoder and InnoCentive will be running idea generation challenges that will provide chemical structure libraries, combinations of the high-throughput in vitro assays, and the underlying scientific rationale that can be leveraged by the topcoder Community to help guide the algorithms. To participate in the InnoCentive challenges, please click here.

To find out more about ToxCast, please the visit Chemical Data Challenge page on

High Level Requirements

We need to create a model or set of models to predict a chemical’s systemic LEL. Solutions to the challenge will be scored on the following criteria;
  1. Strength of prediction as measured against the validation data set
  2. The fraction of the 1,800 included chemicals the method(s) are able to accurately score
  3. The scientific supportability of the performance of the prediction methods.
The prediction method must be embodied in a set of algorithms and a software system. The software system may be comprised of a set of programs. The algorithms and programs must be well-described, and must be available to EPA to directly use to verify the results.
    • The solution must produce an estimate of an LEL plus a confidence interval.
    • The input to the solution can include any or all of the following: chemical structure, chemical properties, in vitro assay data and reverse toxicokinetics data (RTK).  The RTK data allows extrapolation from a concentration in an in vitro assay to an administered dose.
    • A solution could rely on having a near complete experimental data set, or it could be purely computational and only need the chemical structure.
    • Chemicals causing toxicity through inhibition of acetylcholinesterase (a common mechanism for neurotoxicity) are excluded from the challenge.
    • The solution should include cross-validation analysis.
Predictions for a set of “validation” chemicals. These will be ones for which the complete input data set is available, but for which the EPA has held back the LEL data.

Challenge Restrictions

If you have a signed research partnership agreement with EPA which gave you access to pre-published ToxCast Phase II data (including assay summary activity files, assay description files, effect and endpoint data files from animal toxicity studies, concentration response data files & chemical library and structure files), you may be restricted from participation in some activities. Please take care to review the participation rules listed by each contest.

Contest Overview

Are you ready to compete?  Please click on a contest below to learn more!

Name Contest Type Start End
EPA ToxCast – Generating Chemical Structural Descriptors Idea Generation Contest Conceptualization / Idea Generation 12/12/2013 12/28/2013
EPA ToxCast – Describing High-Throughput Screening Assays Idea Generation Contest Conceptualization / Idea Generation 12/29/2013 1/12/2014