Developing an Algorithm to Help the CDC Improve
Workplace Safety

“The high-quality submissions achieved nearly 90% accuracy, which surpassed the 87% accuracy goal achieved during the internal competition.”

NIOSH Science Blog


The National Institute for Occupational Safety and Health (NIOSH) is responsible for conducting research to reduce worker injuries and illnesses in the United States. One of NIOSH’s core activities is analyzing occupational injury surveillance data, allowing them to understand injury trends and prioritize research and prevention efforts. For NIOSH, one of the main elements of this critical data comes in the form of “injury narratives”— text-heavy, open-ended records from employers, medical records, or workers’ compensation reports to specify how an employee was injured (e.g., “tripped over a chain”).

To transform these records into data for analysis, NIOSH assigns a standard, numeric code to each record to represent the event that likely caused the injury or illness. This requires significant people hours, manual workflows, and hyper-redundant processes. A report by the National Academies of Science (NAS) urged NIOSH to adopt machine learning and AI-enabled solutions to help them automate some of their surveillance systems.


Through machine learning text classification, algorithms can be developed to “read” these injury narratives, and data can be coded in these surveillance systems in a fraction of the time. NIOSH, as part of the Centers for Disease Control and Prevention (CDC), along with NASA, worked with Topcoder to help them develop a Natural Language Processing (NLP) algorithm that would automatically classify these narratives. Prior to the Topcoder challenge, team members at the CDC were able to create an algorithm with 87% accuracy — a 6% improvement from the baseline.

In October 2019, NIOSH and NASA teamed up to run a challenge that leveraged Topcoder’s immense Data Science & Machine Learning Community. Submissions poured in from over 26 countries, and the winning algorithm achieved near-90% accuracy.

“Everyone involved benefits from faster processing times, reduced operational overhead and decreased coding errors. Process improvements will be instrumental in efforts to enhance overall worker safety and health.”

Mike Morris, CEO, Topcoder


NIOSH and the CDC took a cost and labor intensive process—manually coding occupational injury surveillance data— and improved it significantly. Through their contract with Topcoder, they were able to improve efficiency, which resulted in fewer coding errors, improved coding consistency, and faster data availability to researchers within and outside of NIOSH. With this improved data processing, much more timely data are available to identify and justify targeted efforts needed to prevent work-related injuries.