November 25, 2020 How the CDC Uses Crowdsourcing and Machine Learning to Improve Public Health
Welcome back to Uprisor, our series of technology conversations centered on the future of work. In this episode, the Centers for Disease Control (CDC)’s Carlos Siordia and Stephen Bertke join Topcoder VP Clinton Bonner to discuss their philosophy, their work, and why they choose to take problems to the crowd. Carlos is a socio-spatial epidemiologist who uses machine learning and crowdsourcing to explore data, and Stephen is a mathematical statistician.
Enjoy the conversation and a check out highlights below.
The CenterS for Disease Control
Instead of one singular agency, the CDC is a collection of many different institutions each with their own niche related to public health. The CDC’s mission is to conduct research into public health issues and to detect and respond to emerging health threats. Stephen works for the National Institute for Occupational Safety and Health (NIOSH), whose focus is collecting information used to study ways to improve people’s lives at work, and to prevent workplace-caused injuries and illness. As a socio-spatial epidemiologist, Carlos seeks to understand the ways in which social, political, cultural, and economic circumstances influence our chances for a healthy life.
“We put science and advanced technology into action to prevent disease.” —Carlos Siordia
Science belongs to everyone
What made the CDC open to using crowdsourcing and on demand talent? They were responding to recommendations from the National Academy of Sciences, who also mentioned investing more time and resources into machine learning technologies. According to Stephen, “The idea of increasing your sample size, of getting more hands involved and more people involved with different backgrounds, viewpoints, and experiences, just really made sense.” Carlos adds that, “I’ve always been, and I think Steve and many of the other colleagues on our team, motivated by the idea that science belongs to everyone.”
Developing an algorithm to help improve workplace safety
Carlos and Stephen partnered with Topcoder on a project to improve the efficiency and accuracy of an automated coding algorithm for injury data. One of NIOSH’s core activities is analyzing occupational injury surveillance data, allowing them to understand trends and prioritize research and prevention efforts. A key part of this data comes from “injury narratives”— text-heavy, open-ended records from employers, medical records, or workers’ compensation reports. Through machine learning text classification, algorithms can “read” these injury narratives and data can be coded in a fraction of the time.
Carlos went on to explain how vital the problem definition phase of a data science challenge is. “I think you should spend about 99% of your time on it,” he says. “If you can figure out what the problem is that you are trying to solve, then almost everything else will fall in line.” Thank-you to Stephen and Carlos for taking the time to offer their insights to the Uprisor audience.
“We improved our coding system from about low 80s to almost 90% accuracy. This may sound small, but when you’re talking about tens or hundreds of thousands of claims, each percentage results in a significant savings in time.” —Stephen Bertke