Coverage Optimization - Variability Reduction Challenge








    Next Deadline: Review
    1d 7h until current deadline ends
    Show Deadlinesicon-arrow-up

    Challenge Overview

    Challenge Objectives

    • Develop a tool to clean up, parametrize and group benefits data

    • Create simple reports in Excel

    Project Background

    • Our client, Morel, wants to optimize their products offerings by consolidating their current products set.

    • In this challenge, we’ll build a tool that cleans the raw benefits data, parametrizes and groups related benefits and outputs simple Excel reports. This effectively reduces the variability of the benefits data

    • A parallel challenge is working on a different subset of the requirements - creating benefits hierarchy and creating a consolidated product set.

    • Future challenges will integrate the outputs of these two challenges, improve reporting and create the final product set recommendations



    Our client offers a variety of insurance products to large organizations who opt for the one that best suits their needs. This is often achieved through customization of product details and these unique products are then administered and managed. This has resulted in the following challenges:

    • Variation of processes across the organization

    • Increased complexity in providing customer service

    • Reduced ability to self-serve

    • An inefficient process of claims payment and pricing models

    • Difficulty in forecasting and providing optionality to customers


    The goal of Coverage Optimization is to:

    • Analyze the variability of parent products with reference to the unique child products formed due to benefits customizations. This analysis will be used to assist Morel in the definition of a target, consolidated product set that can represent a standard set of offerings with ‘configurable’ options

    • Recommend options for this envisioned consolidated product set along with justifications. This will include impact analysis based on utilization and current state product variability

    • Suggest a hierarchal and categorical benefit design with measurable reduction in variability of benefit language, resulting in standardization of offerings and optimal benefit package

    • Increase simplicity of standard benefit options while suggesting configurability options for benefits

    • Offer an applicable algorithm/process that is easily repeatable on similarly structured but different data that

      • Identifies new answers/benefits and aligns them to the suggested benefits appropriately, or

      • Signals the need for a new standard benefit option

    Data Description

    The available dataset is not very large (~70MB). You will have access to the entire data set. Check the sample and metadata file available in the forums for a complete definition of all data fields.


    Here is a definition of some terms that will help with understanding the provided data:

    • Benefit is coverage for various health care services

    • Coverage code is a unique identifier for a set of benefits provided as a group of services with actual start/end dates for the coverage

    • Product consists of a set of coverage codes (and hence benefits) and is used internally to align coverage codes to internal rules and procedures.


    Benefits data is the core data set for this challenge. Here is how this data is generated. Benefits configuration is arranged in a question/answer format on a website. The benefit has a hard coded question, and then several types of available answers to round out the question.  For example, a question might be “Is this is High Deductible Health Plan?” and the user might have a choice of 2 check boxes, radio buttons, or a drop down with Yes/No toggles. The answer then becomes the statement combination of the Q&A, leaving “No, this is not a high deductible health plan.”  Another examples the Question might be “The out of pocket maximum is:” and the user enters “$3000”, leaving the answer to be “The OPM is $3000.” Or finally the Answer might be “Enter additional comments here” in which the user might enter free form text and that free form text becomes the answer.  These sets of answers are rolled up to a form all of the benefits for a specific coverage code.


    The actual data set contains flat data records that:

    • List the benefits (the answer column) and question identifiers (sequence_id)

    • Connect benefits to coverage codes

    • Connect coverage codes to internal products

    • Start/End date for the coverage

    And also these useful columns:

    • Type_of_tag - information about the type of field presented in the software - radio button, checkbox, text input, dropdown

    • Value flag - this is only populated for records where type of tag is text - It denotes whether the user entry field is a text field (meaning all open free form text allowed) or it is a numeric field, meaning the answer may contain some text that is automatically generated by the software, but the user can only enter a numeric value.

    • Top50_flag - This just denotes that the coverage code is for a very important client

    Technology Stack

    • Python

    • Excel


    Code access


    We’re starting a new codebase, so you should create the project structure.

    Winning submission of the ideation challenge is available in the forums - it contains the details of what we’re trying to build in this project. You should read that document before continuing with the individual requirements section.


    Individual requirements


    In this challenge we will focus on grouping the individual benefits (answer column) within each benefit_class and answer_tag.


    Output of this challenge is a Python tool (CLI) that:

    • Reads the benefits data file

    • Cleans the answer data, parametrizes the answers for each benefit class and answer tag and groups the benefits as described in the ideation challenge document

    • Creates the output reports for each benefit class

    The main requirement here is parametrizing and grouping the answers - and you are NOT limited to the methods described in the ideation document (there are quite a lot of ideas and suggestions in the document, but still some cases of benefits that could be grouped are missed). It is up to you to improve the parametrization and grouping of the benefits as you see fit and this will be the a requirement that will get you the most points during review.

    There is no objective score/metric for benefits grouping that we can use here for review so the reviews will be manual and based on the output for each of the benefit classes and suggested grouping.

    In addition to the reports mentioned in the ideation document, the tool should print a global statistics on total number of unique answers to all tags and total number of unique answers after the grouping. That said, don’t try to achieve the lowest number of benefits after grouping with artificial improvements that rely exclusively on human decision (ex hardcoded benefit texts)

    Pay special attention to the possibilities of grouping benefits in “Additional Information” benefit class - this is the class that has the highest variability in the answers and the ideation document does not provide too many details for parameterizing these answers.


    Create a README file with details on how to deploy and verify the tool. Unit testing is out of scope. Code style will not be a major factor, but make sure your code follows the PEP-8 guidelines and is split into modules - don’t put everything into one giant module.


    Review will be a combination of internal Topcoder and client reviews.

    What To Submit


    Submit the full source code

    Submit the build/verification documentation

    Submit a short demo video and sample outputs of the tool

    Final Submission Guidelines

    See above

    Reliability Rating and Bonus

    For challenges that have a reliability bonus, the bonus depends on the reliability rating at the moment of registration for that project. A participant with no previous projects is considered to have no reliability rating, and therefore gets no bonus. Reliability bonus does not apply to Digital Run winnings. Since reliability rating is based on the past 15 projects, it can only have 15 discrete values.
    Read more.


    Final Review:

    Community Review Board


    User Sign-Off


    Review Scorecard