Coverage Optimization - Frequency Analysis Challenge








    Next Deadline: Submission
    5h 49mins until current deadline ends
    Show Deadlinesicon-arrow-up

    Challenge Overview

    Challenge Objectives

    • Develop a tool to merge benefit classes and coverage codes

    • Create simple reports in Excel


    Project Background

    • Our client wants to optimize their products offerings by consolidating their current products set.

    • In this challenge we’ll build a tool to merge related coverage codes and suggest coverage codes hierarchy to create a consolidated product set.

    • A parallel challenge is working on a different subset of the requirements - building a tool that cleans the raw benefits data, parametrizes and groups related benefits and outputs simple Excel reports. This effectively reduces the variability of the benefits data

    • Future challenges will integrate the outputs of these two challenges, improve reporting and create the final product set recommendations



    Our client offers a variety of insurance products to large organizations who opt for the one that best suits their needs. This is often achieved through customization of product details and these unique products are then administered and managed. This has resulted in the following challenges:

    • Variation of processes across the organization

    • Increased complexity in providing customer service

    • Reduced ability to self-serve

    • An inefficient process of claims payment and pricing models

    • Difficulty in forecasting and providing optionality to customers


    The goal of Coverage Optimization is to:

    • Analyze the variability of parent products with reference to the unique child products formed due to benefits customizations. This analysis will be used to assist Morel in the definition of a target, consolidated product set that can represent a standard set of offerings with ‘configurable’ options

    • Recommend options for this envisioned consolidated product set along with justifications. This will include impact analysis based on utilization and current state product variability

    • Suggest a hierarchal and categorical benefit design with measurable reduction in variability of benefit language, resulting in standardization of offerings and optimal benefit package

    • Increase simplicity of standard benefit options while suggesting configurability options for benefits

    • Offer an applicable algorithm/process that is easily repeatable on similarly structured but different data that

      • Identifies new answers/benefits and aligns them to the suggested benefits appropriately, or

      • Signals the need for a new standard benefit option

    Data Description

    The available dataset is not very large (~70MB). You will have access to the entire data set. Check the sample and metadata file available in the forums for a complete definition of all data fields.


    Here is a definition of some terms that will help with understanding the provided data:

    • Benefit is coverage for various health care services

    • Coverage code is a unique identifier for a set of benefits provided as a group of services with actual start/end dates for the coverage

    • Product consists of a set of coverage codes (and hence benefits) and is used internally to align coverage codes to internal rules and procedures.


    Benefits data is the core data set for this challenge. Here is how this data is generated. Benefits configuration is arranged in a question/answer format on a website. The benefit has a hard coded question, and then several types of available answers to round out the question.  For example, a question might be “Is this is High Deductible Health Plan?” and the user might have a choice of 2 check boxes, radio buttons, or a drop down with Yes/No toggles. The answer then becomes the statement combination of the Q&A, leaving “No, this is not a high deductible health plan.”  Another examples the Question might be “The out of pocket maximum is:” and the user enters “$3000”, leaving the answer to be “The OPM is $3000.” Or finally the Answer might be “Enter additional comments here” in which the user might enter free form text and that free form text becomes the answer.  These sets of answers are rolled up to a form all of the benefits for a specific coverage code.


    The actual data set contains flat data records that:

    • List the benefits (the answer column) and question identifiers (sequence_id)

    • Connect benefits to coverage codes

    • Connect coverage codes to internal products

    • Start/End date for the coverage

    And also these useful columns:

    • Type_of_tag - information about the type of field presented in the software - radio button, checkbox, text input, dropdown

    • Value flag - this is only populated for records where type of tag is text - It denotes whether the user entry field is a text field (meaning all open free form text allowed) or it is a numeric field, meaning the answer may contain some text that is automatically generated by the software, but the user can only enter a numeric value.

    • Top50_flag - This just denotes that the coverage code is for a very important client


    Technology Stack

    • Python

    • Excel


    Code access


    We’re starting a new codebase, so you should create the project structure.

    Winning submission of the ideation challenge is available in the forums - it contains the details of what we’re trying to build in this project. You should read that document before continuing with the individual requirements section (Part 1 section describes the goal of this challenge).


    Individual requirements


    In this challenge we will focus on analyzing the coverage codes, suggesting how to reorganize them to create a more unified product set, and how to create hierarchy of the coverage codes, benefit classes and products.


    Output of this challenge is a Python tool (CLI) that:

    • Reads the benefits data file

    • Cleans the benefits data, analyzes the coverage codes set and

    • Creates the output report with suggestions on how to reorganize the coverage codes and details of the suggested coverage codes hierarchy


    The main requirement here is reorganizing the coverage codes to create a more unified product set - and you are NOT limited to the methods described in the ideation document (see the second ideation document for some more ideas on grouping the coverage codes and creating hierarchy). It is up to you to improve method to create the new groups and hierarchy as you see fit and this will be the a requirement that will get you the most points during review.

    There is no objective score/metric for benefits grouping that we can use here for review so the reviews will be manual and based on the output of your algorithm.

    In addition to the above mentioned report, create the same benefits data set as the input, only with using your new proposed coverage codes grouping and hierarchy.

    For example if your tool comes up with a statistic that coverage codes C1 and C2 have 60% of common benefits and you suggest creating a new coverage code CBasic that has those common benefits and CNew1 and CNew2 that have distinct benefits, then the updated benefits data will contain CBasic, CNew1 and CNew2 and not C1 and C2 (this is just an example - your proposed hierarchy and grouping can be totally different/complex)

    When analyzing the coverage codes for similarity, the main data you should use is the answer tag and answer columns as well as coverage codes usage data (TC_MBRSHP_EXPOSURE_DATA file). Note that we’re not expecting a perfect result (for grouping the coverage codes) in this challenge since the benefits data is not clean and has a lot of variability. In a parallel challenge we are trying to standardize the benefits (answer column) and once that is implemented it should improve the performance of this algorithm (this will be handled in a follow-up challenge).


    Create a README file with details on how to deploy and verify the tool. Unit testing is out of scope. Code style will not be a major factor, but make sure your code follows the PEP-8 guidelines and is split into modules - don’t put everything into one giant module.


    What To Submit


    Submit the full source code

    Submit the build/verification documentation

    Submit a short demo video and sample outputs of the tool

    Final Submission Guidelines

    See above

    Reliability Rating and Bonus

    For challenges that have a reliability bonus, the bonus depends on the reliability rating at the moment of registration for that project. A participant with no previous projects is considered to have no reliability rating, and therefore gets no bonus. Reliability bonus does not apply to Digital Run winnings. Since reliability rating is based on the past 15 projects, it can only have 15 discrete values.
    Read more.


    Final Review:

    Community Review Board


    User Sign-Off


    Review Scorecard