Key Information

Register
Submit
The challenge is finished.

Challenge Overview

Challenge Overview

Our client wants to optimize their products offerings by consolidating their current products set.  Your job is to analyze the product, and product usage stats to show how to identify a better product set.

 

The data is comprised of products and benefits data (clients buy products that have a set of benefits). Other data sources are summary claims data (monthly statistics of benefits usage) and exposure data (monthly statistics on potential benefits usage)

 

The ideal outcome of this challenge is a paper containing an analysis of the input data and recommendations on how to create the consolidated products set.

NOTE: Checkpoint prizes available!! If you submit by Sunday 04/21 23:59 EDT you can win one of 4 $100 checkpoint prizes and, more importantly, get early feedback on your submission before working on the final submission. Checkpoint submission has to contain at least an overview of data analysis and an abstract of further work on the data - the more details you include the more useful the feedback will be.
 

Background

Our client offers a variety of insurance products to large organizations who opt for the one that best suits their needs. This is often achieved through customization of product details and these unique products are then administered and managed. This has resulted in the following challenges:

  • Variation of processes across the organization

  • Increased complexity in providing customer service

  • Reduced ability to self-serve

  • An inefficient process of claims payment and pricing models

  • Difficulty in forecasting and providing optionality to customers

 

The goal of Coverage Optimization is to:

  • Analyze the variability of parent products with reference to the unique child products formed due to benefits customizations. This analysis will be used to assist the client in the definition of a target, consolidated product set that can represent a standard set of offerings with ‘configurable’ options

  • Recommend options for this envisioned consolidated product set along with justifications. This will include impact analysis based on utilization and current state product variability

  • Suggest a hierarchal and categorical benefit design with measurable reduction in variability of benefit language, resulting in standardization of offerings and optimal benefit package

  • Increase simplicity of standard benefit options while suggesting configurability options for benefits

  • Offer an applicable algorithm/process that is easily repeatable on similarly structured but different data that

    •   Identifies new answers/benefits and aligns them to the suggested benefits appropriately, or

    • Signals the need for a new standard benefit option

 

Data Description

The available dataset is modestly large -- a couple of GB for all the data sources. You will have access to the entire data set. Check the sample and metadata file available in the forums for a complete definition of all data fields

There are three data sources:

  1. Benefits data

  2. Summary Claims data

  3. Exposure data

 

Here is a definition of some terms that will help with understanding the provided data:

  • Benefit is coverage for various health care services

  • Claim is the actual usage of the benefit (ex MRI, XRay scan, etc)

  • Coverage code is a unique identifier for a set of benefits provided as a group of services with actual start/end dates for the coverage

  • Product consists of a set of coverage codes (and hence benefits) and is used internally to align coverage codes to internal rules and procedures.

 

Benefits data is the core data set for this challenge. Here is how this data is generated. Benefits configuration is arranged in a question/answer format on a website. The benefit has a hard coded question, and then several types of available answers to round out the question.  For example, a question might be “Is this is High Deductible Health Plan?” and the user might have a choice of 2 check boxes, radio buttons, or a drop down with Yes/No toggles. The answer then becomes the statement combination of the Q&A, leaving “No, this is not a high deductible health plan.”  Another examples the Question might be “The out of pocket maximum is:” and the user enters “$3000”, leaving the answer to be “The OPM is $3000.” Or finally the Answer might be “Enter additional comments here” in which the user might enter free form text and that free form text becomes the answer.  These sets of answers are rolled up to a form all of the benefits for a specific coverage code.

 

The actual data set contains flat data records that:

  • List the benefits (the answer column) and question identifiers (sequence_id)

  • Connect benefits to coverage codes

  • Connect coverage codes to internal products

  • Start/End date for the coverage

And also these useful columns:

  • Type_of_tag - information about the type of field presented in the software - radio button, checkbox, text input, dropdown

  • Value flag - this is only populated for records where type of tag is text - It denotes whether the user entry field is a text field (meaning all open free form text allowed) or it is a numeric field, meaning the answer may contain some text that is automatically generated by the software, but the user can only enter a numeric value.

  • Top50_flag - This just denotes that the coverage code is for a very important client

 

Exposure data contains statistics on exposure duration (ex if a coverage contract lasts for 1 month - exposure for one plan member is 1). It connects to the benefit file using coverage codes.  This is a count of membership, of each coverage code. This explains the amount of people utilizing benefits for that coverage code. It should be used to investigate if there is the possibility that the popularity of a coverage code might signal popularity of the benefits within, or weight the importance of a specific plan design because it has greater impact to a greater number of people.

 

Summary claims data contains monthly statistics data about the claims made under different coverage codes (this is the largest data source - the actual usage of the product set). For each month/year and each coverage code there is a simple statistics of total claims made, description of where the provider services were performed, diagnosis code, etc. It connects to the benefit file on coverage codes.  This data should be useful in assessing the types of services most popular under different coverage codes to see if there is a connection between specific benefit language and product design, or possibly see if any certain benefit design needs very high priority because they might be used often.



 

Tips about the data:

  • There are empty values, in some cases in important fields. Contestants should explain how to deal with those data.

  • Benefits are arranged in a hierarchical format.  The hierarchy goes: acct_class->segment->prod_name

Task Detail

In this challenge, the goal is to efficiently analyze the data, engineer the necessary features (if needed), and build an algorithm/process to make recommendations for the new product set.

 

The main idea is to use the benefits data (primary data source) together with exposure and summary claims data, to analyze the variability in the parent products (acct_class, segment) for different child products (prod_name) and use that info for recommending simplified products set. Main goal is the reduction of variability in the benefit data and the creation of simple, specific list of benefit offerings that are categorized logically in a hierarchical manner - goal is to have the answers on the benefit sheet arranged in such a way that syntax differences are removed while underlying purpose of the statement is kept intact.

Here are a few starting areas that you could explore, but it’s definitely expected you will expand the list:

  • Answer data - this one is obvious - since answers are the actual benefits, reducing the variability here will make the benefit offerings more consistent

  • Type of tag - Knowledge of whether a user who is coding benefits is using forced decision with standard language (radio button [R]) or a free form text field (Text [T]) would denote the amount of variation expected within that field and might help assess if there is not enough radio button options, therefore people are defaulting to text.  Additionally, these fields might make sense to provide recommendations for the type of field for the output algorithm. For example, we might have a categorical section in the solution that is called Deductible Waived, and the algorithm suggests a drop down with the available options of Wigs, Chiro, Chemo

  • Top 50 flag - It might be an important consideration when conceiving language to default to answer/benefit language that is more in line with our top 50 accounts, or to make sure that their common concerns are addressed and included in the final results.

  • Exposure data - check if there is the possibility that the popularity of a coverage code might signal popularity of the benefits within, or weight the importance of a specific plan design because it has greater impact to a greater number of people

 

The expected final deliverables are short white papers with data analysis (e.g., tables and figures to illustrate the analysis and recommendations), PoC codes (optional), and example evaluation results and potential benefits that would result from your recommendations. We are not looking to build the entire analysis, reports and recommendations - having the data analysis and detailed approach description that can be used in future challenges to actually build the system is what is expected in this challenge.

 

Submission Contents

A document with details for the data analysis and the proposed algorithm and/or a proof of concept solution, pseudo-code or any documentation/ previous research papers that helps illustrate proposal to create the final submission .

 

The final submission should be a report, more like a technical paper. It should include, but not limited to, the following contents. The client will judge the feasibility and the quality of your proposed approach.

  1. Title : Title of your idea

  2. Abstract / Description : High level overview / statement of your idea

    • Outline of your proposed approach

    • Outline of the approaches that you have considered and their pros and cons

    • Justify your final choice

  3. Details :

    • Detailed description. You must provide details of each step and details of how it should be implemented

      • Description of the entire mechanism

      • The advantage of your idea - why it could be better than others

      • If your idea includes some theory or known papers;

        • Reason why you chose

        • Details on how it will be used

        • Reference to the papers of the theory

      • Reasonings behind the feasibility of your idea

  4. Appendix(optional) :

    • Bibliography, A reference to the paper, etc.

Format

  • A document should be a minimum of 3 pages in PDF / Word format to describe your ideas.

  • It should be written in English.

  • Leveraging charts, diagrams, and tables to explain your ideas is encouraged from a comprehensive perspective.

Judging Criteria

You will be judged on the quality of your ideas, the quality of your description of the ideas, and how much benefit it can provide to the client. The winner will be chosen by the most logical and convincing reasoning as to how and why the idea presented will meet the objective. Note that, this contest will be judged subjectively by the client and Topcoder.

Submission Guideline

You can submit at most TWO solutions but we encourage you to include your great solution and details as much as possible in a single submission.

Supplementary materials

You will be able to download from the Google Drive link posted in the forum.



Final Submission Guidelines

See above

ELIGIBLE EVENTS:

Topcoder Open 2019

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30088354