Su et al introduced GATE, a Graph-Attention Augmented Temporal Neural Network for Medication Recommendation: https://ieeexplore.ieee.org/document/9134772
The task of this challenge is to read the paper and independently implement GATE, using the dataset given in this challenge.
However, instead of predicting a prescription match, you must predict diagnosis, and do so upon the earliest possible visit. For example, if a patient in the training set visits twice, and is diagnosed with a new condition the second time, your algorithm's goal is to predict that diagnosis the first visit. For this reason, your solution will be tested on multi-admission data only (about 6,300 cases).
The purpose of this Code Sprint
The purpose of this Sprint is to build a good "first-pass" implementation of this model. The best solution is a solution that is stable, complete, well-documented, and explainable. In particular, the stakeholders are interested in studying the correlations that will be modeled by the graph.
Given zipped data file (file name: diag_med_Table_1_1_gnn) contains below columns:
- personid: patient id
- diag_drug_code: diagnosis or drug code
- diag_drug_date: date when the diag_drug_code occurs
- target (0-patient alive , 1-patient die from suicide)
Inclusion/Exclusion Criterion for Data:
- Last week of data before death_date and any data after death_date need to be removed for patient whose target==1
- Most recent encounter needs to be within 12 months before death date for patient whose target==1 or before the last diagnosis for patient whose target==0. Patients not meeting this criterion need to be removed.
Input Variable for Model:
Output/Target Variable of Model:
- It needs to be determined by the attendees. It does not need to be the target column in the data file.
From the given research paper (https://ieeexplore.ieee.org/document/9134772) use equation (6) instead of equation (7) as the attention coefficient so that these attention coefficient are trainable
- Checkpoint Submission - 2 weeks
- Final Submission - 4 weeks
- We would like to have mid point check after 2 weeks of start of submission phase (checkpoint submission deadline: 20th Mar 12:00 EDT).
- We are also offering an additional $100 bonus to any checkpoint submitter who wins a final placement.
- Predictive Model with performance metrics.
- Identify the top 10 important medical codes that are most correlated to suicidal death using attention coefficient calculated by equation (6) in the paper.
- Visualize them in graphs like the ones in Figure 7 in the paper. The central node is suicidal death and the 10 most correlated nodes are connected both to the central node and each other. The correlation intensity is represented by the width of the edge.
Final Submission Guidelines
- Professional implementation of the GATE algorithm described in the paper as modified above
- Documentation. It must be very good, and complete, and articulated in a README file
- A competent 3rd party can build/run your code on the first try
- Adaptable - the model will consume a different dataset than the one used to build it, so, submissions with clear instructions for changing the dataset and re-training it will be ranked higher
- Performance - roughly match the performance described in the paper
- Submit a zip file
- You may use Python, Java, or C++. If you wish to use another language please request permission first.
- No need to transform drug coding.
- Provide a README that describes how to run and build and retrain your project.
- Your code must build (if relevant) and run. The copilot will make only one attempt to compile (if relevant) and run your code. It is in your best interest to provide excellent documentation.
- If the copilot cannot get your submission to run with the supporting material provided, your solution may be disqualified.
- Extra consideration will be given to submissions that can include a GUI to visualize the graph results.