Stunting (shortness for age) affects more than one in four children worldwide. Wasting (under-weightedness) and stunting in early childhood is associated with lethargy, reduced levels of play, an increased risk of early death, higher burden of disease, compromised physical capacities, and diminished cognitive development. Stunting and wasting in the first two years of life have been shown to be associated with lower school attainment and reduced economic productivity. This can reduce the productivity of an entire generation. Furthermore, stunting between 12 and 36 months is also linked to poor cognitive performance and/or lower school grades in middle childhood, and both height and head circumference at 2 years were shown to be inversely associated with educational attainment.
The ability to predict, at birth and early in childhood, whether a child is on an appropriate growth trajectory will help initiate preventive or therapeutic interventions leading to good cognitive growth and development outcomes as determined by school performance and a thriving child- and adulthood.
Our goal is to determine a combination of early measures that would be a good predictor for recumbent length (length of child measured while child is lying down, cm), weight (kg), and head circumference (cm). In pursuit of this goal, we have collected time series measurements of child growth, and family trait data (mother���s age, mother���s height, number of previous pregnancies, breast-feeding practices, and father���s height). We would like you to use this data to predict a child���s weight, recumbent length, weight, and head circumference in the attached dataset where values have been censored.
You may download the learning data set from here. The format for the data in the data set is a csv with details provided below:
Column Variable Type Label/Description DV for prediction 1 UID int Unique child ID No 2 AGEDAYS float Age sinc birth at examination (days) No 3 GAGEDAYS float Gestational age at examination (days) No 4 SEX int Sex, 1 = Male, 2 = Female No 5 MUACCM float Mid upper-arm circumference (cm) No 6 SFTMM float Skinfold thickness (mm) No 7 BFED int Child breast fed at time of visist No 8 WEAN int Child being weaned at time of visit No 9 GAGEBRTH float Gestational age at birth in days No 10 MAGE float Maternal age at examination (years) No 11 MHTCM float Maternal height (cm) No 12 MPARITY int Maternal parity No 13 FHTCM float Fathers height (cm) No 14 WTKG float Weight (kg) Yes 15 LENCM float Recumbent length (cm) Yes 16 HCIRCM float Head circumference (cm) Yes
Each child is designated
during early childhood growth (with the time variable provided as Age since birth in days [column 2] and age since conception in days [column 3] . The value ���.��� in any cell implies that that value has not been measured and is therefore not available.
An example of measurements for a single child is given below:
UID AGEDAYS GAGEDAYS SEX MUACCM SFTMM BFED WEAN GAGEBRTH MAGE MHTCM MPARITY FHTCM WTKG LENCM HCIRCM 550 -1.356576074 -1.274154148 2 . . . . 1.614604045 0.112627355 -0.127853527 4 . -1.983209108 -1.682735629 -2.248066477 550 -0.865922259 -0.783913419 2 1.497903433 0.894389325 1 0 1.614604045 0.138831421 -0.127853527 4 . -0.179345397 -0.495300141 -0.56945552 550 -0.622766386 -0.540962261 2 1.776302755 2.593698371 . . 1.614604045 0.159518841 -0.127853527 4 . 0.18942477 -0.050011833 -0.223859146 550 -0.014876703 0.066415633 2 1.915502416 3.040884962 1 1 1.614604045 0.211926973 -0.127853527 4 . 0.731472486 0.533810615 0.31922087 550 0.22827917 0.309366791 2 1.358703772 2.593698371 1 1 1.614604045 0.233993555 -0.127853527 4 . 0.642612205 0.73171653 0.516704512
For each prediction (wi, li and ci), where at least one of the DV values is missing, the error from the true Weight, Recumbent length and Head circumference will be measured as the squared Mahalanobis distance,
where S-1 is the inverse of the sample covariance matrix calculated on the complete dataset.
inverseS = 11.90869495; inverseS = -7.523165469; inverseS = -4.11222794; inverseS = -7.523165469; inverseS = 13.5665806; inverseS = -4.742982596; inverseS = -4.11222794; inverseS = -4.742982596; inverseS = 8.669060303;
Scores will be calculated as a generalized R2 measure of fit. This is calculated as follows. The total sum of errors for the submission will be calculated as SSE = SUM(ei).
A baseline sum of squared error will be calculated by predicting the sample means for each measurement, where at least one of the DV values is missing, that is the mean values of w, l and c for the current training set,
SSE0 = SUM(e0i)
Then the submission score will be Score = 1000000 * MAX(1 - SSE/SSE0, 0).
In the string trainingData, each string states a record of some measurement, and has 16 tokens, comma-separated, in the same order as described above in the table. As before missing values for non-DV variables are presented as ���.��� strings. You can assume that in trainingData all DV values are present. The format of testingData is almost the same as the trainingData. The only difference is that some of the DV values are also replaced by ���.��� strings, therefore your task will be to predict them. Replacement goes in the following way:
N = number of time points for an ID X = random between 0 and N/2 inclusive Y = random between X and N inclusive foreach time point W(1..N) for an ID if W <= X then all three DV values present else if W <= Y then 'c' is replaced by "." else all three DV values are replaced by "."
The data with same IDs are consecutive and ordered by Agedays (time point). The returned string should contain the corresponding predictions for weight, recumbent length and head circumference of the child, in this particular order, comma-separated, for each time point, in the same order as it is in testingData. The length of the return array equals to the number of measurements.
NOTE: All data values are normalized between -6 and 6 as part of data obfuscation requirements.
Notes on Data Set Generation
|-||The time limit is 5 minutes. The memory limit is 2048 megabytes.|
|-||The compilation time limit is 30 seconds. You can find information about compilers that we use and compilation options here.|
|-||Code snippets for calculate score and generate test case.|
|-||There are 10 example test cases and 100 full submission (provisional) test cases.|
This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2020, TopCoder, Inc. All rights reserved.