Stunting (i.e., shortness for age) affects more than one in four children worldwide. Wasting (i.e., being underweight for age) and stunting in early childhood are associated with lethargy, reduced levels of play, an increased risk of early death, higher burden of disease, compromised physical capacities, and diminished cognitive development. Stunting and wasting in the first two years of life have been shown to be associated with lower school attainment and reduced economic productivity. This can reduce the productivity of an entire generation. Furthermore, stunting between 12 and 36 months has also been linked to poor cognitive performance and/or lower school grades in middle childhood, and both height and head circumference at 2 years have been shown to be inversely associated with educational attainment.
In this challenge, we explore the link between cognitive development and child stuntedness. While stuntedness is known to correlate with poor cognitive development, we are interested in finding out if this is reversible. Are there children who were born stunted but nevertheless are able to successfully overcome their slow cognitive development rates later in life? Furthermore, do children born stunted who overcome their small size (perhaps by adequate nutrition) also show increased cognitive ability? Are there external factors (either inherited from the parents or environmental) which can contribute positively to this recovery?
In pursuit of the above, we have collected time series measurements of child growth and family trait data (e.g., mother���s age, mother���s height, number of previous pregnancies, breast-feeding practices, and father���s height). You will need to use this data to predict a child���s IQ measured at 7 years of age in the attached dataset, which contains censored values. To test our hypotheses, we would like to predict IQ in 3 scenarios:
You may download the learning data set here.
Data Set Description
Col# Name Type Description/Notes 1 subjid int Subject ID (In ascending order, not all values necessarily exist) 2 agedays int Age of child in days (Day 1 = day of birth) 3 wtkg float Weight (kg) 4 htcm float Standing height (cm) 5 lencm int Recumbent length (cm) 6 bmi float Body Mass Index (kg/m2) 7 waz float Weight for age Z-score (Per WHO algorithm) 8 haz float Height of age Z-score (Per WHO algorithm) 9 whz float Weight for height Z-score (Per WHO algorithm) 10 baz float BMI for age Z-score (Per WHO algorithm) 11 siteid int Investigation site ID (several values in the range 5-82) 12 sexn int Sex of the child (1 = Male, 2 = Female) 13 feedingn int Breast feeding category 1 = Exclusively breast fed 2 = Exclusively formula fed 3 = Mixture breast/formula fed 90 = Unknown 14 gagebrth int Gestational age at birth in days 15 birthwt int Birth weight (grams) 16 birthlen int Birth length (cm) 17 apgar1 int APGAR score at 1 minute post birth 18 apgar5 int APGAR score at 5 minutes post birth 19 mage int Maternal age at birth of child (years) 20 demo1n int Maternal demographic variable 1 (Nominal value 1 or 2) 21 mmaritn int Mother���s marital status 1 = Married 2 = Common law 3 = Separated 4 = Divorced 5 = Widowed 6 = Single 22 mcignum int Mother's # of cigarettes per day during pregnancy 23 parity int Maternal parity (# of previous live births at the time of this child���s birth) 24 gravida int Maternal gravidity (# of pervious times pregnant) 25 meducyrs int Mother's education level (years) 26 demo2n int Maternal demographic variable 2 (Nomial value 1, 2, 3, 4 or 5) 27 geniq int IQ measured at age 7 * Variable to predict
Code and Scoring
Your code will be given String training and String testing, and will need to return a double, containing one value (the predicted IQ) for each ID present in the test data. The test data will be provided in order by ID, and your return values should be in that same order.
Your code will also be given ints testType and scenario, indicating the type of test and which of the three scenarios is being tested.
Your code will be scored by calculating the SSE (sum of squared error) of your predictions. Also, SSE0 will be calculated, using the average of all IQ values in the training set as the prediction. Your score for a test case will then be given by:
Score = 1000000 * MAX(0, 1 - SSE/SSE0)
Your overall score will be the average score across all test cases.
Notes on Data Set Generation
Notes on Time Limits
Because different test types deal with different volumes of data, the time limits will also differ. Example tests are limited to 360s (6 minutes), provisional tests to 540s (9 minutes) and system tests to 900s (15 minutes). The testType parameter will be 0, 1, or 2, to indicate Example, Provisional, or System test, respectively, so that your code can take timing into account. Similarly, the scenario parameter is also 0, 1, or 2, referring to the three scenarios listed above.
This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2020, TopCoder, Inc. All rights reserved.