Topcoder is working with a group of researchers organized by the University of Chicago that are competing to understand a series of simulated environments. In the Power World program, we are looking for predictive models to better evaluate the impact on various social systems and actors due to various events, like an election, and how wealth is distributed and changes in the data.
We have some initial datasets provided, with descriptions below as follows. The RunDataTable describes demographic information and policy-related vote information of each individual, and the RelationshipDataTable describes the social relations between these individuals.
This Initial Data Package contains data collected from five isolated societies, one main society and four neighboring societies. Each of these societies have essentially the same causal factors for individual, group, and societal behaviors despite some minor cultural differences.
The Predict Questions are asking about only this society. Data collected from this society is reported in data table files without any appended society number, e.g. RunDataTable-Census.tsv, RelationsipDataTable-GSS.tsv, etc. Entities such as agents and groups in this society have unique names that also do not have any appended numbers, e.g. agent0, agent1, agent2, etc. and group0, group1, etc.
Data collected from the neighbouring societies is reported in data tables with the society number attached, e.g. RunDataTable-Census-Society-2.tsv, RelationshipDataTable-GSS-Society-2.tsv, etc. Entities such as agents and groups in these societies also have their society number appended to their names to distinguish them from entities in other societies. For example, agents in Society 2 are named agent0-2, agent1-2, agent2-2, etc. and groups in Society 2 are named group0-2, group1-2, etc.
There are no interactions between entities that are in different societies. For example, agents in the main society do not know about or interact with agents in Society 2. Similarly, agents in Society 3 do not know about and cannot join or interact with groups in Society 4, etc.
Additional, secondary data has been provided that refers to specific research requests (RR). This secondary data may be helpful and could be useful in helping certain models perform more accurately. There is a "Secondary Research Data" folder in the data pack that contains this data. There is also a requests document in the forum that can be used, along with the research request ID (RR-XXXX), to find the individual data that was collected and described as part of the request.
NOTE: In the secondary data, we have provided the original data for the Phase 2 Explain task. Between every phase and every challenge, the world changes. The same causal model is used during Explain, Predict, and Prescribe, but some specifics about each island are different. So, for example, agent10 or group5 in the Explain world is not the same individual or group as agent10 or group5 in the Predict world.
The full dataset can be downloaded here.
Final Predictive Goals: Given all the data of a number of societies, we will need to build models to predict the status of actors and groups in the future. You may want to build separate models for different questions, but please note that these multiple predictions are highly relevant to each other. That’s why they are included in the same challenge.
These are the predictions that we are going to create models for:
1. Of the agents who voted in the last election (Day 350 in RunDataTable-GSS.tsv), which agents will NOT vote in the next election?
2. How many members will Group 1 have on Day 495?
3. How many group contests will Group 1 win over the next 20 days (from Day 475 through 495)
4. What percentage of voters will vote yes on the MaxHappiness policy, at the global level, in the election on Day 980?
5. How much wealth will agent 11 have on Day 1010, 1050, 1100, 1200, and 1300?
6. Of the agents who voted in the last election, which agents will NOT vote in the next election?
7. Currently (Day 1000), Agent 13 is in Groups 1, 7, and 9. What would Agent 13’s happiness be on Day 1010, 1050, 1100, 1200, and 1300 if it left all of these groups on Day 1001, and no agent could join or qut any groups for the next 300 days? (Express happiness on a 0-1 scale)
8. How many members will Group 1 have on Day 1200?
9. How many group contests will Group 0 win over the next 300 days (from Day 1001 through 1300)? Provide a 95% confidence interval for the number of contest wins.)
10. If the GroupOne policy was active in all locations and globally for the next 300 days (from time 1001 to 1300), what would the average relationship strength between Group 1 members be on Day 1010, 1050, 1100, 1200, and 1300? (Express relationship strength on a 0-1 scale.)
11. What percentage of voters will vote yes on the MaxHappiness policy, at the global level, in the election on Day 1070? Provide a 95% confidence interval for the percentage of yes votes.
12. What will be the social network density on Day 1300? (Use the definition of network density as the number of edges divided by the number of possible edges.) Provide a 95% confidence interval for the network density.
13. If, in addition to the interactions that would normally occur, every agent initiated an individual interaction with another agent chosen at random every day from Day 1000 Day 1300, what will the average happiness of the society be on Days 1010, 1050, 1100, 1200, and 1300? (Express happiness on a 0-1 scale.)
Goal of This Challenge:
You are asked to build models to make predictions to the corresponding questions. Your solution will be judged based on the novelty as well as the performance on the given data.
It is recommended that you build a method to generate the model and create the prediction for each individual question above, so that we can easily review your submission with the expanded data set.
A few of the given questions will be objectively scored according to answers that are known to be correct, and others will be scored subjectively.
For all predictions, please provide clear model training and usage to create the predictions. The reviewers should be able to easily expand your code to use an expanded data set. Do not leave anything to be assumed here, no matter how trivial. This will be part of the review at the end of the challenge, so the more information you provide, and the better your documentation is, the better your chances of winning will be.
The dataset to use can be downloaded here or you can download the data set from the forum.
Each University of Chicago Team has the ability to request additional information from the virtual world simulation teams beyond what is initially provided through a “Research Request” process. Data files or folders that are denoted with an “RR” are the output of this process. In the Code Document forum you’ll find a link to a Research Request document which provides the original request submitted by the University of Chicago researchers that can provide some context. The requests have to include a plausible collection methodology (e.g. surveys or instruments that can collect data). There may be additional data that is provided over the course of this challenge submission period. You are encouraged to include this input into your analysis.