Causal Graph for Disaster World

Key Information

The challenge is finished.
Show Deadlines

Challenge Overview

Detailed requirements

Topcoder is working with a group of researchers that are competing to understand a series of simulated environments.  There are four of these environments: Urban World, Conflict World, Power World, and Disaster World. Each “world” is defined by a set of data but the full range of possible variables used to develop the simulated worlds isn’t disclosed to working groups in advance. The research teams have to propose and request useful data and then justify how it would be collected in the target environment.

In this challenge, we are asking the Topcoder community to build a causal graph that will reveal relationships in the ground truth for the simulated disaster world.

Ground truth is a graphical representation of the causal structure driving the simulation. The ground truth should incorporate processes at three levels of analysis:

  • The actor level: the decision-making process that actors in the simulation used to make decisions, including processes for integrating information (for example, from the environment, the actor’s own characteristics and states, and interactions with other actors).

  • The group level: interactions between actors, explicit and implicit group formation, decision making, and interaction, and other simulated activity that involves multiple actors.

  • The system level: the environment and its influences on/from actors and groups


Nevertheless, the level of a process need not be specified and all processes should be incorporated into a singular, submitted ground truth graph comprising nodes (processes) and unweighted, unlabeled directed links (causation). Multi-level processes in the ground truth should reference simulation variables that cause other processes (also variables) within the simulated world. These variables may be visible and named in the data, or they may be hidden but discoverable. We note that causal links between nodes must be statistically identifiable from the particular run of the simulation, even though you may or may not yet have the data required to statistically identify them.

The purpose of the Ground Truth program is to create artificial but socially plausible simulations that have known causal ground truth to validate the accuracy and robustness of social science modeling methods.


 Guidelines for Creating a Simulation’s Ground Truth

  1. Ground truth represents the causal structure of the simulation.

    1. Nodes represent simulation variables.

    2. Links represent causal relationships between simulation variables.

  2. Nodes should be aggregated where possible. If there are multiple simulation variables that represent similar concepts and have the same causal structure (ie: the same causal influences), then those variables can be represented as a single node. For example, if multiple agents use the same decision-making structure, even if they are parameterized differently, they should be represented as a single node.

  3.  Links represent causal relationships that are represented in the simulation equations or algorithms. For example, if variable A is used in the equation/algorithm for calculating variable B, then the ground truth should include a link from node A to node B.

  4. Note: links do not represent the correlation. If a relationship between two nodes is correlative and not causal, it should not be represented by a link.

  5. Relationships between entities should be represented as simply as possible. For example, the ground truth does not need to represent the entire influence network used in the simulation; instead, links can represent types of causal relationships between generic actors.

  6. Equations and parameterization are not represented in the ground truth. The ground truth diagram is only meant to specify causal relationships.

  7. The ground truth should describe the causal influences that determine:

  •     How actors in the simulation make decisions.

  •     How actors in the simulation interact with each other.

  •     How actors in the simulation interact with their environments.

  •     Any environmental factors that influence each other within the simulation.

We’ll be attaching the following documents:

  1. Challenge Description

  2. Initial Phase 2 Data set of Disaster World - includes a Simulation Description folder which outlines the variables contained in the data set.

  3. Disaster World Research Requests - outlines additional variables/data that have been requested by the Disaster World Team. 

  4. The previous phase’s GraphML files and images.  This shows the basic output format.

  5. Some sample Python code to format graphml output and render the images.


Final goal: Develop a causal graph for Phase 2 Explain data set which reveals the underlying causal relationships that exist in the simulated disaster world data.


Evaluation Criteria:

1.     Completeness: Please include all the variables provided in your analysis. Please provide a consolidated graphml file that describes all the three-levels. 

2.  Quality of your documented analysis.  Please tell us how you generated your causal graph.  What analysis did you perform to justify the structure you’ve generated?


Final Submission Guidelines

Please submit the following:

  1. A description.txt file which outlines the variables: nodes and edges in your graph

  2. An analysis document which describes the analysis you performed to generate your graph and your approach.  Output from a Jupyter Notebook is also acceptable but please include sufficient text content to explain your analytical methods and conclusions.

  3. A rendered graph image which can either be a JPG file or PNG file.

  4. A GraphML file

Reliability Rating and Bonus

For challenges that have a reliability bonus, the bonus depends on the reliability rating at the moment of registration for that project. A participant with no previous projects is considered to have no reliability rating, and therefore gets no bonus. Reliability bonus does not apply to Digital Run winnings. Since reliability rating is based on the past 15 projects, it can only have 15 discrete values.
Read more.


Final Review:

Community Review Board


User Sign-Off


Review Scorecard