The United States government routinely performs radiological search deployments to search for the presence of illicit nuclear materials (like highly enriched uranium and weapons grade plutonium) in a given area. The deployments can be intelligence driven, in support of law enforcement, and planned events such as the Super Bowl, presidential inaugurations or political conventions. In a typical deployment, radiation detection systems carried by human operators or mounted on vehicles move in a clearing pattern through the search area. Search teams rely on radiation detection algorithms, running on these systems in real time, to alert them to the presence of an illicit threat source. The detection and identification of sources is complicated by large variation of natural radiation background throughout a given search area and the potential presence of localized non-threat sources such as patients undergoing treatment with medical isotopes. As a result, detection algorithms must be carefully balanced between missing real sources and reporting too many false alarms. The purpose of this competition is to investigate new methodologies in detecting and identifying nuclear materials in a simplified search mission.
For more information on the search mission, see https://www.energy.gov/nnsa/nuclear-incident-response.
For this competition, we have used Monte Carlo particle transport models (see scale.ornl.gov) to create thousands of data files resembling typical radiological search data collected on urban streets in a mid-sized U.S. city. Each data file – a run – simulates the pulse trains of gamma-ray detection events (amount of energy deposited in the detector at a given time) received by a standard thallium-doped sodium iodide – NaI(Tl) – detector, specifically a 2”×4”×16” NaI(Tl) detector, driving along several city street blocks in a search vehicle. Gamma-rays produced from radioactive decay have a characteristic energy which can be used to identify the source isotope (https://en.wikipedia.org/wiki/Gamma_ray). All of the runs contain data on radiation-detector interaction events from the natural background sources (roadway, sidewalks, and buildings), and some of the runs also contain data arising from non-natural extraneous radiation sources.The event data created for this competition is derived from a simplified radiation transport model – a street without parked cars, pedestrians, or other “clutter”, a constant search vehicle speed with no stoplights, and no vehicles surrounding the search vehicle. In fact, the search vehicle itself is not even in the model – the detector is moving down the street by itself, 1 meter off the ground. The energy resolution is 7.5% at 661 kilo-electronvolts (keV).This simple model provides a starting point for comparing detection algorithms at their most basic level.
The runs are separated into two sets: a training set for which a file with the correct answers (source type, time at which the detector was closest to the source) is also provided, and a test set for which you, the competitor, will populate and submit an answer file for online scoring. For each run in the test set, you’ll use your detection algorithm on the event data to determine
whether there is a non-natural extraneous source along the path (the detection component), and if so,
what type of source it is (identification) and
at what point the detector was closest to it during the run (location — which in this competition will be reported in seconds from the start of the run).
Data for Download
testing.tar.gz - Test set — 15,840 runs (6.8 GB)
training.tar.gz - Training set — 9,700 runs (7.4 GB)
trainingAnswers.csv - Training set answers file (128 kB)
submittedAnswers.csv - Blank test set answers file (155 kB)
sourceInfo.zip - Energy spectra for sources 1-5 (290 kB) – see discussion below
For the competition we have generated data from thousands of runs that mimic what would be acquired by a 2”×4”×16” NaI(Tl) detector moving down a simplified street in a mid-sized U.S. city. Each run has been designed so that no source is located within the first 30 seconds of measurements, though the first 30 seconds could include events associated with gamma rays arising from an extraneous source located farther along the street.
In a real urban search scenario, you would be able to see the general layout of the street as the detector moves along. A notional diagram is shown in the figure, though the exact street geometry for each run will be different (and unspecified). Buildings made of different materials (red, beige, and dark gray) occupy most of the space along the main street, with a few asphalt parking areas (white), and grass-covered park areas (green). The main street (also white) has four travel lanes, two in each direction, and several cross streets. Travel lanes are each ten feet wide and there is at least ten feet of sidewalk (gray) between the outer travel lane and the building facades. Detector paths are all within one of the four main traffic lanes – with no lane switching and no turning onto the cross streets.
The set of data for a given run mimics what a detector would see in real time: a list of pairs, each comprising the time at which a gamma ray interacted within the radiation detector and the corresponding energy the gamma ray deposited in the detector. Times are in microseconds (µsec, millionths of a second) since the last event, and the deposited energy is in units of kilo-electronvolts (keV). The time recorded for the first event in each run will be zero. Your algorithm can bin the data into any time and energy structure that you find beneficial.
For each run, several characteristics are fixed a priori:
The lane and direction of travel.
The speed of travel, a constant between 1 and 13.4 meters per second.
The street geometry (e.g., what buildings are present where).
The type of source (see below for the list of source types and their SourceID labels).
The strength of the source.
Whether the source is shielded or not.
The location of the source along the street.
Although these 7 characteristics are not revealed in the test set, they do impact the data recorded in each file. For instance, fewer events are recorded if the detector is traveling at a faster speed, and more events are recorded if the source has a higher strength.
Note that the runs in the training set and the runs in the test set are all drawn from the same 7-dimensional input space based on the 7 characteristics above. However, the emphasis of each set varies. Recall that the test set runs are divided into those used to calculate your provisional score while the competition is running (42%) and those used to calculate your final score at the end of the competition (58%). The training runs span the full input space relatively evenly, while the final score test runs emphasize the more challenging regions of the input space. The provisional score test runs lie somewhere in between in their emphasis.
If a source is present in a run, it will be one of six types:
SourceID Source Type
1 HEU: Highly enriched uranium
2 WGPu: Weapons grade plutonium
3 131I: Iodine, a medical isotope
4 60Co: Cobalt, an industrial isotope
5 99mTc: Technetium, a medical isotope
6 A combination of 99mTc and HEU
Enter your prediction of the SourceID in your answers file for each run. If no source is present in a run, the SourceID field for that run in your answers file should be entered as 0. For all runs, there will be no source located within the first 30 seconds of measurements.
Energy spectra for each source type are shown below for a significant quantity of source types 1 and 2 and 1 microcurie (μCi) for source types 3-5. In each figure, the solid curve shows the unshielded spectrum while the dashed curve shows the spectrum with 1 cm of lead shielding. The plots show sources 1 meter away from the center of the detector in a vacuum.
The data that generated these figures are included in the data for download in the file sourceInfo.zip.
Files for this competition are standard ASCII and comma delimited (CSV format). They include run files that contain the radiation detection event data for each run and answers files that contain the source type and source time for all the runs in either the training or the test set.
The events for each run (one instance of a detector traveling down a street) are contained in a run file named with a unique run ID number. Each run file contains a list of radiation detection events, one event per row. Each row gives the time in microseconds since the previous recorded event and the deposited energies of this event in keV. For the first event in each run file, the time since the last event is recorded as 0.
Example Run File:
In this example, the first row records the first event, which had a deposited energy of 69 keV. The second row describes the second event, which occurred 985 µsec after the first with a deposited energy of 154.7 keV. The third row records an event that occurred 757 µsec after the second event (985 +757 = 1742 µsec since the beginning of the run) with a deposited energy of 55.5 keV. The event descriptions continue row-by-row until the end of the run.
For the training set, the answers file contains the correct answer for each run file in CSV format. Each row of the training set answers file gives an identifier for the run (RunID), an identifier for the source type (SourceID), and the time in seconds, reported to the nearest 1/100 of a second, at which the detector was closest to the source (SourceTime), sometimes referred to as the time of closest approach. For runs without an extraneous source (background only), SourceID and SourceTime are 0 and 0.00, respectively.
Example Training Set Answers File:
In the example training set answers file, the first row is for run number 100001, which, like runs 100002 and 100003, had no extraneous source present (0 for SourceID and 0.00 for SourceTime), while the fourth row after the header is for run number 100004, which had source 1 (HEU) present at a location that was nearest to the detector 65.58 seconds after the start of that run.
Making a Submission
This competition uses the result submission style, i.e., you will run your solution locally using the provided files as input, and produce a file that contains your answer. Note that only your first 300 submissions can be considered for final rankings for prizes. If you make more than 300 submissions, the later submissions will be scored as part of the provisional leaderboard while the competition is running but will not appear in the final rankings.
Your output must be a CSV file with the same format as for the training set answers file above. The name of your submission file must be solution.csv. You will fill in the SourceID and SourceTime fields of the answers file for each run in the test set and submit the entire file for scoring. If any of the entries for any of the runs is left blank, or contains inadmissible entries (e.g., listing a SourceID of “7” or a negative SourceTime), the score will not be generated when that submission is processed. Likewise your score will not be generated unless your answers are sorted by RunID. You should zip your solution.csv folder before submitting it to Topcoder.
Keep in mind that your complete code that generates the results will be verified at the end of the contest if you achieve a score in the top 10, as described later in the “Requirements to Win a Prize” section. That is, participants will be required to provide fully automated executable software to allow for independent verification of software performance and the metric quality of the output data.
Scores range from 0 (worst) to 100 (best) and reflect your success on each of the three components: detection, identification, and location. Remember that only your first 300 submissions can be considered for final rankings for prizes. If you make more than 300 submissions, the later submissions will be scored as part of the provisional leaderboard while the competition is running but will not appear in the final rankings.
Let B be the base score and p be the unit of possible penalty (arbitrary scale). Your score is set to B points and then it is modified according to these rules:
- For each run that contains a source:
- Detection: If you incorrectly say there’s no source present (false negative), you lose 2p points.
- Identification and location: If you correctly say there’s a source present and the distance D between the predicted location and the correct location is less then the standoff (which is different for each run):
- Identification: If you correctly identify the SourceID, you earn p points.
- Location: You earn points according to how far away you are from the correct location, up to a maximum of p points, following this formula:
“points earned” = p · cos((π / 2) · (D / standoff))
- Location: If you correctly say there’s a source present but get the location wrong (i.e., not within a given standoff), you lose 2p points (so it is the same as false negative).
- For each run that does not contain a source:
- Detection: If you incorrectly say there’s a source present (false positive), you lose 2p points. Otherwise (true negative), you earn 6p points.
The values of B and p are selected so that the minimum possible score is 0 and the maximum possible score is 100.
Since it is critical for a working detector not to have too many false positives, runs containing no sources contribute more to your score than runs with a source.
The provisional leaderboard rankings will be based on your scores for approximately 42% of the runs in the test set. This will be updated each time a competitor submits an answers file during the competition.
The final rankings for prizes will be based on the remaining 58% of runs. The scoring from these runs is not revealed to the competitor until the end of the competition, and so the final rankings may be different from what has been shown on the public leaderboard.
Recall that only your first 300 submissions will be considered for final rankings for prizes. If you make more than 300 submissions, the later submissions will be scored as part of the provisional leaderboard while the competition is running but will not appear in the final rankings.
An offline scorer is available here that you can use to test your solution locally. It calculates detailed scores based on your output file and a file containing ground truth annotations.
REQUIREMENTS TO WIN A PRIZE AND OTHER RULES
Only your final submission (or your 300th submission if you make more than 300 submissions) will be considered for final scoring.
In order to receive a cash prize if your final score is within the top 10, you must provide:
A dockerized version of your algorithm, within 7 days after the announcement of results of system testing, along with any assets/materials necessary to deploy, use and train it. The technical details on how to dockerize the solution are described in a separate document. The code in your container should produce the same results as your final submission csv.
Write-up explaining the training methods used and theory behind your approach.
Your code, excluding training, should execute within 12 hours on m4.10xlarge or p3.2xlarge AWS machine and produce the final submission csv.
There is no restriction on code language or platform provided that Topcoder can build, deploy, and execute your dockerized code.
The following conditions and restrictions apply to the competition:
Federal employees acting within the scope of their employment are not eligible to participate.
Federal employees acting outside the scope of their employment should consult their ethics advisor before participating in the competition.
People who downloaded the data provided in the earlier version of this competition, hosted at https://datacompetitions.lbl.gov/, are not eligible to participate.
Contractors receiving Government funding for directly related work, such as national laboratory employees, may participate in this competition but must forego monetary prizes unless they are acting outside the scope of their employment as in #2 above. Competitors will still be publicly recognized based on their performance on Topcoder’s online leaderboard. Throughout the competition, Topcoder’s online leaderboard will display your rankings and accomplishments, giving you various opportunities to have your work viewed and appreciated by stakeholders from industry, government, and academic communities.
Only one Topcoder account is permitted per competitor; you cannot make submissions from multiple accounts.
Teaming is allowed. Topcoder members are permitted to form teams for this competition. If you want to compete as a team, please complete the teaming form. After forming a team, Topcoder members of the same team are permitted to collaborate with other members of their team. To form a team, a Topcoder member may recruit other Topcoder members, and register the team by completing this Topcoder Teaming Form. Each team must declare a Captain. All participants in a team must be registered Topcoder members in good standing. All participants in a team must individually register for this Competition and accept its Terms and Conditions prior to joining the team. Team Captains must apportion prize distribution percentages for each teammate on the Teaming Form. The sum of all prize portions must equal 100%. The minimum permitted size of a team is 1 member, the maximum permitted team size is 5 members. Only team Captains may submit a solution to the Competition. Notwithstanding Topcoder rules and conditions to the contrary, solutions submitted by any Topcoder member who is a member of a team on this challenge but is not the Captain of the team are not permitted, are ineligible for award, may be deleted, and may be grounds for dismissal of the entire team from the challenge. The deadline for forming teams is 11:59pm ET on the 21th day following the start date of each scoring period. Topcoder will prepare a Teaming Agreement for each team that has completed the Topcoder Teaming Form, and distribute it to each member of the team. Teaming Agreements must be electronically signed by each team member to be considered valid. All Teaming Agreements are void, unless electronically signed by all team members by 11:59pm ET of the 28th day following the start date of each scoring period. Any Teaming Agreement received after this period is void. Teaming Agreements may not be changed in any way after signature.
OWNERSHIP AND RIGHTS
You retain any and all rights to ownership of your submissions submitted to Topcoder for this competition. Topcoder and the Department of Energy of the United States of America will not gain ownership of this material. However, by submitting any submission or any other material to Topcoder, you hereby grant a perpetual, royalty-free, irrevocable, non-exclusive right and license to Topcoder and the Department of Energy of the United States of America, to use, reproduce and publish such documents, materials or source code for commercial and/or non-commercial use.
You agree that if Topcoder is unable because of your unavailability, or for any other reason, to secure your signature to apply for or to pursue any application for any United States or foreign patents, mask work, copyright or trademark registrations covering the assignments to Topcoder above, then you hereby irrevocably designate and appoint Topcoder and its duly authorized officers and agents as your agent and attorney in fact, to act for and in your behalf and stead to execute and file any such applications and to do all other lawfully permitted acts to further the prosecution and issuance of patents, copyright, mask work and trademark registrations thereon with the same legal force and effect as if executed by your authorized agent.
Nothing in this Agreement shall be construed as granting you any right or license under any intellectual property right of Topcoder (including any rights Topcoder may have in any patents, copyrights, trademarks, service marks or any trade secrets), by implication, estoppel or otherwise, except as expressly set forth herein.