PINS Explorer: Extracting VI Ionogram Parameters

Register
Submit a solution
The challenge is finished.

Challenge Overview

Thanks everyone for participating the PINS Explorer challenge.  The PINS Master challenge is now open for registration here.

Introduction

Despite starting nearly 100 kilometers above the surface of the Earth, the ionosphere plays an active role in our day-to-day lives affecting High-Frequency (HF) Radio propagation. International air traffic controllers, oceanographers using surface wave radars, the space launch community, and many others are all affected by electron density distribution in the ionosphere. The ionosphere lets you hear distant AM radio stations in your car but it can also affect the quality of long-range air traffic control communications.

Modeling the impacts of the ionosphere on HF Radio can be a significant challenge. Installing and operating ionospheric bottomside sounding systems, called ionosondes, requires a large amount of electricity, human resources, and the construction of an entire infrastructure of high-profile antennas. However, passively receiving a characterized or non-characterized sounder transmission is considerably more convenient. It requires a fraction of the power and resources, and utilizes lower-profile equipment that can be installed temporarily.

The IARPA Passive Ionospheric Non-Characterized Sounding (PINS) Challenge is an open innovation competition that asks Solvers to develop an algorithm that characterizes, monitors, and models ionospheric variation effects on high frequency emissions. The PINS Challenge invites Solvers from around the world to develop innovative solutions that can lead to a greater understanding of the ionosphere and the effects it has on our technology.

Solvers are challenged to characterize the ionosphere with selected digitized radio-frequency (RF) spectrum recordings from sounder receiver data, but not any transmitter data. The PINS Challenge takes place in two stages: Explorer and Master. Part I, the Explorer challenge, specifies the sounder signals to be detected and characterized. Part II, the Master challenge, will add new data sets along the way. This document is the problem specification of Part I, the Explorer challenge.

Task overview

In these challenges your task is to process In-phase/Quadrature voltages (known as I/Q signals, see the References section at the end of this document for more information) measured by a broadband HF antenna that is connected to a set of Software Defined Radios (SDR). The SDRs were time-synchronized with Global Positioning System (GPS) timing. All PINS I/Q recordings were made at several locations in the United States. Within these recordings are many unique signals with a wide variety of modulations and signal strengths, and the recorded signals propagated between sites by ground wave and/or skywave.

The input data set also contains a large volume of synthetic data that will present both simple and complex signal environments for algorithm development, testing, troubleshooting, and partly also for measuring the performance of your algorithm. The training data sets will have a good distribution of 'easy' and 'hard' signal environments. Using these data sets, the noise and interference environments can be gradually escalated in order to improve the performance of your algorithm.

In the current challenge (Explorer Challenge), your are asked to calculate ionospheric parameters from samples of I/Q data, collected in vertical incidence (VI) measurements. The parameters of interest are presented below in Figure 1.

Figure 1:  VI Ionogram Parameters

Extracting these ionogram parameters requires you to extract a VI sounder's parameters from the given I/Q data. For linear sweep sounding you will need to determine the sounder's sweep rate and start time. For pulsed soundings derived parameters also include - among other variables - the number of pulse repeats per frequency and inter-pulse period. See [1] and [2] for a detailed list of sounder parameters that you will need to work with and also for the range and possible values of each sounder parameter.

For each test case (i.e. a continuous sample of raw HF I/Q data) you are given descriptions of soundings (e.g. whether it originates from a continuous linear sweep or from a coded pulse sounding. Some test cases contain real (i.e. observed, measured) data, others contain synthetic data. Both observed and synthetic data present a varying amount of noise and interference.

To reduce the data download and storage requirements, beyond the observed and synthetic full bandwidth data sets, the training data also contains narrow-band step recordings (called 'lite' data from now on) such that is shown in Figure 2. Such recordings will not be used to evaluate your algorithms, they are disclosed in the hope that you will find them useful for training during development. See [1] for details on this type of data.

Figure 2:  Sounder sweep segmented in a series of stepped narrow band recordings

For each test case your algorithm must calculate the following basic ionospheric parameters:

  • h’F2 and foF2 ([8], page 54)

  • Determine presence of E layer and fmaxE ([8], page 53)

The quality of your algorithm will be judged by how closely these extracted parameters match those determined by domain experts. See Scoring for details.

Input files

Training and provisional test data can be downloaded from this AWS S3 bucket:

Bucket name: pins-data-public
Access key ID: AKIAT52LNUDUORF65C5T
Secret access key: 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

(If you are unfamiliar with AWS S3 technology then this document can help you get started. Note that the document was prepared for a different contest, but most of its content - like how to set up an AWS account or what are the tools you can use - is relevant in this challenge as well.)

Note that the total size of training and provisional testing data is huge: ~1328GB. We recommend that before downloading the full data set you familiarize yourself with the data by downloading a small subset, e.g. a single full bandwidth IQ file (~8GB).

The /training folder of the pins-data-public bucket has the following structure:

- list.txt: contains meta data about each full bandwidth IQ data file in CSV format. Each line of list.txt describes one IQ file. The lines are formatted as follows:

file-id,data-type,sounder-type,fmaxE,foF2,h'F2

where

  • file-id is the unique identifier of the test case.

  • data-type is either observed or synthetic.

  • sounder-type is either linear or pulsed.

  • fmaxE is the critical frequency of the E layer measured in MHz. If the E layer is not present it contains the value nan.

  • foF2 is the critical frequency of the F2 layer measured in MHz. In a very small number of cases when the value can not be determined it contains nan.

  • h'F2 is the virtual height of the F2 layer measured in km. In a very small number of cases when the value can not be determined it contains nan.

- <file-id>.bin: raw IQ data. See [1] for details on the format and content of the file. Two important pieces of metadata: 1) each full bandwidth raw IQ file has a sampling rate of 10 MHz. 2) each full bandwidth raw IQ file has a center frequency of 7 MHz. This value is inherent to the recorder and it is what the antenna signals are mixed against.

- <file-id>.png: contains the ionogram generated from the corresponding raw IQ data.

- sounding-params-linear.txt and sounding-params-pulsed.txt: contain sounder parameters of the measurements, for the linear sweep and pulsed test cases, respectively. These are two CSV files, containing one line of information for each sample. The format of the line depends on the sounder-type:

  • for linear soundings (sounding-params-linear.txt) the line contains these fields, in this order:

    • file-id,

    • start time: the time when the linear sweep starts, measured in seconds from the start of the file,

    • start frequency: initial frequency of the sounder, measured in Hz,

    • sweep rate: speed of the frequency sweep, in Hz per second,

    • end frequency: final frequency of the sounder, measured in Hz.

 
  • for pulsed soundings (sounding-params-pulsed.txt) the line contains these fields, in this order:

    • file-id,

    • start time: the time when the first pulse starts, measured in seconds from the start of the file,

    • inter pulse period: time difference between the start of two consecutive pulses, measured in seconds,

    • number of pulses per frequency,

    • start frequency: initial frequency of the sounder, measured in Hz,

    • frequency step: frequency step size, in Hz,

    • end frequency: final frequency of the sounder, measured in Hz,

    • polarization: one of {O, O/X},

    • phase shifting enabled: one of {true, false}.

 

- /training/lite folder: contains the narrow band step recorded training data. There are approximately 5000 synthetic samples both for linear and pulsed soundings in two tar.gz files. (Note that the files are ~50GB each and will expand to ~110GB.) Note that the file name of the narrow band .bin files also contains metadata about the sample: the center frequency and sampling frequency, see [1] for details.The four text files in this folder contain in a self explanatory format the ionogram parameters and sounder parameters used to generate these synthetic samples.

 

The /testing folder has the following structure:

- list.txt: contains one piece of metadata about each full bandwidth IQ data file in CSV format: whether the sample's sounder type is linear or pulsed.

- <file-id>.bin: raw IQ data. Each such file corresponds to one test case, your task is to extract the required ionogram parameters from each of these IQ files.

Output file

The extracted fmaxE, foF2 and h'F2 values must be listed in a single CSV file. This file should contain all the required ionogram parameters corresponding to all raw IQ files in the test set found in the AWS bucket referenced above. The file must be named solution.csv and have the following format:

file-id,fmaxE,foF2,h'F2

where

  • file-id is the unique identifier of the test case,

  • fmaxE is the critical frequency of the E layer measured in MHz. If your algorithm detects that the E layer is not present, then the value nan must be used.

  • foF2 is the critical frequency of the F2 layer measured in MHz,

  • h'F2 is the virtual height of the F2 layer measured in km.

Your solution file may or may not include the above header line. The rest of the lines should specify the extracted parameters, one test case per line.

Sample lines:

test-001,3.1,6.575,247.8
test-002,nan,4.475,262.0

 

Submission format

This match uses a combination of the "submit data" and "submit code" submission styles. The required format of the submission package is specified in a submission template document. This current document gives only requirements that are either additional or override the requirements listed in the template.

  • You must not submit more often than 3 times a day. The submission platform does not enforce this limitation, it is your responsibility to be compliant to this limitation. Not observing this rule may lead to disqualification.

  • An exception from the above rule: if your submissions scores 0, then you may make a new submission after a delay of 1 hour.

  • To speed up the final testing process the contest admins may decide not to build and run the dockerized version of each contestant's submission. It is guaranteed however that if there are N main prizes then at least the top 2*N ranked submissions (based on the provisional leader board at the end of the submission phase) will be final tested.
       

Scoring

During scoring your solution.csv file (as contained in your submission file during provisional testing, or generated by your docker container during final testing) will be matched against  expected ground truth data using the following algorithm.

If your solution is invalid (e.g. if the tester tool can't successfully parse its content), you will receive a score of 0.

Otherwise your score for a test case is calculated as follows:

raw_score = sc_E + sc_fE +sc_fF2 + sc_hF2, where

  • sc_E = 1000 if you correctly guessed the presence of the E layer, that is the expected and extracted fmaxE values are either both nan or both different from nan. Otherwise sc_E = 0.

  • sc_fE = 0 if either the expected or extracted fmaxE value is nan. Otherwise sc_fE = max(0, 1000 - diff), where diff is the difference of expected and extracted fmaxE values in kHz.

  • sc_fF2 = max(0, 1000 - diff), where diff is the difference of expected and extracted foF2 values in kHz. In a very small number of cases the expected value of foF2 is unknown, in this case sc_fF2 = 1000, irregardless of the value you extracted.

  • sc_hF2 = max(0, 1000 - diff), where diff is the difference of expected and extracted h'F2 values in km. In a very small number of cases the expected value of h'F2 is unknown, in this case sc_hF2 = 1000, irregardless of the value you extracted.

Then score = raw_score / max_score, where max_score = 3000 if there is no E layer present, 4000 otherwise.

Finally your score is calculated as 100 * average of test case scores.

Note that there is a minimum quality requirement for prize winning solutions which is not directly related to the score calculated as described above. See the "Final prizes" section later.

Final testing

This details of the final testing work flow and the requirements against the /code folder of your submission are also specified in the submission template document. This current document gives only requirements or pieces of information that are either additional or override those given in the template. You may ignore this section until you decide you start to prepare your system for final testing.

  • The allowed time limit for the train.sh script is not specified at the launch date of the contest. It will be determined after the end of the submission phase, based on discussions with the top few ranked solvers.

  • The training data within your docker container will look like this:
        data/
          training/
               list.txt
               sounding-params-linear.txt
               sounding-params-pulsed.txt
               train-000.bin
               train-000.png
               ... other .bin and .png files

  • The allowed time limit for the test.sh script is 24 hours. The testing data folder contain similar data in the same structure as is available for you during the coding phase. The final testing data will be similar in size and in content to the provisional testing data.

  • Testing data within your docker container will look like this:
        data/
          testing/
               list.txt

               test-000.bin
               test-001.bin
               ... other .bin files

  • Hardware specification. Your docker image will be built and run on a Linux AWS instance with this configuration: m4.xlarge.  Please see here for the details of this instance type.

General Notes

  • This match is not rated.

  • Teaming is allowed. Topcoder members are permitted to form teams for this competition. If you want to compete as a team, please complete the teaming form. After forming a team, Topcoder members of the same team are permitted to collaborate with other members of their team. To form a team, a Topcoder member may recruit other Topcoder members, and register the team by completing this Topcoder Teaming Form. Each team must declare a Captain. All participants in a team must be registered Topcoder members in good standing. All participants in a team must individually register for this Competition and accept its Terms and Conditions prior to joining the team. Team Captains must apportion prize distribution percentages for each teammate on the Teaming Form. The sum of all prize portions must equal 100%. The minimum permitted size of a team is 1 member, the maximum permitted team size is 5 members. Only team Captains may submit a solution to the Competition. Notwithstanding Topcoder rules and conditions to the contrary, solutions submitted by any Topcoder member who is a member of a team on this challenge but is not the Captain of the team are not permitted, are ineligible for award, may be deleted, and may be grounds for dismissal of the entire team from the challenge. The deadline for forming teams is 11:59pm ET on the 21th day following the start date of each scoring period. Topcoder will prepare a Teaming Agreement for each team that has completed the Topcoder Teaming Form, and distribute it to each member of the team. Teaming Agreements must be electronically signed by each team member to be considered valid. All Teaming Agreements are void, unless electronically signed by all team members by 11:59pm ET of the 28th day following the start date of each scoring period. Any Teaming Agreement received after this period is void. Teaming Agreements may not be changed in any way after signature.

  • Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about the problem itself or possible solution techniques.

  • In this match you may use any programming language and libraries, including commercial solutions, provided Topcoder is able to run it free of any charge. You may also use open source languages and libraries, with the restrictions listed in the next section below. If your solution requires licenses, you must have these licenses and be able to legally install them in a testing VM (see “Requirements to Win a Prize” section). Submissions will be deleted/destroyed after they are confirmed. Topcoder will not purchase licenses to run your code. Prior to submission, please make absolutely sure your submission can be run by Topcoder free of cost, and with all necessary licenses pre-installed in your solution. Topcoder is not required to contact submitters for additional instructions if the code does not run. If we are unable to run your solution due to license problems, including any requirement to download a license, your submission might be rejected. Be sure to contact us right away if you have concerns about this requirement.    

  • You may use open source languages and libraries provided they are equally free for your use, use by another competitor, or use by the client. If your solution includes licensed elements (software, data, programming language, etc) make sure that all such elements are covered by licenses that explicitly allow commercial use.

  • If your solution includes licensed software (e.g. commercial software, open source software, etc), you must include the full license agreements with your submission. Include your licenses in a folder labeled “Licenses”. Within the same folder, include a text file labeled “README” that explains the purpose of each licensed software package as it is used in your solution.    

  • External data sets and pre-trained models are allowed for use in the competition provided the following are satisfied:

    • The external data and pre-trained models are unencumbered with legal restrictions that conflict with its use in the competition.

    • The data source or data used to train the pre-trained models is defined in the submission description.

Final prizes

In order to receive a final prize, you must do all the following:

  • Achieve a score in the top five according to final system test results. See the "Scoring" and "Final testing" sections above.

  • Satisfy the following minimum quality requirements:

    • calculated as the average over the final test cases your extracted h'F2 and foF2 values must be within 20% of the expected values.

    • calculated as the average over the final test cases where the E layer is present, your extracted fmaxE values must be within 20% of the expected values. In the cases where you don't detect the presence of the E layer, the sample average fmaxE value will be used in this calculation.

  • Once the final scores are posted and winners are announced, the prize winner candidates have 7 days to submit a report outlining their final algorithm explaining the logic behind and steps to its approach. You will receive a template that helps creating your final report.

  • If you place in a prize winning rank but fail to do any of the above, then you will not receive a prize, and it will be awarded to the contestant with the next best performance who did all of the above.

PINS Challenge References

[1] Dao, Eugene (2019, March): Data Description and Target Sounder Signals

[2] Dao, Eugene (2019, March): HF Sounder Signal Processing

[3] https://nvlpubs.nist.gov/nistpubs/Legacy/MONO/nbsmonograph80.pdf

[4] http://www.cnofs.org/Handbook_of_Geophysics_1985/Chptr10.pdf

[5] https://www.ngdc.noaa.gov/stp/iono/Dynasonde/

[6] https://github.com/MITHaystack

[7] https://www.iarpa.gov/index.php/research-programs/hfgeo

Additional References

Solvers can be successful solely using the preceding references, but for those seeking more in-depth knowledge of RF signal propagation and ionospheric characterization the following references may be of interest.

[8] Leo F. McNamara (1991) The Ionosphere: Communications, Surveillance, and Direction Finding (Orbit: A Foundation Series), Krieger Publishing Company, Malabar FL

[9] Goodman, J. M. (1991), HF Communications: Science and Technology, Van Nostrand Reinhold, New York, NY

Additional Rules and Conditions

There are a number of additional rules and conditions associated with this competition.  Please review the PINS Challenge Rules Document for supplementary information about Payment Terms, Intellectual Property Agreements, Eligibility Requirements, Warranties, and Limitations of Liability.