# USSOCOM HF Geolocation Algorithm Challenge

## Key Information

Register
Submit
The challenge is finished.

## Challenge Overview

PRIZE DISTRIBUTION

1st:     \$11,000.00
2nd:    \$7,000.00
3rd:     \$5,000.00
4th:    \$2,500.00
5th:    \$2,000.00

SUMMARY

A flying object receives signal from a radio static emitter located on the ground. We are looking for solutions that, based on this signal, can compute the location of the emitter.

BACKGROUND

When the position of the receiver changes, the observed frequency on the receiver side does not match the actual frequency of the signal. This is well known as the Doppler effect. The relationship between the observed frequency by the receiver, freceived, and the frequency of the original signal, femitted, is expressed by the formula

Here, v denotes the relative speed of how fast the receiver and the emitter are approaching each other and c is the velocity of the signal.

In this competition, the receiver is an aircraft, the emitter is static on the ground, and the signal is high frequency radio. Hence, c is the speed of light and v is the ground speed vG of the aircraft multiplied by the cosine of the angle λ which is formed by the two rays in the 3D space: one heading from the aircraft to the emitter, the other heading in the direction of the current aircraft movement (see figure).

When the frequency femitted is known to the receiver, the only unknown in the equation

is the angle λ. Therefore it should be possible to calculate the value of λ at any given time during the flight and to estimate the emitter’s position from several values of λ calculated during a short period of time.

However, we need to take into account the following:

• The value of femitted is known, but the receiver cannot be tuned to the exact frequency (it may be off by ~100Hz or even more, due to the resolution of the tuner). Therefore, femitted must be considered as unknown variable (more precisely, it is known only to a certain precision, and the uncertainty is bigger than the expected Doppler shift).

• Some of the emitters may not be up all the time. They may be few seconds on - few seconds off. The algorithm might need logic to throw out data during the periods when the emitter is off.

• The value of freceived is not given explicitly. We need to determine/estimate this value from I/Q data (see below). Depending on the signal properties, this may be difficult, especially in cases when the signal bandwidth is too wide.

• We only have the information about the aircraft’s height above ground level (AGL). The information about the emitter’s metres above sea level (MASL) is not known, nor any information about the surface shape. (But the surface is supposed to be flat enough, so that we can assume the MASL of the emitter is very similar to the MASL of the ground directly under the aircraft.)

• If the aircraft is moving along a line (without changing the course), then using the change in Doppler shift one cannot in principle (because of the symmetry) deduce in which half-space (left or right) the emitter is located.

• Apart from Doppler shift, one may consider to use also the signal strength, which should increase with decreasing distance. The change of the signal strength may be useful information.

EXAMPLE

The following animation shows the flight path of the aircraft, emitter location and evolution of the frequency spectrum. It corresponds to the files “3.log” and “3.raw” from the training set (see download links in the dedicated section “Data and Additional Resources”). Below is the last frame of the video.

Trajectory of the aircraft is plotted on the left side (blue curve). Green circle denotes the aircraft, red color is the emitter location. (The coordinate system is selected randomly.)

On the right side, you can observe the frequency spectre evolution. It was calculated by applying the fast Fourier transform to the 1 second long chunks of I/Q data (note that taking different chunk sizes may lead to better results, this is just an example). Only a specific interval of frequencies close to the detected peak is displayed (horizontal axis). The peak is located at around 14400 Hz. Since the signal from the emitter is sampled at the rate of 16000 samples per second, all the obtained frequency values should be considered modulo 16000 Hz. Hence, 14400 Hz corresponds to −1600 Hz. The displayed values (vertical axis) are the logarithms of the absolute values of the numbers from the sequence obtained by the fast Fourier transform.

For this case, the receiver center frequency is 134,476,574 Hz (see “3.log”). The source frequency is 134,475,360 Hz (see “emitters.csv”), it is smaller by 1214 Hz. Therefore, the frequency peak is expected to be oscillating around −1214 Hz, depending on Doppler effect. However, because of the issue with the resolution of the tuner discussed above, the peak is located even more to the left (off by another approx. 400 Hz). The vertical dotted line represents the expected position of the peak calculated according to the Doppler effect theory and shifted by −390 Hz. It is visible in the animation that the peak follows the red line.

OBJECTIVE

For each test case, you will receive in 5 second chunks of the signal from the static emitter in the form of I/Q data for a specific period of time and the corresponding GPS log of the flying object. Your task is to return the estimated location of the emitter after each data chunk, gradually improving the location estimation with each new chunk. You are requested to return also a flag “solved” or “unsolved”, based on your confidence that the estimated location is within the radius of 2 km from the ground truth location. The distance between the aircraft and the emitter is guaranteed to be less than 150 km.

INPUT

At the beginning, your algorithm will receive the value of femitted as an integer value (in Hz).

• chunk of I/Q data from a period of 5 seconds,

• flight telemetry corresponding to the same period.

In this contest, the signal from the emitter is sampled at the rate of 16,000 samples per second and is represented in the form of I/Q data. Each sample of I/Q data is a complex number represented as a pair of 16-bit signed integers. You will receive the signal in chunks of 5 × 16,000 = 80,000 samples in form of a raw binary file of size 2 × 2 × 80,000 = 320,000 bytes. Each block of size 4 bytes represents one sample: the first 2 bytes (16 bits) represent the real part of the sample, the next 2 bytes represent the imaginary part.

You will also receive the log file corresponding to the raw file. It is a text file containing 28 × 10 × 5 = 1,400 lines. Each block of 28 lines corresponds to 0.1 seconds of I/Q data. The most important is the 17th line containing the text "Center Frequency: [value]".This is the value to which the receiver frequency is set. It is constant for each test case, so you only need to parse it at the beginning. This value is close to femitted. (If possible, it would be set exactly to femitted.) After you transform the I/Q signal into frequency spectrum (the standard way to do it is to apply fast Fourier transform), the frequency peak should be located at a frequency equal to the difference between femitted and the receiver frequency, up to corrections due to the Doppler shift and due to the inaccuracy described above in the “Background” section. You can explore also the other values in the log files (see the data released for training) and use them if you find it useful. E.g., the 14th, 15th and 16th line contain information about the time at which the data was recorded.

The information about the flight path will be given to your algorithm via a comma-delimited text file with the header line and with the values in the same format as in the file released for training. Each line corresponds to a single flight path position. The data was recorded roughly 4 times per second, but some values was updated only once per second (that is, at a specific time, the value is the same in all lines). The description of the columns follows:

• DTG(OCU): Date-time group. String "30162244Z Jan19" should be interpreted as January 30th, 2019, 16:22:44.

• source: This is always "GPS". You can ignore this field.

• status: This is always "A". You can ignore this field.

• time(nav-source): Time. String "162244.00" should be interpreted as 16:22:44.

• latitude(dms): Latitude. String "N030:40:50.1234" represents 30° 40′ 50.1234′′ northern hemisphere.

• longitude(dms): Longitude. String "W085:40:50.1234" represents 85° 40′ 50.1234′′ western hemisphere.

• mgrs: Originally refers to Military Grid Reference System, but in this contest, this field is irrelevant - use the previous two columns for location identification.

• msl(m): Metres above mean sea level.

• agl(m): Metres above ground level.

• speed(kts): Aircraft speed in knots.

• pitch(deg): Pitch in degrees.

• roll(deg): Roll in degrees.

Note that there are two different columns with time (the 1st and the 4th). They may differ by a few seconds. For syncing with I/Q data, you should use "time(nav-source)".

OUTPUT

After each chunk (before receiving the next chunk), your algorithm should output the estimated location of the emitter (latitude and longitude) and the flag "solved" or "unsolved".

SCORING

There will N test cases generated from the data, numbered i = 1, 2, ..., N. The test case #i will cover Ti seconds of I/Q data, which will be split into chunks of length 5 seconds. Ti will vary in the range from 15 sec. to 500 sec.

To each test case #i, the score will be assigned following these rules:

• If the algorithm reports "unsolved" at the end, the score will be 0.

• If the algorithm reports "solved" at the end and the location prediction reported after the last chunk of data is within 2 km of the correct location, denote by tmin the minimum time in seconds such that the location prediction was within 2 km of the correct location for every time t >= tmin. The score will be

1 − (tmin / Ti).

• If the algorithm reports "solved" at the end, but the location prediction reported after the last chunk of data is NOT within 2 km of the correct location, the score will be

0.1 × (2 / error − 1),

where error is the distance (in km) between the predicted and the correct location.

We will measure the time your algorithm consumes. Initially, we set the variable total_lag to 0 seconds. Then we will measure the time your algorithm spends, separately for each chunk. Whenever the time spent on a chunk – t_chunk – is more than 5 seconds, we increase the variable total_lag by (t_chunk − 5) seconds. The final value of total_lag must be less than 15 seconds, otherwise you receive for the test case the score of the worst solution, that is, −0.1.

Note that for the scoring it is irrelevant which flag ("solved" or "unsolved") your solution reports except for the one reported after the last chunk. However, in advance you do not know which chunk is the last.

The distance between your predicted location and the correct location will be calculated using haversine formula (with Earth radius 6,371 km).

The total score will be the average of single test case scores. The higher the score, the better. For leaderboard purposes, this total score will be linearly transformed into range [0, 100], where 0 represents the worst solution (which always reports "solved" with error = infinity) and 100 represents the perfect solution (which always reports "solved" and always predicts within 2 km of the correct location).

N will be 20 for provisional and at least 100 for final tests.

IMPLEMENTATION DETAILS

The allowed programming languages are Java and C++ only. You will submit a dockerized version of your algorithm. The scoring system will build a Docker container out of it, and will run it for each test case and communicate with it via files in the shared folder "/workdir/chunks/". The communications works this way:

• Receiving a string means that you should regularly check for the existence of the file "/workdir/chunks/semaphore_in[#R]", where [#R] is the number initially set to 0 and is increased by 1 with each received string. After you make sure that this file exists, you can access the file "/workdir/chunks/message_in[#R]", which will contain a line with the received string.
• Sending a string means that you should (in this specific order):
• Write a line with the string into the file "/workdir/chunks/message_out[#S]", where [#S] is the number initially set to 0 and is increased by 1 with each sent string.

• Create the file "/workdir/chunks/semaphore_out[#S]" (it can be empty).

The provided dockerized examples (link below) contain code with methods send_line() and receive_line(). You can use and adjust it as needed.

The following demonstrates the required workflow of your algorithm:

1. Receive a string with three comma delimited items:

• The first item is an integer (the value of femitter in Hz).
• The second item is a floating point number (the actual value of total_lag; see the scoring section).

• The third item is a string denoted by STR.

2. If STR == "end", send any string and exit (end of the test case). Otherwise:
• You can access files STR.raw (binary I/Q data), STR.log (I/Q data info) and STR.csv (flight path). The string will include also the path. (Example: When STR == "/workdir/chunks/chunk0", the three files are "/workdir/chunks/chunk0.raw", "/workdir/chunks/chunk0.log" and "/workdir/chunks/chunk0.csv").

• Calculate the estimated location of the emitter.

• Send a string with three comma delimited items:

• The first item is a floating point number (predicted latitude).

• The second item is a floating point number (predicted longitude).

• The third item is the string "solved" or "unsolved".

• Go to step 1.

Note that the format you should use for latitude and longitude is different from the format used in the input file with the flight path. E.g., N030:30:00.0000 should be returned as 30.5 and W085:20:00.0000 should be returned as -85.333333 (negative numbers for western hemisphere).

For a given test case, the value of femitter (the first item of the received string) will be always the same.

You should submit ZIP file which contains Dockerfile and everything else needed to build the Docker image and create a container out of it. When we start the container, your solution should start execution immediately. These are the steps which will be performed by the testing harness:

• Extract ZIP file.

• Build Docker image based on the submitted Dockerfile.

• Create a container based on the image.

• For each test case, start the container, communicate with it as above, stop the container.

• Destroy the container and the image.

You are not allowed to share any information between different test cases in the container. You can use the folder "/workdir/chunks/" only for the communication with the tester. We may change the flow so that we will create a separate container for each test case.

The container will be created on a Linux virtual machine. Your algorithm can use 4 cores and at most 1GB of RAM.

We provide dockerized examples (link below) of the solution which always returns "unsolved", both for Java and C++. We encourage you to use these examples as a starting point to build your solution. These example solutions are created so that they can be used online as well as locally on your machine without Docker, with the provided tester (link below).

The data for this contest was collected by a real aircraft, using real emitters. There are various kinds of emitters with different signal quality. For some, the frequency peak is clearly visible, in other cases it is more difficult to identify it. We are aware that it is hard to solve the problem for some of the emitters, while there are emitters for which the location can be reasonably estimated in a reasonable time. We encourage you to explore the training set to get familiar with various kinds of signals.

• navData.csv - Flight path data from which the training set test cases can be generated.

• IQ.zip - I/Q data (raw files and log files) from which the training set test cases can be generated.

• emitters.csv - Frequencies of the emitters and ground truth locations used in training set.

• tester_v2.zip - Tool for generating training set test cases and offline testing tool (added visualization mode).

• solutionExample.zip - Dockerized examples of a solution which always returns “unsolved”, both C++ and Java version.
• IQExample.zip - The file “3.raw” converted to human-readable format. It is a sequence of 35,184,000 complex numbers (2199 seconds, 16000 samples per second).

• frequenciesExample.zip - Evolution of the frequency spectrum for the files “3.log” and “3.raw” from the training set (used also in the example video above); it was calculated applying the fast Fourier transform to 1-second long chunks (16000 samples) of I/Q data. It is a 16000 × 2199 matrix. If we denote by IQt the vector of 16000 complex samples sampled at time t seconds from the start, then the t-th column of the matrix is log(abs(fft(IQt))), where fft() is the fast Fourier transform and log(abs()) is applied to each coordinate of the transformed vector.

TEST CASE GENERATION

The data is split into training, provisional and final set. There is no time overlap between these sets. You can use training set for testing your solution locally. During the submission phase, the leaderboard will show your score calculated on the test cases generated from the provisional set. The final ranking (published after the submission phase ends) will be based on the test cases generated from the final set.

When a test case is being generated for a given set (training, provisional or final), the following is performed:

1. We randomly select one line from the list in emitters.csv (for each set, this list is different).

2. We randomly (uniformly) select length of the test case from the interval [15 sec., 500 sec.]

3. If the length of the interval covered by the line selected in Step 1 is at least the length selected in Step 2, we randomly select sub-interval of that length. Otherwise we go to Step 2.

REQUIREMENTS TO WIN A PRIZE

Only your final submission will be considered for final scoring.

Note that not all the requirements will be checked automatically by the testing harness (e. g., we will not check that your container uses only C++ or Java). We may need to check it only for the top 5 solutions. In order to receive a prize, your solution must fulfil the requirements described in “Implementation details” section and you must do the following:

• Achieve a score in the top 5, according to final test results calculated using the final set.

• Your score must be higher than the score of a solution which always return "unsolved".

• Provide a full source code of your solution, and any related assets.

• Java 8/7/6 compatible libraries. Java 9 and newer libraries are NOT supported.

• C++ Visual Studio 2008 or 2015 compliant libraries. SDK/API should avoid STL (Standard Template Library) data types in headers due to potential version mismatch.

• LGPL license dependencies must be clearly listed. GPL licenses require legal approval.

• Within 7 days from the announcement of the contest winners, submit a complete 2-page (minimum) report that: (1) outlines your final algorithm and (2) explains the logic behind and steps of your approach.

If you place in the top 5, but fail to do all of the above, then you will not receive a prize, which will be awarded to the contestant with the next best performance who completed the submission requirements above.

## Reliability Rating and Bonus

For challenges that have a reliability bonus, the bonus depends on the reliability rating at the moment of registration for that project. A participant with no previous projects is considered to have no reliability rating, and therefore gets no bonus. Reliability bonus does not apply to Digital Run winnings. Since reliability rating is based on the past 15 projects, it can only have 15 discrete values.