Topcoder Challenge | Topcoder Community

Challenge Overview

The SpaceNet Challenge, round 5

Prize Distribution
Prize                 USD
1st               $20,000
2nd               $10,000
3rd                $7,500
4th                $5,000
5th                $2,500
Top Graduate       $2,500
Top Undergraduate $2,500
Total Prizes      $50,000

Challenge overview

Determining optimal routing paths in near real-time is at the heart of many humanitarian, civil, military, and commercial challenges. This statement is as true today as it was two years ago when the SpaceNet Partners announced SpaceNet Challenge 3 focused on road network detection and routing (Challenge Results). In a disaster response scenario, for example, pre-existing foundational maps are often rendered useless due to debris, flooding, or other obstructions. Satellite or aerial imagery often provides the first large-scale data in such scenarios, rendering such imagery attractive. This challenge seeks to build upon the advances from SpaceNet 3 and by challenging competitors to automatically extract road networks and routing information from satellite imagery, along with travel time estimates along all roadways, thereby permitting true optimal routing.

Your task will be to output a detailed graph structure with edges corresponding to roadways and nodes corresponding to intersections and end points, with estimates for route travel times on all detected edges.

SpaceNet will be open sourcing new data sets for the following cities: Moscow, Russia; Mumbai, India; and San Juan, Puerto Rico. For the first time in SpaceNet history, the final submissions will be tested on a mystery city data set that be revealed and open sourced at the end of the Challenge!

In building off of the results from SpaceNet 3, this challenge will use a modified version of the Average Path Length Similarity (APLS) metric that is tuned to optimize travel times between nodes of interest. CosmiQ Works’ blog, The DownLinQ, provides additional information including an APLS metric overview and detailed description of SpaceNet 5 motivation and structure.

This problem specification largely overlaps with that of the specification of SpaceNet Challenge 3, however, there are important differences that should be carefully noted by contestants who participated in SpaceNet 3. The most important difference is that now you should report estimated travel times for the extracted route network, and scoring will be based on time instead of edge lengths.
Important note: Scoring is not enabled at contest launch time. You can access the training and testing data, and all contest resources like a visualizer tool, but can't make submissions yet. Please watch the contest forum for updates.

Input files

Satellite images

Four types of images are available for the target areas. (Note: these image types were named differently in Spacenet 3)

PAN: panchromatic (single channel, 16-bit grayscale, ~30 cm resolution)
MS: 8-band multi-channel (8*16-bit, ~1.2m resolution).
PS-RGB: pan-sharpened version of Red-Green-Blue bands from the multispectral product, with dynamic range adjustment (DRA) applied (3 channels, 3*8-bit, ~30 cm resolution). This is formed by using the PAN image to interpolate 3 bands of the MS dataset to increase the resolution of the Red, Green and Blue bands.
PS-MS: pan-sharpened version of MS (8 channels, 8*16 bit, ~30 cm resolution)

The images were collected by the DigitalGlobe Worldview-3 satellite over Moscow, Mumbai, San Juan and a fourth (undisclosed) city. The training data contains more than 2300 image chips, each image covers approximately 400m x 400m on the ground. Worldview-3 is sensitive to light in a wide range of wavelengths, see the specification of the frequency bands here. This extended range of spectral bands allows Worldview-3 imagery to be used to classify the material that is being imaged.

Images are provided in GeoTiff format. Since most image viewer applications are not able to display 16-bit images and 8-band images, we provide a visualizer tool that you can use to look at the data. See the Visualizer description for details.

Road networks

The location, shape and estimated travel time of known roads are referred to as ‘ground truth’ in this document. These data are described in CSV files using the following format:

ImageId,WKT_Pix,length_m,travel_time_s

img1,"LINESTRING (1159.4 1006.1, 940 1011.9, 661.7 1002, 365.8 980.8)",41.492,3.094

img1,"LINESTRING (661.7 1002, 670.5 1058.5)",96.208,7.174

img999,LINESTRING EMPTY,0,0

ImageId is a (case sensitive) string that uniquely identifies the image.
WKT_Pix specifies the points and edges that represents a road network in Well Known Text format. Only the LINESTRING object type of the WKT standard is supported. The coordinate values represent pixels, the origin of the coordinate system is at the top left corner of the image, the first coordinate is the x value (positive is right), the second coordinate is the y value (positive is down). The coordinates refer to the PAN images. Note that because of the different resolutions, the coordinates of the same object on the MUL images are different.
length_m specifies the length of this road segment in meters. This value is ignored if WKT_Pix is LINESTRING EMPTY.
travel_time_s specifies the estimated travel time for this road segment in seconds. This value is ignored if WKT_Pix is LINESTRING EMPTY.

Notes on road network definition

A LineString is a sequence of points given as {X Y} tuples. Each LineString contains at least 2 points with the exception of the special LINESTRING EMPTY construct, which is used for images showing no roads at all. Each consecutive pair of points in a LineString represents a straight road section between two points.
The points that represent road junctions are explicitly listed as elements of a LineString, and they occur more than once either in multiple LineStrings and/or in the same LineString. An example is the point (661.7 1002) in the sample above, which is present in both LineStrings for img1. If a LineString self-crosses or two LineString cross each other without the location of the crossing being explicitly listed as a point then this does NOT define a junction, this represents a situation of one of the roads running above the other one.
In this challenge all roads are undirected.
There are multiple ways of describing the same road network. For example this network
A---B---C
| | |
D---E---F
can be described with a single LineString (showing node labels instead of coordinates for better readability):
LINESTRING (B, C, F, E, B, A, D, E)
or equivalently by a set of two linestrings:
LINESTRING (A, B, E, D, A)
LINESTRING (B, C, F, E)
or in several other ways.

Notes on data

Many tiles contain regions from where no image and no road network data is available. Such regions are shown in black in the visualizer tool. There are even tiles where the ratio of such black regions is close to 100%. Your algorithm should handle such cases of missing data.
The ground truth data was created manually and it is of high quality. Nevertheless, as in all real life problems, it may contain errors. Also you may annotate roads differently than how the annotators did.

Downloads

Input files are available for download from the spacenet-dataset AWS bucket. A separate guide is available that details the process of obtaining the data. See also this page for description of the Spacenet AWS data and download instructions.

Note that the same bucket holds data for the previous SpaceNet challenges as well, you need only a subset of the bucket content now: download only the files in the spacenet/SN5_roads/tarballs directory (~69GB compressed).

Here

SN5_roads_train_AOI_<n>_<city>.tar.gz is the training set that belongs to a given city. It contains imagery in the 4 formats described above and also contains the road networks in a CSV file, see train_AOI_<n>_<city>_geojson_roads_speed_wkt_weighted.csv.
SN5_roads_test_public_AOI_<n>_<city>.tar.gz is the testing set that belongs to a given city. It contains imagery but does not contain the road network.

Note that individual files or smaller subsets are available from the spacenet/SN5_roads/train/ or spacenet/SN5_roads/test_public/ folders. Use these if you want to get familiar with the data without having to download the full sets.

Notes on image ids and image names.

The format of an imageId is
AOI_<n>_<city>_chip<i>
The <n>_<city> part can take one of the following values: 7_Moscow, 8_Mumbai, 9_San_Juan, 10_XXX. (XXX will be replaced later by a real city name.)
<i> is a 0-based integer that is unique for an image within a set that belongs to a city.
All imageIds are case sensitive.

The format of an image name is
<prefix>_AOI_<n>_<city>_<type>_chip<i>.tif
The <n>_<city> and <i> parts are as defined above for image IDs.
<type> is one of: MS, PAN, PS-MS, PS-RGB. Images of different type will be organized into folders having the same name as the 4 values listed here.
<prefix> is a string that is the same for images belonging to the same test set, an example is SN5_roads_test_public. Note that this prefix is NOT part of the image ID.

Output file

Your output must be a CSV file with identical format to the road network definition files described previously.

ImageId,WKT_Pix,length_m,travel_time_s

Your output file may or may not include the above header line. The rest of the lines should specify the road networks your algorithm extracted.

The required fields are:

ImageId is a (case sensitive) string that uniquely identifies the image.
WKT_Pix specifies the points and edges that represent the road network you found. The format is exactly the same as given above in the Input files section. Important to know that the coordinates must be given in the scale of the PAN images. So if you find a road junction at (40, 20) on the PAN image and at (10, 5) on the corresponding MS image then your output file should have a (40 20) coordinate pair listed in one or more of the shape definition LineStrings.
The length_m value is not used in scoring, it is given only so that the input and output files have the same format. You can use 0 or any other value.
travel_time_s specifies the estimated travel time for this road segment in seconds.

Constraints

A single file must contain road network definitions for ALL images in the test set.
The file may (and typically does) contain multiple lines for the same ImageId.
If you found no roads on an image then you must use the LINESTRING EMPTY construct. In this case your file must not contain other lines for the same ImageId. The values for length_m and travel_time_s are ignored in this case.
The same road section (defined as consecutive points in a LineString) must not be listed more than once for the same image. I.e. points P1 and P2 must appear next to each other (in any order) in the LineStrings only once for the same image.
All road sections must have non-zero length, i.e. consecutive points in the LineStrings must be different.

Submission format

This match uses a combination of the "submit data" and "submit code" submission styles. The required format of the submission package is specified in a submission template document. This current document gives only requirements that are either additional or override the requirements listed in the template.

You must not submit more often than 3 times a day. The submission platform does not enforce this limitation, it is your responsibility to be compliant to this limitation. Not observing this rule may lead to disqualification.
An exception from the above rule: if your submissions scores 0, then you may make a new submission after a delay of 1 hour.
The /solution folder of the submission package must contain the solution.csv file, which should be formatted as specified above in the Output file section and must list all extracted roads from all images in the test set.

Scoring

During scoring your solution.csv file (as contained in your submission file during provisional testing, or generated by your docker container during final testing) will be matched against expected ground truth data using the following algorithm.

If your solution is invalid (e.g. if the tester tool can't successfully parse its content), you will receive a score of 0.

If your submission is valid, your solution will be scored using the APLS algorithm. The main steps of the algorithm are described below, but there are many finer details that are not given here. For a definitive and detailed algorithm see the source code of the visualizer tool or the APLS github repository.

For each test tile a tile-level route difference score is calculated as follows.

Let T be the ground truth graph representing the road network on the tile, and P be your predicted graph for the same tile. These graphs are created from the LineString-based representations by treating each point present in the LineString as a graph node and treating each line segment (consecutive pair of points in the LineString) as an edge between the two nodes. Note that these graphs differ from the graph concept used in mathematics because a) the nodes represent physical locations and b) the edges represent more than the presence of a connection between nodes: they have shape (a sequence of straight line segments).
If both graphs are empty (contain no edges) then the score is 1 (the best possible tile-level score). If only one of the graphs is empty then the score is 0 (the worst possible tile-level score). Otherwise all the following steps are executed.
Both graphs are simplified and smoothed.
1. Simplification means that nodes with 2 neighbours are removed. If node B is connected only to nodes A and C then B is removed and a new edge between A and C is created in a way that the shape of the route between A and C remains the same as it was before this step. After this step is repeated for all 2-connected nodes, only road junctions and road endpoints will remain as nodes, but the original shape of the network (the set of line segments) is kept unchanged.
  (A special case is a cycle: a subgraph which contains only 2-connected nodes and no other connections. Here we keep all of the nodes even if they are 2-connected.)
2. Smoothing means that extra nodes are inserted into the graph to make sure that no edges are longer than 50 meters. For example if the length of an edge between A and B is 120 meters (this is not the Euclidean distance, this is measured along the path between the two end points, in the most extreme case A and B may be the same), and we don’t want to allow edges longer than 50 meters, then 2 extra nodes will be inserted: at 40 and 80 meters from node A.
The APLS score between two graphs G1 and G2 is calculated as follows.
1. Create a copy of graph G2, call this G2’. Inject all nodes of G1 into G2’. This means that for a node n in G1 we find the closest point in G2’ and create a node n’ there. (Note that such a n’ may not correspond to an existing node in G2’, it may be created on an edge.) The Euclidean distance between n and n’ must be smaller than 4 meters, otherwise no n’ will be injected for this given n.
2. Aggregate a total path difference score by listing all valid routes (p1->p2) in G1 and checking what happens with corresponding routes in G2' (p1'->p2'). A route from p1 to p2 is valid if there exists a path between them in G1. (A path can be a direct connection between the two nodes or a longer sequence of edges that start from p1 and ends in p2.)
  1. If there is no matching p1' in G2’ then add a difference score of 1 for all valid routes there are in G1 starting from p1.
  2. If there is a matching p1' but no matching p2' then also add a difference score of 1.
  3. If there are matching p1' and p2' but there is no route between them in G2' then also add a difference score of 1.
  4. If there are matching p1' and p2' and they are connected in G2' then add a difference score calculated as
    min(1, abs(t - t’) / t),
    where t and t’ are the travel times along the paths connecting the two points in G1 and G2’, respectively. Note that this is the single step in scoring that is different from the one used in Spacenet 3. Previously in this step the path lengths were used instead of the travel times.
3. Let avgDiff be the total difference score divided by the number of valid (p1->p2) routes in G1.
4. The APLS score between G1 and G2 is 1 - avgDiff.
Calculate the APLS score between T and P.
Repeat the APLS score calculation in the other direction: from P to T.
The tile-level score is taken as the harmonic mean of the two scores calculated in the two previous steps.

Finally tile-level scores are averaged for each city, then globalAvgAPLS is taken as the weighted average of city level scores as follows:

For provisional scoring (online leaderboard during the submission phase): 20% from each of the training cities and 60% from San Juan.
For final scoring: 60% from the unknown city, 20% from San Juan, and 10% each from the training cities.

Your overall score is calculated as 100 * globalAvgAPLS.

Notes on scoring

All length measurements are done in pixel space, using a constant pixel size of 0.31 meters. For example the maximum edge length of 50 meters means a 50/0.31 (~161.3) pixel distance.
The scorer treats points closer to each other than 0.1 pixels as equal. It is not recommended to place points close to each other, the connectivity of the graph may change in a nondeterministic way in this case.

Final testing

This details of the final testing work flow and the requirements against the /code folder of your submission are also specified in the submission template document. This current document gives only requirements or pieces of information that are either additional or override those given in the template. You may ignore this section until you decide you start to prepare your system for final testing.

The signature of the train script is slightly different than given in the template, as it takes multiple folders as parameters:
train.sh [<data_folder>]...
The supplied <data_folder> parameters point to folders having raw Spacenet satellite data in the same structure as is available for you during the coding phase. The number of such folders will be between 1 and 4, inclusive.
The allowed time limit for the train.sh script is 8 GPU-days (2 days on a p3.8xlarge) [these values are subject to slight change in the early part of the competition]. You may assume that each data folder path will be under /data.
A sample call to your training script (single line) follows. Note that folder names are for example only, you should not assume that the exact same folders will be used in testing.
./train.sh /data/train/AOI_7_Moscow_Train /data/train/AOI_8_Mumbai_Train
In this case you can assume that the training data looks like this:
data/
train/
AOI_7_Moscow_Train/
geojson/
MS/
...
AOI_8_Mumbai_Train/
...
The signature of the test script:
test.sh [<data_folder>]... <output_file>
The testing data folders contain similar data in the same structure as is available for you during the coding phase. The number of such folders will be between 1 and 4, inclusive. You may assume that each data folder path will be under /data.
The allowed time limit for the test.sh script is 12 GPU-hours (3 hours on a p3.8xlarge) [these values are subject to slight change in the early part of the competition] when executed on the full provisional test set (the same one you used for submissions during the contest).
A sample call to your training script (single line) follows. Again, folder names are for example only, you should not assume that the exact same folders will be used in testing.
./test.sh /data/test/AOI_7_Moscow_Test_Public /data/test/AOI_10_XXX_Test_Public /wdata/my_output.csv

To speed up the final testing process the contest admins may decide not to build and run the dockerized version of each contestant's submission. It is guaranteed however that if there are N main prizes then at least the top 2*N ranked submissions (based on the provisional leader board at the end of the submission phase) will be final tested.
Hardware specification. Your docker image will be built, test.sh and train.sh scripts will be run on a p3.8xlarge Linux AWS instance. Please see here for the details of this instance type.

Additional Resources

A visualizer is available that you can use to test your solution locally. It displays satellite images, your extracted road networks and the expected ground truth. It also calculates detailed scores so it serves as an offline tester.
Further details on the scoring metric and also a Python implementation can be found here.

General Notes

This match is NOT rated.
Teaming is allowed. Topcoder members are permitted to form teams for this competition. After forming a team, Topcoder members of the same team are permitted to collaborate with other members of their team. To form a team, a Topcoder member may recruit other Topcoder members, and register the team by completing this Topcoder Teaming Form. Each team must declare a Captain. All participants in a team must be registered Topcoder members in good standing. All participants in a team must individually register for this Competition and accept its Terms and Conditions prior to joining the team. Team Captains must apportion prize distribution percentages for each teammate on the Teaming Form. The sum of all prize portions must equal 100%. The minimum permitted size of a team is 1 member, the maximum permitted team size is 5 members. Only team Captains may submit a solution to the Competition. Topcoder members participating in a team will not receive a rating for this Competition. Notwithstanding Topcoder rules and conditions to the contrary, solutions submitted by any Topcoder member who is a member of a team on this challenge but is not the Captain of the team are not permitted, are ineligible for award, may be deleted, and may be grounds for dismissal of the entire team from the challenge. The deadline for forming teams is 11:59pm ET on the 21th day following the date that Registration & Submission opens as shown on the Challenge Details page. Topcoder will prepare a Teaming Agreement for each team that has completed the Topcoder Teaming Form, and distribute it to each member of the team. Teaming Agreements must be electronically signed by each team member to be considered valid. All Teaming Agreements are void, unless electronically signed by all team members by 11:59pm ET of the 28th day following the the date that Registration & Submission opens as shown on the Challenge Details page. Any Teaming Agreement received after this period is void. Teaming Agreements may not be changed in any way after signature.
The registered teams will be listed in the contest forum thread titled “Registered Teams”.
Organizations such as companies may compete as one competitor if they are registered as a team and follow all Topcoder rules.
Relinquish - Topcoder is allowing registered competitors or teams to "relinquish". Relinquishing means the member will compete, and we will score their solution, but they will not be eligible for a prize. Once a person or team relinquishes, we post their name to a forum thread labeled "Relinquished Competitors". Relinquishers must submit their implementation code and methods to maintain leaderboard status.
In this match you may use any programming language and libraries, including commercial solutions, provided Topcoder is able to run it free of any charge. You may also use open source languages and libraries, with the restrictions listed in the next section below. If your solution requires licenses, you must have these licenses and be able to legally install them in a testing VM (see “Requirements to Win a Prize” section). Submissions will be deleted/destroyed after they are confirmed. Topcoder will not purchase licenses to run your code. Prior to submission, please make absolutely sure your submission can be run by Topcoder free of cost, and with all necessary licenses pre-installed in your solution. Topcoder is not required to contact submitters for additional instructions if the code does not run. If we are unable to run your solution due to license problems, including any requirement to download a license, your submission might be rejected. Be sure to contact us right away if you have concerns about this requirement.
You may use open source languages and libraries provided they are equally free for your use, use by another competitor, or use by the client.
If your solution includes licensed software (e.g. commercial software, open source software, etc), you must include the full license agreements with your submission. Include your licenses in a folder labeled “Licenses”. Within the same folder, include a text file labeled “README” that explains the purpose of each licensed software package as it is used in your solution.
External data sets and pre-trained networks are allowed for use in the competition provided the following are satisfied:
- The external data and pre-trained network dataset are unencumbered with legal restrictions that conflict with its use in the competition.
- The data source or data used to train the pre-trained network is defined in the submission description.
- The external data source must be declared in the competition forum in the first 45 days of the competition to be eligible in a final solution. References and instructions on how to obtain are valid declarations (for instance in the case of license restrictions). If you want to use a certain external data source, post a question in the forum thread titled “Requested Data Sources”. Contest stakeholders will verify the request and if the use of the data source is approved then it will be listed in the forum thread titled “Approved Data Sources”.
- Using OpenStreetMap or similar, street level data sources that explicitly describe the targeted test cities is not allowed.
Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about possible solution techniques.

Award details and requirements to Win a Prize

Final prizes

In order to receive a final prize, you must do all the following:

Achieve a score in the top five of the average score of all cities, according to final system test results. See the "Final scoring" section above.

Once the final scores are posted and winners are announced, the prize winner candidates have 7 days to submit a report outlining their final algorithm explaining the logic behind and steps to its approach. You will receive a template that helps creating your final report.

If you place in a prize winning rank but fail to do any of the above, then you will not receive a prize, and it will be awarded to the contestant with the next best performance who did all of the above.

Top undergraduate / top graduate award

The highest scoring undergraduate university student (someone actively pursuing a Bachelor’s degree), or team of such students is awarded $2,500. The same applies to the highest scoring graduate student (someone pursuing a Master’s or doctoral degree) or team of such students.

Both prizes are based on the rankings after the final tests. The top undergraduate / graduate prize and the final prizes are not exclusive, the same contestant / team may win a final prize and also one of these two awards. For teams to be eligible for one of these awards all team members must be eligible for the same award.

Eligibility

To be eligible to win the Graduate and Undergraduate prizes, individuals must provide proof of enrollment in an accredited degree program prior to the end of the challenge.

Employees of In-Q-Tel, Maxar Technologies (DigitalGlobe, Radiant Solutions, SSL, and MDA), Amazon Web Services (AWS), Intel, TopCoder, and Capella Space are not allowed to participate in the contest.
-

SpaceNet 5 Challenge

Challenge Overview

Learn

Challenge links

Toolbox

ID: 30099956