Key Information

The challenge is finished.

Challenge Overview

Problem Statement

    Robot Detection and Tracking of Objects in Industrial Settings


Prize Distribution

The best 3 performers in this contest according to system test results will receive the following prizes:

  1. place: $3000
  2. place: $2000
  3. place: $1000



This problem builds upon the in-progress RobotVisionTracker match, by using a larger dataset that covers a wider variety of potential scenarios, which will provide better overall results. The training data made available for the first contest is included as part of this match, and can and should be used here. Contestants are welcome and encouraged to participate in both matches, and may freely use some or all of the same code for each, as they feel appropriate. (Take caution, however, that as this match adds data not present in the original match, there is no guarantee that the additional training data provided here will be useful for the former.)

Firm A is building a next generation robotics platform that will change the game in field service operations including asset inspection and repair. Firm A has defined a host of high value use cases and applications across industry that will support field engineers and other industrial workers be more productive and, more importantly, perform their jobs safely.

For one example high value use case, the company would like for a robot to detect and track a freight railcar brake release handle, the object of interest (OOI), so that the robot can grasp the handle.

Your task is to develop an algorithm that can detect and track the OOI in video frames. The OOI is typically made of 0.5 inch round steel rod, bent to form a handle. Examples of the varieties of the OOI appear below. The point marked blue is the point to be identified if present in a frame. More details follow.

Your algorithm will receive the following input:

  • Random samples of contiguous frames of videos shot in stereo ("Training Data"). Some frames will contain the OOI, others will not, and the samples will have 10 frames each.
  • The Training Data consists of videos containing frames, all shot in 640x480 resolution with the same camera.
  • Stereo camera calibration was performed with OpenCV. More information on camera calibration can be found here. The left and right camera calibration parameters can be downloaded here and here.
  • If the OOI appears in the sample, then it will be marked in every frame as a point (x,y) according to the following convention, which defines the "ground truth" of the OOI's presence and location. As seen from the convention, the OOI is marked in 3 scenarios:
    • When the OOI is in direct line of sight.
    • When it is occluded by something in front of it.
    • When the brake lever itself is occluding the point.

Please do look at the convention PDF in the link above.

The Training Data can be downloaded here and here.



Your task is to implement training and testing methods, whose signatures are detailed below.

int[] imageDataLeft and int[] imageDataRight contains the unsigned 24 bit image data. The data of each pixel is a single number calculated as 2^16 * Red + 2^8 * Green + Blue, where Red, Green and Blue are 8-bit RGB components of this pixel. The size of the image is 640 by 480 pixels. Let x be the column and y be the row of a pixel, then the pixel value can be found at index [x+y*640] of the imageData array.

Firstly, your training method will be called multiple times. You can use this method to train your algorithm on the supplied training data. If your training method returns the value 1, then no more data will be passed to your algorithm, and the testing phase will begin. Data for each video frame of multiple videos will be sequentially passed to your method. All the video frames available for each training video will be passed to your algorithm. The number of frames for each video may differ. The ground-truth location for the OOI for each frame will also be provided in leftX, leftY, rightX and rightY. A negative value indicates that the OOI was not detected in that frame.

Once all training images have been supplied, doneTraining() will be called. This will signal that your solution should do any further processing based on the full set of training data.

Finally, your testing method will be called 50 times. The first 10 times it will contain contiguous frames from a video. The next 10 times it will be contiguous from a second video. And so on... The array you return should contain exactly 4 values. Returning any point outside the bounds of the image will indicate that you did not detect the OOI in the image. Each element in your return should contain the following information:

  • leftX - estimated x-coordinate for the point in the left image
  • leftY - estimated y-coordinate for the point in the left image
  • rightX - estimated x-coordinate for the point in the right image
  • rightY - estimated y-coordinate for the point in the right image

The videos used for testing as well as the starting frame within the video will be selected randomly, so it is possible to have repetitions or intersection of frames during the testing phase.


Testing and Scoring

There are 1 example tests, 5 provisional tests and at least 10 system tests.

177 videos have been split into three sets: 34, 68 and 75. The first 34 videos are available for download for local testing and example testing.

  • Example tests: 25 (out of the set of 34) videos used for training, 9 for testing.
  • Provisional tests: 34 videos used for training, 68 for testing.
  • System tests: 34 videos used for training, 75 for testing.

Your algorithm's performance will be quantified as follows.

xr, yr: True x and y-coordinates for the OOI in the image (in units of pixels)

xe, ye: Estimated x and y-coordinates for the OOI in the image (in units of pixels)

dr = sqrt( (xe - xr)*(xe - xr) + (ye - yr)*(ye - yr) )

leftR[DIST] = percentage of left image frames whose dr <= DIST pixels

rightR[DIST] = percentage of right image frames whose dr <= DIST pixels

Note: In case of the OOI not being visible in the frame, the detection will be counted as correct if your algorithm correctly detects that the OOI is not in the frame.

T = total CPU processing time for all testing frames in seconds

AccuracyScore = 10000.0 * (50.0*(leftR[10]+rightR[10]) + 35.0*(leftR[20]+rightR[20]) + 15.0*(leftR[50]+rightR[50]))

TimeMultiplier = 1.0 (if T <= 3.33), 1.3536 - 0.2939 * Ln(T) (if 3.33 < T <= 100.0), 0.0 (if T > 100.0)

Score = AccuracyScore * (1.0 + TimeMultiplier)

You can see these scores for example test cases when you make example test submissions. If your solution fails to produce a proper return value, your score for this test case will be 0.

The overall score on a set of test cases is the arithmetic average of scores on single test cases from the set. The match standings displays overall scores on provisional tests for all competitors who have made at least 1 full test submission. The winners are competitors with the highest overall scores on system tests.

An offline tester/visualizer tool is available.


Minimum Score Criteria

To be eligible for a prize, your submission needs to attain a minimum score of 700000 in System Testing.


Special rules and conditions

  • The allowed programming languages are C++, Java, C# and VB.
  • Be sure to see the official rules for details about open source library usage.
  • In order to receive the prize money, you will need to fully document your code and explain your algorithm. If any parameters were obtained from the training data set, you will also need to provide the program used to generate these parameters. There is no restriction on the programming language used to generate these training parameters. Note that all this documentation should not be submitted anywhere during the coding phase. Instead, if you win a prize, a TopCoder representative will contact you directly in order to collect this data.
  • You may use any external (outside of this competition) source of data to train your solution.


Method signature:int doneTraining()
Parameters:int, int, int[], int[]
Method signature:int[] testing(int videoIndex, int frameIndex, int[] imageDataLeft, int[] imageDataRight)
Parameters:int, int, int[], int[], int, int, int, int
Method signature:int training(int videoIndex, int frameIndex, int[] imageDataLeft, int[] imageDataRight, int leftX, int leftY, int rightX, int rightY)
(be sure your methods are public)


-The match forum is located here. Please check it regularly because some important clarifications and/or updates may be posted there. You can click "Watch Forum" if you would like to receive automatic notifications about all posted messages to your email.
-Time limit is 60 minutes per test case for training and 3 minutes for testing and the memory limit is 6144MB.
-There is no explicit code size limit. The implicit source code size limit is around 1 MB (it is not advisable to submit codes of size close to that or larger).
-The compilation time limit is 60 seconds. You can find information about compilers that we use, compilation options and processing server specifications here.
-Please be aware that this match includes some client-specific terms and conditions (which were listed at time of registration). For reference, a full copy of those additional terms can be found here.


Returns: "SEED=1"

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2020, TopCoder, Inc. All rights reserved.