Challenge Overview

Problem Statement

    

Prize Distribution

              Prize             USD
  1st                           $8,000
  2nd                           $5,000
  3rd                           $3,500
  4th                           $2,000
  5th                           $1,500
Total Prizes                   $20,000

To be eligible for cash prizes, you must score at least 800,000.

If any contestant scores at least 900,000, then the total prize pool increases to $35,000. On top of the main prizes listed above, the extra $15,000 will be split among those in the top 5 who scored above 900,000, in the same proportions as the main prizes.

In addition to the money prizes, the highest scoring contestant will receive a free trip to the 2018 Topcoder Open in Dallas, TX, USA. There is no score threshold to beat to receive this special award.

Contestants should at least score 500,000 to get TCO19 points. Learn more here.

Summary

In this challenge, competitors are tasked with detecting workers on videos captured in industrial settings and tell whether they wear certain kinds of personal protective equipment (PPE). Your task will be to analyze such videos and report how many workers are present, and where, and what kind of PPE they wear or use. Your reported detections will be compared to manually created ground truth data, and the quality of your solution will be judged by a combination of F-score metrics for each type of PPE. See Scoring for details.

Input Files

Video files

The videos are available for download from an AWS bucket. Connection details are shared in the forum of a separate challenge that you need to register in order to access the data. Altoghether we have roughly 2000 minutes of video, these are distributed into training / provisional testing / final testing sets using a 60% / 20% / 20% split. Notes on the videos:

  • They are of variable length, size and image quality.
  • The total size of the videos is large (~18 GB), we recommend to download a subset of it first to get a feel of the task, and download the whole set only if you are seriously considering to work on this task. The /training/sample-data.zip contains 5 videos that you can use for experimenting.
  • Some videos feature real life industrial scenes, others are simulated in which people follow prescribed scripts to demonstrate the presence or usage of certain PPE.
  • Some videos (mostly the simulated ones) contain infrequent camera movements or changes in zoom. Your algorithms must be able to handle these.
  • The video files have descriptive names but you should not rely on the names when processing the files. In final testing the names will be replaced with random strings.

Important note: The videos must not be used for anything else than for the purposes of this challenge. You must not share them further, and must delete them from all your devices when the contest is over.

Annotation files

The location of people and the PPE they wear or use is referred to as 'ground truth' in this document. These data are given in TXT files using a format and content that can be best understood by looking at the Labeling Guide which was used by the people creating the hand made annotations (labels). Please study this guide carefully to fully understand the ground truth data. The following is just a list of the most important points.

  • There is one .txt file for each training video (.mov or .avi), with same name, different extension.
  • TAB-separated fields, giving time-stamped events, one per line.
  • Events are: workers enter or leave the scene, they move, they change the PPE they wear or use.
  • Approximate location of people is given. Note that this is different in the required output of the current challenge, now exact location must be reported, see later.
  • When people enter the scene, the list of PPE they wear is specified. The PPE in the scope of the contest are the following:
    • Protective hard hat. This is either present, or missing, or the annotator could not tell.
    • Protective ear muffs. As above.
    • Protective safety harness. As above.
    • Lanyard. A lanyard is a rope (or pair of ropes) with two hooks at the end that the workers use to fasten themselves securely to firmly fixed objects while climbing or working at a height. This is either missing, or present but not used, or being used, or the annotator could not tell. This image shows a harness (blue bounding box) and lanyard (yellow bounding box) in use.

The ground truth data was created manually and it is of high quality. Nevertheless, as in all real life problems, it may contain errors. Also you may annotate scenes differently than how the annotators did.

Annotation files are available in the AWS bucket as a single .zip file, see /training/annotations.zip. In case you don't want to download the full set, use the /training/sample-data.zip that contains also the annotations of the 5 sample videos.

Output File

Your output must be a single TXT file that contains sections, each section corresponds to one video in the test set (/testing folder of the AWS bucket). A section consists of a line that contains the file name (header line), and several lines that contain TAB-separated data corresponding to time-stamped detections (data lines). The exact format of a section is the following:

<FILE_NAME>
<TIME><TAB>person:<BOX>[<TAB>hat:<BOX>][<TAB>earmuff:<BOX>][<TAB>harness:<BOX>][<TAB>lanyard:<BOX>]
... // more data lines

where

  • [x] means that x is optional.
  • <FILE_NAME> is the name of the video file, e.g. "video 1 2018-03-25.avi" (without quotes, but keeping the spaces, and keeping the file extension).
  • <TIME> is a time stamp in hh:mm:ss or mm:ss format, e.g. 01:05:59 or 12:30.
  • <TAB> is a single TAB (\t) character.
  • <BOX> represents a bounding rectangle given by 4 integer numbers separated by commas. The numbers are the x and y coordinates of the top left corner of the rectangle and its width and height, respectively. All numbers represent percents of the video width and height, these are NOT pixel values. All 4 values must be in the 0..100 range. The top left corner is 0,0, top right is 100,0, bottom right is 100,100.

A data line represents a single person who is present on the video at the given time stamp and optionally wears or uses any of the 4 known PPE types.

  • The person:<BOX> part should always be present in a data line, <BOX> is the bounding rectangle of the visible part of the full body of the person.
  • The hat:<BOX> part is optional, use it only if you detect that the person wears a hard hat. <BOX> is the bounding rectangle of the visible part of the hat.
  • The harness:<BOX> part is optional, use it only if you detect that the person wears a safety harness. <BOX> is the bounding rectangle of the visible part of the harness.
  • The lanyard:<BOX> part is optional, use it only if you detect that the person wears a lanyard attached to the harness. <BOX> is the bounding rectangle of the visible part of the lanyard. Use a single box that covers both ropes and hooks if there are two of them.

The file should contain a data line for each second of the video when you detect that people are present. (Note that this is different from the way the annotation files are formatted, in the annotation files events are added only when something is changed.) Add multiple lines using the same time stamp if there are more than one person present in view at the same time.

Some examples using a hypothetical, 5 second long test video. (For better readability of this document spaces are used instead of TABs, but in your submission you should use TABs.)

One person present, wears a hat and no other PPE, doesn't move.

  test-video-1.mov
  00:00  person:50,30,20,40  hat:60,30,5,3
  00:01  person:50,30,20,40  hat:60,30,5,3
  00:02  person:50,30,20,40  hat:60,30,5,3
  00:03  person:50,30,20,40  hat:60,30,5,3
  00:04  person:50,30,20,40  hat:60,30,5,3

The same as the above, but the worker walks left and leaves the scene at 00:03. Note the missing line for 00:04.

  test-video-1.mov
  00:00  person:50,30,20,40  hat:60,30,5,3
  00:01  person:30,30,20,40  hat:40,30,5,3
  00:02  person:20,30,20,40  hat:30,30,5,3
  00:03  person:10,30,20,40  hat:20,30,5,3

Two workers wearing hats walk from left to right. The one to the right also has earmuffs. They walk off the scene at the end of the video. They exit from view at different times.

  test-video-1.mov
  00:00  person:50,30,20,40    hat:60,30,5,3
  00:00  person:60,30,20,40    hat:70,30,5,3  earmuff:70,35,4,4
  00:01  person:70,30,20,40    hat:80,30,5,3
  00:01  person:80,30,20,40    hat:90,30,5,3  earmuff:80,35,4,4
  00:02  person:90,30,20,40    hat:95,30,5,3

One worker has a harness, and at 00:03 you detect that he also has a lanyard

  test-video-1.mov
  00:00  person:50,30,20,40   harness:50,40,20,20
  00:01  person:50,30,20,40   harness:50,40,20,20
  00:02  person:50,30,20,40   harness:50,40,20,20
  00:03  person:50,30,20,40   harness:50,40,20,20 lanyard:50,30,30,50
  00:04  person:50,30,20,40   harness:50,40,20,20 lanyard:50,30,30,50

Notes and special cases:

  • In the current challenge lanyards are always attached to a harness so if you detect a lanyard then you can be sure that a harness is also present. Whether the person actually uses the lanyard for anchoring (attaching it to a fixed object) has no relevance in the current contest.
  • Don't include detections for people who are completely blocked from view (e.g. somebody goes behind a large object) even if they are not exiting from view at the side of the image.
  • You need to implement some tracking of the person's state so that you can tell whether certain PPE are present even if they are currently not visible. E.g. you detected a person wearing a hat who is climbing up a ladder. At one point of time his head will not be visible, however you should still report that he has a hat, because there is no evidence that this has changed. (You should use an empty BOX for hat location in this case, e.g. "50,0,0,0".)
  • You may create an empty section in your output file for a certain video if you found no workers in it. Simply add the section header line, followed by no data lines, meaning the next line should be a header (that is, file name) of a different video or the end of the file.
  • A sample submission file that scores non-zero on the sample-data set is also available in the /training/sample-data.zip package.
  • The bounding boxes for the 4 equipment types are used for optional visualization purposes and - although must be present in your output - are not used for scoring. The bounding boxes of the detected persons however are important as they are used when matching labeled and detected people.

Constraints

  • A single file must contain detections for ALL videos in the test set.
  • It must have .txt extension. Optionally the file may be zipped, in which case it must have .zip extension, and the .zip must contain only a single .txt file.
  • The file must not be larger than 50MB and must not contain more than 5 million lines.

Functions

This match uses the result submission style, i.e. you will run your solution locally using the provided files as input, and produce a TXT or ZIP file that contains your answer. Although during the online submission phase only your results will be verified, it is important to keep in mind that in the final testing phase you will need to set up a working system that on one hand demonstrates that you have a fully automated solution for solving the problem, and on the other hand meets certain requirements set by the contest stakeholders:

  • The eventual end system of the client will run in real time. This means that processing a video should not take longer than the running time of the video.
  • The end system will NOT contain a GPU, you must make sure that the target performance can be achieved using a CPU-only solution. (Note that it is allowed to use GPU to train your system, this requirement concerns only the system used for inference.)
  • Your solution must process the video sequentially and must not look ahead into future video frames.
  • Your tool will eventually be run in the client's MS Azure environment, you must make sure that this will be possible to do. This requires that you check that none of the libraries that you are using has any known problem in Azure.

In order for your solution to be evaluated by Topcoder's marathon system, you must implement a class named ProtectiveEquipmentDetector, which implements a single function: getAnswerURL(). Your function will return a String corresponding to the URL of your submission file. You may upload your files to a cloud hosting service such as Dropbox or Google Drive, which can provide a direct link to the file.

To create a direct sharing link in Dropbox, right click on the uploaded file and select share. You should be able to copy a link to this specific file which ends with the tag "?dl=0". This URL will point directly to your file if you change this tag to "?dl=1". You can then use this link in your getAnswerURL() function.

If you use Google Drive to share the link, then please use the following format: "https://drive.google.com/uc?export=download&id=" + id

Note that Google has a file size limit of 25MB and can't provide direct links to files larger than this. (For larger files the link opens a warning message saying that automatic virus checking of the file is not done.)

You can use any other way to share your result file, but make sure the link you provide opens the filestream directly, and is available for anyone with the link (not only the file owner), to allow the automated tester to download and evaluate it.

An example of the code you have to submit, using Java:

public class ProtectiveEquipmentDetector {
  public String getAnswerURL() {
    //Replace the returned String with your submission file's URL
    return "https://drive.google.com/uc?export=download&id=XYZ";
  }
}

Keep in mind that your complete code that generates these results will be verified at the end of the contest if you achieve a score in the top 10, as described later in the "Requirements to Win a Prize" section, i.e. participants will be required to provide fully automated executable software to allow for independent verification of the performance of your algorithm and the quality of the output data.

Scoring

A full submission will be processed by the Topcoder Marathon test system, which will download, validate and evaluate your submission file.

Any malformed or inaccessible file, or one that violates any of the constraints listed in the "Output file" section will receive a zero score.

If your submission is valid, your solution will be scored as follows.

  • We calculate true positive, false positive and false negative event counts (TP, FP and FN) for each of the following 5 detection types: detection of people, hats, earmuffs, harnesses and lanyards.
  • These counts are collected separately for each video. In the following description we talk only about one video for simplicity.
  • Events are defined as 1-second long time windows in the video. For each second of the video we take the positions of all people your algorithm detects and also of all people present in the ground truth. We solve an assignment problem to match these two sets of positions, taking the Euclidean distance between points as weights. For any unmatched person in your detections FP_person is increased by one. For any unmatched person in the ground truth FN_person is increased by one. For matched persons TP_person is increased by one.
  • Next we adjust the TP, FP and FN counts for all equipment types. Here the process for hard hats is described, but it is similar for other types as well. (With a small exception for lanyard: in this case the annotations lanyard:used are treated the same way as lanyard:yes.)
    • For all matched persons
      • If the ground truth contains hat:? then we don't do anything.
      • If the ground truth contains hat:yes and you detected the presence of hat for this person then TP_hat is increased by one.
      • If the ground truth contains hat:yes and you did not detect the presence of hat for this person then FN_hat is increased by one.
      • If the ground truth contains hat:no and you detected the presence of hat for this person then FP_hat is increased by one.
    • For all unmatched persons in your detections, if you detected the presence of hat for this person then FP_hat is increased by one.
    • For all unmatched persons in the ground truth, if it contains hat:yes then FN_hat is increased by one.
  • Next the counts are summed up using all the videos in the test set (separately for each count type).
  • Precision, recall and f-score is calculated for each of the 5 detection types. E.g. for hard hats
    precision_hat = TP_hat / (TP_hat + FP_hat)
    recall_hat = TP_hat / (TP_hat + FN_hat)
    f_hat = 2 * precision_hat * recall_hat / (precision_hat + recall_hat)
    
  • We take the average of the 5 f-scores calculated above: f_avg = avg(f_person, f_hat, f_earmuff, f_harness, f_lanyard)
  • Your overall score is calculated as 1,000,000 * f_avg.

See the source code of the offline scorer tool for the detailed algorithm of scoring. For matching the detected people to the set of expected ones we use the Hungarian algorithm, and require that matching points are closer to each other than 50 (measured in screen percentage units).

Example submissions can be used to verify that your chosen approach to upload submissions works. The tester will verify that the returned String contains a valid URL, its content is accessible, i.e. the tester is able to download the file from the returned URL. If your file is valid, it will be evaluated, and detailed scores will be available in the test results. The example evaluation is based on a small subset of the training data containing 5 videos, the video files within the /training/sample-data.zip package is used for this purpose. Though recommended, it is not mandatory to create example submissions. The scores you achieve on example submissions have no effect on your provisional or final ranking. Example submissions can be created using the "Test Examples" button on TopCoder's submission uploader interface.

Full submissions must contain in a single file all the detections that your algorithm made in all test videos of the /testing folder of the contest's AWS bucket. Full submissions can be created using the "Submit" button on TopCoder's submission uploader interface.

Final Scoring

The top 10 competitors after the provisional testing phase will be invited to the final testing round. Within 5 days after the provisional testing phase you are required to submit a dockerized version of your code that we can use to test your system. The technical details of this process are described in a separate document.

Your solution will be subjected to three tests:

First, your solution will be validated, i.e. we will check if it produces the same output file as your last submission, using the same input files used during provisional testing. Note that this means that your solution must not be improved further after the provisional submission phase ends. (We are aware that it is not always possible to reproduce the exact same results. E.g. if you do online training then the difference in the training environments may result in different number of iterations, meaning different models. Also you may have no control over random number generation in certain 3rd party libraries. In any case, the results must be statistically similar, and in case of differences you must have a convincing explanation why the same result can not be reproduced.)

Second, your solution will be tested against a set of new video files. The number and size of these new set of videos will be similar to the one you downloaded as testing data. Also the scene content will be similar.

Third, the resulting output from the steps above will be validated and scored. The final rankings will be based on this score alone.

Competitors who fail to provide their solution as expected will receive a zero score in this final scoring phase, and will not be eligible to win prizes.

Additional Resources

  • An offline scorer is available here that you can use to test your solution locally. It calculates detailed scores based on your output file and a file containing ground truth annotations.

General Notes

  • This match is rated.
  • Teaming is not allowed.
  • In this match you may use any programming language and libraries, including commercial solutions, provided Topcoder is able to run it free of any charge. You may use open source languages and libraries provided they are equally free for your use, use by another competitor, or use by the client. If your solution requires licenses, you must have these licenses and be able to legally install them in a testing VM (see "Requirements to Win a Prize" section). Submissions will be deleted/destroyed after they are confirmed. Topcoder will not purchase licenses to run your code. Prior to submission, please make absolutely sure your submission can be run by Topcoder free of cost, and with all necessary licenses pre-installed in your solution. Topcoder is not required to contact submitters for additional instructions if the code does not run. If we are unable to run your solution due to license problems, including any requirement to download a license, your submission might be rejected. Be sure to contact us right away if you have concerns about this requirement.
  • Your code may use open source libraries as long as they are free to use by the client in a commercial deployment. Apache2, BSD and MIT licenses are permitted. GPL and LGPL are not permitted. All other libraries require permission. To request permission, post your request to the challenge Forum.
  • If your solution includes licensed software (e.g. commercial software, open source software, etc), you must include the full license agreements with your submission. Include your licenses in a folder labeled "Licenses". Within the same folder, include a text file labeled README.txt that explains the purpose of each licensed software package as it is used in your solution.
  • External data sets and pre-trained models are allowed for use in the competition provided the following are satisfied:
    • The external data and pre-trained models are unencumbered with legal restrictions that conflict with its use in the competition.
    • The data source or data used to train the pre-trained models is defined in the submission description.
  • Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about possible solution techniques.

Requirements to win a prize

Final prizes

In order to receive a final prize, you must do all the following:

Achieve a score in the top five of the average score of all cities, according to final system test results. See the "Final scoring" section above.

Once the final scores are posted and winners are announced, the prize winner candidates have 7 days to submit a report outlining their final algorithm explaining the logic behind and steps to its approach. You will receive a template that helps creating your final report.

If you place in a prize winning rank but fail to do any of the above, then you will not receive a prize, and it will be awarded to the contestant with the next best performance who did all of the above.

 

Definition

    
Class:ProtectiveEquipmentDetector
Method:getAnswerURL
Parameters:
Returns:String
Method signature:String getAnswerURL()
(be sure your method is public)
    
 

Examples

0)
    
"1"
Returns: "Test case 1"

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2020, TopCoder, Inc. All rights reserved.