Challenge Overview

Your task in this challenge is to analyze videos featuring garbage collecting operations and to create a tool that will help the client to determine the level of hazard. Your tool must be able to determine whether the people in the image are in danger of being hit by the truck or by its bin lifter.

More specifically, we need a tool that:

Takes a video as input.
Creates a text file (events.txt) as output that contains timestamped events of the following kind:
- Change in truck direction. The direction can be one of {forward, backward, stopped}.
- Change in the bin lifter's moving state. The moving state can be one of {yes, no}.
- Change in the presence of people in the video and the location of people. For each person present, you have to report their approximate x/y coordinates and their distance measured from the back of the truck. Coordinates are measured in percents of image width and height, so (0,0) is the top left corner of the video, (100,0) is the top right, (100,100) is the bottom right corner. One person's location is represented as a single x/y pair that corresponds to the center of the person's body. A special location identifier is "off", use this when the person is exiting from view. Distance is measured in meters. Two more {yes,no}-valued parameters that are needed: whether the person is an employee of the garbage collecting company (as opposed to a passers-by), and whether the person stands close (within 0.5 meter) to the left or right side at the back of the truck. (This latter is important only when the bin lifter is moving.)
Creates another text file (hazard.txt) as output that contains the hazard level calculated for each second in the video. See later for definition of hazard level and the required format of this file.

In events.txt you have to report all events when the truck movement direction or bin lifter's state parameters change. In case of people's state you have to report an event when a person enters or leaves the viewing area or when his location changes significantly compared to a previous report. There is no exact definition of 'significantly', but keep in mind that we want to track the movement of people in the video, so reports must allow to do that even if there are multiple people present. Reports need not be more frequent than once every second, and no new report is needed if a person stays more or less at the same place.

Data files

Very Important: Participants are supposed to delete the Video contents from their personal device after the challenge is over and this data is not allowed to be shared with public as well as with any other person or systems.

Raw video files and corresponding annotations are available from the challenge forum after registration. Some videos feature normal daily operations of trucks, others show simulated scenes where people follow prescribed movement patterns. Also there are unannotated videos available that you may use in whatever way you see fit.

Please study the annotation format carefully, it's expected that your tool creates the events.txt output file in the exact same format:

A TAB-separated text file.
First element is time in seconds from the beginning of the video.
The rest of the elements are either:
- direction: {forward,backward,stopped}
- bin moving: {yes,no}
- employee: {yes,no} <TAB> location: {x,y or off} <TAB> distance: {z} <TAB> by-side: {yes,no}

Hazard level

Based on these low level parameters described above a hazard level will be calculated. The function that maps these parameters to hazard level is shown below as a decision tree. (The "CLOSE' value of the 'distance' parameter means 0.5m or smaller. The "FAR" value means 3m or more.)

We convert the HIGH, LOW, etc values to numeric values: NO=0, LOW=0.25, MEDIUM=0.5, HIGH=1.0. Then for each pair of branches that differ only in the output value and the continuous 'distance' parameter we use linear interpolation between the end points based on distance. As an example, this pair of branches

|-----distance: FAR => MEDIUM

\-----distance: CLOSE => HIGH

will be converted as:

h(d) = 0.5, if d > 3,

h(d) = 1.0, if d < 0.5,

else h(d) = 1.0 - 0.5*(d-0.5)/(3-0.5),

where d is the distance, h(d) is the hazard likelihood.

If there are multiple people present then the hazard value is calculated for all of them and the highest value is taken.

Note that the x,y values of the person's location are not used in the hazard calculation, they are used only to help tracking multiple people in the video.

The hazard.txt file must contain lines formatted as

where <time> is time in seconds from the beginning of the video, <hazard> is the calculated hazard level, in the [0...1] range. The file must contain at least one line for each second of the video, but it may contain more data for periods when the hazard level changes rapidly.

Evaluation

Your submitted tools will be evaluated by running them on new videos. The hazard levels reported by your tool will be compared against hazard levels calculated from hand made annotations.

Final Submission Guidelines

The submission package should include the following items.

A document in PDF / Word format to describe your approach.
- It should be minimum of 2 pages.
- It should be written in English. You’re not being judged on your facility with English. We won’t deduct points for spelling mistakes or grammatical issues.
- Leveraging charts, diagrams or tables to explain your ideas is encouraged to complement your proposal.
The output (2 txt files) your tool generates on the following video: ch02_20180613065827_20180613071701-part2.mp4
Your implementation and deployment guide.
- We need a step-by-step instruction about how to run your code, including description of all dependencies.
- Your code may be implemented in any programming language, however the stakeholders of this contest have a strong preference to Python or R.
- It should be possible to run your code on a Windows or Mac machine.
Your source code and build scripts.
All dependent 3rd party libraries and tools. (Alternatively pointers to them if their size is large.)
If your solution includes prebuilt models (like neural network weights) then make sure we can run your tool without having to run training first. So either include or link to your hosted model files.
Code requirements.
- Your code may use open source libraries as long as they are free to use for the client.
- The eventual end system of the client will run in real time. This means that processing a video should not take longer than the running time of the video.
- Your solution must process the video sequentially and must not look ahead into future video frames.

Titan Eye - Proximity Hazard Detection Code Sprint

Key Information

Challenge Overview

Data files

Hazard level

Evaluation

Final Submission Guidelines

LEARN:

ELIGIBLE EVENTS:

REVIEW STYLE:

Final Review:

Approval:

CHALLENGE LINKS:

TOOLBOX:

SHARE:

ID: 30067224