Quartz Energy Mud Log Image Tag Collision

Key Information

Register
Submit
The challenge is finished.

Challenge Overview

We have a couple of projects in process that are producing boundary boxes on a set of images and we need to be able to determine if any of the boxes are overlapping and remove overlapping boundaries boxes if required.  Your job in this challenge write a simple script that does the following: 
  • get the list of images (the distinct list is available in the IMAGE_OCR.NAME field)
  • query for the list of phrases/rectangles within each image
  • see if any of the rectangles overlap with each other.  The coordinates for the bounding boxes/marks are in the IMAGE_OCR_PHRASE.X1, IMAGE_OCR_PHRASE.Y1,IMAGE_OCR_PHRASE.X2,IMAGE_OCR_PHRASE.Y2 fields
  • remove overlapping rectangles -- where there is a clash keep the rectangle with the smaller area.
  • save the kept records to a new database table - IMAGE_OCR_PHRASE_KEEP.  IMAGE_OCR_PHRASE_KEEP should have exactly the same structure as IMAGE_OCR_PHRASE.
  • save the reject records to a new database table - IMAGE_OCR_PHRASE_REJECT.  IMAGE_OCR_PHRASE_REJECT should have exactly the same structure as IMAGE_OCR_PHRASE.
Attached in the Document Forum is a mysql dump file with a set of images and the database structure.  You will have to manual insert a few overlapping boundary boxes into the dataset to validate your output.  

Final Submission Guidelines

  • Python 3 script
  • Deployment instructions and required installation modules (pip install ... ) .
  • The mysqldump file of your output after running your app including the manual inserted boundary boxes, and the structures and data for the IMAGE_OCR_PHRASE_KEEP and IMAGE_OCR_PHRASE_REJECT tables.

ELIGIBLE EVENTS:

2018 Topcoder(R) Open

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30059510