• Compete
  • Learn
  • Community
ico-magnifying_glass

    Quartz Energy Mud Log Image Tagging Challenge

    PRIZES

    1st

    $2,000

    Register
    Submit
    The challenge is finished.
    Show Deadlines icon-arrow-up

    Challenge Overview

    According to Wikipedia, “Mud logging is the creation of a detailed record (well log) of a borehole by examining the cuttings of rock brought to the surface by the circulating drilling medium (most commonly drilling mud).”  Quartz Energy has provided Topcoder with a set of mud logs and we’re planning to develop an application to extract meaning from these records.  The documents are very interesting -- they are even oil-well shaped!  You can read more details about them here. We have a set of 600 mud log images that we need to mark in order to validate the efficacy of our OCR Extraction processes.    

    Here is what we needs to be done.

    1. Download Mud Log Image Tagging application from here.

    2. Follow the README.md instruction to deploy the tagging application locally.  You will need to install a local MySQL 5.7.x database.  The commands to create the database are listed in the README.md

    3. Here is a list of images to mark.  Please indicate no more than 20 images at a time which you plan to mark by putting your Topcoder Handle into the “Assigned Member” column of the Mud Log document.  After you’ve completed marking the 20 images and submitted results to us you may proceed to the next 20. The images can be found in the following S3 folder:  https://s3.amazonaws.com/mud-log-images/

    4. Review the image that you are planning to mark.  We are looking for certain phrases which indicate that hydrocarbons might be present in a drilling operation.  There are four types of phrases that we’re looking for:  Show, Stain, Trace, and Negative.  Here is a document which outlines the exact phrases to find.  Some documents have 100’s of phrases others have none.

    5. Using the tool you should put a bounding box around each phrase that you identify.  Double-clicking on the bounding box will bring up a dialog box:

     
    1. You should update the OCR Phrase Type to the appropriate type and enter the OCR Phrase.  Then click “Apply.  Your entry will be recorded in the database.

    2. The simplest way to send us an export of your database is to use the MySQL’s mysqldump command.  Use this from the command line not from inside a mysql prompt.  If you could attach your handle name to the mysqldump output file that would help us keep the files straight:

    $mysqldump -uroot -p --databases image_tagging >image_tagging_<your_tc_handle>.sql

    Additional Information

    • We’ll pay $.10 (ten cents) for each accepted phrase found.  It is estimated that there are about 15,000-20,000 phrases in this data set.  There is really no first prize for this contest -- this challenge will continue until our images have been processed.  You'll be paid if you mark one image or 100 images.  Some of the files have hundreds of phrases and some files don't have any.  

    • Don’t mark phrases that break across lines.  For example, if the phrase, “No Stn” occurs across two lines (“No” on the first line and “Stn” on the second line”) just mark the second word of the phrase “Stn” which is a keyword on its own.

    • The phrases are case insensitive.  Either case is fine.  (e.g., “SHOW” is the same as “show” or “Show”.)

    • If you can’t see a phrase clearly because the text is blurry don’t mark it.

    • Please make the bounding boxes as tight as possible to the words without obscuring them.

    • Please try to mark all the visible phrases on a particular image.  If there are no phrases found in a particular document please just mark a zero in the column under phrase count.  I've also added a column where comments about the quality of the file can be added if the file quality is extremely poor.
    • We have already identified phrases some of the images.  Here is a document which lists a number of phrases in the images which have already been identified.  It’s only for 100 of the 600 images but you can use this resource to find phrases in that subset of the documents.   The phrases are identified by general depth (which is more approximate) rather than by X, Y coordinate -- our new system.  Warning...the phrase listings have changed slightly -- we’ve added phrases like “No Cut or Flor” to the Negative phrasing list.

    Final Submission Guidelines

    1.  Update the Mud Log Image List Google Sheet with the images you are planning to mark.
    2.  Once you've marked a set of images, load your mysqldump output file to dropdox or google drive and add the link to the folder.
    3.  Please update the phrase count column and file quality notes in the Google Sheet of the files you have processed.  This will help us make payments more easily.
    4.  Challenge administration will validate that you've highlighted legitimate phrases in the documents.
    5.  If you mark some images submit a blank text file to the code challenge in your submission.zip for this challenge.  This also makes to payments more straightforward. 

    Reliability Rating and Bonus

    For challenges that have a reliability bonus, the bonus depends on the reliability rating at the moment of registration for that project. A participant with no previous projects is considered to have no reliability rating, and therefore gets no bonus. Reliability bonus does not apply to Digital Run winnings. Since reliability rating is based on the past 15 projects, it can only have 15 discrete values.
    Read more.

    ELIGIBLE EVENTS:

    2018 TopCoder(R) Open

    REVIEW STYLE:

    Final Review:

    Community Review Board

    ?

    Approval:

    User Sign-Off

    ?

    CHALLENGE LINKS:

    Review Scorecard