Topcoder Challenge | Topcoder Community

Challenge Overview

What are we doing here?
Our client needs to analyze and process a set of LAS files on a daily and weekly basis.

What is a LAS File?
It is dataset file with multiple sections. It is a fixed width text file. It contains header data about the well in question, the operator, and logging company and many other attributes as well as information about the types of logging information that are recorded in the file. There is also instrumentation data section of the document which lists depth-registered instrumentation data.

Much better and detailed explanation here -
http://www.cwls.org/las/

Where do I need to look in the LAS file for this project?
As part of this project we will be looking into 3 parts of the LAS file.
Sections
Attributes
Meta Data & Value

To explain a bit more here are some more details -

Section
Anything starting with “~” is a Section Name here and the subsequent data is part of that section.

Attribute Name
It stays on the extreme left side of each and every field.
e.g. in the below screenshot LNAM is a Attribute

Metadata & Value
For each TAG we have the corresponding data on the right hand side with Metadata name
e.g in the below screenshot
Metadata: NAME
Value: AIT/HILT/BHC

Relevant attributes as part of this project -
There are two sets of attributes we are interested in predicting using machine learning

Standard attributes (typically found in a standard LAS file from a vendor)

SRVC (service company)
SVCO (also service company)

Non-standard attributes (attributes we would like to define to make searching and querying our LAS database easier for the end users.)

LNAM – Log name
LACT – Log activity
DSRC – Digit source
PLVL – Processing Level
FTOL – Full Toolstring
CASE – Casedhole Flag
GTOL – Generic Tool String

The existing non-standard attributes have been populated in and exported from an existing internal database. The purpose of adding them to the LAS files was to simplify population of the metadata in a new database. If this information is populated in an LAS file, it should be considered part of the training set.

The attributes to be predicted will most likely be determined by associations of attributes in the various sections, especially the ~C section.

Which Attributes are part of this challenge?
SRVC and LNAM

What is the SRVC attribute?
The SRVC attribute value is typically the company that collected the data and generated the LAS data file. Many of the larger service companies use unique curve attributes in their files which can be predictive of both the vendor and the tool that was used to collect the data. In other cases the curves may not be predictive but other attributes in the header information might point to certain vendors working in a certain area during a certain time period.

Things to consider in the SRVC attribute

SRVC may have the following

It can be missing in the LAS file.
It can have a different value e.g. just a number (when a number it is a defined number from a standards group)
It can have spelling mistakes

How will you know if the SRVC mentioned is the right SRVC or not?

There are a few steps to the process.

Spelling Mistakes & Other values
Here is a set of Loggin Contractor Aliases to the companies. Just remember this is not a finite list, it can differ much more. The algorithm should be able to identify based on the data provided in the file.  In this way you should be able to make a clear judgment on the right SRVC.

Non-standard attributes
The non-standard attributes are not typically found in LAS files. These attributes have been created and populated in an existing database using traditional logic statements in order to mass populate them. These generated attributes are then inserted into an LAS file export in order to load the attributes into a new database. These additional attributes will be used to simplify querying the new database using advanced search capabilities. We would like to use a machine learning algorithm to predict these same attributes as new data is delivered into the database and improve existing metadata within the database.

LNAM – “Log Name” is created by determining the combination of tools used to collect the data. Tools are currently determined by knowing the SVCO and certain combinations of mnemonics found in the ~C Curve Information section. Reference tables are available for some vendors.

What are we predicting for LNAM and why?
The attribute LNAM is a non-standard field in an LAS file. But for downstream processing the client needs the attribute present in every file. The current client system is not able to provide the same with good accuracy. If the LNAM value is already present in the file you can ignore it. The values in the training file should override these values. The client is hoping to automate the process of generating the LNAM's for each file to replace the current manual process. Ultimately this application will be processing thousands of files.

How to predict LNAM?
The LNAM data is dependent on the Curves data and the attribute SRVC. Each logging vendors uses a different set LNAM labels for their LAS files. The curves data is under the header - “~Curve Information Block”. The algorithm should come up with a logic to parse the curve information and learn based on the values present in the LNAM attribute of the training set. So that it can predict the LNAM attribute in the files where the LNAM attribute is missing.

LNAM – Log name - “Log Name” is created by determining the combination of tools used to collect the data. Tools are currently determined by knowing the SVCO and certain combinations of mnemonics found in the ~C Curve Information section. Reference tables are available for some vendors.

Challenge Input
You'll be provided the following data sources which can be found in Code Documents section of the forums.

Training Set of LAS files.
training.csv which displays UWI numbers (Well Identifier) and the predicted LNAM value.
Testing Set of LAS files.
Logging Contractor Alias.xlsx.

Final Submission Guidelines

Technology Stack
Python 3
(Any Open Source Machine Learning framework can be used)

Submissions Guidelines
You should provide the following in your submission .zip file:

Your source code
Dependency management and build scripts (pip install, etc)
Documentation - README.md
testing.csv - a file with your predictions for the LNAM assignments in the testing set of LAS files.

What should be the prediction format ?
As part of this challenge you should provide a testing.csv file as output that matches the format of the training.csv. Please include the following header in your file:
UWI, LNAM Value

Evaluation Guidelines

The solutions will be ranked in order of accuracy of the predictions on the testing set of LAS files. We'll publish our test harness in the forums so you can use it to evaluate your own solution against the training data.
Your solution must be flexible enough to accommodate new training and testing files. The client (ultimately) will be using this solution against and broader data set and will need to retrain from time to time.

Quartz Energy - LNAM Prediction Data Science Challenge

Key Information

Challenge Overview

Final Submission Guidelines

LEARN:

ELIGIBLE EVENTS:

REVIEW STYLE:

Final Review:

Approval:

CHALLENGE LINKS:

TOOLBOX:

SHARE:

ID: 30068700