Topcoder Challenge | Topcoder Community

Challenge Overview

Welcome to the Fault Detection in a 3D Seismic Volume Marathon Match! Our client has interests in geophysics research, and it approached Topcoder with the following problem related to automated analysis and interpretation of 3D Seismic data.

Reflection seismology allows us to image the inner structure of Earth’s surface layer by sending down test seismic waves, and recording their “echo”, i.e. their reflection from heterogeneities inside the ground. One important type of heterogeneity is called a fault, which is defined as a planar fracture or discontinuity in a volume of rock across which there has been significant displacement as a result of rock-mass movement. The location of planar faults is an important topic in geophysics and geology because faults with high stress within them are the cause of most earthquakes. The successful interpretation (extraction) of these faults is a critical step in subsurface exploration. Typically the faults planes are extracted via a manual process where a human interpreter “draws” fault sticks on a series of slices visualized in the volume. These sticks are then images together as a triangulated plane (see later images). This is an extremely time-consuming & manually intensive process.

Today there are various methods to process the input seismic volume & generate an output that highlights the faults making them easier to visualize. One such process generates a 3D Fault Likelihood volume. In a nutshell it is a 3D grid, with two coordinates corresponding to the geographical coordinates on the Earth surface, and the third coordinate corresponding to the depth inside the ground. The value associated with each grid node specifies the estimated likelihood (probability) that the node lies on a geological fault, or it is located close to it. The problem our client looks to solve is to develop an algorithm able to, given a 3D Fault Likelihood volume, automatically extract fault planes contained within that volume. See the following images to better understand the problem.

Figure 1. The raw 3D seismic amplitude volume displays how well seismic waves are reflected inside different points of an Earth’s surface and images its structure.

Figure 2. A 3D Fault Likelihood volume calculated from the raw 3D seismic data displays at each point the likelihood that the point lies at a fault. Bright colors (most of the volume) represent low probability, and the dark colors represent high probability of a fault. Seen together these points image fault planes found in the original data. Fault likelihood volumes are the input to the problem we want solved.
As you see from the illustration, most of the dark points are arranged into planes cutting through the surface slab, and possibly intersecting each other. The goal of this project is to come up with an algorithm that automatically recognizes fault planes.

Figure 3. Manual extraction of fault planes. An expert visually examines cross-sections of 3D Seismic (as in this illustration) or a Fault Likelihood volume, and marks by “fault sticks” the lines where a fault crosses the cross-section. The fault plane is then triangulated based on the positions of corresponding sticks in a series of cross sections.

Recently we ran the ideation challenge about this problem. In this MM forum you will find three winning solutions from that challenge, which you may use as the starting point of your work (this is not obligatory though).

Also in the match forum you will find the local scorer code, training inputs and ground truths, and an example submission (more details below). In addition you may find useful the following resources:

https://github.com/equinor/segyio - C library for manipulation of SEG-Y files (the format used for 3D Likelihood Volumes, i.e. inputs, in this match), with language bindings for Python and Matlab.
SEG-Y format reference
https://cultpenguin.gitbooks.io/segymat/content/ - Matlab and Octave library for SEG-Y manipulation.
https://www.dgbes.com/index.php/download - Free software for interpretation of seismic data.
Also https://github.com/dhale/ipf contains sample programs for manipulation and processing of seismic data; and two articles covering existing approaches to the problem are shared in the challenge forum.

Inputs and Outputs

The input of your algorithm will be a set of SEG-Y files, each containing a single 3D Fault Likelihood volume. For each input SEG-Y file you will create a folder of the same name and output there a number of 3D meshes describing fault planes detected in the corresponding SEG-Y file, one mesh per output file.

The mesh file format is a text document with a list of vertex coordinates, followed by a list of triangular facets. Vertex and facet indices are 1-based, and corresponding lines are prefixed by VRTX n and TRGL labels, e.g. the following mesh describes a unit cube in XY plane, formed by two triangular facets:

VRTX 1 0.0 0.0 0.0
VRTX 2 1.0 0.0 0.0
VRTX 3 1.0 1.0 0.0
VRTX 4 0.0 1.0 0.0
TRGL 1 2 3
TRGL 1 3 4

The mesh files may contain any other lines (the training ground truth files do), and for the purposes of this match any lines beside vertex and triangle coordinates are ignored.

All input files have been generated randomly. Within some reasonable limits the generator selected the size of 3D volume, the number of “planes” to drop into the volume, the size and the curvature of each plane. It dropped these planes into the volume, and then transformed the resulting structure into an artificial 3D Fault Likelihood volume assigning probabilities ~1.0 to the grid points located on and near the planes, and ~0.0 probabilities to all other grid points. It also output the original planes as the ground truth meshes.

The important point is, although we call the fault planes as planes, they don’t have to be planar on the global scale! They are planar locally, but may bend on a larger scale. See images for examples. Another important point is that planes may, and will often intersect each other, and your algorithm is expected to nevertheless trace and segregate individual planes, and output them as separate mesh files! The following images show a randomly generated dataset as a combination of generated fault planes, and also as a cross-section of the resulting SEG-Y file:

Figure 4. (left) a random test case, containing 16 fault planes with intersections between each other; (right) a cross section through the corresponding 3D Fault Likelihood volume (SEG-Y file), with red color corresponding to ~0.0 probability, and black-yellowish colors corresponding to ~1.0 probability.��

Submission Format

You will submit a dockerized version of your code, i.e. the source code along with Dockerfile, placed inside a folder named code, and ZIPed so that the code folder is located in the root of the resulting archive (example submission is provided in the challenge forum). Topcoder system will build it and run the resulting Docker container with input and output folders mounted as local paths inside the container, and their local path names passed in as two command line arguments, i.e. the container will be run like:

docker run <commands to mount the folders> image-name /input /output

The specified input folder inside the container will contain a set of input SEG-Y files, e.g.

/input
test01.sgy
test02.sgy
test03.sgy

For each input file you’ll create a folder with the same name inside the specified output path, and write into it detected fault plane meshes, i.e.:

/output
/test01
mesh01.ts
mesh02.ts
/test02
a-mesh.ts
b-mesh.ts
/test03
c-mesh.ts

It does not matter, how you call individual mesh files: the scorer will read all files in each folder, and assume that each of them contains a single mesh.

A few additional points to consider:

Online testing will be done on t2.medium EC2 AWS machines, with 50 Gb volume, and no GPU support.
Solution runtime will be limited by 60 minutes. We are open to increase it by a reasonable amount, if it is found beneficial. In general, this is the problem where accuracy of the results is more important than the computational time, thus the calculation speed is not a part of the scoring. However, we still need to restrict the runtime to keep the scoring practically feasible.

Scoring

Given a set of folders with meshes output by your solution, and the corresponding ground truth data, the scorer will evaluate your solution in the following way.

Overall Score. The overall solution score will be the average of scores from individual test cases (a test case means a separate test_case_name.sgy input file, the corresponding test_case_name folder with meshes traced by your solution, and the corresponding set of ground truth meshes), multiplied by 100 to be a number within [0; 100] range.
Test Case Score. Assume for any two meshes A, and B the similarity score s(A,B) is defined in such way that 0.0 <= s(A,B) <= 1.0, it equals 1.0 if, and only if, A and B describe exactly the same surface (even if its triangulation is different), and it becomes smaller when the corresponding surfaces, or their parts, are moved away of each other.

Assume as well that according to the ground truth there are N meshes {G₁, G_2, …, G_N} in the test case, and your solution found M >= N meshes {A₁, A₂, ... , A_M} (if M < N, we’ll switch the sets, as the scoring metric is indifferent to which of them is the ground truth, and which is the solution).

The scorer will select among all possible combinations N mesh pairs {A_i, G_j}, such that any mesh is present in at most a single pair, and the selection maximizes the average mean square score

Notice, we divide by M rather than by N, effectively saying that remaining M - N unpaired meshes, if any, contribute with zero scores into the average.
Surface Similarity Score. For a given pair of surfaces A and B let L_AB(r) be the Euclidean distance from the point r of surface A to the closest point of surface B. We define the similarity score as

where the integrals are taken over the surfaces A and B, and S_A, S_B are surface areas of A and B.
To compute the score defined above, the scorer numerically evaluates integrals inside s(A,B) by sampling random points on meshed surfaces. Sample points are distributed uniformly on mesh facets, thus the calculations are agnostic to the exact surface triangulation. The number of points on each surface is selected to achieve a target average area per point: a = 10^-3 * S_T for provisional testing during the match, and a = 10^-5 * S_T for the final testing. S_T is the total surface area of all meshes in the ground truth. Thus, score accuracy will be higher in the final testing, and we may further increase the final testing accuracy for selected solutions should we see that ranking may be affected by insufficient accuracy.
Also, by definition each mesh should describe a planar surface. The scorer will check for each mesh that: (i) any edge belongs to at most two facets (triangles); (ii) any facet shares at least one edge with another facet of the mesh. If any mesh in outputs from your solution fails this check the scorer will fail your solution as a whole.

Additional Match Rules

Once your solution is functional at TC platform (i.e. it does not crash right away, and runs successfully to get a valid score), please do not submit more often than once a day, to ensure high availability of scorer for everyone.
The prize eligibility threshold score is 25.0 out of 100.0.
To claim a prize, beside ranking at a prize placement in the final scoring, you are expected to provide within a week since the results announcement a detailed report explaining your solution, its implementation, and any other details relevant to use and further development.
Any 3-rd party components used in your solution should be available under permissive open-source licenses similar to MIT license. Should you wish to use something covered by more restrictive licenses, please contact the copilot for an approval.

Fault Detection in a 3D Seismic Volume

Challenge Overview

Inputs and Outputs

Submission Format

Scoring

Additional Match Rules

Learn

Challenge links

Toolbox

ID: 30120970