Next Deadline: Registration
    1d 14h until current deadline ends
    Show Deadlinesicon-arrow-up

    Challenge Overview

    The sponsor of this challenge is seeking to explore the capabilities of ML.net on a specific use case, explained below.  While there may be other more effective approaches to solving the use case itself, the purpose of this challenge is to solve it using the capabilities of ML.net.  Native capabilities are more interesting to the Sponsor than custom approaches, so if you must replace a native feature with a custom one instead in order to meet the performance objectives, you’ll need to explain your analysis behind your choice.

    The use case for this contest is to predict data about buildings, based upon what data is available, and similar data available for other buildings in similar areas.  Using, of course, ML.net.

    In particular, we seek to accurately predict  the following four values:

    • Year Built
    • Construction
    • Number of Stories
    • Building Value

    Several features are available in the provided data, and may have varying levels of usefulness in making the final predictions. Note that the dataset may have unknown/missing values, and the predictors must be able to function (as well as possible) even when some data values are unknown.

    Feature Tips

    • InfrastructureId: Reference only. Should not be used in the Prediction.
    • Location Name: For context purposes only.
    • FootprintSqMtrs: This is the building footprint (ie. the area of the building) in square metres.

    Feature Engineering Tips

    • Lat;Long: concatenated and 3 decimals only could be used to group structures to a location area.
    • Occupancy: is 1 feature (ie: OccupancyScheme, OccupancyCd and OccupancyDesc) describes the type of business.
    • Construction: is 1 feature (ie: ConstructionScheme, ConstructionCode and ConstructionDesc) describes what the Building is made of.


    We are providing a data set you can use for testing your solution, training your models, and putting together your submission. Actual scoring of the accuracy will be based upon a different dataset not revealed during the contest. Download here.


    Final Submission Guidelines


    While getting good prediction results is important, it is not the sole focus on this contest. Overall ranking will be based 60% on accuracy, and 40% on reviewer/client evaluation of the submitted solution and accompanying documents explaining how to use it, and how it works.

    Documentation must include the following, plus any additional information you feel may be useful:

    • How to run the code, with simple examples for clarity.
    • ���What models are used, and, in rough terms, how they work. E.g. what features are considered by the model, any weighting schemes used.
    • Any approaches to the problem that were considered but ultimately abandoned.
    • Possible future approaches to consider as alternatives.

    Scoring for each of the numerical predictors will be based on calculating the RMSE (root mean squared error) of predicted values as compared against the ground truth for a single predictor. That RMSE will be compared against a baseline RMSE, which is calculated by “predicting” each data point as the average (i.e. mean) value based on the whole dataset.

    For the non-numerical categorical predictors (i.e. construction type), the equivalent RMSE values will be calculated by assuming a correct prediction has an error of 0, and an incorrect prediction an error of 1. For example, if a building is concrete, and the predictor returns “concrete”, that is an error of 0. If the predictor returns “steel”, that is an error of 1.

    The overall score from accuracy will weight the score for each of the predictors equally.



    To qualify for a prize, final deliverables must include all source code and documentation as described above. Additionally, the predictors must also include at least a minimal interface or API to be able to access, use, and test them. Whether this is implemented as an API or as a web interface is a choice left to the submitter.

    Reliability Rating and Bonus

    For challenges that have a reliability bonus, the bonus depends on the reliability rating at the moment of registration for that project. A participant with no previous projects is considered to have no reliability rating, and therefore gets no bonus. Reliability bonus does not apply to Digital Run winnings. Since reliability rating is based on the past 15 projects, it can only have 15 discrete values.
    Read more.


    Final Review:

    Community Review Board


    User Sign-Off