The sponsor of this challenge is seeking to explore the capabilities of ML.net on a specific use case, explained below. While there may be other more effective approaches to solving the use case itself, the purpose of this challenge is to solve it using the capabilities of ML.net. Native capabilities are more interesting to the Sponsor than custom approaches, so if you must replace a native feature with a custom one instead in order to meet the performance objectives, you’ll need to explain your analysis behind your choice.
The use case for this contest is to predict data about buildings, based upon what data is available, and similar data available for other buildings in similar areas. Using, of course, ML.net.
In particular, we seek to accurately predict the following four values:
- Year Built
- Number of Stories
- Building Value
Several features are available in the provided data, and may have varying levels of usefulness in making the final predictions. Note that the dataset may have unknown/missing values, and the predictors must be able to function (as well as possible) even when some data values are unknown.
- InfrastructureId: Reference only. Should not be used in the Prediction.
- Location Name: For context purposes only.
- FootprintSqMtrs: This is the building footprint (ie. the area of the building) in square metres.
Feature Engineering Tips
- Lat;Long: concatenated and 3 decimals only could be used to group structures to a location area.
- Occupancy: is 1 feature (ie: OccupancyScheme, OccupancyCd and OccupancyDesc) describes the type of business.
- Construction: is 1 feature (ie: ConstructionScheme, ConstructionCode and ConstructionDesc) describes what the Building is made of.
We are providing a data set you can use for testing your solution, training your models, and putting together your submission. Actual scoring of the accuracy will be based upon a different dataset not revealed during the contest. Download here.