Challenge Overview
Problem Statement | |||||||||||||
Prizes
OverviewA common feature of online products and services is the ability to leave a review. These reviews are commonly used by future (human) visitors as a means of assessing the expected quality. Of course, as reviews themselves are also made (presumably) by humans, there is some degree--possibly substantial-- in subjectivity and personal preferences in any kind of evaluation. Reviews typically encompass both a quantitative evaluation (e.g. a rating score, or # of stars, etc) as well as a more freeform subjective portion. In this challenge, we will attempt to perform sentiment analysis on the review comments which customers have left, and how they correlate to the quantitative review score for a given seller. The means by which competitors attempt to make such correlations is left completely open. For our data set, we will be evaluating a service that allows individuals to rent part of their property for short-term, temporary residence by visitors. Several data fields about each listing are provided, including textual descriptions of the offering. Also, importantly, the reviews which have previously been left are provided for each listing. For this challenge, there is an overall rating score for each listing (scored up to 100), as well as six sub-scores (scored up to 10)for various different categories: accuracy, cleanliness, checkin, communication, location, and value. Special RequirementsFor this challenge, competitors are required to use IBM Cloud / Watson Studio for part of their solution. The exact ways in which you use it are left to you, however, in order to be eligible for a prize, you should be prepared to include in your write-up about how those services were a part of your solution. DataThere are three main files of concern:
Apart from the data provided as part of this competition, no external data sources should be used in this case. While it is certainly possible that other external sources of data could provide additional insights beyond what is provided, in this case the goal is the analyze based only upon that information which would be immediately and readily available to a potential customer looking to make a rental; which is what is provided in the data files. Possible ApproachesAlthough by no means an exhaustive list of possible avenues for investigation, the following are areas where one may find some data for correlation:
ScoringFor each of the seven categories, the "Root Mean Squared Error" (RMSE) will be calculated. The seven RMSE values will be summed to a grand total. As the primary rating comprises a greater range of values, it will contribute the most to the overall total. Your score will then be computed as MAX(20 - TotalRMSE, 0), and scaled to 1000000. That is, only submissions with a total RMSE of less than 20 will get a positive score. (Note that this is not overly hard to achieve with a naive solution that makes the same prediction for all listings.) Submission RequirementsDuring the course of the contest, it will only be necessary to submit a CSV as described in the scoring section. The "stub" Java code you submit will have a single method to return the URL at which your CSV can be downloaded by the tester. (You can use the linked example CSV to confirm what a valid submission should look like. It is the native approach described above.) public class ProductReviews { public String getUrl() { return "http://timk1980-001-site1.ctempurl.com/average_test.csv"; } } Following the competition, the top 5 submissions will be invited for final testing. The top submissions as a result of final testing will then need to setup an IBM Cloud VM (provided) with a working implementation of their code that is capable of producing the same results as were previously provided.
Those top 5 winning submissions will be required to submit a write-up of their solution, documenting how the code works, any considerations that should be known to a future user wishing to run it, and some explanation on the overall approach and methodology. Keep in mind that using IBM Cloud / Watson for at least some portion of the solution remains an important requirement here. Special NoteThis match is valid towards the TCO18 Cognitive trip. You will be awarded points according to the criteria on the page: https://tco18.topcoder.com/win-a-trip/cognitive-community/ | |||||||||||||
Definition | |||||||||||||
| |||||||||||||
Examples | |||||||||||||
0) | |||||||||||||
|
This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2020, TopCoder, Inc. All rights reserved.