Challenge Overview

Project Overview

We are launching a series of fun Challenges, intended for learning new technology and topcoder platform. This week the challenge is about Data Science.

Important :

This is a fun and learning challenge. No prizes will be awarded for completing the challenge.

Overview

A large amount of data has been and continues to be collected from the vehicles that ABC Corp uses for delivery of cargo to customers. The data includes On-board Computer (OBC) alarms that record exceptional driving events such as excessive speed, speed changes, and tractor stability during operation. Existing OBC data is correlated with other information such as time of day, driver status, route details, cargo, and weather conditions to provide a broad spectrum of data related to ABC corp deliveries.

Since accidents are extremely rare and since the ideal objective would be to prevent all accidents, OBC alarms (occurring on fewer than 5% of all trips) are considered an important factor in managing safety. Preparation for the current match assumes that the ability to use correlated data to anticipate and thus reduce OBC events will further increase the safety of trips.

Problem Statement

The current challenge will be successful when community provides algorithmic solutions that, when run, can identify Top 500 which in a data-set are most likely to involve alarms from on-board computers. These algorithms will ultimately provide input to the logistical planning system used by ABC corp.

Overview of Data

source		pilot
dist		pilot2
cycles		pilot_exp
complexity		pilot_visits_prev
cargo		pilot_hours_prev
stops		pilot_duty_hrs_prev
start_month		pilot_dist_prev
start_day_of_month		route_risk_1
start_day_of_week		route_risk_2
start_time		weather
days		visibility
pilot		Risk_involved

The target variable Risk_involved is the aggregation of all OBC events. In the training data set it has the levels “n” and “r” which means not risky and risky respectively. The training data set contains around 80 K records . The test data set has around 42 K records. You need to find the top 500 trips that are most likely to be risky i.e your submission file would have 500 records. Your output will be in the following format.

Here is one sample output record that is expected:

source	dist	cycles	complexity	cargo	stops	start_month	start_day_of_month
L04	267	1	14	5	2	10	20

start_day_of_week	start_time	days	pilot	pilot2	pilot_exp	pilot_visits_prev
7	1632	0.33	17355	0	3	1

pilot_hours_prev	pilot_duty_hrs_prev	pilot_dist_prev	route_risk_1	route_risk_2
17.6	13.1	942.9	97	209

weather	visibility	Risk_involved	Prob
2	8.466666667	r	0.52

Here is one sample output record that is expected:

source	dist	cycles	complexity	cargo	stops	start_month	start_day_of_month
L04	267	1	14	5	2	10	20

start_day_of_week	start_time	days	pilot	pilot2	pilot_exp	pilot_visits_prev
7	1632	0.33	17355	0	3	1

pilot_hours_prev	pilot_duty_hrs_prev	pilot_dist_prev	route_risk_1	route_risk_2
17.6	13.1	942.9	97	209

weather	visibility	Risk_involved	Prob
2	8.466666667	r	0.52

Your evaluation Criteria would be based on = 100 * precision for 500 trips

Final Submission Guidelines

Your ouput would be the test.csv with the values for Risk_involved column along with their probabilities which would be used to generate the AUC by using ROC curves..

Upload output csv file for submiting this challenge.

Important :

This is a fun and learning challenge. No prizes will be awarded for completing the challenge.

Fun Series - Data Science - Trip Safety

Key Information

Challenge Overview

Overview

Overview of Data

Final Submission Guidelines

LEARN:

REVIEW STYLE:

Final Review:

Approval:

CHALLENGE LINKS:

TOOLBOX:

SHARE:

ID: 30051002