Challenge Overview
Overview
A large amount of data has been and continues to be collected from the vehicles that ABC Corp uses for delivery of cargo to customers. The data includes On-board Computer (OBC) alarms that record exceptional driving events such as excessive speed, speed changes, and tractor stability during operation. Existing OBC data is correlated with other information such as time of day, driver status, route details, cargo, and weather conditions to provide a broad spectrum of data related to ABC corp deliveries.
Since accidents are extremely rare and since the ideal objective would be to prevent all accidents, OBC alarms (occurring on fewer than 5% of all trips) are considered an important factor in managing safety. Preparation for the current match assumes that the ability to use correlated data to anticipate and thus reduce OBC events will further increase the safety of trips.
Problem Statement
The current challenge will be successful when community provides algorithmic solutions that, when run, can identify which trips in a dataset are most likely to involve alarms from on-board computers. These algorithms will ultimately provide input to the logistical planning system used by ABC corp.
Overview of Data
source |
|
pilot |
dist |
|
pilot2 |
cycles |
|
pilot_exp |
complexity |
|
pilot_visits_prev |
cargo |
|
pilot_hours_prev |
stops |
|
pilot_duty_hrs_prev |
start_month |
|
pilot_dist_prev |
start_day_of_month |
|
route_risk_1 |
start_day_of_week |
|
route_risk_2 |
start_time |
|
weather |
days |
|
visibility |
pilot |
|
Risk_involved |
The target variable is Risk_involved is the aggregagation of all OBC events. In the training data set it has the levels “n” and “r” which means not risky and risky respectively. The training data set contains around 80 K records . The test data set has around 42 K records. You need to predict the value of Risk_involved column in the test data file . Your ouput would be the test.csv with the values for Risk_involved column along with their probabilities which would be used to generate the AUC by using ROC curves..
Here is one sample output record that is expected:
source |
dist |
cycles |
complexity |
cargo |
stops |
start_month |
start_day_of_month |
L04 |
267 |
1 |
14 |
5 |
2 |
10 |
20 |
start_day_of_week |
start_time |
days |
pilot |
pilot2 |
pilot_exp |
pilot_visits_prev |
7 |
1632 |
0.33 |
17355 |
0 |
3 |
1 |
pilot_hours_prev |
pilot_duty_hrs_prev |
pilot_dist_prev |
route_risk_1 |
route_risk_2 |
17.6 |
13.1 |
942.9 |
97 |
209 |
weather |
visibility |
Risk_involved |
Prob |
2 |
8.466666667 |
r |
0.52 |
Final Submission Guidelines
Your ouput would be the test.csv with the values for Risk_involved column along with their probabilities which would be used to generate the AUC by using ROC curves..
Upload output csv file for submiting this challenge.