Fun Series - Data Science - Trip Safety

Key Information

Register
Submit
The challenge is finished.

Challenge Overview

Overview

A large amount of data has been and continues to be collected from the vehicles that ABC Corp  uses for delivery of cargo to customers. The data includes On-board Computer (OBC) alarms that record exceptional driving events such as excessive speed, speed changes, and tractor stability during operation. Existing OBC data is correlated with other information such as time of day, driver status, route details, cargo, and weather conditions to provide a broad spectrum of data related to ABC corp deliveries.

Since accidents are extremely rare and since the ideal objective would be to prevent all accidents, OBC alarms (occurring on fewer than 5% of all trips) are considered an important factor in managing safety. Preparation for the current match assumes that the ability to use correlated data to anticipate and thus reduce OBC events will further increase the safety of trips.

Problem Statement

The current challenge will be successful when community provides algorithmic solutions that, when run, can identify which trips in a dataset are most likely to involve alarms from on-board computers. These algorithms will ultimately provide input to the logistical planning system used by ABC corp.

Overview of Data

source

 

pilot

dist

 

pilot2

cycles

 

pilot_exp

complexity

 

pilot_visits_prev

cargo

 

pilot_hours_prev

stops

 

pilot_duty_hrs_prev

start_month

 

pilot_dist_prev

start_day_of_month

 

route_risk_1

start_day_of_week

 

route_risk_2

start_time

 

weather

days

 

visibility

pilot

 

Risk_involved

 

The target variable is Risk_involved is the aggregagation of all OBC events. In  the training data set it has the levels “n” and “r” which means not risky and risky respectively. The training data set contains around 80 K records . The test data set has around 42 K records. You need to predict the value of Risk_involved column in the test data file . Your ouput would be the test.csv with the values for Risk_involved column along with their probabilities which would be used to generate the AUC by using ROC curves..

Here is one sample output record that is expected:

source

dist

cycles

complexity

cargo

stops

start_month

start_day_of_month

L04

267

1

14

5

2

10

20

 

start_day_of_week

start_time

days

pilot

pilot2

pilot_exp

pilot_visits_prev

7

1632

0.33

17355

0

3

1

 

pilot_hours_prev

pilot_duty_hrs_prev

pilot_dist_prev

route_risk_1

route_risk_2

17.6

13.1

942.9

97

209

 

weather

visibility

Risk_involved

Prob

2

8.466666667

r

0.52

 



Final Submission Guidelines

Your ouput would be the test.csv with the values for Risk_involved column along with their probabilities which would be used to generate the AUC by using ROC curves..

Upload output csv file for submiting this challenge.

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30050704