Fun Series - Data Science - Trip Safety

Key Information

Register
Submit
The challenge is finished.

Challenge Overview

Project Overview

We are launching a series of fun Challenges, intended for learning new technology and topcoder platform. This week the challenge is about Data Science. 

Important :

This is a fun and learning challenge. No prizes will be awarded for completing the challenge.

Overview

A large amount of data has been and continues to be collected from the vehicles that ABC Corp  uses for delivery of cargo to customers. The data includes On-board Computer (OBC) alarms that record exceptional driving events such as excessive speed, speed changes, and tractor stability during operation. Existing OBC data is correlated with other information such as time of day, driver status, route details, cargo, and weather conditions to provide a broad spectrum of data related to ABC corp deliveries.


Since accidents are extremely rare and since the ideal objective would be to prevent all accidents, OBC alarms (occurring on fewer than 5% of all trips) are considered an important factor in managing safety. Preparation for the current match assumes that the ability to use correlated data to anticipate and thus reduce OBC events will further increase the safety of trips.

Problem Statement

The current challenge will be successful when community provides algorithmic solutions that, when run, can identify Top 500 which in a data-set are most likely to involve alarms from on-board computers. These algorithms will ultimately provide input to the logistical planning system used by ABC corp.

Overview of Data

source

 

pilot

dist

 

pilot2

cycles

 

pilot_exp

complexity

 

pilot_visits_prev

cargo

 

pilot_hours_prev

stops

 

pilot_duty_hrs_prev

start_month

 

pilot_dist_prev

start_day_of_month

 

route_risk_1

start_day_of_week

 

route_risk_2

start_time

 

weather

days

 

visibility

pilot

 

Risk_involved

The target variable  Risk_involved is the aggregation of all OBC events. In  the training data set it has the levels “n” and “r” which means not risky and risky respectively. The training data set contains around 80 K records . The test data set has around 42 K records. You need to find the top 500 trips that are most likely to be risky i.e your submission file would have 500 records. Your output will be in the following format. 

 

Here is one sample output record that is expected:

source

dist

cycles

complexity

cargo

stops

start_month

start_day_of_month

L04

267

1

14

5

2

10

20

 

start_day_of_week

start_time

days

pilot

pilot2

pilot_exp

pilot_visits_prev

7

1632

0.33

17355

0

3

1

 

pilot_hours_prev

pilot_duty_hrs_prev

pilot_dist_prev

route_risk_1

route_risk_2

17.6

13.1

942.9

97

209


 

weather

visibility

Risk_involved

Prob

2

8.466666667

r

0.52

 

Here is one sample output record that is expected:

source

dist

cycles

complexity

cargo

stops

start_month

start_day_of_month

L04

267

1

14

5

2

10

20

 

start_day_of_week

start_time

days

pilot

pilot2

pilot_exp

pilot_visits_prev

7

1632

0.33

17355

0

3

1

 

pilot_hours_prev

pilot_duty_hrs_prev

pilot_dist_prev

route_risk_1

route_risk_2

17.6

13.1

942.9

97

209


 

weather

visibility

Risk_involved

Prob

2

8.466666667

r

0.52


 

Your evaluation Criteria would be based on = 100 * precision for 500 trips

 

 

 



Final Submission Guidelines

Your ouput would be the test.csv with the values for Risk_involved column along with their probabilities which would be used to generate the AUC by using ROC curves..

Upload output csv file for submiting this challenge.

Important :

This is a fun and learning challenge. No prizes will be awarded for completing the challenge.

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30051002