Challenge OverviewThe client for this challenge is a large telecom / mobile network operator.
On this challenge, We are reaching out to the Topcoder data science community to get ideas which will help to build /develop an algorithm that can analyze historical data and predict 5 separate key performance indicators (KPIs) on the radio access networks (RAN), up to 2 hours in advance with 85% accuracy.
The eventual solution will be built in later phases and not by this challenge. In this challenge we seek your analysis of the problem, advice on how to solve it, and a proof of concept demonstrating the potential of your idea. Some data cleansing and preparation is necessary and included within scope of this challenge.
Prediction target:The target is for the 5 KPIs below, two hours into the future. So, if we want to predict the KPIs at 11AM for a specific lncel_name, we can train using the date from 9AM and before, make the prediction, and then compare to the actual values at 11AM. Client objective is 85% accuracy with the given data.
BackgroundBelow are the 5 RAN KPIs that we want to predict -
- Download (DL) physical resource block (PRB) Utilisation - This KPI shows the average value of the Physical Resource Block(PRB) utilization per TTI(Transmission Time Interval) in downlink direction. The utilization is defined by the ratio of used to available PRBs per TTI.
- In the data, this is the e_utran_avg_prb_usage_per_tti_dl column
- DL Traffic Volume - This KPI is the Packet Data Convergence Protocol (PDCP) and Service data Unit (SDU) volume on an eUu interface per cell in a downlink direction.
- In the data, this is the pdcp_sdu_volume__dl column
- DL Average End User Throughput - This KPI indicates the IP scheduled end-user throughput in DL for QCIx Services. Unit of Kbps.
- In the data, this is the avg_pdcp_cell_thp_dl column
- Average number of users -This KPI shows the average number of User Equipment (UEs) having one PRB during the measurement period.
- In the data, this is the avg_act_ues_dl column
- Spectrum Efficiency - Spectral efficiency usually is expressed as “bits per second per hertz,” or bits/s/Hz. In other words, it can be defined as the net data rate in bits per second (bps) divided by the bandwidth in hertz. Net data rate and symbol rate are related to the raw data rate which includes the usable payload and all overhead.
- In the data, the columns are dl_spectral_efficiency and ul_spectral_efficiency
- DL PRB Utilisation
- DL Traffic Volume
- DL Average End User Throughput
- Spectrum Efficiency
Data description and Key data challenge :History data is provided as a large CSV file with 256 columns and approximately 16,000,000 records. This data is sorted by date in the period_start_time_date column. You don’t have to use all the data in the CSV file, but there should be plenty there to test correlations and models.
We have very limited documentation on what the columns represent. You can ask about certain ones, but the only ones we know for sure are the KPI columns (detailed above), the period_start_time, mrbts_sbts_name, which is the tower ID, lnbts_name, which is the site ID, and lncel_name, which is the cell name.
Task DetailIn this Ideation challenge, we are looking for ideas for the following problems:
- Understand current set of data gathered and your observations/Ideas of Data quality in order to build an algorithm which can compute the 5 KPI’s
- Your findings re. Useful features and your recommendations to develop a better algorithm
- What approaches/ modelling techniques can be used to meet the KPIs. What are the risks or consequences of using your approach (for example, for some hypothetical problem, SARIMAX may not perform as well as LSTM, but is more explainable in terms of cause and effect of an input variable changing).
The final deliverables is a report explaining the algorithm and PoC ( Python 3) that will demonstrate your idea. More details are discussed in the Final Submission section.
Submission Format Guide for the White Paper:Your submission should include a text, .doc, PPT or PDF document that includes the following sections and descriptions:
- Overview: describe your approach in “laymen’s terms”
- Methods: describe what you did to come up with this approach, eg literature search, experimental testing, etc
- Materials: did your approach use a specific technology? Any libraries? List all tools and libraries you considered part of your proposal
- Discussion: Explain what you attempted, considered or reviewed that worked, and especially those that didn’t work or that you rejected. For any that didn’t work, or were rejected, briefly include your explanation for the reasons (e.g. such-and-such needs more data than we have). If you are pointing to somebody else’s work (eg you’re citing a well known implementation or literature), describe in detail how that work relates to this work, and what would have to be modified
- Data: what about the data described/provided - is it enough to achieve the 5 KPI’s with 85% accuracy ? If not, what could be achieved and Why ? Any Correlations in the data that may not be immediately obvious
- Feasibility: GIven the data and approach you implemented, How feasible and realistic it is and Why ? What level of prediction accuracy can be obtained using this approach and justification for the same ?
- Assumptions and Risks: what are the main risks of this approach, and what are the assumptions you/the model is/are making? What are the pitfalls of the data set and approach?
- Results: Did you implement your approach? How’d it perform?
- Other: Discuss any other issues or attributes that don’t fit neatly above that you’d also like to include
Judging CriteriaYou will be judged on the quality of your ideas, the quality of your description of the ideas, and how much benefit it can provide to the client. The winner will be chosen by the most logical and convincing reasoning as to how and why the idea presented will meet the objective. However, the judging criteria will largely be the basis for the judgement.
- Feasibility and Completeness
- Did you complete the sections as required above
- What're the key insights we can get from your analysis to meet client objectives ?
- Data quality findings/ Can you model predict 2 hours in advance
- PoC code is presented
- Does your submission include enough detail for us to understand if this approach is feasible?
- Is your solution more likely feasible than other submissions to the challenge?
Final Submission Guidelines
- Documentation (in text, .doc, .pdf, or .md format)
- Code (in a Jupyter Notebook with readme)
- Please provide a single configuration variable to allow us to change the location of the data for reviewer systems. Don’t hard-code the path to all the files.
- Don’t require hardware, like CUDA cards, but if you want to optionally provide a way to target them, that’s fine.