Covid-19 Data Sharing and Research - Differential Privacy Challenge

Key Information

Register
Submit
The challenge is finished.

Challenge Overview

Challenge Overview

1. Context:

PROJECT CONTEXT
As part of our commitment to provide innovative tools to fight against the Covid-19 pandemic, Topcoder has been committed to support the efforts by making use of the crowd to develop different solutions that can assist people in these tough times.

CHALLENGE CONTEXT
In this challenge, you’re going to make Google’s C++ Differential Privacy library more useable for COVID research and data sharing.
Making use of this fork of Google’s C++ differential privacy library, create an implementation that uses the data schema provided in the specification to calculate common DP aggregate functions.

2. Expected Outcome:

The outcome of the challenge is the updated codebase supporting the data scheme shared in the requirements with updated test cases.

3. Challenge Details:

INDIVIDUAL REQUIREMENTS
1. Data Generation
Based on the following schema, you have to generate synthetic data (csv files) that can be used by the DP algorithm.
  • User
    • User ID
    • Name
    • Mobile
    • Email
    • Pincode
    • GPS Location (lat,lng) coordinates
    • Signup Date
    • Age
    • Gender
    • Marital Status
  • UserAddress
    • User ID
    • Address
    • GPS Location (lat,lng) coordinates
    • Type (HOME/WORK)
  • UserMedicalConditionHistory
    • User ID
    • Status (0/1)
    • Type (SARS / Cardiovascular / H1N1 / Covid-19 / Hypertension)
    • Start Date
    • End Date
    • Created At
    • Last Updated At
  • UserQuarantineStatus
    • User ID
    • Status (0/1)
    • Created At
    • Last Updated At
  • UserInfection
    • User ID
    • Status (0/1)
    • Type (SARS / Cardiovascular / H1N1 / Covid-19 / Hypertension)
    • Last Updated
2. Differential Privacy Implementation
A sample code is provided in the git fork which follows the example, to create a UserAgeReporter to give the following aggregate functions.
  • Mean
  • Count above a given age
  • Max age in the given sample
Similarly you need to implement the True and DP common aggregate functions (sum, mean, count, max, min)  from the given schema for the following combinations.
  • Users infected for a given disease within a given pincode
  • Users infected for a given disease above a particular age limit
  • Users who are recovered from a given disease
  • Users who are infected for a given disease between two dates
  • Users who are infected for a given disease within a certain distance (say 5 kms) of the given GPS coordinates
You have to write a reporter main program which will print the True and DP values similar to the example.
 
TECHNOLOGY STACK

Final Deliverables

  • Updated codebase including tests
  • README.md documenting how to review and test your submission

4. Scoring Aid:

Refer the scorecard used for code review.
 

Final Submission Guidelines

Please see above

ELIGIBLE EVENTS:

2020 Topcoder(R) Open

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30128240