$100 - Excel data loader for COVID-19 forecast project

Register
Submit a solution
The challenge is finished.

Challenge Overview

Project Context

COVID tracking data comes from many sources, in many different formats. In this contest, you will parse one of those into a standardized format.

 

Challenge Context

You will be adding to this starter code:

https://gitlab.com/tc-covid/f2f-brazil

 

In the repository you will find 3 things.

 
  1. A Python script, ingest.py, to which you will be adding new features. It is already set up to handle some sample data for the US. Your code will follow a similar structure, but for data from Brazil.

  2. Raw data is in data/raw_data/. You will be adding new code to handle HIST_PAINEL_COVIDBR_25mai2020.xlsx. There are also 3 raw data samples for the US, which the script can already handle.

  3. Processed data, with one directory per country and one file per area within that country. You will be working with data/processed/Brazil/, but there are also processed samples available for the US.

 

Challenge Details

You do not need to implement the full set of actions from the US sample code. You only need to implement the equivalent of “overwrite”, which will produce new outputs from the inputs without trying to protect any existing outputs.

 

There is already a placeholder brazil-format subcommand in place.

 

Expected Outcome

You must implement the brazil-format subcommand such that the processed outputs can be reproduced without causing any diffs. That is, assuming all relevant libraries are installed, it must be possible to run this:

 

rm data/processed/Brazil/*

./ingest.py brazil-format

git diff

 

And see no changes to the files under the Brazil/ subdirectory. The code must also work for any input files of the same format, and not rely on hard-coding of the processed sample data.

 


Final Submission Guidelines

Please submit a git patch file with your changes.

 

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30128048