Challenge Overview
Project Context
COVID tracking data comes from many sources, in many different formats. In this contest, you will parse one of those into a standardized format.
Challenge Context
You will be adding to this starter code:
https://gitlab.com/tc-covid/f2f-brazil
In the repository you will find 3 things.
-
A Python script, ingest.py, to which you will be adding new features. It is already set up to handle some sample data for the US. Your code will follow a similar structure, but for data from Brazil.
-
Raw data is in data/raw_data/. You will be adding new code to handle HIST_PAINEL_COVIDBR_25mai2020.xlsx. There are also 3 raw data samples for the US, which the script can already handle.
-
Processed data, with one directory per country and one file per area within that country. You will be working with data/processed/Brazil/, but there are also processed samples available for the US.
Challenge Details
You do not need to implement the full set of actions from the US sample code. You only need to implement the equivalent of “overwrite”, which will produce new outputs from the inputs without trying to protect any existing outputs.
There is already a placeholder brazil-format subcommand in place.
Expected Outcome
You must implement the brazil-format subcommand such that the processed outputs can be reproduced without causing any diffs. That is, assuming all relevant libraries are installed, it must be possible to run this:
rm data/processed/Brazil/*
./ingest.py brazil-format
git diff
And see no changes to the files under the Brazil/ subdirectory. The code must also work for any input files of the same format, and not rely on hard-coding of the processed sample data.
Final Submission Guidelines
Please submit a git patch file with your changes.