Here is the list of issues the client has with the current code version, to be addressed in this challenge:
- Current PoC does not support multiple bills provided in a single image. We should update the algorithm (add additional algorithms into the processing flow) to detect individual receipts in the images containing multiple receipts, to be able to handle them as separate receipts in the further flow. This operation should be the first in the entire processing flow.
- Current PoC does not perform well when resized images are submitted (in some cases it manages to detect as duplicates resized versions of the same image, it is inconsistent in some times, and does not work well when the size difference is large);
- Current PoC does not work well with content tampered bills; i.e. when the applicant edits some part of an old receipt and submits it again, for example modifying a date and submitting the bill again the next month. We want to add some algorithms that are able to detect receipt manipulations without comparisons with other images; i.e. this piece of the processing flow should take a single receipt image as the input, and tells on the output whether this image is authentic or has traces of manipulation with the receipt or with the image itself. This operation should be the second in the entire processing flow, and if the receipt is suspected to be tampered, the code should report it right away, without proceeding to the duplicates comparison stage.
- Current PoC does not work well with skewed bills.