Challenge Overview

Project Overview

GE processes thousands of patent applications a year in the US alone.  They have applications from other countries with applicable info called "prior art."  This info (prior art) sometimes needs to be transposed from one application to another, a process that is done manually today and can take anywhere from five minutes to an hour. I think you can see the value that this tool could provide when you think about this process happening over thousands of applications.

We are running series of POC challenges to build a tool that could automate this process.

The final tool we want to build, will have the following workflow :

  • A user opens a web page to have two functionality :

    • upload a patent application that contains the ‘prior art’ information

    • upload an optional file that contains key/value pairs representing additional information that will be replacing default values used in building the final output

  • A php application backend will handle the upload functionality, and delegate logic to a java command line utility (packaged in jar file) passing the files to the utility

  • The utility will parse the input arguments, and decide the type of input file of both input files

  • For the patent application file, the utility extracts the ‘prior art’ information, and construct an xml

  • For optional file (let’s call it ‘extra info’ file), the utility converts the file to xml

  • the utility uses a mapping file, the converted xmls, and create a pdf/xls/csv file, store it locally, and return the full path to the file to the php

  • the php will read the file and send it back to the user.

Challenge Requirements

You have built two command line utilities, first one (OCR) convert PDF (contains images) into XML file. The second one convert the XML file from first utility to a CSV format.

1st utility is : OCR utility
2nd utility is : XML to CSV utility

The goal of this challenge is to build a java/jsp web application :

  • The web page will include the following :
    • An input field to upload single PDF file.
    • A key/value form (optional) - by default showing single row with ability to add multiple entries.
    • Add button 'submit' to submit the input to be processed.
    • It is preferred if the page show a progress indicator icon while the process in backend is being executed.
    • Display the output in a link with text "Success! click here to download the output" Or display the error message if error happend.
    • Use Bootstrap for UI.
  • The backend will perform the following :
    • Use the OCR utility to convert PDF to XML file, the output file must be specified to be used in next step.
    • Convert the key/value form into XML (in format accepted by 2nd utility), and pass it along with the XML from OCR utility to the 2nd utility.
    • Render the output file path (a web accessible link).
  • For 2nd utility you will need to update the mapping file if there is any differences between the default mapping in the submission, and the output XML from the OCR utility.
  • Use maven to build and deploy solution, maven should handle setting up the web server. The solution is simple so we don't think client shoudl make any additional step to setup a web server.

Documents

In provided solutions, there are sample files. Use it for testing.

Deployment Environment

  • Target OS is Linux (CentOS or Ubuntu).
  • Use configuration for any setting that is environment specific

Deliverable

  • A Fully implemented solution that addresses the requirements above.
  • A detailed readme to setup and run the solution.


Final Submission Guidelines

.

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30048765