ico-arrow-big-left

C code documentation reverse engineering

Key Information

Register
Submit
The challenge is finished.
Show Deadlines

Challenge Overview

We have a large manufacturer of telecommunication equipment that writes all their firmware in ANSI C.   Over the years that have amassed large volumes of C code, some of which is used and some of which is not, yet it still contains useful code snippets.   The problem they are facing is that we new developers come on board they a facing a sea of firmware that does not contain any documentation other than comments.   They are looking for a tool that can generate reverse engineering design documents from a given C source code. The design document should contain various modules along with their interactions, finite state machines and other high level design details.

 

In a previous challenge we have developed a tool to parse the project, traverse the call tree and generate all kinds of call graphs. The call graphs generated are not too useful as they contain extremly large number of nodes and we didn't apply any useful layout algorithms to make them look better.

In this challenge we'll refine the call graphs to get some insight into the code structure. Specifically the goal is to get the following:

  • Functional view for each of the modules

  • Modules interaction view

  • External interfaces list

  • Common data structures

We’ll keep the assumptions about the project code structure from the previous challenge. Top level directory contains just the build/configuration files and has no c code. It also contains a number of directories that represent modules of the project and they contain the actual .h and .c files. There are two opensource projects we can use as examples in this challenge: Quagga and Openswitch

 

Functional view

With the code from the previous challenge we can get a call graph for a module with './call_graph.py ./quagga-master/bgpd quagga-bgpd.svg' and the resulting graph looks like this

 

In order to keep just the most useful data in the graph any try to get the overall structure, we'll use PageRank algorithm to rank the functions(nodes in the graph) and produce the output graph containing only the high ranked nodes. The threshold for keeping/discarding graph nodes should not be fixed because number of functions and their interactions varies a lot between different modules so a threshold that's good for one module might be way off for a different one. Implement a simple heuristic to determine the appropriate threshold and verify it on quagga and ovs modules.

 

Modules interaction view

We'll create 2 graphs here:

- module dependency graph, where nodes are individual modules and if there is a function from modula a that calls a function from module b than there is a directed edge between nodes a and b

- inter-module call graph where we'll only show functions that call functions from other modules. For example if module A has a function f1 and module B has a function f2 that is called by f1 the graph should contain nodes f1 and f2 with an edge from f1 to f2

 

To construct the above graphs you'll need to traverse the parsed AST and select the appropriate functions. Take a look at call_graph.py for an example selecting only functions from a single module.

 

External interfaces view

Once we have the inter-module call graph, we can define the interface of one module as all the methods in a module that are called by other modules. More specifically those are just parts of the interface that are currently used but there might be more useful methods in the module that are meant to be called by other modules. To cover those cases, we'll consider all functions in a file as interface functions if at least one method in the file is called by any other module.

 

Common data structures

Similar to how we defined module interfaces above, we can define common data structures as structures (enum/struct) defined in one module but used by other modules. For example if module A has a definition for structure S, it will be considered a common data structure if at least one other module creates an instance of it or it's used as a method parameter or return value from an interface function.

You can check 'example_data_types.py' as an example for finding data structures in a module, but it won't provide any info about where that structure is used (that part should be developed in this contest)

 

Presentation

Create a command line tool that will accept just one argument: path to the project folder. The tool should create a HTML file containing the above views. Use a simple bootstrap based layout with placeholder header and footer. The page content should have 2 tabs: 'High level overview' and 'Module details'

High level overview tab should contain just the module dependency graph and the inter module call graph (the two graphs defined in Modules interaction view)

Module details view will be more like an api docs page and will have the following:

  • An index with this structure:

    • module name

      • Interface

        • List of all the interface methods for that module

      • Data Structures

        • List of all common data structures for that module

Each of the items in the index should link to the specific details view in the page

  • Detail section for each of the modules that will contain
    • module name

    • functional view graph

    • list of the interface methods (return type, name, parameters)

    • list of common data structures (name and contents)

Please use some templating engine for generating the page contents.

 

The code from the previous challenge is attached in the forums. You don't have to use it as base code, but you should reuse the parts that parse the project code and traverse the AST.


Improve code docs and generate doxygen documentation
In the first ideation challenge in this series we got a suggestion for using Swummary module for generating code docs for the project. It will basically go through the project files and fill in the missing comments based on menthod/variable names. You should just run the module on the sample projects and generate doxygen documentation based on the output from the swummay module. Make sure to cover this in detail in the deployment guide.
 

 

Final Submission Guidelines

Submit all the code in a zip archive.
Submit a build and verification guide.
Submit the output your tool produces on quagga and ovs projects.

Reliability Rating and Bonus

For challenges that have a reliability bonus, the bonus depends on the reliability rating at the moment of registration for that project. A participant with no previous projects is considered to have no reliability rating, and therefore gets no bonus. Reliability bonus does not apply to Digital Run winnings. Since reliability rating is based on the past 15 projects, it can only have 15 discrete values.
Read more.

ELIGIBLE EVENTS:

2017 TopCoder(R) Open

REVIEW STYLE:

Final Review:

Community Review Board
?

Approval:

User Sign-Off
?

CHALLENGE LINKS:

Review Scorecard

?