C code parsing and documentation

Key Information

The challenge is finished.
Show Deadlines

Challenge Overview

We have a large manufacturer of telecommunication equipment that writes all their firmware in ANSI C.   Over the years that have amassed large volumes of C code, some of which is used and some of which is not, yet it still contains useful code snippets.   The problem they are facing is that we new developers come on board they a facing a sea of firmware that does not contain any documentation other than comments.   They are looking for a tool that can generate reverse engineering design documents from a given C source code. The design document should contain various modules along with their interactions, finite state machines and other high level design details.


In a previous challenge we have gathered a few ideas that will help us reverse engineed the code documents.

In this challenge we’ll do two things

  • Parse C code and generate syntax tree

  • Create sample call graphs for from the parsed tree


Parsing C code and generating syntax tree

In order to gain any useful information about the code we need to be able to parse it correctly. To that end, a primary objective of this challenge is to generate a global syntax tree we can then traverse to look for interesting features like dependencies between different project modules (function calls between modules), data structures, enumerations, state machines etc.


We’ll make some assumptions about the project code structure. Top level directory contains just the build/configuration files and has no c code. It also contains a number of directories that represent modules of the project and they contain the actual .h and .c files. There are two opensource projects we can use as examples in this challenge: Quagga and Openswitch


We need to parse all the code in the project (all the modules) but in a way that would enable us to traverse the parse tree for each function within a file, entire file, each module and entire project.


You can use readily available parsers to parse the c code. We suggest using pycparser but are open to suggestions on this front.


Create sample call graphs for from the parsed tree

You need to build a tool that will parse the code in two example projects and output the following to demonstrate parse tree handling:

  • Global call graph as a png image. The code should traverse the entire parse tree, visit only function call nodes and build the call graph (if a function a calls function b, create a directed edge from a to b in the call graph). Make sure the graph contains only functions from the project, not external functions calls like stdlib or 3rd party library calls.
  • Call graph for each of the modules (calls to the functions outside the module should be filtered out)

  • Call graph for each of the files (calls to the functions outside the file should be filtered out)

  • List of defined data types in each of the modules (structs, enums)

  • List of variables of type 2d struct array where the struct has 2 members: function pointer, and an int or uchar (for example, it should list the FSM struct in /quagga/bgpd/bgp_fsm.c)


The output format for the above lists isn’t too important as it will be used just to demonstrate you can traverse the parsed tree correctly and filter the required data.

For generating the call graphs, you can use pygraphwiz.


Final Submission Guidelines

Submit all the code in a zip archive.
Submit a build and verification guide.

Reliability Rating and Bonus

For challenges that have a reliability bonus, the bonus depends on the reliability rating at the moment of registration for that project. A participant with no previous projects is considered to have no reliability rating, and therefore gets no bonus. Reliability bonus does not apply to Digital Run winnings. Since reliability rating is based on the past 15 projects, it can only have 15 discrete values.
Read more.


2017 TopCoder(R) Open


Final Review:

Community Review Board


User Sign-Off


Review Scorecard