We have a large manufacture of telecommunication equipment that writes all their firmware in ANSI C. Over the years that have amassed large volumes of C code, some of which is used and some of which is not, yet it still contains useful code snippets. The problem they are facing is that we new developers come on board they a facing a sea of firmware that does not contain any documentation other than comments. They are looking to Topcoder and its team to build a tool to reverse engineer some code documentation.
In previous contests we have successfully parsed sample c code projects, created sample module dependency graphs and module functional overviews.
Sample outputs of the tool are currently deployed to
This challenge will focus on looking up finite state machines in the project code. The goal is to identify as much state machines as possible. For each state machine we need to detect possible states and transitions between the states. There are several quite standard approaches to encoding state machines in C:
Switch statements - states and events are defined as enumerations, and the transitions are defined using a double nested switch statement where outer switch statement branches on current state, and inner switch statements branch on current event. See the wikipedia entry for an example.
2D state transition arrays - where a 2d array of num_states x num_events format is used and each element is a struct with 2 elements: function pointer (what to execute on the transition) and next state (what is the next state after the transition function is executed). See this page for a sample.
1D state function pointers and 2D transition matrix - where state array contains functions executed in each of the states (as oposed to the case of 2d arrays where transitions are the functions) and 2d transition matrix defines transitions based on return values of the state functions. See this gist for an example
There are ofcourse many more variations but in this challenge we'll focus on finding those three types of state machines. You should expand our tool for code documentation with state machine detection. The project code is already parsed (using libclang) and there is a number of examples for traversing the parse tree in the current code. There is also an example of looking up 2D arrays of structs where the struct contains a function pointer and an int - exactly what we are looking for 2d state transition encoded state machines. For other types of state machines you will need to traverse the parse tree looking for different structures - nested switch statements, 1D state function pointers and 2D transition matrices.
It's important to try and detect state and event names for all the state machines. These can be defined in the code as enumerations, defined constants, or just integers. For example, take a look at an example state machine in the quagga project, defined at bgpd/bgp_fsm.c lines 995-1166. It is a state machine encoded as a 2d transition array, where states and events are defined constants in bgpd/bgpd.h#736-753
#define Idle 1
#define Connect 2
#define BGP_Start 1
#define BGP_Stop 2
In this case it might not be possible to get event and state names because define statements are preprocessed before code parsing so they might not be available in the parse tree. There are however options in clang to not expand preprocessor statements and you should explore those to get event and transition names. If it's not possible, please explain the reasoning in the deployment guide.
If the states and transitions are defined as enumerations it should be straightforward to get the names from the parse tree, while in the case raw integers are used for states and transitions we have no other option but to keep those as state and transition names.
Entry point of the current tool is parser.py file and it will produce a json output containing data for the generated docs. You should expand the output to include the generated state machines. for each of those the output should include a list of events, list of states and transitions and a state machine graph. You should use pygraphviz to generate the graph in svg format (there are examples in the current code). Make sure you use a sane layout where states won't overlap in the output image. Node labels should be state names, and edge labels should be event names. Also, if states are ancoded as function pointers (1D array case) add function name to the node label, or if transitions are encoded as function pointers (2D case, switch case) add function names to edge labels.
You should update the document_viewer module to display the detected state machines. Add a separate tab names 'Finite state machines' where you will list all the state machines with all the available details. Use a similar view layout to module-details/module_name pageWe’ll keep the assumptions about the project code structure from the previous challenge. Top level directory contains just the build/configuration files and has no c code. It also contains a number of directories that represent modules of the project and they contain the actual .h and .c files. There are two opensource projects we can use as examples in this challenge: Quagga and Openswitch
It should be fairly easy to find plenty of state machines in the sample projects (at least 10+). As a hint, take a look at bgpd, ospf6d, ospfd, isisd, zebra modules in quagga project and datapath and vswitchd in openswitch project.
Final Submission Guidelines
Submit all the code in a zip archive.
Submit a build and verification guide. Document the steps for detecting state machines.
Submit the output your tool produces on quagga and ovs projects.