This week, let’s analyze a problem that featured in Division 1 of SRM 705, sponsored by Blizzard, and also explore a common algorithm belonging to Graph Theory.
The summary of the problem statement is as follows:
Given a list of words, determine if there exists a possible permutation of the english alphabet, such that the characters in each word are sorted lexicographically. If yes, return “POSSIBLE”, else return “IMPOSSIBLE”
For example, if the words were [“ab”, “ad”, “db”], then all permutations of the english alphabet where “a” appears before “b” and “d”, and “d” appears before “b”, are valid.
Now, look at our task from a slightly different perspective and see if that helps us solve it easily.
Let’s consider the same problem, but this time, let’s analogize it with graph theory.
Let each letter, represent a node. Every two consecutive letters, would then represents two nodes, connected by a directed edge (as the first letter leads to the second letter)
The reasoning behind connecting them with a directed edge, is to store the order of letters. This is important, as our answer depends on the order of letters, and thus storing them in some representation is necessary – in this case, as directed edges.
The key rule our final permutation of the english alphabet must obey, is that whenever a letter appears in the permutation, all letters preceding this letter in all the input words, have already appeared before hand.
Now, we must generate some permutation of the english alphabet, that obeys our key rule.
In other words, from a graph theory perspective, we must generate some ordering of our vertices, in which no vertex appears before any of its ancestors.
This can be solved, by “Topological Sort”, an algorithm that sorts the vertices in an order where all vertices appear after their ancestors.
There’s one catch, though. It can fail, if there exists a cycle in our graph. And that is the case we must inspect, to determine whether it is “IMPOSSIBLE” to generate a valid permutation of the alphabet.
The idea of topological sort is very intuitive:
- Iterate over all vertices
- If a vertex has not yet been visited, visit it
- Mark the vertex as visited.
- If any of its neighbours haven’t been visited, recursively visit them until you reach a vertex whose neighbours have all been visited.
- Add the vertex to a global stack.
This stack, after the recursive method is over, will store the topological order of the vertices.
However in the case of this question, since the constraints are small and we’re creating a graph from the english language and we know the maximum number of vertices we can have is limited to 26, it may be faster to implement the solution with adjacency graphs.
In fact, that’s exactly what the fastest scorer in Java did – 2rf and the benefit of implementing this as an adjacency matrix, is that checking if a cycle exists in the graph is also extremely simple. This implementation can be found here.
The fastest scorer, anta had another intuitive approach, which was to calculate for every vertex how many ancestors it had, i.e. how many letters appeared right before this vertex.
Then after iterating over all vertices with zero ancestors and removing their contribution to their children by simulating their removal from the graph, anta summed up the number of vertices who had zero ancestors and if this number matched the initial number of vertices, then the graph was successfully constructed and deconstructed, implying a perfect topological sort existed.
This implementation can be found here.
An interesting exercise for the reader would be to now consider how to generate a valid permutation of the english alphabet, given a stream of words.