TCO18 Round 4 Editorial


In this problem we are considering two-dimensional arrangements of nonnegative integers into a pyramid-like shape in which each number is the sum of the two numbers that are diagonally below its place:

   12 13
 7   5 8

The task is to calculate the number of pyramids of L levels and top number T.

Observe that the whole pyramid is uniquely determined by the numbers in the bottom row. So let’s see how to calculate top number T based only on these numbers. If we denote the j-th number (0-based) on the r-th level from the top by A[r,j], we see that

T = A[0,0] = A[1,0] + A[1,1] = (A[2,0] + A[2,1]) + (A[2,1] + A[2,2]) = A[2,0] + 2*A[2,1] + A[2,2]
   = A[3,0] + 3*A[3,1] + 3*A[3,2] + A[3,3] = …

So it’s easy to observe (and prove) the pattern that T is the sum of numbers from the r-th row multiplied by binomial coefficients:

T = \sum_{j=0}^r A[r,j] * binom(r,j).

That allows us to come up with a simple dynamic programming approach. Let D[i,t] will be the number of pyramids in which the partial sum starting from the i-th number in the last row is equal to t, i.e.

t = \sum_{j=i}^{L-1} A[L-1,j] * binom(L-1,j).

The answer to the problem is, of course, D[0,T]. To calculate D[i,t] we can iterate over all possible values of A[L-1,i]:

D[i,t] = \sum_{v=0}^t D[i+1, t – v * binom(L-1,i)].

Of course we stop iterating when t – v*binom(L-1,i) becomes negative. The size of array D is O(L*T), since we have L levels and each partial sum is bounded by the value of top number T. The calculation of each cell is done in O(T), so the total complexity of the algorithm is O(L*T^2).

With L and T bounded by 1000 this could be too much. But observe that binomial coefficients grow exponentially, and for big L the values binom(L-1,i) in the middle of the last row will be larger than T, thus the corresponding values A[L-1,i] must be zeros, and the loop for these values will be done in constant time.

Therefore the time complexity is much better, and closer analysis can show that it is in fact O(L*T).

Here is a sample implementation in C++ using recurrence with memoization:

 const int N = 1010, MOD = 1000000007;
  int cache[N][N],binom[N][N];
  bool incache[N][N];

  int rek(int i, int t) {
    if (i == L) { return t == 0; }
    if (incache[i][t]) { return cache[i][t]; }
    int cnt = 0, val = t;
    while (val >= 0) {
      cnt += rek(i+1, val);
      cnt %= MOD;
      val -= binom[L-1][i];
    incache[i][t] = true;
    return cache[i][t] = cnt;

  binom[0][0] = 1;
  for (int i = 1; i < L; ++i) {
    binom[i][0] = binom[i][i] = 1;
    for (int j = 1; j < i; ++j) {
      // note that for binom[i][j] > T we do not need exact value
      binom[i][j] = min(T+1, binom[i-1][j-1] + binom[i-1][j]);

  return rek(0, T);

Polyline Areas

In this problem we are given a rectilinear polyline in the plane and we need to find the area of each finite region that this polyline divides the plane into.

Parsing the program

The polyline is given as a simple program for the robot that moves on the plane and draws the polyline along its path. The robot understands three basic instructions: F (move 1 unit forward), L (turn 90 degrees left) and R (turn 90 degrees right). The program is constructed from these instructions and cycles (possibly nested). Moreover, we can assume that during execution of the program the robot will make at most 250,000 instructions, therefore it is feasible to construct the whole path.

We can do it recursively as follows:

 string parse_program(int& i) {
    string s;
    while (i < polyline.size() && polyline[i] != ']') {
      int num = 1;
      if (isdigit(polyline[i])) {
        // parse number of repeats
        num = 0;
        while (isdigit(polyline[i])) {
          num = 10*num + polyline[i++] - '0';
      string comm;
      if (polyline[i] == '[') {
        // parse subprogram
        comm = parse_program(i);
      } else {
        comm = string(1, polyline[i++]);
      for (int j = 0; j < num; ++j) {
        s += comm;
    return s;

  int i = 0;
  string full_polyline = parse_program(i);

The function parse_program reads subsequent commands until it reaches character ] (in case it was called recursively on a subprogram) or the end of the string (at the end of the program). If the command starts with a digit, it is a cycle, thus the function parses number of repeats, otherwise it sets number of repeats to 1. Then it parses an instruction, and if the next character is [ it recursively parses a subprogram. Then it repeats the generated subprogram specified number of times.

In the above program we parse the subprogram exactly once, and then we just append it in a loop. We could say that since the output string has a limited length, we could instead just parse the subprogram in a loop. But there is a catch here: the subprogram could result in an empty output string, and in this case it is easy to construct a test case which requires exponential time for such solution, e.g.: 10[10[10[10[10[…10[]]]]]].

Constructing the planar graph

We got a string consisting just of basic instructions, which allows us to simulate the path of the robot. This path may cross and overlap itself arbitrarily, so we need to take care of it. Suppose that the robot starts in point (0,0) and heads to the right. During the simulation we will maintain its current position pos and direction dir. We will also store in map edges all the positions it visited together with information in which directions from these positions we have parts of the polyline. Since there are only four directions (right, up, left, down), we can number them from 0 to 3 and in each position store them in a bitmask:

typedef pair<int,int> pii;
 const int DX[] = {1,0,-1,0}, DY[] = {0,1,0,-1};

 map<pii, int> edges;
 pii pos;
 int dir = 0;
 for (char c : full_polyline) {
   if (c == ‘F’) {
     edges[pos] |= 1 << dir;                                // remember outgoing edge
     pos = pii(pos.first + DX[dir], pos.second + DY[dir]);  // move forward
     edges[pos] |= 1 << (dir^2);                            // remember ingoing edge
   } else if (c == ‘R’) {
     dir = (dir+3) % 4;  // turn right
   } else {
     dir = (dir+1) % 4;  // turn left

We can treat the structure we have just built as a planar graph. All visited positions are its vertices and for each vertex we maintain edges (at most 4) which connect it to the neighbour vertices.

Calculating the areas of the faces

Now we would like to calculate the area of each finite face of this planar graph. Observe that to do this it is enough to traverse the boundary of a face and use any algorithm for calculating the area of a polygon. How could we traverse a face if we draw this graph on a paper? We could start from any edge along this face and then go along the boundary counter-clockwise. Since we need also to track the direction we are moving, we will maintain a directed edge (each edge of the planar graph could be directed in two ways). In each vertex we must change the edge, remembering that we cannot cross the polyline. This can be done by selecting the next edge in the clockwise direction from this vertex. We repeat the process until we hit the edge we started from. On the following picture there is a planar graph built from a polyline encoded by a program F2[FRFR3FRFL]RFF and traversal of one of its faces:

Note that the boundary of a face is a polygon, but not necessarily simple (it can touch and overlap itself). But this should not be a problem for standard algorithms for calculating the area.

In order to do the above process for all faces, we could just keep track of visited directed edges and iteratively pick up non-visited edge, and go around the border. Note that this way we also calculate the area of the infinite face of the planar graph. But we can easily remove it by noting that this area will be the largest one, or if we calculate the signed area, this area will have non-positive sign.

  vector<long long> ans;
  set<pair<pii,int> > vis;

  for (auto i : edges) {
    for (int d = 0; d < 4; ++d) if (i.second & 1 << d) {
      pii pos = i.first;
      int dir = d;
      if (vis.find(make_pair(pos, dir)) != vis.end()) { continue; }

      // handle non-visited directed edge (pos, dir)
      long long area = 0;
      while (vis.find(make_pair(pos, dir)) == vis.end()) {
        vis.insert(make_pair(pos, dir));
        pii npos = pii(pos.first + DX[dir], pos.second + DY[dir]); // move forward
        area += (pos.first - npos.first) * pos.second;             // use Green's formula to update signed area
        for (int j = 0; j < 4; ++j) {
          // find next edge in clockwise direction along npos
          int ndir = ((dir^2)+4-1-j) % 4;
          if (edges[npos] & 1 << ndir) { dir = ndir; break; }
        pos = npos;
      if (area > 0) {


  sort(ans.begin(), ans.end());
  if (ans.size() > 200) {
    ans.erase(ans.begin()+100, ans.end()-100);
  return ans;

The time complexity of the algorithm is O(L log L), where L denotes the length of the full polyline. It could be speeded up to O(L) by using hash tables for edges and vis.

The above algorithm could be generalized to calculate areas of faces in any planar graph. However, a special form of graph in this problem made this task much easier. In the general case we would have to sort all edges along the vertices. And also if the graph is disconnected (which here cannot happen, since it is built from a single polyline), we would have to handle a case of a face inside another face.


In this problem we start from a sequence A of length N generated by a linear pseudorandom generator, and we perform a series of steps on it. A single step removes the first B elements of the sequence and appends the sum of these elements to the end of the sequence. We would like to know the first and the last element of the sequence after S steps.

Let’s look at an example for N=5 and B=2. We denote the subsequent elements appended during the evolution of the sequence by E[N], E[N+1], and so on:

The first observation is that each appended element will be a sum of consecutive elements of the original sequence (treated as a cycle). To be more formal: if we unwind the original sequence A, that is for indices i >= N we define A[i] = A[i mod N], then each appended element E[i] can be represented as a pair (s, k), which denotes that this element is the sum of k elements from the original sequence A, starting from the s-th element, i.e. E[i] = A[s] + A[s+1] + … + A[s+k-1].

Therefore solution to the problem consists of two parts:

  • finding pairs (s, k) for the first and the last element of the final sequence,
  • calculating the sums A[s] + A[s+1] + … + A[s+k-1] for these pairs.

Linear recurrence and fast matrix exponentiation

The second part of the solution is more standard, so we begin with it. The sequence A is generated using linear recurrence, and this can be rewritten using matrix notation:

If we unwind this recurrence, we get:

This allows us to calculate the value of the i-th element in the original sequence A in time O(log i) by using fast matrix exponentiation.
But the same technique can be applied to calculate a sum of any consecutive elements of the sequence. First observe that each sum can be obtained using only prefix sums. If we denote P[i] = A[0] + A[1] + … + A[i-1], then we have

A[s] + A[s+1] + … + A[s+k-1] = P[s+k] – P[s-1]

And calculating prefixes P[i] can be also done with a slightly bigger matrix:

The following C++ contains full implementation of the above ideas. Later we will use function sum_elems which works in time O(log N):

 typedef long long ll;
  const int MOD = 1000000007, M=4;

  struct mat_t {
    int m[M][M];

  mat_t mat_mult(const mat_t& A, const mat_t& B) {
    mat_t C;
    REP(i,M) REP(j,M) C.m[i][j] = 0;
    REP(i,M) REP(k,M) REP(j,M) {
      C.m[i][j] = (C.m[i][j] + ll(A.m[i][k]) * B.m[k][j]) % MOD;
    return C;

  mat_t mat_pow(mat_t A, ll n) {
    mat_t ans;
    REP(i,M) REP(j,M) ans.m[i][j] = i==j;
    while (n) {
      if (n&1) ans = mat_mult(ans, A);
      A = mat_mult(A, A);
      n /= 2;
    return ans;

  mat_t MAT;

  void init_mat() {
    REP(i,M) REP(j,M) MAT.m[i][j] = 0;
    MAT.m[0][0] = c1;
    MAT.m[0][1] = c2;
    MAT.m[0][2] = add;
    MAT.m[1][0] = MAT.m[2][2] = MAT.m[3][1] = MAT.m[3][3] = 1;

  int sum(ll k) {
    // sum_{i=0..k-1} A[i]  (for k <= N)
    mat_t mat = mat_pow(MAT, k);
    int ans = ( ll(mat.m[3][0]) * a1 + ll(mat.m[3][1]) * a0 + mat.m[3][2] ) % MOD;
    return ans;

  int sum_cycle(ll k) {
    // sum_{i=0..k-1} A[i % N]
    return (ll(sum(N)) * (k/N) + sum(k % N)) % MOD;

  int sum_elems(ll s, ll k) {
    // sum_{i=s..s+k-1} A[i % N]
    return (sum_cycle(s+k) + MOD - sum_cycle(s)) % MOD;

Compressed simulation of the process

The first part, i.e. finding the indices of the elements is quite nontrivial if we want to do this fast. In a naive solution we could just simply simulate the process, maintaining pairs (s, k) for each element E[i].

Initially we put pairs (i, 1) for 0 <= i < N on a deque of pairs elems. Then we simulate S steps:

  struct elem_t { ll s, k; };
  deque<elem_t> elems;

  for (int i = 0; i < N; ++i) {
    elems.push_back(elem_t{ i, 1 });

  while (S-- > 0) {
    int s = elems.front().s, k = elems.front().k;
    for (int i = 1; i < B && !elems.empty(); ++i) {
      k += elems.front().k;
    elems.push_back(elem_t{ s, k });

  elem_t ef = elems.front(), eb = elems.back();
  return vector<int>{ sum_elems(ef.s, ef.k), sum_elems(eb.s, eb.k) };

We use here an invariant that for two consecutive pairs (s1, k1) and (s2, k2) we have (s1 + k1) mod N = s2.

Unfortunately this results in a solution of time complexity O(S*B) which is not acceptable. To speed it up we observe that many consecutive pairs will have the same value of k, and thus we can compress them and handle them together.

We will gather such elements into blocks. Each block is a triple (n, s, k) which denotes that it consists of n consecutive elements of a sequence E described previously by pairs (s, k), (s+k, k), (s+2*k, k), …, (s+(n-1)*k, k).

Now to perform a step we must consider three cases:

  • There is exactly one block, and n=1 in this block. Then there is only one element in the sequence and each following step will not change it; we can stop.
  • The first block encodes at least B elements. Then we can perform floor(n/B) steps at the same time. We remove floor(n/B)*B elements from the first block, and we add a block (floor(n/B), s, k*B).
  • Else we simulate one step by constructing a block with one element.

After that we get block representation of the final sequence and we can easily recover indices for the first and the last element.

  struct block_t { ll n, s, k; };
  deque<block_t> blocks;
  blocks.push_back(block_t{ N, 0, 1 });
  while (S > 0) {
    block_t b = blocks.front();
    if (blocks.size() == 1 && b.n == 1) { break; }  // Case 1

    if (b.n >= B) {  // Case 2

      ll cnt = min(S, b.n/B);
      if (b.n - cnt*B) {
        blocks.push_front(block_t{ b.n - cnt*B, b.s + cnt*B * b.k, b.k });
      blocks.push_back(block_t{ cnt, b.s, b.k*B });
      S -= cnt;

    } else {  // Case 3

      block_t c = block_t{ 1, b.s, b.n*b.k };
      ll x = b.n;
      while (x < B && !blocks.empty()) {
        b = blocks.front();
        ll cnt = min(B - x, b.n);
        c.k += cnt * b.k;
        if (cnt != b.n) {
          blocks.push_front(block_t{ b.n - cnt, b.s + cnt * b.k, b.k });
        x += cnt;


  block_t bf = blocks.front(), bb = blocks.back();
  return vector<int>{ sum_elems(bf.s, bf.k), sum_elems(bb.s + (bb.n-1)*bb.k, bb.k) };

How fast is the above algorithm? In case 3 we produce one block with n=1. In case 2 we remove a block of some n=x, and produce a new block with n <= floor(x/B) and possibly leave smaller n in the first block. But after that we must go to case 3, which removes the first block and produces a block with n=1. So we can say that cases 2 and 3 combined together remove block with some n=x and produce block with n <= floor(x/B) and possibly block with n=1.

Thus during the algorithm we will have exactly one big block (initially with n=N) possibly followed by a block with n=1. As long as B >= 2 the n of this big block will decrease exponentially during every 2+3 case, so there could be at most O(log N) such decreases.

Therefore this part works in O(log N) time, and this is the time complexity of the whole algorithm.

There is one tricky case though. We assumed B >= 2 for exponential decrease, and in fact for B=1 the above algorithm works in O(S/N). But in this case we just S times perform operation of moving the first element from the sequence to the end. So it can be easily solved separately by just calculating the indices after S such moves:

  if (B == 1) {
    // special case
    return vector<int>{ sum_elems(S, 1), sum_elems(S+N-1, 1) };