TopCoder Feature Articles

Immutable Objects
Thursday, July 10, 2003

By vorthys
TopCoder Member

Introduction
How many times have you been bitten by the following bugs? You pass an object to an innocent-looking method and it makes changes to the object that you weren't expecting. Or you update an object in one thread and some other thread makes incompatible changes to the same object. Or you place an object in a hash table and then somebody makes a change to the object that alters its hash code, so the object is now in the wrong place in the hash table. All of these bugs—and a horde of similar bugs—can be avoided by using objects that are immutable.

An immutable object is read-only—once it is created, it can never be changed. All of the bugs mentioned above involve undisciplined or incompatible changes to an existing object. Such bugs simply go away when you switch to using immutable objects.

Of course, we have to be able to update our data structures. For example, when implementing collections, we obviously want to be able to add and remove elements. The key is that when we update an immutable object, we get back a new object that reflects the update—we do not change the existing object. Java programmers will be familiar with this behavior from the String class. In Java, strings are immutable and concatenating two strings, for example, produces a third string but does not alter either of the original strings.

Besides reducing bugs, immutable objects have several other advantages. Perhaps most importantly, threads that manipulate immutable objects have drastically reduced synchronization requirements. Two threads can safely access or even update the same object without any synchronization whatsoever. See schveiguy's recent article on Threads and Atomicity for a discussion of the kinds of headaches caused by threads when working with ordinary, mutable objects.

Immutable objects are particularly handy for implementing certain common idioms such as undo/redo and abortable transactions. Take undo for example. A common technique for implementing undo is to keep a stack of objects that somehow know how to run each command in reverse (the so-called "Command Pattern"). However, figuring out how to run a command in reverse can be tricky. A simpler technique is to maintain a stack of immutable objects representing the state of the system between successive commands. Then, to undo a command, you simply revert back to the previous system state (and probably store the current state on the redo stack).

As an extended example of the kind of idiom supported by immutable objects, consider an application that processes the nodes of a tree top-down, where each node is processed in an environment that contains all of the node's ancestors. This pattern is extremely common in interpreters and compilers. Now, as you move down the tree, you obviously add information about each node to the environment, but what about when you move back up the tree? Using ordinary, mutable objects, you have to explicitly remove nodes from the environment, as in the following code:

   void visit(TreeNode t, MutableEnvironment env) {
     ...do something at t involving env...

     env.add(t);    // moving down the tree, add to environment
     if (t.left != null) visit(t.left, env);
     if (t.right != null) visit(t.right, env);
     env.remove(t); // moving up the tree, remove from environment
   }

On the other hand, using immutable objects, you simply throw away the new environment and go back to using an earlier environment.

   void visit(TreeNode t, ImmutableEnvironment env) {
      ...do something at t involving env...

      ImmutableEnvironment newEnv = env.add(t);
      if (t.left != null) visit(t.left, newEnv);
      if (t.right != null) visit(t.right, newEnv);
      // discard newEnv, no need to remove t!
   }

Notice that the recursive visit to the left child will update its environment, but when that recursive visit returns, those updates will be discarded. The recursive visit to the right child will revert back to newEnv, which was unaffected by any changes made while processing the left child.

So far, the differences between the two versions of visit may seem minor, but now suppose you want to modify visit to process both children of the node concurrently. In the version using immutable environments, this modification would be trivial, but in the version using mutable environments, the changes would be all-encompassing. You would essentially have to rewrite the code from scratch!

I been have extolling the virtues of immutable objects, but they have their downsides as well. Immutable objects are commonly less efficient that their mutable counterparts, in both time and space. The efficiency problems are particularly severe for flat objects like strings, making it most unfortunate that many programmers' first introduction to immutable objects is the String type in Java. However, the efficiency differences are typically much smaller for structured data like trees. The space issue is exacerbated by the fact that immutable objects are frequently shared, making it unclear who should be responsible for their deallocation. Garbage collection is a big help here. On the other hand, the increased sharing possible with immutable objects can make immutable objects use less space for some applications, such as those using the so-called "Flyweight Pattern".

The most serious downside of immutable objects is that they require a different mindset from mutable objects, one that many programmers initially find foreign. For example, it is often easier to process immutable objects using a recursive, functional style (see radeye's article on Functional Programming) than the loops and assignments many programmers are most comfortable with. But if you can get past this initial discomfort, you will find that immutable objects are often just the right tool for writing reliable, thread-safe code.

Would you like to write a feature?