TopCoder Feature Articles

Java for C++ coders, and vice versa
Tuesday, June 2, 2004

By dplass
TopCoder Member

Introduction
Three out of the four languages used in TopCoder competitions derive from a common language, C. But there are still differences (and similarities) between C++, Java, and C#. In this feature, I will point out the more significant, and interesting, features that distinguish these three languages. I'll assume that you know at least one of these languages well; I myself have used C++ and Java for the last 8 years but have only competed in Java thus far on TopCoder.

Primitives
C# and C++ have both signed and unsigned integer classes ranging from 8 to 64 bits. In Java, all integer types are signed. Traditionally (i.e., in C), the char primitive was an 8-bit data type (i.e. byte); Java provides both a signed byte and the char type, which is 16 bits and represents a Unicode character.

In addition, in Java, unlike C++, primitives are not first-class objects. You cannot extend a primitive data type in Java, nor can you call methods on a primitive. Furthermore, as a result, in Java you cannot pass a primitive to a method that accepts an Object (or any class that extends Object). This becomes a problem in Java collections, which only store Objects (this will change in Java 1.5 with the addition of generics, see below).

To get around these limitations, Java has "wrapper" classes with similar names to the primitives that they wrap. For example, the Integer class wraps int; you can cheaply make an Integer with the int that it wraps. The wrapper classes in Java also provide utility methods and constants that are associated with the primitive itself. For example, Integer.MAX_VALUE is the most positive int (2³¹-1). Other very useful methods include Integer.parseInt, which parses a String into an int (aside: why doesn't it parse it into an Integer?), and Integer.toBinaryString.

Strings
Java Strings are immutable; another class, StringBuffer, represents a mutable sequence of characters. Unfortunately, string concatenation creates new objects, a classic "memory hog" in Java. Happily, near-universal support for the standard toString method allows you to easily convert many standard datatypes to strings for printing, debugging or display.

In C++ strings are mutable and adhere to the [] style for both get and set. This is one of my pet peeves about the Java String and StringBuffer classes; you must use awkward charAt(int) and setCharAt(int, char) methods to get and set individual characters within the string. Because C++ supports operator overloading (see below), the string class defines operator[] to support this syntax.

Classes and inheritance
When extending classes in C++, you must specify public, protected or private inheritance. This allows you to inherit the implementation of a class without inheriting the interface it provides. There is no such analogy in Java, as all class inheritance is public. In Java, you can prevent a class from being subclassed by declaring it final.

An interesting feature of Java lets you declare a class as abstract with no abstract methods. This will prevent that particular class from being instantiated. C++ has no such analogous concept; by definition in C++, an abstract class is one in which at least one abstract method is defined.

Interfaces
An interface in C# and Java is a description of a set of methods that a class that implements it must define. It is similar to an abstract class in C++ which only has abstract methods. C# and Java both allow multiple interface inheritance. Two common uses of interfaces are for "Mix-ins" and "tagging".

"Mix-ins" are typically single-method interfaces which allow you to use the class in a certain way. For example, the Comparable interface in Java allows you to define the compareTo method, which will allow Java utility methods and classes to sort your objects. In this case, it is equivalent to defining the operator< method in C++.

"Tagging" interfaces are used to signify to the application server that it should do something special with the class. For example, in Java, the Serializable interface signifies to the container that it can write out the bytes of the object and read it back later with the same exact state. The developer does not need to actually implement anything when a class implements Serializable.

Virtual methods
Java programmers might be confused by the term 'virtual method,' since in Java all methods are virtual. This means, that at runtime, no matter what type an object is referenced as (i.e. the class itself or a parent class), the "right" method will be called. Example:

public class Parent
   {
      void method() { System.out.println("parent"); }
   }
   
public class Child extends Parent
   {
      void method() { System.out.println("child"); }
   }


   Parent c = new Child();
   c.method();   // outputs "child"

In C++, you must explicitly tag Parent.method as virtual for this to work. Example:

class Parent
{
    public:
   virtual void method() { printf("parent\n"); }
   void method2() { printf("parent 2\n"); }   // note not virtual
};

class Child: public Parent
{
    public:
void method() { printf ("child\n"); }
void method2() { printf("child 2\n"); }
};

int main()
{
Parent *c = new Child();
   c->method();      // outputs "child"
   c->method2();    // outputs "parent 2"
}

This is known as the "Slicing Problem" in C++, and can wreak havoc on systems. It makes it hard to track down the behavior of a child class when the problem isn't in the child class! If the child class overrides a method defined in the parent as not virtual, you have to change the parent class to make it virtual. Java doesn't have this problem, since all methods are virtual there is no way to induce the Slicing Problem.

Operator overloading
As one of my favorite features of C++, operator overloading allows for natural arithmetic and other expressions when building custom classes. The classic example is the 'complex' class. Example:

   class Complex
   {
   public: 
double real, imag;
      
Complex & operator+(Complex &that)
{
   this->real += that.real;
   this->imag += that.imag;
   return this;
      }
   };

Then you can write very natural code with Complex objects, e.g.,

   Complex a;
   a.real = 1.0; a.imag = 2;
   Complex b;
   b.real = 3.0; b.imag = 4.0;
   Complex c = a + b;   // very natural, and does what you think

This facility is absolutely not supported in Java. All I can say is, "Why, oh why not!?" It makes C++ so much readable, especially when it comes to custom classes that represent mathematical entities (as in the Complex example above), or array-ish classes. For example, the vector template class defines the operator[] method, which allows you to write array-like code for accessing and modifying members of the vector.

Standard libraries
All three languages provide standard libraries which define many similar constructs:

Collections - lists, arrays, maps, sets, iterators
Algorithms: sort, binary search
Dates

The big difference is that Java does not support templatized data types (until recently, when Sun announced the addition of generics to the next version of the language). So, when you define a Vector in Java, it is only a Vector of Objects. In C++ you specifically declare vector<int>, and you can only put ints into the vector. When you retrieve objects out of a Java Vector, you must then cast it to the actual object that it is. Get it wrong, and you get a run-time exception (which, of course, you can catch.)

As mentioned above, the C++ vector template class defines operator[], allowing you to treat vectors as if they were arrays. Even better, the C++ map class also defines operator[]. As a result, you can write efficient, and understandable, code like this:

   map<string, int> wordCount;   // allocated on the stack
   wordCount["the"] = 1;

The equivalent in Java would be:

   HashMap wordCount = new HashMap(); // no stack allocation in Java
   WordCount.put("the", new Integer(1));   
// remember, no primitives in collections 
// [until Java 1.5 is released -- in 2004?]

Memory allocation
There are vast differences between the memory management process between Java, C++ and C#. This has been the subject of more religious wars than I care to go into. I'll point out some of the differences, advantages and disadvantages of each here.

Java and C# support garbage collection of unreachable memory. For example:

   public void JavaAllocate()
   {
      Object reclaimable = new Object();
      reclaimable.doSomething();
   }

After the method exists, reclaimable will not be reachable from any other object in memory. Eventually, the Java garbage collector, when it needs memory, or at some other time, will find this object on the heap and reclaim its memory. Garbage collection is the only way memory is reclaimed in Java. The advantage of this system is that the burden is not put onto the developer to remember to deallocate memory. The disadvantage is that you have no control over when the garbage collector will run, and it consumes system resources (i.e. CPU cycles). [Aside: real-time extensions to Java have addressed this problem.]

In contrast, C++ supports explicit deallocation of unwanted memory. For example:

   void cpp_cleanup()
   {
      Thing *thing = new Thing();
      thing->do_something();
      delete(thing);
   }

If you forget to deallocate thing, it will remain on the heap forever (or at least until your program terminates). Example:

   void cpp_hanging()
   {
      Thing *thing = new Thing();
      thing->do_something();
   }

In this example, you will never be able to deallocate the memory allocated originally after this method terminates. The advantage of C++'s memory allocation model is that you are guaranteed that as soon as you deallocate the memory, it is available (in contrast, you might have to wait for Java's garbage collector to run.) Of course, the disadvantage is that you have to be extremely careful to remember to deallocate things once you're done with them. Many companies have made much money providing tools to find, detect, debug, and eliminate these kinds of memory problems in both C++ and Java.

Pointers, references and handles
C# and C++ both support raw pointers which are in reality just numbers (often just 32 or 64-bit integers, depending on implementation).

Java, on the other hand, only uses handles to memory. This is due to the desire of the Java designers to allow the same class files (see below) to support any number of physical implementations (i.e. CPUs). Thus they could not expose something so close to the hardware as a pointer. Instead, the specific JVM that the code is running on is responsible for mapping handles to memory.

The following snippet (in a method) defines a handle in Java, but doesn't actually allocate any memory:

    Myclass object;

In C++ the above code would actually allocate an instance of the Myclass object on the stack. But in Java nothing is allocated. Only a slot, named object, in the stack frame has been defined. The value in this slot is actually undefined and the compiler will throw an error if you attempt to use object before assigning it to a value (or allocating something.)

This is similar to references in C++, which always must reference an object, whether on the stack or in memory on the heap. The big difference is that C++ references must always reference an object, but handles in Java can be null, or 'undefined' as described above.

Because you can allocate on the stack, and define a pointer to any allocated object, the "dangling reference" problem plagues C++. (This cannot occur in Java, because you cannot allocate objects on the stack). Example:

   Thing *cpp_dangling()
   {
      Thing stackThing;    // calls no-arg constructor
      return &stackThing;   // valid syntax!
   }

   void main()
   {
      Thing *bad = cpp_dangling();
      bad->do_something();   // likely will CRASH
   }

When cpp_dangling terminates, the stack memory where stackThing was allocated has been deallocated. As a result, the pointer "bad" points to a place in memory which has been deallocated (that stack frame), and possibly already overwritten by some other object. This usually results in a fatal runtime error (a.k.a. a CRASH.)

Compilation and run-time environment
Put simply, C++ is compiled down to the native binary level. The binary that you get can only run on the CPU for which it is targeted. The output is generally an executable, or library.

C#, like Java, runs within a virtual machine. This is one of the selling points of Java - "Write once, run anywhere". The .class files that a Java compiler produces can (theoretically) be run on any operating system that has a JVM. Both C# and Java support further compilation (usually at run-time) to native code to improve performance.

The runtime environment for both Java and C# also provide run-time checks for other potential errors, such as array out of bounds (or string access out of bounds), which C++ does not. You can catch this exception in Java, but in C++ it is much harder to trap this kind of operating-system-level exception.

Conclusion
This feature has touched upon only a few of the features of C++, Java and C#. Hopefully it gave you a better idea of how the "other half lives", and maybe will inspire some of you to investigate these interesting languages further. I myself intend to try to write and submit SRM solutions in C++ when it will give me an advantage over Java (sometimes simply for the use of vector<int>).

References and further information

Design and Evolution of C++, Bjarne Stroustrup

STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library

Sun Microsystem's Java website; specifically, I rely heavily on the Java API

The Java Programming Language

A Comparative Overview of C#

C# Tutorial

C++ for Java programmers; slides are available online

Would you like to write a feature?