Development

Start Typ-ing!

charles.loboNovember 10, 2017

Recently there has been an increasing interest in languages like TypeScript, Elm, PureScript, and Flow. These are early signs but it looks like the era of static type checking has finally arrived as more and more developers discover that types are actually a Good Thing™ and working with untyped code is Unproductive and Sinful™. Even Python - the granddaddy of "get your types off my lawn!" has bent to introduce type annotations so there’s clearly something going on.
So what is going on? Why the resurgence of types? I believe that, as a community, we have reached a stage of maturity where we realize that - like unit tests - types provide a heap of benefits that outweigh the cost of using them. Types are similar to unit tests in that you get feedback early and often that your code is "correct" and

They don't require writing and maintaining external test code
They check every possible path that reaches that code

With this in mind, I want to spend this post describing just what a type is. By the time you've read this post you will understand:

What a type is and why it's useful
What are static vs dynamic types
What are manifest vs inferred types
What are nominal, structural, and duck types
What is strong vs weak typing
What are ADT's, GADT’s, and other esoteric types

Phew! That's a lot to get into so let's jump into it.

What is a Type

A type is a way of making explicit a programmer's visualization of the constructs in her program. For example, "this is a 'string'", "this is a 'function taking a User and returning a boolean'", "this is an 'array of accounts'".
As you can see, types can be attached to almost any programming construct and help make the programmer's intent clearer. The main purpose of a type system is to reduce the possibilities of bugs in our programs by defining interfaces between the different parts, and then checking that all the parts have been wired up correctly.
So a type systems main purpose - we have learned - is to prevent bugs in our programs. Type systems can have other purposes - expressing business rules, enabling compiler optimizations, providing a form of documentation, and so on - but their main utility is in type checking for bug reduction.
A good description of the importance of types can be found in the paper Practical Type Inference Based on Success Typings

For programmers already experienced in developing programs [without types], programming is a tranquil and relatively uneventful activity, at least initially. The occasional frustrations of having to convince the type system that one really knows what she is doing are avoided. Also, since type declarations and annotations need not be typed (in), program development can progress more rapidly. Unfortunately, this freedom of expression comes with a price. Significantly less typos and other such mundane programming errors are caught by the compiler. More importantly, the freedom of not stating one’s intentions explicitly, considerably obstructs program maintenance...

Static vs Dynamic Types

Given type information, a compiler has two options:

check the types before allowing the program to run of
check the types as the program is running

The first method of type checking is called Static type checking and the second is called Dynamic type checking. For dynamic type checking the program must embed type information into the running system so that it can be checked at run time. This generally makes programs bulkier and slower. Dynamic type checking does not inform the programmer of errors - it brings them up while running and is therefore far less useful than static type checking.
A static type checker will quickly detect type errors even in rarely used code paths. Without static type checking, even code coverage tests with 100% coverage will not be able to find all errors because they may depend on a specific combination of values.

Manifest vs Inferred Types

A "manifest" type is one that the programmer must explicitly declare while an "inferred" type is one that the compiler figures out by itself.
For example (in C):

contains manifest types while (in ML):

the compiler infers the type of s (as string) and x (as number) and will also raise an error for s+x. This makes it even easier to get all the benefits of type checking without needing to write down all the boilerplate types.

Nominal vs Structural vs Duck Typing

A Nominal type system considers a type to be based on its declared name. For instance in C:

and

Are two distinct types (you cannot use struct x where the compiler is expecting struct y). However in a structural type system (such as Elm):

and

are considered the same type. You can use type x in all locations where the compiler is expecting type y.
Duck typing takes it a step further - the compiler only checks what is being used at a given point rather than the definition. For example (in Python):

and

are considered the same type in the expression
obj['a']
because only ‘a’ is accessed.
A nominal typing system is strongest and best for type checking. Structural type systems are weaker and duck typing weakest of all.

Strong vs Weak Type checking

Strongly typed languages are stricter than weakly typed languages. Languages that are strongly typed are likely to produce type errors where weakly typed languages will perform implicit type conversions or return unexpected results. Who has not encountered the magic of Javascript:
16 == [16] → true 16 == [1,6] → false "1,6" == [1,6] → true

Strongly typed languages FTW!

ADT's, GADT’s, and other esoteric terms

An algebraic data type (ADT) is a type formed by combining other types. There are two such types:

Product Types: Tuples and Records
Sum Types: Union or Variant Types

A Product Type is what we would understand as a record (or struct in C):

while a Sum Type generally can take one or more variants. For example (a binary tree in Haskell):
Tree = Empty | Leaf Int | Node Tree Tree

(The binary tree is either empty, has a single leaf, or has a node with two attached trees). What is happening is that we are able to define a datatype which can be one of several types of things. This is quite useful in many situations.
Type Variables are types that act as "placeholders" for other types. For example:
Tree a = Empty | Leaf a | Node Tree Tree

declares the same binary tree but this time, it can take any type of value rather than only being limited to Int's. The Type Variable ‘a’ acts as a placeholder for the actual type of elements that can be held in the tree.
A Type Class is similar to defining an interface - it defines a set of operations that must be present on a type to belong to a class. For example, let us take a look at the type class Eq in Haskell:
class Eq a where (==) :: a - a -Bool (/=) :: a - a -Bool

What this means is any type you define that has the operators (==) and (/=) defined on them belongs to the typeclass Eq.
A GADT - Generalized Algebraic Data Type - allows us more control over how the type is constructed. If you ever use Haskell or Ocaml - you'll probably get to use ADT's/GADT's/Type Classes and so on. Otherwise let’s just be content with our basic understanding of these esoteric constructs for now.

Conclusion

The wikipedia page on types says:

Software testing is an empirical method for finding errors that the type checker cannot detect.

I look at this statement and it clearly tells me that types, used correctly, are our first line of defense against bugs. Looking ahead, I fully expect the trend of more and more static typing to continue well into our careers.