Basic Python: Lists, Tuples, Sets, Dictionaries

Johnny HopkinsSeptember 12, 2019

We have come to the end of our basic Python posts and will now examine lists, tuples, sets, and dictionaries. In our last post, I defined the idea of an array and explained how Python does not have arrays unless you are using NumPy and SciPy. Now, let's look at each of these data structures and then examine two sample programs that illustrate these concepts.

Before we talk about these data structures, let us define an important term: immutability. Immutability means the object does not change as it is manipulated. In Python terms, objects keep their characteristics, important for manipulating lists, sets, dictionaries, and tuples.

The first data structure we will look at is a list. A list is an object that is changeable and orderable. A list is defined as this:

listA = [1, 2, 3, 4]

List A has four elements and has an index starting at zero and ending at three. This is because an array goes from 0 to n-1. List A is changeable because I can manipulate the list, as in a NumPy program calculating statistics. It is orderable because I can change the elements and reorder them. The downside to lists is the high processing time to manipulate them within code.

This leads to the next data structure in our post, tuples. Tuples are collections of Python objects as shown here:

tupleA = (1, 2, 3, 4)

Tuples are similar to lists, except they are immutable and faster than lists. Each element is separated by a comma and declared as what you see above. Because you cannot change tuples, your program runs faster.

Next, we look at sets. Set() is an actual function called from core Python. The set() method is unordered, mutable, and has no duplicates. It resembles the mathematical structure of a set. Computer scientists learn about sets in discrete math and quantitative literacy courses. Here is a sample function call to set():

a = set([4,5,6,7])

b = set([7,8,9,10])

These two variables, a and b, decare two sets, with the common element of '7.' Mathematically, we represent this by two overlapping circles with '7' in the center, which is the intersection between a and b.

Finally, we look at dictionaries in Python. As mentioned in the post on decision statements, case structures can be simulated using dictionaries. Dictionaries are changeable, indexed, and unordered. They are declared as follows:

my_dict = {"a": 5, "b": 9}

The variable, my_dict, declares a dictionary with two elements and encapsulates them in curly braces. The dictionary could also represent the case structure as a console-based menu. Indexes and elements can change, so programs with this data structure could potentially run slowly. Algorithmic time measures are beyond the scope of the series, but can be looked at in an algorithms textbook.

Let us look at some code examples which use a couple of these structures. These programs are small, but very powerful. Here is a program which declares a list and a tuple:

Figure 1: structure.py

Figure 2: Output from structure.py

We declare a list, test_scores, and place five elements with an index of zero to four. We then call count() to get the index number of the list. We call reverse() to reverse the order of the list. Next, we declare a tuple, my_tuple, and populate it with four values plus print the elements to standard output.

Finally, we return to NumPy and SciPy for our next code example. You may recall from our earlier post, we used both libraries in tandem to calculate descriptive statistics on a list. We will create a new Python script with a new data set and this time we will return to Jupyter. Jupyter is the notebook interface with IPython as a foundation. IPython is an advanced shell to run Python code. Here is our notebook:

Figure 3: The Jupyter Notebook, structurestats

Figure 4: The Jupyter Notebook, structurestats (cont)

The Jupyter notebook shows how we call NumPy's statistical methods on our list to show how we do operations on a list. What do you think will happen if we use a tuple instead of a list? The tuple does not change the output because no matter if we use a list or tuple, NumPy and SciPy will treat the data set as an array and perform our calculations.

This concludes our post on data structures in Python. The final post will be a comprehensive example using all of our data science libraries on a moderate sized dataset.

Note: Most of the coding and screenshots came from the PyDroid app, giving Android users the ability to code on smartphones, tablets, and Chromebooks. I did one or two programs on my Acer netbook, but everything was done on the app.