The basic structure of Python is great for a lot of purposes, as it is super simple, and the underlying Python system handles much of the more complicated data management operations more or less transparently: there’s no need to tell Python to allocate or free memory, or what the data type of variables is, or how to actually complete many complex operations. Thus we can do things like creating a list of values or initializing a variable by simply writing:
[1]:
x = list(range(5000))
y = 0
However, this simplicity comes at a cost: Python handles all these tasks for you, but depending on the workload you create, it may end up being extremely slow. For example, consider the following commands to square the numbers in our list:
[2]:
for i in x:
y = y + i**2
y
[2]:
41654167500
Underneath the hood, Python is actually executing low level commands roughly equivalent to this monologue:
x
and store it in i
.i
is an integer.i
somewhere in the computer’s memory.i
and raise it to the power of 2.y
is an integer.+
operator means ‘add’ when applied to integers.y
somewhere in the computer’s memory.y
to the integer result of the power operation above.y
, and write the value to somewhere in the computer’s memory.y
in one additional place.y
in one fewer place.y
is now zero, I can flag this memory to be freed up.x
and store it in i
.i
in one fewer place.i
is now zero, I can flag this memory to be freed up.i
is an integer.i
and raise it to the power of 2.y
is an integer.+
operator means ‘add’ when applied to integers.y
to the integer result of the power operation above.y
.y
in one additional place.y
in one fewer place.y
is now zero, I can flag this memory to be freed up.x
and store it in i
.Fortunately, our computer can run through these instructions quite fast, so for the five item list, this quite long list of instructions appears to run instantanously. Nevertheless, it is obviously extraordinarily repetitive, and if the loop is to run on thousands of items in a list, the sheer number of instructions will slow down the process to the point where executing code becomes painfully slow.
To speed things up, it is necessary to get rid of a lot of the excessively repetitive instructions. This is called “vectorization” of Python code. The two key features of vectorization that helps speed things up are:
The numpy
package includes the tools necessary to do this efficiently. For example, we can vectorize the example code above like this:
[3]:
import numpy
z = numpy.arange(5000)
(z**2).sum()
[3]:
41654167500
In comparison to the instructions above, this code is executed in a manner loosely equivalent to this monologue:
x
somewhere in a contiguous block of the computer’s memory.x
, square it, and write that to the memory allocated above.sum
method means ‘add these up’ when applied to integers.It should be clear that the code above is wildly more efficient. For such a small set of operations described above, the different in time that it takes to compute the result either way is insignificant for a human perspective: does it really matter to you if the work is completed in a millisecond or a microsecond?
[4]:
%%timeit
y = 0
for i in x:
y = y + i**2
2.05 ms ± 287 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
[5]:
%%timeit
(z**2).sum()
8.41 µs ± 578 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
But when scaled up to larger data sets and more complex operations, these differences are extremely important: make each a million times longer, and we’re looking at the difference between one second and 15 minutes.
Many of the tools and libraries described in the rest of these tutorials already take advantage of the speed benefit of vectorization as much as reasonable, but it is still important for a user to know about these differences. The simplest way to take advantage of these speed ups in coding your own Python tools is to avoid using “for” loops as much as possible in any section of your code that runs slowly, especially in places where “for” loops are nested inside other “for” loops. For example, writing code that loops for each destination TAZ inside a loop for each origin TAZ is generally a bad idea.
Conversely, there’s no reason to fear using “for” loops in other places. As we observed above, a single loop that iterates over a few thousand items in a list can complete in just a millisecond. It’s only necessary to vectorize code when the number of loops and the complexity of calculation inside the loop are both high.
The arange
function in numpy
creates an array of evenly spaced values, analogous to the range
function in Python itself. Arrays can also be created out of lists:
[6]:
a = numpy.asarray([11,22,33,44])
a
[6]:
array([11, 22, 33, 44])
Importantly, the data type of the values in the array is all the same. You can inspect the datatype using the dtype
attribute:
[7]:
a.dtype
[7]:
dtype('int64')
You cannot create a simple array of mixed data types; instead the data will be up-casted as necessary to create a common datatype. For example, if you mix integers and floating point values, the resulting array will be all floating point values.
[8]:
b = numpy.asarray([11,22,33,4.4])
b.dtype
[8]:
dtype('float64')
[9]:
b
[9]:
array([11. , 22. , 33. , 4.4])