Tuesday 11 September 2007

Because Speed matters: 11 ways to optimise your python code

In a post that is both longer and more technical normal I am going to cover 11 ways in which you can speed up python code, and in the name of speed, let's get started:

1) Use the profiler

Much of optimising your python code is counter-intuitive (as you will see) - so it is vital to really know how your changes affect the speed of your code. If you have not already done so, run your code through the python profiler.

2) Lookups aren't free
Python is a very dynamic language and, unlike in compiled C, variable look-ups take time. With this in mind it is a good idea to make local references to any global variables you are using regualarly.

In Python 2.2+, variables are looked up in the following order:
The function's local scope
Any lexically enclosing functions, from inner to outer
The Global scope
Built-in Functions

So, it is especially important to define local references to Global variables. For example the (rather contrived) function foo(), which modifies a globaly scoped list a:

def foo():
for i in range(len(a)):
a[i] = a[i] * 2


would be better written:

def foo():
loca = a
for i in range(len(loca)):
a[i] = a[i] * 2

3) Use the -O flag
By running the python interpreter using the -O flag the interpreter generates optimised byte-code, yeilding a minor performance increase. If you are running your code directly rather than from the command line, you can add this to your shebang line. e.g.

#!/usr/local/bin/python -O

If you don't require docstrings in your compiled code you can use the -OO flag to generate byte-code without docstrings.

4) Use List Comprehensions
In general, list comprehensions are fast. For example, on my machine:

for i in range(100000):
b = [i for i in a if i%2 == 0]

runs 25 percent faster than:

for i in range(100000):
b = []
for j in a:
if j%2 == 0:
b.append(j)

5)Use as much C as possible:
Any available C functions are probably faster than what you would write in python. For example, if you are doing any linear algebra, it is highly recommended that you try using Numpy / SciPy if you have the option. If you have a function that needs to run really quickly, think about writing a C extension module. (unless you want to keep your code pure python for portability)

6)Use .join for string concatination.

For an example, for a tupple of six strings

endstr = "".join(list(strings))

runs 10-20 percent faster on my machine than

endstr = ""
for s in strings:
endstr = endstr + s


and

endstr = "".join(["a","b","c","d","e","f"])

runs 20 percent faster than

endstr = "a"
endstr = endstr + "b"
endstr = endstr + "c"
endstr = endstr + "d"
endstr = endstr + "e"
endstr = endstr + "f"

7) Store references to functions

Just as variable lookup is not free in python, neither is looking up functions. As an important example that you are probably using while you optimise your code, defining

loctime = time.time

allows you to call loctime() instead of time.time(), with an increase in speed of between 10 and 50 percent. This is especially important while optimising, even if you are only calling time a couple of times, since the varying look-up time can make a significant difference to your tests.

As previously mentioned, the Built-in functions are in the final scope searched, so this is useful for commonly used built-in functions.

8) Inline your functions
If possible, try to inline your commonly called functions. This saves a great deal of look-up time, and could make a significant difference. Here the code above takes up to 2 times as long to run as the code below.

slow:

def foo(a):
return a*2

for i in range(1000000):
b = foo(i)

faster:

for i in range(1000000):
b = 2*i

9) Go Psyco
Try using Psyco - a JIT compiler for python - if it is available on your system. Even if it is not, it's worth trying to import the module if your code is going to be run on other machines that may have it. Psyco can speed up code by between 2 and 1000 times.

10) Go fourth and multiply
General good advice no matter what language you are coding in (from ASM to javascript), try to avoid floating point division where possible. This is because floating point multiplication can be done in a single operation in the processor, where division is much slower. Many optimising compilers will try to do this for you, but it is better to take control if you can. e.g.

b = i/2.

takes twice as much time to run (on my system) as

b = i * 0.5

11 ) Don't cast if you don't have to.

If i is an int, remember (for example) that division by a float will automatically cast the result to a float. This can be important, here

b = i/2.

runs 3 times faster than

b = float(i)/2.

due to the overhead of looking up and calling float().

I hope these have been useful, feel free to add any more in the comments.