Some things I have learnt about optimising Python

I've been enjoying writing my research code in Python over the past couple of years. I haven't had to put much effort into optimising it, so I never bothered, but just recently I've been working on a graph-search algorithm which can get quite heavy - there was one script I ran which took about a week.

So I've been learning how to optimise my Python code. I've got it running roughly 10 or 20 times faster than it was doing, which is definitely worth it. Here are some things I've learnt:

The golden rule is to profile your code before optimising it; don't waste your effort. The cProfile module is surprisingly easy to use, and it shows you where your code is using the most CPU. Here's an example of what I do on the commandline:
# run your heavy code for a good while, with the cProfile module logging it ('streammodels.py' is the script I'm profiling): python -m cProfile -o scriptprof streammodels.py # then to analyse it: python import pstats p = pstats.Stats('scriptprof') p.strip_dirs().sort_stats('cumulative').print_stats(30) p.strip_dirs().sort_stats('time').print_stats(30)
There are some lovely features in Python which are nice ways to code, but once you want your code to go fast you need to avoid them :( - boo hoo. It's a bit of a shame that you can't tell Python to act as an "optimising compiler" and automatically do this stuff for you. But here are two key things to avoid:
- list-comprehensions are nice and pythonic but they're BAD for fast code because they create an array even if you don't need it. Instead, use things like map() or filter() which process data without constructing a new array. (Also use "xrange" rather than "range" if you just want to iterate a range rather than keeping the resulting list.)
- lambdas and locally-defined functions (by which I mean something like "def myfunc():" as a local thing inside a function or method) are lovely for flexible programming, but when your code is ready to run fast and solid, you will often need to replace these with more "ordinary" functions. The reason is that you don't want these functions constructed afresh every time you use them; you want them constructed once and then just used.
Shock for scientists and other ex-matlabbers: using numpy isn't necessarily a good idea. For example, I lazily used numpy's "exp" and "log" when I could have used the math module and avoided dragging in the heavy array-processing facilities that I didn't need. After I changed my code to not actually use numpy (I didn't need it - I wasn't really using array/matrix maths for this particular code), I went much faster.
Cython is easy to use and speeds up your python code by turning it into C and compiling it for you - who could refuse? you can also add static typing things to speed it up even more but that makes it not pure python code so ignore that until you need it.
Name-lookups are apparently expensive in python (though I don't think the profiler really shows the effect this, so I can't tell if it's important). there's no harm in storing something in a local variable -- even a function, e.g. "detfunc = scipy.linalg.det".

So now I know these things my Python runs much faster. I'm sure there are many more tricks of the trade. For me as a researcher, I need to balance the time saved by optimising against the flexibility to change the code on a whim and sto be able to hack around with it, so I don't want to go too far down the optimisation rabbit-hole. The benefit of Python is its readability and hackability. It's particularly handy, for example, that Cython can speed up my code without me having to make any weird changes to it.

Any other top tips, please feel free to let me know...

Originally posted on Dan's personal blog.

Tweets by @soundsoftwareuk

Recent notes

MLSP Prizes for Reproducibility: Winners announced!

Announcing the winners of the MLSP 2014 and SoundSoftware.ac.uk Prizes for Reproducibility in Signal Processing, organised by SoundSoftware.ac.uk in conjuction with the IEEE Signal Processing Society for the 2014 IEEE International Workshop on Machine Learning for Signal Processing.

SoundSoftware 2014: Videos now available!

The SoundSoftware 2014 workshop, our third annual workshop on software and data in audio and music research, was just as enjoyable as the previous two. Because so much research in this field ends up being expressed through software, a software workshop turns out to be all about the means by which research becomes useful and relevant to people other than the original researchers—fertile ground for interesting and thought-provoking talks.

The workshop videos are now available online at http://soundsoftware.ac.uk/soundsoftware2014, so if you weren't able to make it in person, catch up here!

Our third annual one-day workshop on Software and Data for Audio and Music Research takes place on the 8th of July 2014 at Queen Mary, University of London. The workshop includes talks on issues such as robust software development for audio and music research, reproducible research in general, management of research data, and open access. Read more here, clear your calendar, and register now!