Blog
Can I make Python as fast as Julia? Nope.
TL;DR - I tried to make a fractal generating Python program run as fast as the same program written in Julia. I couldn't. But I got close.
So I've been trying to get to grips with the programming language Julia. Julia is a pretty new language in the world of scientific computing and data science. It's been built from the ground up for the purpose of running numerically intensive programs using a clean and relatively simple syntax. Julia makes this happen with a combination of some careful language design decisions and a just-in-time LLVM-based compiler. The result is an interesting compromise between a fully compiled language like C and an interpreted language like Python.
Julia's generated a lot of excitement, but it's also very new so the ecosystem around it is still in the process of maturing. The 1.0 version of the language was only released in 2018 so there's a ways to go until it's as established as the other popular "number crunching" languages like Python, R and MATLAB. But with that said, I think there are big things on the horizon for Julia, particularly in the world of ML.
Julia vs Python
I was curious to see just how fast Julia code could run so I decided to benchmark it against Python which is my usual number crunching language. The task I settled on for this comparison was the calculation of fractal patterns. Computing these patterns involves some intensive calculating so it seemed like a nice way to demonstrate what Julia can do. Also the fractals are very pretty. One particular type of fractal pattern arises from calculating a numerical set in mathematics called the Julia set. So, Julia code making Julia sets!
Here's an example of one of the fractals I generated.
A nice explanation of what the julia set is can be found here. Specifically I looked at julia sets of the function f(z) = z^2 + c where z and c are complex numbers. Different sets and different patterns are produced by setting c to different values. x and y positions in the image correspond to the real and imaginary parts of z (z = x + iy). The above is an example of a julia set created with the setting c = -0.5 + 0.61i.
These patterns are slow to calculate because for each pixel in the image you have to repeatedly call (in series) a function many times in order to assess its convergence/non-convergence behaviour. However, each pixel calculation is independent of the other pixels so, at least in principle, could be carried out in parallel.
I wrote out the program in both languages in a kind of obvious way with no particular attempt to consider efficiency. Have a look at the Python code and the Julia code.
Python Code:
import numpy as np
import pickle
import time
def julia_func(z):
c = complex(-0.5, 0.61)
z = z**2 + c
return z
def gen_julia():
im_width, im_height = 5000, 5000
max_it = 300
max_z = 100
min_x, max_x = -1, 1
min_y, max_y = -1, 1
julia_set = np.zeros((im_width, im_height))
for ix in range(im_width):
for iy in range(im_height):
real_part = ((max_x-min_x)/im_width)*ix + min_x
im_part = ((max_y-min_y)/im_height)*iy + min_y
z = complex(real_part, im_part)
it = 0
while(abs(z) < max_z and it < max_it):
z = julia_func(z)
it += 1
ratio = it/max_it
julia_set[ix, iy] = ratio
return julia_set
def main():
t1 = time.time()
julia_set = gen_julia()
t2 = time.time()
print("Time to complete julia set: {:2f} secs".format(t2-t1))
with open("julia_data", "wb") as f:
pickle.dump(julia_set, f)
main()
Julia Code:
using Serialization
function julia_func(z)
c = complex(-0.5, 0.61)
z = z^2 + c
return z
end
function gen_julia()
im_width, im_height = 5000, 5000
max_it = 300
max_z2 = 100
min_x, max_x = -1, 1
min_y, max_y = -1, 1
julia_set = zeros(im_width,im_height)
for ix in 1:im_width
for iy in 1:im_height
real_part = ((max_x-min_x)/im_width)*ix + min_x
im_part = ((max_y-min_y)/im_height)*iy + min_y
z = complex(real_part, im_part)
it = 0
while(abs2(z) < max_z2 && it < max_it)
z = julia_func(z)
it += 1
end
ratio = it/max_it
julia_set[ix,iy] = ratio
end
end
return julia_set
end
function main()
@time julia_set = gen_julia()
serialize("julia_data",julia_set)
end
main()
I think it's interesting to compare them - notice how similar the syntax of Julia is to that of Python. Both, very clean and readable styles. Both programs generate the fractal pattern above, but the Julia program was a lot faster.
- Julia: 1.8 seconds
- Python: 300 seconds
This x150 speed up of Julia over Python is impressive. The variables and functions in the Julia program generally deal with unambiguous types despite me not manually annotating them. The crafty Julia compiler can clearly work out these stable types and then pre-compile a lot of the program.
Speeding Python up with Numpy Vectorization
This comparison might be considered a little unfair. Python simply isn't designed to run efficiently when written in this way. The program is explicitly looping through every pixel in the image which in raw Python is very inefficient. Usually when you want something like this to run efficiently in Python you would use numpy and vectorized operations as much as possible. So I spent some time re-writing the Python program that way.
Have a look at this vectorized python code.
import numpy as np
import pickle
import time
def gen_julia():
im_width, im_height = 5000, 5000
max_it = 300
max_z = 100
min_x, max_x = -1, 1
min_y, max_y = -1, 1
c = complex(-0.5, 0.61)
ix_array, iy_array = np.meshgrid(range(im_width),range(im_height))
real_part = ((max_x-min_x)/im_width)*ix_array + min_x
im_part = ((max_y-min_y)/im_height)*iy_array + min_y
z = real_part + (1j)*im_part
not_done = np.ones((im_width, im_height)) # Array which records which pixels have not finished being computed
it = np.zeros((im_width, im_height))
while np.any(not_done):
not_done = np.logical_and(np.abs(z) < max_z, it < max_it)
z[not_done] = z[not_done]**2 + c
it[not_done] +=1
julia_set = it/max_it
return julia_set
def main():
t1 = time.time()
julia_set = gen_julia()
t2 = time.time()
print("Time to complete julia set: {:2f} secs".format(t2-t1))
with open("julia_data", "wb") as f:
pickle.dump(julia_set, f)
main()
As you can see I've used numpy to vectorize as much of the program as possible. In particular I'm no longer explicitly looping through each pixel in the image. The optimized Python code definitely runs faster:
- Python + vectorized numpy ops: 103 seconds
That's an improvement from the naive Python implementation but it's still nowhere near as fast as Julia.
The cognitive overhead required to re-write the program in this more efficient way wasn't trivial. Also, I feel like the code has become a little harder to read and understand. I like self-documenting code and I think re-writing everything this way makes that more difficult to achieve with Python + numpy.
Speeding up Python with Numba, a Python JIT!
Next, I tried to use Numba to improve the efficiency of the Python code. Numba describes itself on its homepage as "an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code". Basically it allows you to manually annotate parts of your program which you would like Numba to pre-compile down to machine code before it gets run.
Adapting the Python code to use Numba was easy. I simply wrapped the two number crunching functions with the Numba jit decorator:
from numba import jit
@jit(nopython=True)
def julia_func(z):
...
@jit(nopython=True)
def gen_julia():
...
With just these small changes the code ran two orders of magnitude faster:
- Python + Numba: 3.5 seconds
That's pretty amazing. Numba pre-compiles the slow parts of the Python program and allows the program to execute almost as fast as Julia! In this case it also proved possible to do this without changing the clear implementation of the program which was kind of lost in the numpy vectorized version of the code.
Its worth mentioning that this was a pretty simple program to speed up with Numba. Indeed the incorporation of Numba was almost trivial here. However, in a more complex program this incorporation could get more complicated. You would probably need to use a profiler and think carefully about which parts of the program should be pre-compiled and which parts shouldn't. Its nice that Julia essentially does all this for you!
Final scoreboard:
- Julia: 1.8 seconds
- Python: 300 seconds
- Python + vectorized numpy ops: 103 seconds
- Python + Numba: 3.5 seconds
All benchmarks were run on my MacBook Pro 2014 with a 2.5 GHz Quad-Core Intel Core i7.
So circling back to the original question: Can I make Python as fast as Julia? The answer seems to be no, but nearly! I'm really impressed by Numba, and will definitely add it to my Python toolkit for the future.
My Takeaways
I think this has demonstrated two major strengths of Julia:
- You're able to transfer a lot of the cognitive load of finding run-time efficiencies from your own mind to the Julia JIT.
- You can write Julia code which runs efficiently but is also easy to read and understand.
Also, I think Numba is definitely worth checking out if you're looking for a quick way to speed up some number crunching Python.
Its worth noting that Julia isn't a completely free lunch and there are some important and interesting Julia gotchas that can cause major run-time inefficiencies if you're not careful. However, as long as you avoid these you'll likely be running self-documenting code in a very efficient way. I'm excited about the future of Julia and look forward to seeing what people can build with it.
If you'd like to try out the code or generate some fractal images yourself, you can find everything in my Github repo here. Try changing the value of the c parameter and see how it changes the pattern. Hint: the pattern is very sensitive to this value so only tiny changes are needed!