How better speed performance in loops would be achieved in cython?

Question

I have started a project in python which mostly consists of loops. A few days ago I read about cython which helps you to get faster code by static-typing. I developed these two functions to check the performance (one is in python and the other in cython):

import numpy as np
from time import clock

size = 11
board = np.random.randint(2, size=(size, size))

def py_playout(board, N):
    black_rave = []
    white_rave = []
    for i in range(N):
        for x in range(board.shape[0]):
            for y in range(board.shape[1]):
                if board[(x,y)] == 0:
                    black_rave.append((x,y))
                else:
                    white_rave.append((x,y))
    return black_rave, white_rave

cdef cy_playout(board, int N):
    cdef list white_rave = [], black_rave = []
    cdef int M = board.shape[0], L = board.shape[1]
    cdef int i=0, x=0, y=0
    for i in range(N):
        for x in range(M):
            for y in range(L):
                if board[(x,y)] == 0:
                    black_rave.append((x,y))
                else:
                    white_rave.append((x,y))
    return black_rave, white_rave

I used the code below to test the performance after all:

t1 = clock()
a = playout(board, 1000)
t2 = clock()
b = playout1(board, 1000)
t3 = clock()

py = t2 - t1
cy = t3 - t2
print('cy is %a times better than py'% str(py / cy))

However I didn't find any noticeable improvements. I haven't used Typed-Memoryviews yet. Can anybody suggest useful solution to improve the speed or help me rewrite the code using typed-memoryview?

In order by difficulty and maybe performance, numpy.vectorize docs.scipy.org/doc/numpy/reference/generated/…, numba.jit numba.pydata.org/numba-doc/0.15.1/examples.html, "x86 intrinsics" software.intel.com/en-us/articles/thread-parallelism-in-cython — J'e, Commented Aug 1, 2017 at 19:49

MSeifert · Accepted Answer · 2017-08-01 19:42:35Z

4

You're right, without adding a type to the board parameter in the cython function the speedup isn't that much:

%timeit py_playout(board, 1000)
# 321 ms ± 19.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit cy_playout(board, 1000)
# 186 ms ± 541 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

But it's still a factor two faster. By adding a type, e.g.

cdef cy_playout(int[:, :] board, int N):
    # ...

# or if you want explicit types:
# cimport numpy as np
# cdef cy_playout(np.int64_t[:, :] board, int N):  # or np.int32_t

It's much faster (almost 10 times faster):

%timeit cy_playout(board, 1000)
# 38.7 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

I also used timeit (okay the IPython magic %timeit) to get more accurate timings.

Note that you can also use numba to achieve great speedups without any additional static typing:

import numba as nb

nb_playout = nb.njit(py_playout)  # Just decorated your python function

%timeit nb_playout(board, 1000)
# 37.5 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

edited Aug 1, 2017 at 19:42

answered Aug 1, 2017 at 19:30

MSeifert

152k41 gold badges349 silver badges366 bronze badges

Wow! You are 20 times faster in answering my question than me writing the questions!! ;-)
– masouduut94
Commented Aug 1, 2017 at 19:41
Hehe, no problem. I'm glad it helped you :)
– MSeifert
Commented Aug 1, 2017 at 19:44
You can also scrape off a few more ms by typing output: cdef tuple playout1 and typing that we are always adding a tuple to the lists: cdef tuple foo foo = (x,y) black_rave.append(foo)
– jeremycg
Commented Aug 1, 2017 at 19:58
@jeremycg That didn't provide any (significant) additional speedups on my machine. But thank you for the additional tipps :)
– MSeifert
Commented Aug 1, 2017 at 20:05
might be just noise, but on mine it does using %timeit :) the largest speedup for sure is typing the board.
– jeremycg
Commented Aug 1, 2017 at 20:11

Add a comment |

masouduut94 · Accepted Answer · 2017-08-03 13:22:11Z

I implemented a function which runs even faster. I simply declared black_rave and white_rave as memoryviews and put them in the return value:

cdef tuple cy_playout1(int[:, :] board, int N):
    cell_size = int((size ** 2) / 2) + 10
    cdef int[:, :] black_rave = np.empty([cell_size, 2], dtype=np.int32)
    cdef int[:, :] white_rave = np.empty([cell_size, 2], dtype=np.int32)

    cdef int i, j, x, y, h
    i, j = 0, 0
    cdef int M,L
    M = board.shape[0]
    L = board.shape[1]
    for h in range(N):
        for x in range(M):
            for y in range(L):
                if board[x,y] == 0:
                    black_rave[i][0], black_rave[i][1] = x, y
                    i += 1
                elif board[x,y] == 1:
                    white_rave[j][0], white_rave[j][1] = x, y
                    j += 1
        i = 0
        j = 0

    return black_rave[:i], white_rave[:j]

This is the speed test results:

%timeit py_playout(board, 1000)
%timeit cy_playout(board, 1000)
%timeit cy_playout1(board, 1000)
# 1 loop, best of 3: 200 ms per loop
# 100 loops, best of 3: 9.26 ms per loop
# 100 loops, best of 3: 4.88 ms per loop

Collectives™ on Stack Overflow

How better speed performance in loops would be achieved in cython?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
python-3.x
loops
cython
typed-memory-views
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pythonpython-3.xloopscythontyped-memory-views or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
python-3.x
loops
cython
typed-memory-views
or ask your own question.