I have started a project in python which mostly consists of loops. A few days ago I read about cython which helps you to get faster code by static-typing. I developed these two functions to check the performance (one is in python and the other in cython):
import numpy as np
from time import clock
size = 11
board = np.random.randint(2, size=(size, size))
def py_playout(board, N):
black_rave = []
white_rave = []
for i in range(N):
for x in range(board.shape[0]):
for y in range(board.shape[1]):
if board[(x,y)] == 0:
black_rave.append((x,y))
else:
white_rave.append((x,y))
return black_rave, white_rave
cdef cy_playout(board, int N):
cdef list white_rave = [], black_rave = []
cdef int M = board.shape[0], L = board.shape[1]
cdef int i=0, x=0, y=0
for i in range(N):
for x in range(M):
for y in range(L):
if board[(x,y)] == 0:
black_rave.append((x,y))
else:
white_rave.append((x,y))
return black_rave, white_rave
I used the code below to test the performance after all:
t1 = clock()
a = playout(board, 1000)
t2 = clock()
b = playout1(board, 1000)
t3 = clock()
py = t2 - t1
cy = t3 - t2
print('cy is %a times better than py'% str(py / cy))
However I didn't find any noticeable improvements. I haven't used Typed-Memoryviews yet. Can anybody suggest useful solution to improve the speed or help me rewrite the code using typed-memoryview?
numpy.vectorize
docs.scipy.org/doc/numpy/reference/generated/…,numba.jit
numba.pydata.org/numba-doc/0.15.1/examples.html, "x86 intrinsics" software.intel.com/en-us/articles/thread-parallelism-in-cython