## Optimize a code - python

When I compare the two following codes, assume a_array is a 1000*3 array.
ni, nj = 100, 100
grid_ = []
for i in np.linspace(-5., 5., ni):
for j in np.linspace(-3.2, 3.2, nj):
mul = 0
for k in a_array:
if( k[0]>=i and k[0]<i+10.0/(ni-1) and k[1]>=j and k[1]<j+6.4/(nj-1)):
mul += 1
grid_.append(mul)
grid_ = np.array(grid_)
grid_ = grid_.reshape(ni, nj).T
That is somehow equal to :
np.histogram2d(a_array[:,0], a_array[:,1], [100, 100])
The first code runs about much slower than the second one.
I want modify the first one a little. mul += 1 -> mul += k[2], like:
ni, nj = 100, 100
grid_ = []
for i in np.linspace(-5., 5., ni):
for j in np.linspace(-3.2, 3.2, nj):
mul = 0
for k in a_array:
if( k[0]>=i and k[0]<i+10.0/(ni-1) and k[1]>=j and k[1]<j+6.4/(nj-1)):
mul += k[2]
grid_.append(mul)
grid_ = np.array(grid_)
grid_ = grid_.reshape(ni, nj).T
I am going to run a large size array so that the speed is important. Can anyone tells me how to optimize the last code?

## Related

### Growing a numpy array

I have a ruleset of 3x256 rules. Each rule maps to a 3x3 grid of values, which in turn themselves are rules. Example rules: 0 -> [[0,0,0],[0,1,0],[0,0,0]] 1 -> [[1,1,1],[0,0,0],[1,1,1]] Seed: [[0]] After 1 iteration: [[0,0,0], [0,1,0], [0,0,0]] After 2 iterations: [[0,0,0,0,0,0,0,0,0], [0,1,0,0,1,0,0,1,0], [0,0,0,0,0,0,0,0,0], [0,0,0,1,1,1,0,0,0], [0,1,0,0,0,0,0,1,0], [0,0,0,1,1,1,0,0,0], [0,0,0,0,0,0,0,0,0], [0,1,0,0,1,0,0,1,0], [0,0,0,0,0,0,0,0,0]] Now I have a working implementation, however, it's the slowest function in my script. I'm wondering if there is a more pythonic and more efficient way to rewrite this function. def decode(rules,fractal_iterations,seed): final_seed_matrix = np.zeros((3,3**fractal_iterations,3**fractal_iterations)) for i in range(dimensions): seed_matrix = np.array([[seed]]) for j in range(fractal_iterations): size_y = seed_matrix.shape[0] size_x = seed_matrix.shape[1] new_matrix = np.zeros((size_y*rule_size_sqrt,size_x*rule_size_sqrt)) for y in range(size_y): for x in range(size_x): seed_value = seed_matrix[y,x] new_matrix[y*rule_size_sqrt : y*rule_size_sqrt+rule_size_sqrt, x*rule_size_sqrt : x*rule_size_sqrt+rule_size_sqrt] = rules[int(seed_value),i] seed_matrix = new_matrix final_seed_matrix[i] = seed_matrix return np.moveaxis(final_seed_matrix,0,-1)

Here is an optimized version that uses advanced indexing to select and patch together all rules in one indexing step. This creates a 4D array with the appropriate rule at the position of the pixel it replaces. Flattening that to 2D is then a matter of swapping the middle axes and reshaping. It appears to give the same result as yours, but significantly faster (only tested for integer rules so far): results equal: True OP : 24.883304461836815 ms optimized: 1.093490980565548 ms Code: import numpy as np dimensions = 3 rule_size_sqrt = 3 def decode(rules,fractal_iterations,seed): final_seed_matrix = np.zeros((3,3**fractal_iterations,3**fractal_iterations)) for i in range(dimensions): seed_matrix = np.array([[seed]]) for j in range(fractal_iterations): size_y = seed_matrix.shape[0] size_x = seed_matrix.shape[1] new_matrix = np.zeros((size_y*rule_size_sqrt,size_x*rule_size_sqrt)) for y in range(size_y): for x in range(size_x): seed_value = seed_matrix[y,x] new_matrix[y*rule_size_sqrt : y*rule_size_sqrt+rule_size_sqrt, x*rule_size_sqrt : x*rule_size_sqrt+rule_size_sqrt] = rules[int(seed_value),i] seed_matrix = new_matrix final_seed_matrix[i] = seed_matrix return np.moveaxis(final_seed_matrix,0,-1) def decode_fast(rules, fractal_iterations, seed): rules_int = rules.astype(int) seed = np.array([[seed]], dtype=int) res = np.empty((3**fractal_iterations, 3**fractal_iterations, dimensions), dtype=rules.dtype) for i in range(dimensions): grow = seed for j in range(1, fractal_iterations): grow = rules_int[grow, i].swapaxes(1, 2).reshape(3**j, -1) grow = rules[grow, i].swapaxes(1, 2).reshape(3**fractal_iterations, -1) res[..., i] = grow return res rules = np.random.randint(0, 4, (4, dimensions, 3, 3)) seed = 1 fractal_iterations = 5 print('results equal:', np.all(decode(rules, fractal_iterations, seed) == decode_fast(rules, fractal_iterations, seed))) from timeit import repeat print('OP :', min(repeat('decode(rules, fractal_iterations, seed)', globals=globals(), number=50))*20, 'ms') print('optimized:', min(repeat('decode_fast(rules, fractal_iterations, seed)', globals=globals(), number=50))*20, 'ms')

### Pythonic way to access numpy array in reverse order

I'm trying to re-write some code that looks like it was written by a FORTRAN programmer to make it more Pythonic/readable. Below is the code snippet of interest. The overall behavior of the code is to store the first three elements of Z into Zstart so long as the values are less than 1, and also store the last three values of Z into Zend as long as they are less than 1 as well. import numpy as np nbpoints = 3 Z = np.linspace(0,1.0,10) Zstart = np.ones(nbpoints) Zend = np.ones(nbpoints) Zpts = np.size(Z) for j in range(nbpoints): if Z[j] < Zstart[j]: Zstart[j] = Z[j] if Z[Zpts - 1 - j] < Zend[nbpoints - 1 - j]: Zend[nbpoints - 1 - j] = Z[Zpts - 1 - j] The counter moving access of Zstart and Zend from both ends is tripping me up a bit. My current solution is the following. import numpy as np nbpoints = 3 Z = np.linspace(0,1.0,10) Zstart = np.ones(nbpoints) Zend = np.ones(nbpoints) Zpts = np.size(Z) for j in range(nbpoints): if Z[j] < Zstart[j]: Zstart[j] = Z[j] if Z[-(j+1)] < Zend[-(j+1)]: Zend[-(j+1)] = Z[-(j+1)] Sample output from this code is: Z = [ 0.0 0.11111111 0.22222222 0.33333333 0.44444444 0.55555556 0.66666667 0.77777778 0.88888889 1.0 ] Zstart = [ 0.0 0.11111111 0.22222222] Zend = [ 0.77777778 0.88888889 1.0 ] My solution feels like I'm still just re-writing poorly written code e.g. rearranging chairs on the deck of the Titanic. Is there a more Pythonic way to perform this operation?

You don't need to instantiate Zstart and Zend with np.ones. Just build them directly: nbpoints = 3 Z = np.linspace(0,1.0,10) Zstart = Z[:nbpoints][Z[:nbpoints] < 1] Zend = Z[-nbpoints:][Z[-nbpoints:] < 1] print(Zstart) print(Zend) # [ 0. 0.11111111 0.22222222] # [ 0.77777778 0.88888889] Notice that Zend has only 2 elements here, because the final element in Z is not less than 1.

This code gives the same results without a moving counter nbpoints = 3 Z=np.linspace(0,1.,10.) Zstart = np.ones(nbpoints) Zend = np.ones(nbpoints) Zstart[Z[:nbpoints] < 1] = Z[:nbpoints][Z[:nbpoints] < 1] Zend[Z[-nbpoints:] < 1] = Z[-nbpoints:][Z[-nbpoints:] < 1]

### What does “while (1<<n) < self.top:” mean?

I saw this statement in the source code of Scapy in volatile.py. What is the meaning of (1 << n) in the while condition? def __init__(self, inf, sup, seed=None, forever=1, renewkeys=0): self.forever = forever self.renewkeys = renewkeys self.inf = inf self.rnd = random.Random(seed) self.sbox_size = 256 self.top = sup-inf+1 n=0 while (1<<n) < self.top: n += 1 self.n =n self.fs = min(3,(n+1)/2) self.fsmask = 2**self.fs-1 self.rounds = max(self.n,3) self.turns = 0 self.i = 0

<< is a bitwise operator, as explained here: x << y Returns x with the bits shifted to the left by y places (and new bits on the right-hand-side are zeros). This is the same as multiplying x by 2**y. so here in the code you provided, the while loop goes until the biggest power of 2 number which is less than self.top , if we assume self.top is 100 , the biggest power of two which is less than 100 is 64, if it were 200, it would go until 128 In [17]: n = 0 In [18]: top = 100 In [19]: while ( 1 << n ) < top: ...: print ( 1<<n) ...: n+=1 ...: 1 2 4 8 16 32 64

### Eliminate for loops in numpy implementation

I have the following dataset in numpy indices | real data (X) |targets (y) | | 0 0 | 43.25 665.32 ... |2.4 } 1st block 0 0 | 11.234 |-4.5 } 0 1 ... ... } 2nd block 0 1 } 0 2 } 3rd block 0 2 } 1 0 } 4th block 1 0 } 1 0 } 1 1 ... 1 1 1 2 1 2 2 0 2 0 2 1 2 1 2 1 ... Theses are my variables idx1 = data[:,0] idx2 = data[:,1] X = data[:,2:-1] y = data[:,-1] I also have a variable W which is a 3D array. What I need to do in the code is loop through all the blocks in the dataset and return a scalar number for each block after some computation, then sum up all the scalars, and store it in a variable called cost. Problem is that the looping implementation is very slow, so I'm trying to do it vectorized if possible. This is my current code. Is it possible to do this without for loops in numpy? IDX1 = 0 IDX2 = 1 # get unique indices idx1s = np.arange(len(np.unique(data[:,IDX1]))) idx2s = np.arange(len(np.unique(data[:,IDX2]))) # initialize global sum variable to 0 cost = 0 for i1 in idx1s: for i2 in idx2: # for each block in the dataset mask = np.nonzero((data[:,IDX1] == i1) & (data[:,IDX2] == i2)) # get variables for that block curr_X = X[mask,:] curr_y = y[mask] curr_W = W[:,i2,i1] # calculate a scalar pred = np.dot(curr_X,curr_W) sigm = 1.0 / (1.0 + np.exp(-pred)) loss = np.sum((sigm- (0.5)) * curr_y) # add result to global cost cost += loss Here is some sample data data = np.array([[0,0,5,5,7], [0,0,5,5,7], [0,1,5,5,7], [0,1,5,5,7], [1,0,5,5,7], [1,1,5,5,7]]) W = np.zeros((2,2,2)) idx1 = data[:,0] idx2 = data[:,1] X = data[:,2:-1] y = data[:,-1]

That W was tricky... Actually, your blocks are pretty irrelevant, apart from getting the right slice of W to do the np.dot with the corresponding X, so I went the easy route of creating an aligned_W array as follows: aligned_W = W[:, idx2, idx1] This is an array of shape (2, rows) where rows is the number of rows of your data set. You can now proceed to do your whole calculation without any for loops as: from numpy.core.umath_tests import inner1d pred = inner1d(X, aligned_W.T) sigm = 1.0 / (1.0 + np.exp(-pred)) loss = (sigm - 0.5) * curr_y cost = np.sum(loss)

My guess is the major reason your code is slow is the following line: mask = np.nonzero((data[:,IDX1] == i1) & (data[:,IDX2] == i2)) Because you repeatedly scan your input arrays for small number of rows of interest. So you need to do the following: ni1 = len(np.unique(data[:,IDX1])) ni2 = len(np.unique(data[:,IDX2])) idx1s = np.arange(ni1) idx2s = np.arange(ni2) key = data[:,IDX1] * ni2 + data[:,IDX2] # 1D key to the rows sortids = np.argsort(key) #indices to the sorted key Then inside the loop instead of mask=np.nonzero(...) you need to do curid = i1 * ni2 + i2 left = np.searchsorted(key, curid, 'left', sorter=sortids) right=np.searchsorted(key, curid, 'right', sorter=sortids) mask = sortids[left:right]

I don't think that there is a way to compare numpy array of different sizes without using for loops. Would be hard to decide what is the output meaning and shape of something like [0,1,2,3,4] == [3,4,2] The only suggestion that I can give you is to get rid of one of the for loop using itertools.product: import itertools as it [...] idx1s = np.unique(data[:,IDX1]) idx2s = np.unique(data[:,IDX2]) # initialize global sum variable to 0 cost = 0 for i1, i2 in it.product(idx1s, idx2): # for each block in the dataset mask = np.nonzero((data[:,IDX1] == i1) & (data[:,IDX2] == i2)) # get variables for that block curr_X = X[mask,:] curr_y = y[mask] [...] You can also keep mask as a bool array mask = (data[:,IDX1] == i1) & (data[:,IDX2] == i2) The output is the same and you have to use anyway the memory to create the bool array. Doing this way saves you some memory and a function evaluation EDIT If you know that the indices do not have holes or have few holes, might be worth to remove the part where you define idx1s and idxs2 and change the for loop to max1, max2 = data[:,[IDX1, IDX2]].max(axis=0) for i1, i2 in it.product(xrange(max1), xrange(max2)): [...] Both xrange and it.product are iterators, so they create only i1 and i2 when you need. ps: if you are on python3.x use range instead of xrange

### Assistance with taking matrix to power of constant

below i have provided all my code for this program I am trying to develop. What this takes as in input is an N x 3 file; i will provide the sample of what im using below (its just a 5x3 sample). Each sample represents a co-ordinate of a pixel in an image, which has been scaled to some XYZ co-ordinate using multidimensional scaling.The purpose of this program is to go from XYZ co-ordinates, to LaB color... which can then be translated into sRGB. The code below (the second portion) shows the transformation from XYZ to LaB, and the upper portion (labelled Fast XYZ - RGB) is a shortcut i found to go from XYZ to RGB cutting out the LaB step. The problem resides in the Fast XYZ - RGB step. What i am trying to do is make the sRGBmat = (1 + val) * RGBLin ^ (1/2.4) - val The problem that i keep running into is that of RGBLin can sometimes be a negative number... which means i have to use Cmath or something else. I tried using Cmath, but it gave me the incorrect values- In MatLab, it gives me a proper number, (well a real + imaginary part), which i can still use. The file xyztest.txt contains a 5x3 matrix with the following values: .2345 .9817 .7612 .5465 .7897 .3514 .7796 .6765 .5645 .1221 .6376 .8790 .5432 .5853 .4652 The output should (with a few more computations) result in an N x 3 matrix, where each row is representative of the RGB values at pixel 1-n of row 1 (for the first n values), then row 2 for the next n+1 values- Any help would be greatly appreciated! import numpy as np d=open('xyztest.txt', 'r') import cmath a=[] count = 0 b = [] AoverAn = [] XoX = [] YoY = [] ZoZ = [] aova=[] c = 0 while 1: line = d.readline() a.append(line.split()) count = count + 1 if not line: break #print a #contains all of the line elements in a list t=[] XYZM = [] illuminant = [94.9423, 100.0000, 108.7201] ##or is it [ .9424, 1.000, .8249] which is in matlab- #print count for i in range(count-1): b = a[i:(i+1)] #print "this is", b c = b[0] x = c[0] y = c[1] z = c[2] XoverXn = round(float(x) /illuminant [0], 10) YoverYn = round(float(y) / illuminant [1], 10) ZoverZn = round(float(z) / illuminant [2], 10) XoX.append(XoverXn) YoY.append(YoverYn) ZoZ.append(ZoverZn) x.replace('\'', '') mmaker = (float("".join(x)), float("".join(y)), float("".join(z))) XYZM.append(mmaker) L = [] a = [] b = [] fXoX = [] fYoY = [] fZoZ = [] Lab = [] ##print "YOUR XYZ MAT", XYZM ##Get an XYZ matrix so i can use fast XYZ to RGB Fast XYZ > RGB ##A is the thing we want to multiply A= np.matrix('3.2410, -1.5374, -0.4986 ;-.9692, 1.8760, 0.0416 ; .0556, -.2040, 1.0570') ##we get [R,G,B]' = A * [X,Y,Z]' ##Must be in the range 0-1 RGBLin=[] ##XYZM = float(XYZM) print "XYZ" print XYZM xyzt = np.transpose(np.matrix(XYZM)) RGBLin = np.transpose(A * xyzt) val = 0.555 temp = (RGBLin <= 0.00304) #print temp print "RGB" ##print RGBLin ## Do power multiplcation because numpy doesnt want to work for non square mat for i in range(len(RGBLin)): for j in range(1): rgbline = RGBLin[i].tolist() for item in rgbline: for i in range(3): print item[i] item[i] = 1.055 + item[i+1]**(1/2.4) print item[i] print item #print rgbline #te[i][j] = pow(RGBLin[i][j] , (1./2.4)) #print te -> The problem resides in this step, i am trying to take the matrix to the power of (1/2.4), but some values of the matrix are negative- How do i get python to give me a value??! #te = pow(RGBLin, (1./2.4)) XYZ -> LAB for i in range(len(XoX)): #print YoY[i] xyz = [] test = float(pow(YoY[i],(1./3))) #print test if (YoY[i] > 0.008856): L.append((116 * (YoY[i] **(1./3))) - 16) #L1 = (116 * (YoY[i] **(1./3))) - 16 else: L.append(903.3* YoY[i]) #L1 = 903.3* YoY[i] ## if (XoX[i] > 0.008856): fXoX.append(pow(XoX[i], (1./3))) #A1 = pow(XoX[i], (1./3)) else: fXoX.append((7.787 * XoX[i])+(16/116)) #A1 = (7.787 * XoX[i])+(16/116) ## if (YoY[i] > 0.008856): fYoY.append(pow(YoY[i], (1./3))) #B1 = pow(YoY[i], (1./3)) else: fYoY.append((7.787 * YoY[i])+(16/116)) #B1 = (7.787 * YoY[i])+(16/116) ## if (ZoZ[i] > 0.008856): fZoZ.append(pow(ZoZ[i], (1./3))) #Z1 = pow(ZoZ[i], (1./3)) else: fZoZ.append((7.787 * ZoZ[i])+(16/116)) #Z1 = (7.787 * ZoZ[i])+(16/116) ## a.append(500*(fXoX[i]-fYoY[i])) b.append(500*(fYoY[i]-fZoZ[i])) xyz.append((L[i], a[i], b[i])) ##print xyz ######### NOW we must go from Lab to RGB, where XYZ is the LaB co-ordinates######

Tell numpy that your numbers are complex. In [1]: import numpy as np In [2]: r = np.array([-5, 2, 8, -1]) In [3]: r ** (1/2.4) /usr/local/share/python3/ipython3:1: RuntimeWarning: invalid value encountered in power #!/usr/local/Cellar/python3/3.2.2/bin/python3.2 Out[3]: array([ nan, 1.33483985, 2.37841423, nan]) In [4]: c = r.astype(complex) In [5]: c ** (1/2.4) Out[5]: array([ 0.50609696+1.88877958j, 1.33483985+0.j , 2.37841423+0.j , 0.25881905+0.96592583j]) There's some discussion of this on scipy.org.