## Optimize a code - python

### Growing a numpy array

```I have a ruleset of 3x256 rules. Each rule maps to a 3x3 grid of values, which in turn themselves are rules.
Example rules:
0 -> [[0,0,0],[0,1,0],[0,0,0]]
1 -> [[1,1,1],[0,0,0],[1,1,1]]
Seed:
[]
After 1 iteration:
[[0,0,0],
[0,1,0],
[0,0,0]]
After 2 iterations:
[[0,0,0,0,0,0,0,0,0],
[0,1,0,0,1,0,0,1,0],
[0,0,0,0,0,0,0,0,0],
[0,0,0,1,1,1,0,0,0],
[0,1,0,0,0,0,0,1,0],
[0,0,0,1,1,1,0,0,0],
[0,0,0,0,0,0,0,0,0],
[0,1,0,0,1,0,0,1,0],
[0,0,0,0,0,0,0,0,0]]
Now I have a working implementation, however, it's the slowest function in my script. I'm wondering if there is a more pythonic and more efficient way to rewrite this function.
def decode(rules,fractal_iterations,seed):
final_seed_matrix = np.zeros((3,3**fractal_iterations,3**fractal_iterations))
for i in range(dimensions):
seed_matrix = np.array([[seed]])
for j in range(fractal_iterations):
size_y = seed_matrix.shape
size_x = seed_matrix.shape
new_matrix = np.zeros((size_y*rule_size_sqrt,size_x*rule_size_sqrt))
for y in range(size_y):
for x in range(size_x):
seed_value = seed_matrix[y,x]
new_matrix[y*rule_size_sqrt : y*rule_size_sqrt+rule_size_sqrt, x*rule_size_sqrt : x*rule_size_sqrt+rule_size_sqrt] = rules[int(seed_value),i]
seed_matrix = new_matrix
final_seed_matrix[i] = seed_matrix
return np.moveaxis(final_seed_matrix,0,-1)
```
```Here is an optimized version that uses advanced indexing to select and patch together all rules in one indexing step. This creates a 4D array with the appropriate rule at the position of the pixel it replaces. Flattening that to 2D is then a matter of swapping the middle axes and reshaping. It appears to give the same result as yours, but significantly faster (only tested for integer rules so far):
results equal: True
OP : 24.883304461836815 ms
optimized: 1.093490980565548 ms
Code:
import numpy as np
dimensions = 3
rule_size_sqrt = 3
def decode(rules,fractal_iterations,seed):
final_seed_matrix = np.zeros((3,3**fractal_iterations,3**fractal_iterations))
for i in range(dimensions):
seed_matrix = np.array([[seed]])
for j in range(fractal_iterations):
size_y = seed_matrix.shape
size_x = seed_matrix.shape
new_matrix = np.zeros((size_y*rule_size_sqrt,size_x*rule_size_sqrt))
for y in range(size_y):
for x in range(size_x):
seed_value = seed_matrix[y,x]
new_matrix[y*rule_size_sqrt : y*rule_size_sqrt+rule_size_sqrt, x*rule_size_sqrt : x*rule_size_sqrt+rule_size_sqrt] = rules[int(seed_value),i]
seed_matrix = new_matrix
final_seed_matrix[i] = seed_matrix
return np.moveaxis(final_seed_matrix,0,-1)
def decode_fast(rules, fractal_iterations, seed):
rules_int = rules.astype(int)
seed = np.array([[seed]], dtype=int)
res = np.empty((3**fractal_iterations, 3**fractal_iterations, dimensions),
dtype=rules.dtype)
for i in range(dimensions):
grow = seed
for j in range(1, fractal_iterations):
grow = rules_int[grow, i].swapaxes(1, 2).reshape(3**j, -1)
grow = rules[grow, i].swapaxes(1, 2).reshape(3**fractal_iterations, -1)
res[..., i] = grow
return res
rules = np.random.randint(0, 4, (4, dimensions, 3, 3))
seed = 1
fractal_iterations = 5
print('results equal:', np.all(decode(rules, fractal_iterations, seed) == decode_fast(rules, fractal_iterations, seed)))
from timeit import repeat
print('OP :', min(repeat('decode(rules, fractal_iterations, seed)', globals=globals(), number=50))*20, 'ms')
print('optimized:', min(repeat('decode_fast(rules, fractal_iterations, seed)', globals=globals(), number=50))*20, 'ms')```

### Pythonic way to access numpy array in reverse order

```I'm trying to re-write some code that looks like it was written by a FORTRAN programmer to make it more Pythonic/readable. Below is the code snippet of interest. The overall behavior of the code is to store the first three elements of Z into Zstart so long as the values are less than 1, and also store the last three values of Z into Zend as long as they are less than 1 as well.
import numpy as np
nbpoints = 3
Z = np.linspace(0,1.0,10)
Zstart = np.ones(nbpoints)
Zend = np.ones(nbpoints)
Zpts = np.size(Z)
for j in range(nbpoints):
if Z[j] < Zstart[j]:
Zstart[j] = Z[j]
if Z[Zpts - 1 - j] < Zend[nbpoints - 1 - j]:
Zend[nbpoints - 1 - j] = Z[Zpts - 1 - j]
The counter moving access of Zstart and Zend from both ends is tripping me up a bit. My current solution is the following.
import numpy as np
nbpoints = 3
Z = np.linspace(0,1.0,10)
Zstart = np.ones(nbpoints)
Zend = np.ones(nbpoints)
Zpts = np.size(Z)
for j in range(nbpoints):
if Z[j] < Zstart[j]:
Zstart[j] = Z[j]
if Z[-(j+1)] < Zend[-(j+1)]:
Zend[-(j+1)] = Z[-(j+1)]
Sample output from this code is:
Z = [ 0.0 0.11111111 0.22222222 0.33333333 0.44444444 0.55555556
0.66666667 0.77777778 0.88888889 1.0 ]
Zstart = [ 0.0 0.11111111 0.22222222]
Zend = [ 0.77777778 0.88888889 1.0 ]
My solution feels like I'm still just re-writing poorly written code e.g. rearranging chairs on the deck of the Titanic. Is there a more Pythonic way to perform this operation?
```
```You don't need to instantiate Zstart and Zend with np.ones. Just build them directly:
nbpoints = 3
Z = np.linspace(0,1.0,10)
Zstart = Z[:nbpoints][Z[:nbpoints] < 1]
Zend = Z[-nbpoints:][Z[-nbpoints:] < 1]
print(Zstart)
print(Zend)
# [ 0. 0.11111111 0.22222222]
# [ 0.77777778 0.88888889]
Notice that Zend has only 2 elements here, because the final element in Z is not less than 1.
```
```This code gives the same results without a moving counter
nbpoints = 3
Z=np.linspace(0,1.,10.)
Zstart = np.ones(nbpoints)
Zend = np.ones(nbpoints)
Zstart[Z[:nbpoints] < 1] = Z[:nbpoints][Z[:nbpoints] < 1]
Zend[Z[-nbpoints:] < 1] = Z[-nbpoints:][Z[-nbpoints:] < 1]```

### What does “while (1<<n) < self.top:” mean?

```I saw this statement in the source code of Scapy in volatile.py.
What is the meaning of (1 << n) in the while condition?
def __init__(self, inf, sup, seed=None, forever=1, renewkeys=0):
self.forever = forever
self.renewkeys = renewkeys
self.inf = inf
self.rnd = random.Random(seed)
self.sbox_size = 256
self.top = sup-inf+1
n=0
while (1<<n) < self.top:
n += 1
self.n =n
self.fs = min(3,(n+1)/2)
self.rounds = max(self.n,3)
self.turns = 0
self.i = 0
```
```<< is a bitwise operator, as explained here:
x << y Returns x with the bits shifted to the left by y places (and
new bits on the right-hand-side are zeros). This is the same as
multiplying x by 2**y.
so here in the code you provided, the while loop goes until the biggest power of 2 number which is less than self.top , if we assume self.top is 100 , the biggest power of two which is less than 100 is 64, if it were 200, it would go until 128
In : n = 0
In : top = 100
In : while ( 1 << n ) < top:
...: print ( 1<<n)
...: n+=1
...:
1
2
4
8
16
32
64```

### Eliminate for loops in numpy implementation

```I have the following dataset in numpy
indices | real data (X) |targets (y)
| |
0 0 | 43.25 665.32 ... |2.4 } 1st block
0 0 | 11.234 |-4.5 }
0 1 ... ... } 2nd block
0 1 }
0 2 } 3rd block
0 2 }
1 0 } 4th block
1 0 }
1 0 }
1 1 ...
1 1
1 2
1 2
2 0
2 0
2 1
2 1
2 1
...
Theses are my variables
idx1 = data[:,0]
idx2 = data[:,1]
X = data[:,2:-1]
y = data[:,-1]
I also have a variable W which is a 3D array.
What I need to do in the code is loop through all the blocks in the dataset and return a scalar number for each block after some computation, then sum up all the scalars, and store it in a variable called cost. Problem is that the looping implementation is very slow, so I'm trying to do it vectorized if possible. This is my current code. Is it possible to do this without for loops in numpy?
IDX1 = 0
IDX2 = 1
# get unique indices
idx1s = np.arange(len(np.unique(data[:,IDX1])))
idx2s = np.arange(len(np.unique(data[:,IDX2])))
# initialize global sum variable to 0
cost = 0
for i1 in idx1s:
for i2 in idx2:
# for each block in the dataset
mask = np.nonzero((data[:,IDX1] == i1) & (data[:,IDX2] == i2))
# get variables for that block
curr_W = W[:,i2,i1]
# calculate a scalar
pred = np.dot(curr_X,curr_W)
sigm = 1.0 / (1.0 + np.exp(-pred))
loss = np.sum((sigm- (0.5)) * curr_y)
# add result to global cost
cost += loss
Here is some sample data
data = np.array([[0,0,5,5,7],
[0,0,5,5,7],
[0,1,5,5,7],
[0,1,5,5,7],
[1,0,5,5,7],
[1,1,5,5,7]])
W = np.zeros((2,2,2))
idx1 = data[:,0]
idx2 = data[:,1]
X = data[:,2:-1]
y = data[:,-1]
```
```That W was tricky... Actually, your blocks are pretty irrelevant, apart from getting the right slice of W to do the np.dot with the corresponding X, so I went the easy route of creating an aligned_W array as follows:
aligned_W = W[:, idx2, idx1]
This is an array of shape (2, rows) where rows is the number of rows of your data set. You can now proceed to do your whole calculation without any for loops as:
from numpy.core.umath_tests import inner1d
pred = inner1d(X, aligned_W.T)
sigm = 1.0 / (1.0 + np.exp(-pred))
loss = (sigm - 0.5) * curr_y
cost = np.sum(loss)
```
```My guess is the major reason your code is slow is the following line:
mask = np.nonzero((data[:,IDX1] == i1) & (data[:,IDX2] == i2))
Because you repeatedly scan your input arrays for small number of rows of interest. So you need to do the following:
ni1 = len(np.unique(data[:,IDX1]))
ni2 = len(np.unique(data[:,IDX2]))
idx1s = np.arange(ni1)
idx2s = np.arange(ni2)
key = data[:,IDX1] * ni2 + data[:,IDX2] # 1D key to the rows
sortids = np.argsort(key) #indices to the sorted key
Then inside the loop instead of
you need to do
curid = i1 * ni2 + i2
left = np.searchsorted(key, curid, 'left', sorter=sortids)
right=np.searchsorted(key, curid, 'right', sorter=sortids)
```
```I don't think that there is a way to compare numpy array of different sizes without using for loops. Would be hard to decide what is the output meaning and shape of something like
[0,1,2,3,4] == [3,4,2]
The only suggestion that I can give you is to get rid of one of the for loop using itertools.product:
import itertools as it
[...]
idx1s = np.unique(data[:,IDX1])
idx2s = np.unique(data[:,IDX2])
# initialize global sum variable to 0
cost = 0
for i1, i2 in it.product(idx1s, idx2):
# for each block in the dataset
mask = np.nonzero((data[:,IDX1] == i1) & (data[:,IDX2] == i2))
# get variables for that block
[...]
You can also keep mask as a bool array
mask = (data[:,IDX1] == i1) & (data[:,IDX2] == i2)
The output is the same and you have to use anyway the memory to create the bool array. Doing this way saves you some memory and a function evaluation
EDIT
If you know that the indices do not have holes or have few holes, might be worth to remove the part where you define idx1s and idxs2 and change the for loop to
max1, max2 = data[:,[IDX1, IDX2]].max(axis=0)
for i1, i2 in it.product(xrange(max1), xrange(max2)):
[...]
Both xrange and it.product are iterators, so they create only i1 and i2 when you need.
ps: if you are on python3.x use range instead of xrange```

### Assistance with taking matrix to power of constant

```below i have provided all my code for this program I am trying to develop. What this takes as in input is an N x 3 file; i will provide the sample of what im using below (its just a 5x3 sample). Each sample represents a co-ordinate of a pixel in an image, which has been scaled to some XYZ co-ordinate using multidimensional scaling.The purpose of this program is to go from XYZ co-ordinates, to LaB color... which can then be translated into sRGB. The code below (the second portion) shows the transformation from XYZ to LaB, and the upper portion (labelled Fast XYZ - RGB) is a shortcut i found to go from XYZ to RGB cutting out the LaB step. The problem resides in the Fast XYZ - RGB step.
What i am trying to do is make the sRGBmat = (1 + val) * RGBLin ^ (1/2.4) - val
The problem that i keep running into is that of RGBLin can sometimes be a negative number... which means i have to use Cmath or something else. I tried using Cmath, but it gave me the incorrect values-
In MatLab, it gives me a proper number, (well a real + imaginary part), which i can still use.
The file xyztest.txt contains a 5x3 matrix with the following values:
.2345 .9817 .7612
.5465 .7897 .3514
.7796 .6765 .5645
.1221 .6376 .8790
.5432 .5853 .4652
The output should (with a few more computations) result in an N x 3 matrix, where each row is representative of the RGB values at pixel 1-n of row 1 (for the first n values), then row 2 for the next n+1 values-
Any help would be greatly appreciated!
import numpy as np
d=open('xyztest.txt', 'r')
import cmath
a=[]
count = 0
b = []
AoverAn = []
XoX = []
YoY = []
ZoZ = []
aova=[]
c = 0
while 1:
a.append(line.split())
count = count + 1
if not line:
break
#print a #contains all of the line elements in a list
t=[]
XYZM = []
illuminant = [94.9423, 100.0000, 108.7201]
##or is it [ .9424, 1.000, .8249] which is in matlab-
#print count
for i in range(count-1):
b = a[i:(i+1)]
#print "this is", b
c = b
x = c
y = c
z = c
XoverXn = round(float(x) /illuminant , 10)
YoverYn = round(float(y) / illuminant , 10)
ZoverZn = round(float(z) / illuminant , 10)
XoX.append(XoverXn)
YoY.append(YoverYn)
ZoZ.append(ZoverZn)
x.replace('\'', '')
mmaker = (float("".join(x)), float("".join(y)), float("".join(z)))
XYZM.append(mmaker)
L = []
a = []
b = []
fXoX = []
fYoY = []
fZoZ = []
Lab = []
##Get an XYZ matrix so i can use fast XYZ to RGB
Fast XYZ > RGB
##A is the thing we want to multiply
A= np.matrix('3.2410, -1.5374, -0.4986 ;-.9692, 1.8760, 0.0416 ; .0556, -.2040, 1.0570')
##we get [R,G,B]' = A * [X,Y,Z]'
##Must be in the range 0-1
RGBLin=[]
##XYZM = float(XYZM)
print "XYZ"
print XYZM
xyzt = np.transpose(np.matrix(XYZM))
RGBLin = np.transpose(A * xyzt)
val = 0.555
temp = (RGBLin <= 0.00304)
#print temp
print "RGB"
##print RGBLin
## Do power multiplcation because numpy doesnt want to work for non square mat
for i in range(len(RGBLin)):
for j in range(1):
rgbline = RGBLin[i].tolist()
for item in rgbline:
for i in range(3):
print item[i]
item[i] = 1.055 + item[i+1]**(1/2.4)
print item[i]
print item
#print rgbline
#te[i][j] = pow(RGBLin[i][j] , (1./2.4))
#print te
-> The problem resides in this step, i am trying to take the matrix to the power of (1/2.4), but some values of the matrix are negative- How do i get python to give me a value??!
#te = pow(RGBLin, (1./2.4))
XYZ -> LAB
for i in range(len(XoX)):
#print YoY[i]
xyz = []
test = float(pow(YoY[i],(1./3)))
#print test
if (YoY[i] > 0.008856):
L.append((116 * (YoY[i] **(1./3))) - 16)
#L1 = (116 * (YoY[i] **(1./3))) - 16
else:
L.append(903.3* YoY[i])
#L1 = 903.3* YoY[i]
##
if (XoX[i] > 0.008856):
fXoX.append(pow(XoX[i], (1./3)))
#A1 = pow(XoX[i], (1./3))
else:
fXoX.append((7.787 * XoX[i])+(16/116))
#A1 = (7.787 * XoX[i])+(16/116)
##
if (YoY[i] > 0.008856):
fYoY.append(pow(YoY[i], (1./3)))
#B1 = pow(YoY[i], (1./3))
else:
fYoY.append((7.787 * YoY[i])+(16/116))
#B1 = (7.787 * YoY[i])+(16/116)
##
if (ZoZ[i] > 0.008856):
fZoZ.append(pow(ZoZ[i], (1./3)))
#Z1 = pow(ZoZ[i], (1./3))
else:
fZoZ.append((7.787 * ZoZ[i])+(16/116))
#Z1 = (7.787 * ZoZ[i])+(16/116)
##
a.append(500*(fXoX[i]-fYoY[i]))
b.append(500*(fYoY[i]-fZoZ[i]))
xyz.append((L[i], a[i], b[i]))
##print xyz
######### NOW we must go from Lab to RGB, where XYZ is the LaB co-ordinates######
```
```Tell numpy that your numbers are complex.
In : import numpy as np
In : r = np.array([-5, 2, 8, -1])
In : r ** (1/2.4)
/usr/local/share/python3/ipython3:1: RuntimeWarning: invalid value encountered in power
#!/usr/local/Cellar/python3/3.2.2/bin/python3.2
Out: array([ nan, 1.33483985, 2.37841423, nan])
In : c = r.astype(complex)
In : c ** (1/2.4)
Out:
array([ 0.50609696+1.88877958j, 1.33483985+0.j ,
2.37841423+0.j , 0.25881905+0.96592583j])
There's some discussion of this on scipy.org.```