Error in if elif loop - python

I have 3 columns with values and if the value of one is twice than the other any of the pair, the list should append the index.
0 1 2 3
A 1 2 3
G 2 3 4
K 1 1 2
T 1 1 1
The result should be [A,G,K]
I am getting a Key error in the below loop:
percentage = pd.concat([percent1, percent2, percent3], axis=1, join='inner')
percentage = percentage.reset_index()
AA= []
for i in range(0, len(percentage)):
if percentage[1][i] == 2*percentage[2][i]:
AA.append(percentage['index'][i])
elif percentage[2][i] == 2*percentage[3][i]:
AA.append(percentage['index'][i])
elif percentage[1][i] == 2*percentage[3][i]:
AA.append(percentage['index'][i])

The cause of your actual error is that you're using columns 1, 2, and 3 on the right side of your comparisons, but you don't have 4 columns, only 3. So, this line:
elif percentage[1][i] == 2*percentage[3][i]:
… will raise a KeyError(3) on that percentage[3] part of the expression.
Unfortunately, I have no idea how to fix that, because I can't figure out what your code is intended to do, and you haven't explained it.
The most obvious guess would be that you wanted to use columns 0, 1, and 2 on the right. But that will just give you an empty list, not… a list of three rows, or whatever it is you wanted. And, given that you're using columns 0, 1, and 1 on the left rather than 0, 1, and 2, I'm not sure how obvious the guess was in the first place.

The error was due to the columns name. The concat function keeps the column names as it is and this error was due to those same column names.

Related

python select lines based on maximum value of a column

I'm not very familiar with Python, but there's something I need to do. I have an ASCII file (space-separated) of several columns. In the first column, some values are duplicates. From these duplicate values, I need to select the lines which have a larger value in the 3rd column, for example, and return an array back.
I'd like something like this:
#col1 col2 col3 col4 col5
1 1 2 3 4
1 2 1 5 3
2 2 5 2 1
would return lines 1 and 3.
Here's what I have so far: I defined an auxiliary function to detect the indexes of the duplicates (all second entries)
def list_duplicates(seq):
seen = set()
seen_add = seen.add
return [idx for idx,item in enumerate(seq) if item in seen or seen_add(item)]
and then try to use it to read the list (that I loaded from a file with np.genfromtxt naming each column)
def select_high(ndarray, dup_col, sel_col): #dup_col is the column where the duplicates are, sel_col is the column where we select the larger value
result = []
dup = list_duplicates(ndarray[dup_col])
dupdup = [x-1 for x in dup]
for i in range(len(ndarray[sel_col])):
if i in dup:
mid = []
maxi = max(ndarray[sel_col][i], ndarray[sel_col][i-1])
maxi_index = np.where(ndarray[sel_col] == maxi)[0][0]
for name in ndarray.dtype.names:
mid.append(ndarray[name][maxi_index])
result.append(mid)
else:
mid = []
if i not in dupdup:
for name in ndarray.dtype.names:
mid.append(ndarray[name][i])
result.append(mid)
return np.asarray(result)
but what's happening is that whenever there are duplicates I have to remove the else part or it gives me an error, and whenever there are no duplicates I have to put it back.
Any help is appreciated, sorry for the long post and I hope I made myself clear
I think you are lost in the details (and me too). Here is a version that does what you want, but is more simple:
m = [[1, 2, 1, 5, 3], [1, 1, 2, 3, 4], [2, 2, 5, 2, 1]]
s = sorted(m, key=lambda r:(r[0], -r[2]))
print(s)
seen = set()
print( [r for r in s if r[0] not in seen and not seen.add(r[0])])
The first line defines m as the list of rows you get from the file.
The second line sorts those rows on the value in the first column (r[0]), then on the value in the third column, but from the larger to the smaller value (-r[2]):
s=[[1, 1, 2, 3, 4], [1, 2, 1, 5, 3], [2, 2, 5, 2, 1]]
Now you need to skip the rows when you have seen the value in the first column at least once. We use a set seento store the r[0] values we have already seen. If r[0] is not in seen, we should keep the row and put it in seen, in such a way that we discard the row the next time we see that r[0]. That's a little tricky:
if r[0] not in seen and not seen.add(r[0])
Note that not seen.add(r[0]) is always true, because seen.add returns None. Thus:
if r[0] is not in seen, we put r[0] in seen and keep the row
if r[0] is in seen, we return false and discard the row.
You could express it like that too:
if not (r[0] in seen or seen.add(r[0]))

get nth element of a list

i m new to Python and have following problem
>>> choice = [1,0,1,1,]
>>> choice = [1,0,1,1]
>>> print(choice)
[1, 0, 1, 1]
>>> print(choice[2])
1
why it print 1 rather than 0?
Python uses something that is called zero based indexing, which means that the first element in a list is referred to element number 0 and not 1.
It prints 1 because list indices start from zero and not from one. Thus:
choice[0] is 1
choice[1] is 0
choice[2] is 1
choice[3] is 1
Because of the way math works, Python starts its lists at 0 rather than 1. It seems weird, but there are many advantages to this, even though it is mostly arbitrary.

What is the value of xs[100] after this code?

Not sure if this belongs here but it was a question from my CS exam.
What is the value of xs[100] after this code?
xs = [-1, 0, 1]
for k in range(98):
xs.append(xs[k])
It causes a compile-time error.
It causes a run-time error.
–1
0
1
So far I figured that xs.append(xs[k]) keeps appending -1, 0, 1 in that order to the list xs. How would I go about solving this?
After this loop you will have a list contain 98+3=101 element that the 100's element would be 0 because 101%3= 2.then you must choose the second element from -1,0,1
Note that the append method of list will change the list in-place, so after each iteration you will iterate on the new list,thus it doesn't raise an IndexError
This is not something you need ask, you can just run it in Python and see:
pax$ cat qq.py
xs = [-1, 0, 1]
for k in range(98):
xs.append(xs[k])
print xs[100]
pax$ python qq.py
0
If you can't run it for some reason, you just have to realise that you're appending those elements in order 98 times. That's -1, 0, 1 32 times (for a total of 96), then -1 and 0 as the final two.
And, since you have 3 + 98 = 101 elements, index 100 (the 101st) is the final element, which is 0.

find permutations of items in a python list, with an added complexity

Please bear with me while I struggle to explain this; my math is rusty and I just started computer programming, sorry!
Say I have a list of 3 items. I want to find all possible arrangements of the items in this list where each arrangement consists of 3 items.
Next, still using my original list, I want to find all the possible arrangements of the items of the list, except I only want the arrangements to consist of two items.
Finally, I want to do the same thing again, except arrangements only consist of one item.
So I expect 3! + 3!/1! + 3!/2!, or 15 total arrangements.
Just to be really explicit about what I want, if my list were [1, 2, 3], then the code should produce:
1, 2, 3
1, 3, 2
2, 1, 3
2, 3, 2
3, 1, 2
3, 2, 1
1, 2
1, 3
2, 1
2, 3
3, 1
3, 2
1
2
3
The code I have written below can do what I have written above, but only for lists of length 3. I could modify the code to handle lists of greater length by adding extra 'for' loops and 'elif' statements, but I feel like there has to be a way to generalize the pattern. What should I do so that I can get permutations of the kind described above for lists of any length?
I think my exhaustive enumeration method might be making this more complicated than it needs to be... will try to think about other methods and update if solution found.
def helperFunction(itemsList):
fullPermutationsOutputList = []
def fullPermutations(itemsList, iterations):
for item1 in itemsList:
if iterations == 2:
if len([item1]) == len(set([item1])):
fullPermutationsOutputList.append((item1,))
else:
for item2 in itemsList:
if iterations == 1:
if len([item1, item2]) == len(set([item1, item2])):
fullPermutationsOutputList.append((item1, item2))
else:
for item3 in itemsList:
if iterations == 0:
if len([item1, item2, item3]) == len(set([item1, item2, item3])):
fullPermutationsOutputList.append((item1, item2, item3))
if iterations == 0:
fullPermutations(itemsList, iterations + 1)
elif iterations == 1:
fullPermutations(itemsList, iterations + 1)
fullPermutations(itemsList, 0)
return fullPermutationsOutputList
Just itertools.permutations. You can inspect its sources if you want exact algo.
this will do what u want: https://stackoverflow.com/a/10784693/1419494
def perm(list_to_perm,perm_l,items,out):
if len(perm_l) == items:
out +=[perm_l]
else:
for i in list_to_perm:
if i not in perm_l:
perm(list_to_perm,perm_l +[i],items,out)
a = [1,2,3]
for i in range(1,len(a) +1):
out = []
perm(a,[],i,out)
print out

Best way to keep track of results from a Python loop

I have a fairly big loop that needs to run 500 times, and I'm new to using the programming language and doing this type of simulation.
But I need to document the results of each run, and if the list (table1) contains either all 0's, all 1's or a mix of both.
I was just wondering what method would be the fastest to find out what proportion of the 500 simulations, resulted in a list that contained all 0's, all 1's or a mix and if append would slow it down too much.
for x in range(0, 500):
times = 300
gamma_val = 2
table1 = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
total = 0.0
while total < times:
table = [0 for i in range (21)]
for d1 in range(21):
if table1[d1]:
table[d1] = -(1/gamma_val)*math.log((random.random()))
else:
table[d1] = -(math.log(random.random()))
####Goes on to make new table1 with changes, either all 1's, all 0's or a mix of 0s #########and 1s
files1 = 0
files01 = 0
files0 = 0
if "1" "0" in table1 == True:
files01 += 1
elif 1 in table == True:
files1 += 1
elif 0 in table1 == true:
files0 += 1
For choosing where to append, create 2 boolean vars (Has1s and Has0s) before the while loop, each initialized to False. Set Has1s to True whenever you get a 1 & Has0s when you get a 0 -- then you avoid the (up to) 3 searches at the end.
What exactly do you need at the end?
If you just want to know the proportion of all 1's, all 0's, or mixes, it seems more intuitive (to me at least) to just increment variable values, rather than generate lists. If you set up the code something like:
...
files1=0
files01=0
files0=0
if 1 in table1 and 0 in table1:
files01 += 1
elif 1 in table:
files1 += 1
elif 0 in table1:
files0 += 1
...
then you don't have to do a len(files01) at the end to know how many had a mix of 1's and 0's.
There is no speed lost to append(), compared to the speed lost to scanning over things multiple times. And there is very little time lost to memory operations compared to the computations. So don't worry about it. I might not keep the counts, if I can get them from the lengths of the lists you want to accumulate anyway. It is more readable to do everything once. Make the stuff, and then count.
I trust the standard containers relative to making decisions about speed of algorithms. So I would cast the row to a Set and compare to the Set([0]), Set([1]), and Set([0,1]). I assume 'in' will do a double scan of the row, whereas Set() will make one pass.
BOTH = set([0, 1])
results = {'0': [], '1': [], '01': []}
.... make the list ....
elements = set(table)
if elements == BOTH:
results['01'].append(table)
else if 1 in elements:
results['1'].append(table)
else:
results['0'].append(table)
And I will try to make my picking about style, not outright errors:
Use the final else, don't exhaust all the conditions; it really is more readable. Exhausting them all separately makes readers wonder what case you imagine might be left. It breeds paranoia.
In general, actually comparing things to True or False is non-pythonic. Checking equality to the constant True is the least reliable way to get what you want out of an condition in Python. There are a lot of Python idioms that rely on the truth of a statement being represented by the existence of an object, or the non-emptyness of a list. So you will face programmers who return something other than True or False from helper functions to represent Boolean determinations. Get used to not being finicky about it.
Also, painful as it seems, in when mixed with other comparison operators chains as though it meant <, which is so non-idiomatic for non-analysts that you should never do it.

Resources