Python pandas, can I display value_counts() in reverse order? - python

If I call
print(data[5].value_counts().nlargest(5))
And I get the top five results:
A 8
C 6
D 5
B 3
E 1
What could I could to change the order of the results so it looks like this instead?
8 A
6 C
5 D
3 B
1 E
Thanks!

Swap index and values with Series contructor:
s = data[5].value_counts().nlargest(5)
s = pd.Series(s.index, index=s.values)
print (s)
8 A
6 C
5 D
3 B
1 E
dtype: object

Related

Need to find missing column value by comparing from 2 dataframes

DF1: A B
1 a
2 b
3 c
4 d
5 e
DF2: A B
1 a
2 b
3 c
Expect Result: A B
4 d
5 e
Tried Join & Map but not getting desired result.
Please guide to find solution.
This can also works :
df1.loc[~df1['A'].isin(df2['A'])]
Out[34]:
A B
3 4 d
4 5 e
concat and drop_duplicates
pd.concat([df1,df2]).drop_duplicates(keep=False)
Out[789]:
A B
3 4 d
4 5 e
Try this one:
df1[~df1.isin(df2)].dropna()
Outputs:
A B
3 4 d
4 5 e

python looping and creating new dataframe for each value of a column

I want to create a new dataframe for each unique value of station.
I tried below which gives me only last station data updated in the dataframe = tai_new.i
tai['station'].unique() has 500 values.
for i in tai['station'].unique():
tai_new.i = tai[tai_2['station'] ==i]
Another approach is creating a separate list of
tai_stations = tai['station'].unique()
And then create two loops however, I do not want to type 500 (IF) conditions.
You can create dict of DataFrames by convert groupby object to tuples and then to dict:
dfs = dict(tuple(tai.groupby('station')))
Sample:
tai = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'station':list('aabbcc')})
print (tai)
A B C D E station
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 b
3 d 5 4 7 9 b
4 e 5 2 1 2 c
5 f 4 3 0 4 c
dfs = dict(tuple(tai.groupby('station')))
#select each DataFrame by key - name of station
print (dfs['a'])
A B C D E station
0 a 4 7 1 5 a
1 b 5 8 3 3 a
print (type(dfs['a']))
<class 'pandas.core.frame.DataFrame'>
Please use this
for i in tai['station'].unique():
tai_new[i] = tai[tai_2['station'] ==i]
assuming tai_new is a dict.

how to do such calculation task in pandas?

I have a dataframe like this
disc val
a -1.140502 1
b -0.916104 2
c -0.828460 3
d -2.828460 4
e -2.238450 5
I would like to get the a result dataframe like
disc val final
a -1.140502 1 1
b -0.916104 2 2
c -0.828460 3 6
d -2.828460 4 24
e -2.238450 5 120
try this:
In [17]: df.val.cumprod()
Out[17]:
a 1
b 2
c 6
d 24
e 120
Name: val, dtype: int64

Plotting pandas time and category

I really can't get out of this. Here is my table:
where grade can be A,B,C
doc_id, grade, timestamp
1, A, 27/01/15
2, A, 27/01/15
3, B, 27/01/15
...
My aim is to show a graph with three lines, showing how many A, B and C I got through time.
I can only think of this:
docs[docs['grade']== 'A'].groupby('time').count()
docs[docs['grade']== 'B'].groupby('time').count()
docs[docs['grade']== 'C'].groupby('time').count()
and combine them some how, but it is already clear I am on the wrong way,
any hint?
try this:
df2 = df.groupby(['timestamp', 'grade']).grade.size().unstack().cumsum().ffill().fillna(0)
It basically pivots by date and grade, rolling forward the cumulative sum.
>>> df2
grade A B C
timestamp
4/1/15 0 1 0
4/11/15 4 1 2
4/3/15 4 4 2
4/4/15 4 5 3
4/5/15 4 6 3
4/6/15 7 6 6
4/8/15 9 6 8
4/9/15 13 7 12
If you don't want a cumulative sum, you can just use:
df2 = df.groupby(['timestamp', 'grade']).grade.size().unstack().fillna(0)
Let the input_data be
grade timestamp
doc_id
1 A 27/01/15
2 A 27/01/15
3 B 27/01/15
4 C 27/01/15
5 A 27/01/16
6 A 27/01/16
7 A 27/01/16
8 B 27/01/16
9 B 27/01/16
10 C 27/01/16
11 A 27/01/16
12 B 27/01/16
13 C 27/01/16
14 C 27/01/16
So to show a graph with three lines, showing how many A, B and C you got through time, you can use
result = input_data.groupby(['timestamp']).apply(lambda x: x.grade.value_counts())
The output would be something like this
A B C
timestamp
27/01/15 2 1 1
27/01/16 4 3 3
You can plot the data using result.plot().

Pandas: Nested iteration of DataFrame

I have a dataframe which contains two columns: 'rep_id' and 'values'. The second column is a list of numbers.
I would like to compute all possible pairs of 'rep_id' and then find the values that are common to the pairs.
The final Dataframe would look like: 'rep_id1', 'rep_id2', 'values' where 'values' represents a list of common elements between the lists of 'rep_id1' and 'rep_id2'.
One way to do this is to create a nested loop and go through pairs of rep_ids but I can't seem to find a way to do it.
http://en.wikipedia.org/wiki/Cartesian_product
import pandas as pd
df = pd.DataFrame(zip(['A','B','C','D','E','A','B','C','A'],[1,2,3,4,5,2,3,5,5]),columns=['rep_id','val'])
df = df.sort('val')
cartesian_product = pd.merge(df, df,on='val')
cartesian_product = cartesian_product[cartesian_product['rep_id_x'] != cartesian_product['rep_id_y']]
cartesian_product[['val','rep_id_x','rep_id_y']]
df:
rep_id val
0 A 1
1 B 2
5 A 2
2 C 3
6 B 3
3 D 4
4 E 5
7 C 5
8 A 5
cartesian_product:
val rep_id_x rep_id_y
2 2 B A
3 2 A B
6 3 C B
7 3 B C
11 5 E C
12 5 E A
13 5 C E
15 5 C A
16 5 A E
17 5 A C

Resources