How to sum a list of dicts - python

What is the most Pythonic way to take a list of dicts and sum up all the values for matching keys from every row in the list?
I did this but I suspect a comprehension is more Pythonic:
from collections import defaultdict
demandresult = defaultdict(int) # new blank dict to store results
for d in demandlist:
for k,v in d.iteritems():
demandresult[k] = demandresult[k] + v
In Python - sum values in dictionary the question involved the same key all the time, but in my case, the key in each row might be a new key never encountered before.

I think that your method is quite pythonic. Comprehensions are nice but they shouldn't really be overdone, and they can lead to really messy one-liners, like the one below :).
If you insist on a dict comp:
demand_list = [{u'2018-04-29': 1, u'2018-04-30': 1, u'2018-05-01': 1},
{u'2018-04-21': 1},
{u'2018-04-18': 1, u'2018-04-19': 1, u'2018-04-17' : 1}]
d = {key:sum(i[key] for i in demand_list if key in i)
for key in set(a for l in demand_list for a in l.keys())}
print(d)
>>>{'2018-04-21': 1, '2018-04-17': 1, '2018-04-29': 1, '2018-04-30': 1, '2018-04-19': 1, '2018-04-18': 1, '2018-05-01': 1}

Here is another one-liner (ab-)using collections.ChainMap to get the combined keys:
>>> from collections import ChainMap
>>> {k: sum(d.get(k, 0) for d in demand_list) for k in ChainMap(*demand_list)}
{'2018-04-17': 1, '2018-04-21': 1, '2018-05-01': 1, '2018-04-30': 1, '2018-04-19': 1, '2018-04-29': 1, '2018-04-18': 1}
This is easily the slowest of the methods proposed here.

The only thing that seemed unclear in your code was the double-for-loop. It may be clearer to collapse the demandlist into a flat iterable—then the loopant presents the logic as simply as possible. Consider:
demandlist = [{
u'2018-04-29': 1,
u'2018-04-30': 1,
u'2018-05-01': 1
}, {
u'2018-04-21': 1
}, {
u'2018-04-18': 1,
u'2018-04-19': 1,
u'2018-04-17': 1
}]
import itertools as it
from collections import defaultdict
demandresult = defaultdict(int)
for k, v in it.chain.from_iterable(map(lambda d: d.items(), demandlist)):
demandresult[k] = demandresult[k] + v
(With this, print(demandresult) prints defaultdict(<class 'int'>, {'2018-04-29': 1, '2018-04-30': 1, '2018-05-01': 1, '2018-04-21': 1, '2018-04-18': 1, '2018-04-19': 1, '2018-04-17': 1}).)
Imagining myself reading this for the first time (or a few months later), I can see myself thinking, "Ok, I'm collapsing demandlist into a key-val iterable, I don't particularly care how, and then summing values of matching keys."
It's unfortunate that I need that map there to ensure the final iterable has key-val pairs… it.chain.from_iterable(demandlist) is a key-only iterable, so I need to call items on each dict.
Note that unlike many of the answers proposed, this implementation (like yours!) minimizes the number of scans over the data to just one—performance win (and I try to pick up as many easy performance wins as I can).

I suppose you want to return a list of summed values of each dictionary.
list_of_dict = [
{'a':1, 'b':2, 'c':3},
{'d':4, 'e':5, 'f':6}
]
sum_of_each_row = [sum(v for v in d.values()) for d in list_of_dict] # [6,15]
If you want to return the total sum, just simply wrap sum() to "sum_of_each_row".
EDIT:
The main problem is that you don't have a default value for each of the keys, so you can make use of the method dict.setdefault() to set the default value when there's a new key.
list_of_dict = [
{'a':1, 'b':1},
{'b':1, 'c':1},
{'a':2}
]
d = {}
d = {k:d[k]+v if k in d.keys() else d.setdefault(k,v)
for row in list_of_dict for k,v in row.items()} # {'a':3, 'b':2, 'c':1}

Related

list of dicts to/from dict of lists

I am looking to change back and forth between a dictionary of lists (all of the same length):
DL={'a':[0,1],'b':[2,3]}
and a list of dictionaries:
LD=[{'a':0,'b':2},{'a':1,'b':3}]
I am looking for the cleanest way to switch between the two forms.
Perhaps consider using numpy:
import numpy as np
arr=np.array([(0,2),(1,3)],dtype=[('a',int),('b',int)])
print(arr)
# [(0, 2) (1, 3)]
Here we access columns indexed by names, e.g. 'a', or 'b' (sort of like DL):
print(arr['a'])
# [0 1]
Here we access rows by integer index (sort of like LD):
print(arr[0])
# (0, 2)
Each value in the row can be accessed by column name (sort of like LD):
print(arr[0]['b'])
# 2
For those of you that enjoy clever/hacky one-liners.
Here is DL to LD:
v = [dict(zip(DL,t)) for t in zip(*DL.values())]
print(v)
and LD to DL:
v = dict(zip(LD[0],zip(*[d.values() for d in LD])))
print(v)
LD to DL is a little hackier since you are assuming that the keys are the same in each dict. Also, please note that I do not condone the use of such code in any kind of real system.
To go from the list of dictionaries, it is straightforward:
You can use this form:
DL={'a':[0,1],'b':[2,3], 'c':[4,5]}
LD=[{'a':0,'b':2, 'c':4},{'a':1,'b':3, 'c':5}]
nd={}
for d in LD:
for k,v in d.items():
try:
nd[k].append(v)
except KeyError:
nd[k]=[v]
print nd
#{'a': [0, 1], 'c': [4, 5], 'b': [2, 3]}
Or use defaultdict:
nd=cl.defaultdict(list)
for d in LD:
for key,val in d.items():
nd[key].append(val)
print dict(nd.items())
#{'a': [0, 1], 'c': [4, 5], 'b': [2, 3]}
Going the other way is problematic. You need to have some information of the insertion order into the list from keys from the dictionary. Recall that the order of keys in a dict is not necessarily the same as the original insertion order.
For giggles, assume the insertion order is based on sorted keys. You can then do it this way:
nl=[]
nl_index=[]
for k in sorted(DL.keys()):
nl.append({k:[]})
nl_index.append(k)
for key,l in DL.items():
for item in l:
nl[nl_index.index(key)][key].append(item)
print nl
#[{'a': [0, 1]}, {'b': [2, 3]}, {'c': [4, 5]}]
If your question was based on curiosity, there is your answer. If you have a real-world problem, let me suggest you rethink your data structures. Neither of these seems to be a very scalable solution.
If you're allowed to use outside packages, Pandas works great for this:
import pandas as pd
pd.DataFrame(DL).to_dict('list')
Which outputs:
[{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
Here are the one-line solutions (spread out over multiple lines for readability) that I came up with:
if dl is your original dict of lists:
dl = {"a":[0,1],"b":[2,3]}
Then here's how to convert it to a list of dicts:
ld = [{key:value[index] for key in dl.keys()}
for index in range(max(map(len,dl.values()]
Which, if you assume that all your lists are the same length, you can simplify and gain a performance increase by going to:
ld = [{key:value[index] for key, value in dl.items()}
for index in range(len(dl.values()[0]))]
and here's how to convert that back into a dict of lists:
dl2 = {key:[item[key] for item in ld]
for key in list(functools.reduce(
lambda x, y: x.union(y),
(set(dicts.keys()) for dicts in ld)
))
}
If you're using Python 2 instead of Python 3, you can just use reduce instead of functools.reduce there.
You can simplify this if you assume that all the dicts in your list will have the same keys:
dl2 = {key:[item[key] for item in ld] for key in ld[0].keys() }
Here my small script :
a = {'a': [0, 1], 'b': [2, 3]}
elem = {}
result = []
for i in a['a']: # (1)
for key, value in a.items():
elem[key] = value[i]
result.append(elem)
elem = {}
print result
I'm not sure that is the beautiful way.
(1) You suppose that you have the same length for the lists
Cleanest way I can think of a summer friday. As a bonus, it supports lists of different lengths (but in this case, DLtoLD(LDtoDL(l)) is no more identity).
From list to dict
Actually less clean than #dwerk's defaultdict version.
def LDtoDL (l) :
result = {}
for d in l :
for k, v in d.items() :
result[k] = result.get(k,[]) + [v] #inefficient
return result
From dict to list
def DLtoLD (d) :
if not d :
return []
#reserve as much *distinct* dicts as the longest sequence
result = [{} for i in range(max (map (len, d.values())))]
#fill each dict, one key at a time
for k, seq in d.items() :
for oneDict, oneValue in zip(result, seq) :
oneDict[k] = oneValue
return result
If you don't mind a generator, you can use something like
def f(dl):
l = list((k,v.__iter__()) for k,v in dl.items())
while True:
d = dict((k,i.next()) for k,i in l)
if not d:
break
yield d
It's not as "clean" as it could be for Technical Reasons: My original implementation did yield dict(...), but this ends up being the empty dictionary because (in Python 2.5) a for b in c does not distinguish between a StopIteration exception when iterating over c and a StopIteration exception when evaluating a.
On the other hand, I can't work out what you're actually trying to do; it might be more sensible to design a data structure that meets your requirements instead of trying to shoehorn it in to the existing data structures. (For example, a list of dicts is a poor way to represent the result of a database query.)

Is there any simple way to swap python dictionary key values where values are list

Is there any simple way to swap python dictionary key values where values are list
My dictionary is like following
d={1:[1,2,3,4],2:[2,3,4],5:[1,3,6,7]}
I want to generate a dictionary from it like following
a={1:[1,5],2:[1,2],3:[1,2,5],4:[1,2],6:[5],7:[5]}
I tested with reversed
dict(map(reversed, d.items())
It won't iterate and create keys with items in the list returns TypeError: unhashable type: 'list'
I am looking for any inline methods available for achieving this
This will work:
def revdict(d):
r = {}
for k in d:
for v in d[k]:
if v not in r:
r[v] = [k]
else:
r[v].append(k)
return r
Then you can do:
d={1:[1,2,3,4],2:[2,3,4],5:[1,3,6,7]}
a = revdict(d)
print(a)
If you want to avoid having to check for new keys, you can use a defaultdict, then always append.
Your approach does not work, since here you aim to construct a dictionary:
{[1,2,3,4]: 1, [2,3,4]: 2, [1,3,6,7]: 5}
but since lists are unhashable, these can not be used as keys (and furthermore it is not what you intend to construct anyway).
You probably better use a defaultdict for this:
from collections import defaultdict
result = defaultdict(list)
for k, vs in d.items():
for v in vs:
result[v].append(k)
after this opertion, result is a defaultdict (a subclass of the vanilla dict) which maps items in the list of values to keys (that contained that value). Like:
>>> result
defaultdict(<class 'list'>, {1: [1, 5], 2: [1, 2], 3: [1, 2, 5], 4: [1, 2], 6: [5], 7: [5]})
You can optionally use:
result = dict(result)
to create a new dictionary with these values (and thus drop the defaultdict).
Mind that:
since most Python interpreters do not order the dictionary (it is definitely not a hard assumption you can make), the order of the elements in the list might be different; and
the items in the lists in your dictionary d should be hashable.
This isn't really suitable for a one-liner; you are oversimplifying what you need to do by calling it a simple reversal.
A real reversal would simply map values to keys instead of keys to values, which you can do (with the slight but necessary change of lists to tuples):
>>> print dict((tuple(d[k]), k) for k in d)
{(1, 3, 6, 7): 5, (2, 3, 4): 2, (1, 2, 3, 4): 1}
What you want is far more complicated, and is commonly called a transposition of the dict.
from operator import itemgetter as ig
from itertools import groupby
transposed_dict = dict((k, map(ig(1), v))
for k, v in groupby(
sorted((nk, k) for k in d for nk in d[k]),
key=ig(0)))
There's nothing particularly simple about this, although conceptually it isn't too bad:
(nk, k) for k in d for nk in d[k] creates an expanded association list from the dictionary, with the keys and values inverted.
groupby collects all the tuples with a common first element
(k, map(ig(1), v)) collects the tuples with a common first element into a single tuple: (1,1), (1,2) => (1, [1,2]).
The tuples from step 3 are used to build the new dictionary.
You are far better off, however, using a simple 3-line for loop as shown by Willem Van Onsem; the essence of that loop is in the generator expression used by sorted; everything else is just dealing with the attempt to avoid mutable variables. Not everything should (or can) be reduced to a simple one-liner.

Combine 2 dictionaries by key-value in python

I have not found a solution to my question, yet I hope it's trivial.
I have two dictionaries:
dictA:
contains the order number of a word in a text as key: word as value
e.g.
{0:'Roses',1:'are',2:'red'...12:'blue'}
dictB:
contains counts of those words in the text
e.g.
{'Roses':2,'are':4,'blue':1}
I want to replace the values in dictA by values in dictB via keys in dictB, checking for nones, replacing by 0.
So output should look like:
{0:2,1:4,2:0...12:1}
Is there a way for doing it, preferentially without introducing own functions?
Use a dictionary comprehension and apply the get method of dict B to return 0 for items that are not found in B:
>>> A = {0:'Roses',1:'are',2:'red', 12:'blue'}
>>> B = {'Roses':2,'are':4,'blue':1}
>>> {k: B.get(v, 0) for k, v in A.items()}
{0: 2, 1: 4, 2: 0, 12: 1}

Python: how to slice a dictionary based on the values of its keys?

Say I have a dictionary built like this:
d={0:1, 1:2, 2:3, 10:4, 11:5, 12:6, 100:7, 101:8, 102:9, 200:10, 201:11, 202:12}
and I want to create a subdictionary d1 by slicing d in such a way that d1 contains the following keys: 0, 1, 2, 100, 101, 102. The final output should be:
d1={0:1, 1:2, 2:3, 100:7, 101:8, 102:9}
Is there an efficient Pythonic way of doing this, given that my real dictionary contains over 2,000,000 items?
I think this question applies to all cases where keys are integers, when the slicing needs to follow certain inequality rules, and when the final result needs to be a bunch of slices put together in the same dictionary.
You could use dictionary comprehension with:
d = {0:1, 1:2, 2:3, 10:4, 11:5, 12:6, 100:7, 101:8, 102:9, 200:10, 201:11, 202:12}
keys = (0, 1, 2, 100, 101, 102)
d1 = {k: d[k] for k in keys}
In python 2.7 you can also compute keys with (in python 3.x replace it.ifilter(...) by filter(...)):
import itertools as it
d = {0:1, 1:2, 2:3, 10:4, 11:5, 12:6, 100:7, 101:8, 102:9, 200:10, 201:11, 202:12}
d1 = {k: d[k] for k in it.ifilter(lambda x: 1 < x <= 11, d.keys())}
One succinct way of creating the sub-dictionary is to use operator.itemgetter. This function takes multiple arguments and returns a new function to return a tuple containing the corresponding elements of a given iterable.
from operator import itemgetter as ig
k = [0, 1, 2, 100, 101, 102]
# ig(0,1,2,100,101,102) == lambda d : (d[0], d[1], d[2], d[100], d[101], d[102])
d1 = dict(zip(k, ig(*k)(d)))

Python Collections Counter for a List of Dictionaries

I have a dynamically growing list of arrays that I would like to add like values together. Here's an example:
{"something" : [{"one":"200"}, {"three":"400"}, {"one":"100"}, {"two":"800"} ... ]}
I'd like to be able to add together the dictionaries inside the list. So, in this case for the key "something", the result would be:
["one":400, "three": 400, "two": 800]
or something to that effect. I'm familiar with the Python's collection counter, but since the "something" list contains dicts, it will not work (unless I'm missing something). The dict is also being dynamically created, so I can't build the list without the dicts. EG:
Counter({'b':3, 'c':4, 'd':5, 'b':2})
Would normally work, but as soon as I try to add an element, the previous value will be overwritten. I've noticed other questions such as these:
Is there any pythonic way to combine two dicts (adding values for keys that appear in both)?
Python count of items in a dictionary of lists
But again, the objects within the list are dicts.
I think this does what you want, but I'm not sure because I don't know what "The dict is also being dynamically created, so I can't build the list without the dicts" means. Still:
input = {
"something" : [{"one":"200"}, {"three":"400"}, {"one":"100"}, {"two":"800"}],
"foo" : [{"a" : 100, "b" : 200}, {"a" : 300, "b": 400}],
}
def counterize(x):
return Counter({k : int(v) for k, v in x.iteritems()})
counts = {
k : sum((counterize(x) for x in v), Counter())
for k, v in input.iteritems()
}
Result:
{
'foo': Counter({'b': 600, 'a': 400}),
'something': Counter({'two': 800, 'three': 400, 'one': 300})
}
I expect using sum with Counter is inefficient (in the same way that using sum with strings is so inefficient that Guido banned it), but I might be wrong. Anyway, if you have performance problems, you could write a function that creates a Counter and repeatedly calls += or update on it:
def makeints(x):
return {k : int(v) for k, v in x.iteritems()}
def total(seq):
result = Counter()
for s in seq:
result.update(s)
return result
counts = {k : total(makeints(x) for x in v) for k, v in input.iteritems()}
One way would be do as follows:
from collections import defaultdict
d = {"something" :
[{"one":"200"}, {"three":"400"}, {"one":"100"}, {"two":"800"}]}
dd = defaultdict(list)
# first get and group values from the original data structure
# and change strings to ints
for inner_dict in d['something']:
for k,v in inner_dict.items():
dd[k].append(int(v))
# second. create output dictionary by summing grouped elemetns
# from the first step.
out_dict = {k:sum(v) for k,v in dd.items()}
print(out_dict)
# {'two': 800, 'one': 300, 'three': 400}
In here I don't use counter, but defaultdict. Its a two step approach.

Resources