logging a dict with unicode causing UnicodeDecodeError - python

Using Python 2.6 (don't judge me ;-) ) with the logging module, I'm finding that trying to log an entire dict with unicode bits is causing problems.
log.debug("mydict is %s", mydict)
This is causing UnicodeDecodeError exceptions when run in a test runner, but not when I'm in a simple python shell.
>>> d
{'foo': u'\u2615'}
>>> d['bar'] = d['foo'].encode('utf8')
>>> d
{'foo': u'\u2615', 'bar': '\xe2\x98\x95'}
>>> print d['foo']
>>> print d['bar']
>>> d
{'foo': u'\u2615', 'bar': '\xe2\x98\x95'}
>>> print "%s" % d
{'foo': u'\u2615', 'bar': '\xe2\x98\x95'}
>>> import logging
>>> logging.basicConfig()
>>> log = logging.getLogger('')
>>> log.setLevel(logging.DEBUG)
>>> log.debug("%s", d)
DEBUG:root:{'foo': u'\u2615', 'bar': '\xe2\x98\x95'}
So in a terminal it's all good, but in a test runner I'm getting exceptions as python is apparently trying to decode the repr string of the dict using an ascii codec.
So, I'm wondering why the inconsistency? And what is a good, safe way to send a dict to the logger for debug purposes?


How to convert Python dict to JSON when some keys are not strings?

The JSON equivalent of a Python dict is a JSON object. However its keys must be strings, that's a well-known limitation.
I need to support also boolean and numeric keys. I could make a simple Python value <--> JSON string one-to-one translation:
False <--> "bool:False"
42 <--> "int:42"
"Foo" <--> "str:Foo"
But I'd like to ask if there is some existing recommendation or some kind of standard for this. Simply anything that it is worth to be compatible with.
JSON isn't able to do that and I don't know of any widely-used extensions of JSON that allow you to do this. You'd have to write the serializer and deserializer yourself, which probably wouldn't be that difficult if you subclass json.JSONEncoder and json.JSONDecoder.
If you're able to switch protocols, there are JSON-ish protocols that support non-string keys. MessagePack is one:
>>> import msgpack
>>> msgpack.loads(msgpack.dumps({'1': 12, False: 3, 2: 8}))
{False: 3, 2: 8, '1': 12}
This would achieve what you want:
>>> import json
>>> data = {False: 'this data', 42: 'answer', 'foo': 'bar'}
>>> json.dumps({"%s:%s" % (type(k).__name__, k): v for k, v in data.items()})
'{"int:42": "answer", "str:foo": "bar", "bool:False": "this data"}'
You'd then have to de-serialize this, and it would only work for basic types.
Since you cannot issue standard JSON, why don't you just use str to serialize and ast.literal_eval (standard package) to deserialize?
>>> d = {'a': 33, 23:(4,5), False: 10, 4: "foo"}
>>> a = str(d) # convert the python structure to a string
>>> d2 = ast.literal_eval(a)
>>> print(d2) # evaluate the a literal to a python structure
{False: 10, 4: 'foo', 'a': 33, 23: (4, 5)}
>>> d2 == d # are those the same before & after evaluation ?

Force YAML values to be strings

Look at this code, under Python 2.7:
>>> import yaml
>>> yaml.load('string: 01')
{'string': 1}
>>> :(
Is it possible to obtain the string 01 without modifying the yaml file? I didn't find anything in the docs.
>> import yaml
>> yaml.load('string: 01', Loader=yaml.loader.BaseLoader)
{u'string': u'01'}

how to retrieve list when stored in keyring in python

I am storing a list in python keyring. But when I retrieve it, it is converted into unicode
import keyring
print c
d= keyring.get_password(a,"internal")
print d[0]
d=unicode: ['harish', 'ravi', 'kisan']
c=['harish', 'ravi', 'kisan']
The value of d[0] is "[" not "ian"
similarly, d[1] is "i" not "ned".
I want to make d as list similar to c.
Use ast.literal_eval. It will interpret a string as Python code, but safely.
>>> import ast
>>> l = ast.literal_eval("['hello', 'goodbye']")
>>> l
['hello', 'goodbye']
>>> type(l)
<type 'list'>
If the string you get can't be interpreted as valid Python, then you will get a ValueError. If that's the case, you'll need to show us what your output looks like in order to determine a correct solution.
Use Json to parse the output:
import json
import keyring
print c
d= json.loads(keyring.get_password(a,"internal"))
print d[0]

Output difference between ipython and python

It was my understanding that python will print the repr of the output, but this is apparently not always the case. For example:
In ipython:
In [1]: type([])
Out[1]: list
In [2]: set([3,1,2])
Out[2]: {1, 2, 3}
In python:
>>> type([])
<type 'list'>
>>> set([3,1,2])
set([1, 2, 3])
What transformation does ipython apply on the output?
Instead of repr or standard pprint module IPython uses IPython.lib.pretty.RepresentationPrinter.pretty method to print the output.
Module IPython.lib.pretty provides two functions that use RepresentationPrinter.pretty behind the scenes.
IPython.lib.pretty.pretty function returns the string representation of an object:
>>> from IPython.lib.pretty import pretty
>>> pretty(type([]))
IPython.lib.pretty.pprint function prints the representation of an object:
>>> from IPython.lib.pretty import pprint
>>> pprint(type([]))
IPython uses its own pretty printer because the standard Python pprint module "does not allow developers to provide their own pretty print callbacks."

Decode bracket-encoded dictionary in query string [closed]

I have the following query string '?attr[foo]=1&attr[bar]=2'
Is there an easy way to decode that into a dictionary:'{attr: {'foo':1,'bar':2}}' using the existing libraries?
I know I can easily implement a decoder for this, just asking if there is a decoder that does this.
import re
import urlparse
s = '?attr[foo]=1&attr[bar]=2'
def parse(s):
d = {}
for key, value in urlparse.parse_qs(s).items():
match = re.match(r'(?P<key>[^\[]+)\[(?P<value>[^\]]+)\]', key)
gd = match.groupdict()
d.setdefault(gd['key'], {})[gd['value']] = value[0]
return d
{'attr': {'foo': '1', 'bar': '2'}}
Here is an alternative approach which answers your question in comments.
Is there a standard way to encode a dictionary/hash on a URL?
You can use JSON, use json.dumps() and urllib.parse.quote() for the encoding:
>>> from urllib.parse import quote, unquote
>>> quote(json.dumps({'attr': {'foo': 1, 'bar': 2}}))
And urllib.parse.unquote() and json.loads() for the decoding:
>>> json.loads(unquote('%7B%22attr%22%3A%20%7B%22foo%22%3A%201%2C%20%22bar%22%3A%202%7D%7D'))
{'attr': {'foo': 1, 'bar': 2}}
Note that on Python 2.x the urllib.parse module does not exist, use urllib2.quote() and urllib2.unquote() instead.
The URL becomes a bit ugly because of the escaping but the encoding and decoding is trivial, so this is a good approach as long as you don't need people to be able to easily type or understand the encoded URL.