I have looked all around and only found solutions for python 2.6 and earlier, NOTHING on how to do this in python 3.X. (I only have access to Win7 box.)
I HAVE to be able to do this in 3.1 and preferably without external libraries. Currently, I have httplib2 installed and access to command-prompt curl (that's how I'm getting the source code for pages). Unfortunately, curl does not decode html entities, as far as I know, I couldn't find a command to decode it in the documentation.
YES, I've tried to get Beautiful Soup to work, MANY TIMES without success in 3.X. If you could provide EXPLICIT instructions on how to get it to work in python 3 in MS Windows environment, I would be very grateful.
So, to be clear, I need to turn strings like this: Suzy & John
into a string like this: "Suzy & John".
You can use
xml.sax.saxutils.unescape
for this purpose. This module is included in the Python standard library, and is portable between Python 2.x and Python 3.x.Python 3.x has html.entities too
Apparently I don't have a high enough reputation to do anything but post this. unutbu's answer does not unescape quotations. The only thing that I found that did was this function:
Which I got from this page.
In my case I have a html string escaped in as3 escape function. After a hour of googling haven't found anything useful so I wrote this recusrive function to serve for my needs. Here it is,
Edit-1 Added functionality to handle unicode characters.
You could use the function html.unescape:
In Python3.4+ (thanks to J.F. Sebastian for the update):
In Python3.3 or older:
In Python2:
I am not sure if this is a built in library or not but it looks like what you need and supports 3.1.
From: http://docs.python.org/3.1/library/xml.sax.utils.html?highlight=html%20unescape
xml.sax.saxutils.unescape(data, entities={}) Unescape '&', '<', and '>' in a string of data.