I have written a pretty simple code to get the first result for any term on urbandictionary.com. I started by writing a simple thing to see how their code is formatted.
def parseudtest(searchurl):
url = 'http://www.urbandictionary.com/define.php?term=%s' %searchurl
url_info = urllib.urlopen(url)
for lines in url_info:
print lines
For a test, I searched for 'cats', and used that as the variable searchurl
. The output I receive is of course a gigantic page, but here is the part I care about:
<meta content='He set us up the bomb. Also took all our base.' name='Description' />
<meta content='He set us up the bomb. Also took all our base.' property='og:description' />
<meta content='cats' property='og:title' />
<meta content="http://static3.urbandictionary.com/rel-1e0b481/images/og_image.png" property="og:image" />
<meta content='Urban Dictionary' property='og:site_name' />
As you can see, the first time the element "meta content" appears on the site, it is the first definition for the search term. So I wrote this code to retrieve it:
def parseud(searchurl):
url = 'http://www.urbandictionary.com/define.php?term=%s' %searchurl
url_info = urllib.urlopen(url)
if (url_info):
xmldoc = minidom.parse(url_info)
if (xmldoc):
definition = xmldoc.getElementsByTagName('meta content')[0].firstChild.data
print definition
For some reason the parsing doesn't seem to be working and invariably encounters an error every time. It is especially confusing since the site appears to use basically the same format as other sites I have successfully retrieved specific data from. If anyone could help me figure out what I am messing up here, it would be greatly appreciated.