I am trying to read in the following url using urllib2: http://frcwest.com/ and then search the data for the meta redirect.
It reads the following data in:
<!--?xml version="1.0" encoding="UTF-8"?--><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><title></title><meta content="0;url= Home.html" http-equiv="refresh"/></head><body></body></html>
Reading it into Beautifulsoup works fine. However for some reason none of the functionality works for this specific senarious, and I don't understand why. Beautifulsoup has worked great for me in all other scenarios. However, when simply trying:
soup.findAll('meta')
produces no results.
My eventual goal is to run:
soup.find("meta",attrs={"http-equiv":"refresh"})
But if:
soup.findAll('meta')
isn't even working then I'm stuck. Any incite into this mystery would be appreciated, thanks!
It's the comment and doctype that throws the parser here, and subsequently, BeautifulSoup.
Even the HTML tag seems 'gone':
Yet it is there in the
.contents
iterable still. You can find things again with:Demo: