Is there any way using urlib
, urllib2
or BeautifulSoup
to extract HTML tag attributes?
for example:
<a href="xyz" title="xyz">xyz</a>
gets href=xyz, title=xyz
There is another thread talking about using regular expressions
Thanks
Is there any way using urlib
, urllib2
or BeautifulSoup
to extract HTML tag attributes?
for example:
<a href="xyz" title="xyz">xyz</a>
gets href=xyz, title=xyz
There is another thread talking about using regular expressions
Thanks
You could use BeautifulSoup to parse the HTML, and for each
<a>
tag, usetag.attrs
to read the attributes:why don't you try with the HTMLParser module?
Something like this: