I need python regex to extract url's from html, example html code :
<a href=""http://a0c5e.site.it/r"" target=_blank><font color=#808080>MailUp</font></a>
<a href=""http://www.site.it/prodottiLLPP.php?id=1"" class=""txtBlueGeorgia16"">Prodotti</a>
<a href=""http://www.site.it/terremoto.php"" target=""blank"" class=""txtGrigioScuroGeorgia12"">Terremoto</a>
<a class='mini' href='http://www.site.com/remove/professionisti.aspx?Id=65&Code=xhmyskwzse'>clicca qui.</a>`
I need extract only:
http://a0c5e.site.it/r
http://www.site.it/prodottiLLPP.php?id=1
http://www.site.it/terremoto.php
http://www.site.com/remove/professionisti.aspx?Id=65&Code=xhmyskwzse
Observe
Might want to add % so you can catch other escapes.
Regex might solve your problem, but consider using BeautifulSoup
From Jon Clements
On a different note, your href quotaion in your html snippet is incorrect.
You can use BeautifulSoup library to manipulate/extract information on HTML.
I don't recommend you to use regular expressions to parse HTML data. HTML is not regular, it's context-free grammar. When a link structure changes, HTML can be valid but your regex may not , and you will have to write the expression again. Using BeautifulSoup is a decent way to extract information.