I need to get the text inside the two elements into a string:
source_code = """<span class="UserName"><a href="#">Martin Elias</a></span>"""
>>> text
'Martin Elias'
How could I achieve this?
I need to get the text inside the two elements into a string:
source_code = """<span class="UserName"><a href="#">Martin Elias</a></span>"""
>>> text
'Martin Elias'
How could I achieve this?
You can also try using html5lib and XPath, there is a good question about it here, that answer has an important detail (
namespaceHTMLElements
) to remember to make html5lib behave as expected. I wasted so much time trying to get it to work because I overlooked that I needed to change that.Install beautifulsoup and You can do like this:
I recommend using the Python Beautiful Soup 4 library.
It makes HTML parsing really easy.
I searched "python parse html" and this was the first result: https://docs.python.org/2/library/htmlparser.html
This code is taken from the python docs
Here is the result:
Using this and by looking at the code in HTMLParser I came up with this:
You can use it like this:
Now you should be able to extract your data from those lists easily. I hope this helped!