I am trying to use python3 to return the bibtex citation generated by http://www.doi2bib.org/. The url's are predictable so the script can work out the url without having to interact with the web page. I have tried using selenium, bs4, etc but cant get the text inside the box.
url = "http://www.doi2bib.org/#/doi/10.1007/s00425-007-0544-9"
import urllib.request
from bs4 import BeautifulSoup
text = BeautifulSoup(urllib.request.urlopen(url).read())
print(text)
Can anyone suggest a way of returning the bibtex citation as a string (or whatever) in python?
You don't need
BeautifulSoup
here. There is an additional XHR request sent to the server to fill out the bibtex citation, simulate it, for example, withrequests
:Prints:
You can also solve it with
selenium
. The key trick here is to use an Explicit Wait to wait for the citation to become visible:Prints the same as the above solution.