In this answer, I was told to not use BeautifulSoup(xmlData, 'html.parser')
for XML
parsing but to use BeautifulSoup(xmlData, 'xml')
. That parser, however, does not come with BeautifulSoup
.
As per one of the comments, I tried:
python -m pip install lxml
But got:
Collecting lxml
Using cached lxml-3.6.4.tar.gz
Installing collected packages: lxml
Running setup.py install for lxml ... error
Complete output from command D:\SOFT\Python3\python.exe -u -c "import setuptools, tokenize;__file__='C:\\U
sers\\myuser\\AppData\\Local\\Temp\\pip-build-hl9fxzny\\lxml\\setup.py';f=getattr(tokenize, 'open', open)(__fi
le__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:
\Users\myuser\AppData\Local\Temp\pip-ivemv19a-record\install-record.txt --single-version-externally-managed --
compile:
Building lxml version 3.6.4.
Building without Cython.
ERROR: b"'xslt-config' is not recognized as an internal or external command,\r\noperable program or batch
file.\r\n"
** make sure the development packages of libxml2 and libxslt are installed **
Using build configuration of libxslt
running install
running build
running build_py
creating build
creating build\lib.win32-3.5
creating build\lib.win32-3.5\lxml
copying src\lxml\builder.py -> build\lib.win32-3.5\lxml
copying src\lxml\cssselect.py -> build\lib.win32-3.5\lxml
copying src\lxml\doctestcompare.py -> build\lib.win32-3.5\lxml
copying src\lxml\ElementInclude.py -> build\lib.win32-3.5\lxml
copying src\lxml\pyclasslookup.py -> build\lib.win32-3.5\lxml
copying src\lxml\sax.py -> build\lib.win32-3.5\lxml
copying src\lxml\usedoctest.py -> build\lib.win32-3.5\lxml
copying src\lxml\_elementpath.py -> build\lib.win32-3.5\lxml
copying src\lxml\__init__.py -> build\lib.win32-3.5\lxml
creating build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\__init__.py -> build\lib.win32-3.5\lxml\includes
creating build\lib.win32-3.5\lxml\html
copying src\lxml\html\builder.py -> build\lib.win32-3.5\lxml\html
copying src\lxml\html\clean.py -> build\lib.win32-3.5\lxml\html
copying src\lxml\html\defs.py -> build\lib.win32-3.5\lxml\html
copying src\lxml\html\diff.py -> build\lib.win32-3.5\lxml\html
copying src\lxml\html\ElementSoup.py -> build\lib.win32-3.5\lxml\html
copying src\lxml\html\formfill.py -> build\lib.win32-3.5\lxml\html
copying src\lxml\html\html5parser.py -> build\lib.win32-3.5\lxml\html
copying src\lxml\html\soupparser.py -> build\lib.win32-3.5\lxml\html
copying src\lxml\html\usedoctest.py -> build\lib.win32-3.5\lxml\html
copying src\lxml\html\_diffcommand.py -> build\lib.win32-3.5\lxml\html
copying src\lxml\html\_html5builder.py -> build\lib.win32-3.5\lxml\html
copying src\lxml\html\_setmixin.py -> build\lib.win32-3.5\lxml\html
copying src\lxml\html\__init__.py -> build\lib.win32-3.5\lxml\html
creating build\lib.win32-3.5\lxml\isoschematron
copying src\lxml\isoschematron\__init__.py -> build\lib.win32-3.5\lxml\isoschematron
copying src\lxml\lxml.etree.h -> build\lib.win32-3.5\lxml
copying src\lxml\lxml.etree_api.h -> build\lib.win32-3.5\lxml
copying src\lxml\includes\c14n.pxd -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\config.pxd -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\dtdvalid.pxd -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\etreepublic.pxd -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\htmlparser.pxd -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\relaxng.pxd -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\schematron.pxd -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\tree.pxd -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\uri.pxd -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\xinclude.pxd -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\xmlerror.pxd -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\xmlparser.pxd -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\xmlschema.pxd -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\xpath.pxd -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\xslt.pxd -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\etree_defs.h -> build\lib.win32-3.5\lxml\includes
copying src\lxml\includes\lxml-version.h -> build\lib.win32-3.5\lxml\includes
creating build\lib.win32-3.5\lxml\isoschematron\resources
creating build\lib.win32-3.5\lxml\isoschematron\resources\rng
copying src\lxml\isoschematron\resources\rng\iso-schematron.rng -> build\lib.win32-3.5\lxml\isoschematron\
resources\rng
creating build\lib.win32-3.5\lxml\isoschematron\resources\xsl
copying src\lxml\isoschematron\resources\xsl\RNG2Schtrn.xsl -> build\lib.win32-3.5\lxml\isoschematron\reso
urces\xsl
copying src\lxml\isoschematron\resources\xsl\XSD2Schtrn.xsl -> build\lib.win32-3.5\lxml\isoschematron\reso
urces\xsl
creating build\lib.win32-3.5\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_abstract_expand.xsl -> build\lib.win
32-3.5\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_dsdl_include.xsl -> build\lib.win32-
3.5\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schematron_message.xsl -> build\lib.
win32-3.5\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schematron_skeleton_for_xslt1.xsl ->
build\lib.win32-3.5\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_svrl_for_xslt1.xsl -> build\lib.win3
2-3.5\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\readme.txt -> build\lib.win32-3.5\lxml\i
soschematron\resources\xsl\iso-schematron-xslt1
running build_ext
building 'lxml.etree' extension
error: Unable to find vcvarsall.bat
----------------------------------------
Command "D:\SOFT\Python3\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\myuser\\AppData\\L
ocal\\Temp\\pip-build-hl9fxzny\\lxml\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().repl
ace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\myuser\AppData\Lo
cal\Temp\pip-ivemv19a-record\install-record.txt --single-version-externally-managed --compile" failed with err
or code 1 in C:\Users\myuser\AppData\Local\Temp\pip-build-hl9fxzny\lxml\
I am using Python 3.5.2
and would like something that will work right out of pip
, meaning won't need to be compiled separately.
You would need a compiler on Windows to install lxml through pip.
Some unofficial builds are available here: http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml
Find URL for the right wheel package then this should work: