If I do this:
import lxml
in python, lxml.html
is not imported. For instance, I cannot call the lxml.html.parse()
function. Why is this so?
If I do this:
import lxml
in python, lxml.html
is not imported. For instance, I cannot call the lxml.html.parse()
function. Why is this so?
lxml
is a package, not a module. A package is a collection of modules. As it happens, you can also import the package directly, but that doesn't automatically import all of its submodules.As to why this is, well, that's a question for the BDFL. I think it's probably because packages are generally quite large, and importing all the submodules would be an excessive performance penalty.
Importing a module or package in Python is a conceptually simple operation:
Find the .py file corresponding to the import. This involves the Python path and some other machinery, but will result in a specific .py file being found.
For every directory level in the import (
import foo.bar.baz
has two levels), find the corresponding__init__.py
file, and execute it. Executing it simply means running all the top-level statements in the file.Finally, the .py file itself (
foo/bar/baz.py
in this case) is executed, meaning all the top-level statements are executed. All the globals created as a result of that execution are bundled into a module object, and that module object is the result of the import.If none of those steps imported sub-packages, then those sub-packages aren't available. If they did import sub-packages, then they are available. Package authors can do as they wish.
It's by design. The package has the option to import the nested package in its
__init__.py
, then, you would be able to access the nested package without problems. It's a matter of choice for the package writer, and the intent is to minimize the amount of code that you probably won't use.lxml
is called a package in Python, which is a hierachical collections of modules. Packages can be huge, so they are allowed to be selective about what is pulled in when they are imported. Otherwise everybody would have to import the full hierarchy, which would be quite a waste of resources.This is to allow only the minimum amount of code to have to be loaded for multi-part libraries that you may not use the entirety of. For instance, you might not be using the
html
part oflxml
, and thus not want to have to deal with loading its code.