How to read a (static) file from inside a Python p

2019-01-08 08:30发布

问题:

Could you tell me how can I read a file that is inside my Python package?

My situation

A package that I load has a number of templates (text files used as strings) that I want to load from within the program. But how do I specify the path to such file?

Imagine I want to read a file from:

package\templates\temp_file

Some kind of path manipulation? Package base path tracking?

回答1:

[added 2016-06-15: apparently this doesn't work in all situations. please refer to the other answers]


import os, mypackage
template = os.path.join(mypackage.__path__[0], 'templates', 'temp_file')


回答2:

Assuming your template is located inside your module's package at this path:

<your_package>/templates/temp_file

the correct way to read your template is to use pkg_resources package from setuptools distribution:

import pkg_resources

resource_package = __name__  # Could be any module/package name
resource_path = '/'.join(('templates', 'temp_file'))  # Do not use os.path.join()
template = pkg_resources.resource_string(resource_package, resource_path)
# or for a file-like stream:
template = pkg_resources.resource_stream(resource_package, resource_path)

Tip:
This will read data even if your distribution is zipped, so you may set zip_safe=True in your setup.py, and/or use the long-awaited zipapp packer from python-3.5 to create self-contained distributions.

According to the Setuptools/pkg_resources docs, do not use os.path.join:

Basic Resource Access

Note that resource names must be /-separated paths and cannot be absolute (i.e. no leading /) or contain relative names like "..". Do not use os.path routines to manipulate resource paths, as they are not filesystem paths.



回答3:

In case you have this structure

lidtk
├── bin
│   └── lidtk
├── lidtk
│   ├── analysis
│   │   ├── char_distribution.py
│   │   └── create_cm.py
│   ├── classifiers
│   │   ├── char_dist_metric_train_test.py
│   │   ├── char_features.py
│   │   ├── cld2
│   │   │   ├── cld2_preds.txt
│   │   │   └── cld2wili.py
│   │   ├── get_cld2.py
│   │   ├── text_cat
│   │   │   ├── __init__.py
│   │   │   ├── REAMDE.md   <---------- say you want to get this
│   │   │   └── textcat_ngram.py
│   │   └── tfidf_features.py
│   ├── data
│   │   ├── __init__.py
│   │   ├── create_ml_dataset.py
│   │   ├── download_documents.py
│   │   ├── language_utils.py
│   │   ├── pickle_to_txt.py
│   │   └── wili.py
│   ├── __init__.py
│   ├── get_predictions.py
│   ├── languages.csv
│   └── utils.py
├── README.md
├── setup.cfg
└── setup.py

you need this code:

import pkg_resources

# __name__ in case you're within the package
# - otherwise it would be 'lidtk' in this example as it is the package name
path = 'classifiers/text_cat/REAMDE.md'  # always use slash
filepath = pkg_resources.resource_filename(__name__, path)

I'm not too sure about the "always use slash" part. It might come from setuptools

Also notice that if you use paths, you must use a forward slash (/) as the path separator, even if you are on Windows. Setuptools automatically converts slashes to appropriate platform-specific separators at build time

In case you wonder where the documentation is:

  • PEP 0365
  • https://packaging.python.org/guides/single-sourcing-package-version/


回答4:

The content in "10.8. Reading Datafiles Within a Package" of Python Cookbook, Third Edition by David Beazley and Brian K. Jones giving the answers.

I'll just get it to here:

Suppose you have a package with files organized as follows:

mypackage/
    __init__.py
    somedata.dat
    spam.py

Now suppose the file spam.py wants to read the contents of the file somedata.dat. To do it, use the following code:

import pkgutil
data = pkgutil.get_data(__package__, 'somedata.dat')

The resulting variable data will be a byte string containing the raw contents of the file.

The first argument to get_data() is a string containing the package name. You can either supply it directly or use a special variable, such as __package__. The second argument is the relative name of the file within the package. If necessary, you can navigate into different directories using standard Unix filename conventions as long as the final directory is still located within the package.

In this way, the package can installed as directory, .zip or .egg.



回答5:

Every python module in your package has a __file__ attribute

You can use it as:

import os 
from mypackage

templates_dir = os.path.join(os.path.dirname(mypackage.__file__), 'templates')
template_file = os.path.join(templates_dir, 'template.txt')

For egg resources see: http://peak.telecommunity.com/DevCenter/PythonEggs#accessing-package-resources



回答6:

assuming you are using an egg file; not extracted:

I "solved" this in a recent project, by using a postinstall script, that extracts my templates from the egg (zip file) to the proper directory in the filesystem. It was the quickest, most reliable solution I found, since working with __path__[0] can go wrong sometimes (i don't recall the name, but i cam across at least one library, that added something in front of that list!).

Also egg files are usually extracted on the fly to a temporary location called the "egg cache". You can change that location using an environment variable, either before starting your script or even later, eg.

os.environ['PYTHON_EGG_CACHE'] = path

However there is pkg_resources that might do the job properly.



回答7:

See

Finding a file in a Python module distribution



回答8:

You should be able to import portions of your package's name space with something like:

from my_package import my_stuff

... you should not need to specify anything that looks like a filename if this is a properly constructed Python package (that's normally abstracted away).