Determining the location of distutils data files p

2019-03-15 15:27发布

问题:

I'm trying to include data files in distutils for my package and then refer to them using relative paths (following http://docs.python.org/distutils/setupscript.html#distutils-additional-files)

My dir structure is:

myproject/
  mycode.py
  data/
    file1.dat

the code in mycode.py, which is actually a script in the package. It relies on accessing data/file1.dat, refer to it using that relative path. In setup.py, I have:

setup(
 ...
 scripts = "myproject/mycode.py"
 data_files = [('data', 'myproject/data/file1.dat')]
)

suppose the user now uses:

python setup.py --prefix=/home/user/

Then mycode.py will appear in some place like /home/user/bin/. But the reference to data/file1.dat is now broken, since the script lives elsewhere from the data.

How can I find out, from mycode.py, the absolute path to myproject/data/file1.dat, so I can refer to it properly depending on where the user installed the package?

EDIT
When I install this with prefix=/home/user/, I get data/file1.dat created in /home/user/ which is exactly what I want, the only missing piece is how to retrieve the absolute path to this file programmatically, given only a relative path and not knowing where the user installed the package. When I try to use package_data instead of data_files, it does not work - I simply don't get data/file1.dat created anywhere, even if I delete my MANIFEST file.

I've read all the of the current discussions of this apparently very common problem. All the proposed solutions however are not dealing with the case I have a above, where the code that needs to access data_files is a script and its location might change depending on the --prefix argument to setup.py. The only hack I can think of to resolve this is to add the data file to scripts= in setup(), as in:

setup(
  ...
  scripts = ["myproject/mycode.py", "myproject/data/file1.data"]
)

this is a horrible hack but it is the only way I can think of to ensure that file1.data will be in the same place as the scripts defined in scripts=, since I cannot find any platform independent and installation sensitive API to recover the location of data_files after the user ran setup.py install (potentially with --prefix= args).

回答1:

I think the confusion arises from the usage of scripts. Scripts should refer to a runnable executable, perhaps a utility script related to your package or perhaps an entry point into functionality for your package. In either case, you should expect that any scripts will not be installed alongside the rest of your package. This expectation is due mainly to the convention that packages are considered libraries (and installed to lib directories) whereas scripts are considered executables (and installed to bin or Scripts directories). Furthermore, data files are neither executables nor libraries and are completely separate.

So from the script, you need to determine where the data files are located. According to the Python docs,

If directory is a relative path, it is interpreted relative to the installation prefix.

Therefore, you should write something like the following in the mycode script to locate the data file:

import sys
import os

def my_func():
    with open(os.path.join(sys.prefix, 'data', 'file1.dat')) as f:
        print(next(f))

if __name__ == '__main__':
    my_func()

If you're not pleased with the way that your code and data are not bundled together (and I would not be), then I would restructure your package so that you have an actual Python package (and module) and use packages= and package_data= to inject the data into the package, and then create a simple script that calls into the module in the package.

I did that by creating this tree:

.
│   setup.py
│
├───myproject
│   │   mycode.py
│   │   __init__.py
│   │
│   └───data
│           file1.dat
│
└───scripts
        run-my-code.py

With setup.py:

from distutils.core import setup

setup(
    name='myproject',
    version='1.0',
    scripts=['scripts/run-my-code.py'],
    packages=['myproject'],
    package_data = {
        'myproject': ['data/file1.dat'],
    },
)

run-my-code.py is simply:

from myproject import mycode

mycode.my_func()

__init__ is empty and mycode.py looks like:

import os

here = os.path.dirname(__file__)

def my_func():
    with open(os.path.join(here, 'data', 'file1.dat')) as f:
        print(next(f))

This latter approach keeps the data and code bundled together (in site-packages/myproject) and only installs the script in a different location (so it shows up in the $PATH).



回答2:

You should be able to use pkg_resources.resource_filename to get the filename of a file in your data_files.



回答3:

For a solution that'll work nicely inside/outside a virtualenv on Windows/Linux import pip and os then run:

os.path.join(os.path.split(os.path.split(pip.__file__)[0])[0]

Full example

from setuptools import setup, find_packages
from os import path
from functools import partial
from pip import __file__ as pip_loc


if __name__ == '__main__':
    package_name = 'gen'

    templates_join = partial(path.join, path.dirname(__file__),
                             package_name, 'templates')
    install_to = path.join(path.split(path.split(pip_loc)[0])[0],
                           package_name, 'templates')

    setup(
        name=package_name,
        version='0.0.1',
        test_suite=package_name + '.tests',
        packages=find_packages(),
        package_dir={package_name: package_name},
        data_files=[(install_to, [templates_join('.gitignore'),
                                  templates_join('logging.conf')])]
    )

Reference (my own): https://stackoverflow.com/a/29120636