I'm trying to include data files in distutils for my package and then refer to them using relative paths (following http://docs.python.org/distutils/setupscript.html#distutils-additional-files)
My dir structure is:
myproject/
mycode.py
data/
file1.dat
the code in mycode.py
, which is actually a script in the package. It relies on accessing data/file1.dat
, refer to it using that relative path. In setup.py
, I have:
setup(
...
scripts = "myproject/mycode.py"
data_files = [('data', 'myproject/data/file1.dat')]
)
suppose the user now uses:
python setup.py --prefix=/home/user/
Then mycode.py
will appear in some place like /home/user/bin/
. But the reference to data/file1.dat
is now broken, since the script lives elsewhere from the data.
How can I find out, from mycode.py
, the absolute path to myproject/data/file1.dat
, so I can refer to it properly depending on where the user installed the package?
EDIT
When I install this with prefix=/home/user/
, I get data/file1.dat
created in /home/user/
which is exactly what I want, the only missing piece is how to retrieve the absolute path to this file programmatically, given only a relative path and not knowing where the user installed the package. When I try to use package_data
instead of data_files
, it does not work - I simply don't get data/file1.dat
created anywhere, even if I delete my MANIFEST
file.
I've read all the of the current discussions of this apparently very common problem. All the proposed solutions however are not dealing with the case I have a above, where the code that needs to access data_files
is a script and its location might change depending on the --prefix
argument to setup.py
. The only hack I can think of to resolve this is to add the data file to scripts=
in setup()
, as in:
setup(
...
scripts = ["myproject/mycode.py", "myproject/data/file1.data"]
)
this is a horrible hack but it is the only way I can think of to ensure that file1.data
will be in the same place as the scripts defined in scripts=
, since I cannot find any platform independent and installation sensitive API to recover the location of data_files
after the user ran setup.py install
(potentially with --prefix=
args).
I think the confusion arises from the usage of scripts. Scripts should refer to a runnable executable, perhaps a utility script related to your package or perhaps an entry point into functionality for your package. In either case, you should expect that any scripts will not be installed alongside the rest of your package. This expectation is due mainly to the convention that packages are considered libraries (and installed to lib directories) whereas scripts are considered executables (and installed to bin or Scripts directories). Furthermore, data files are neither executables nor libraries and are completely separate.
So from the script, you need to determine where the data files are located. According to the Python docs,
If directory is a relative path, it is interpreted relative to the
installation prefix.
Therefore, you should write something like the following in the mycode script to locate the data file:
import sys
import os
def my_func():
with open(os.path.join(sys.prefix, 'data', 'file1.dat')) as f:
print(next(f))
if __name__ == '__main__':
my_func()
If you're not pleased with the way that your code and data are not bundled together (and I would not be), then I would restructure your package so that you have an actual Python package (and module) and use packages= and package_data= to inject the data into the package, and then create a simple script that calls into the module in the package.
I did that by creating this tree:
.
│ setup.py
│
├───myproject
│ │ mycode.py
│ │ __init__.py
│ │
│ └───data
│ file1.dat
│
└───scripts
run-my-code.py
With setup.py:
from distutils.core import setup
setup(
name='myproject',
version='1.0',
scripts=['scripts/run-my-code.py'],
packages=['myproject'],
package_data = {
'myproject': ['data/file1.dat'],
},
)
run-my-code.py is simply:
from myproject import mycode
mycode.my_func()
__init__
is empty and mycode.py looks like:
import os
here = os.path.dirname(__file__)
def my_func():
with open(os.path.join(here, 'data', 'file1.dat')) as f:
print(next(f))
This latter approach keeps the data and code bundled together (in site-packages/myproject) and only installs the script in a different location (so it shows up in the $PATH).
You should be able to use pkg_resources.resource_filename to get the filename of a file in your data_files.
For a solution that'll work nicely inside/outside a virtualenv
on Windows/Linux import pip
and os
then run:
os.path.join(os.path.split(os.path.split(pip.__file__)[0])[0]
Full example
from setuptools import setup, find_packages
from os import path
from functools import partial
from pip import __file__ as pip_loc
if __name__ == '__main__':
package_name = 'gen'
templates_join = partial(path.join, path.dirname(__file__),
package_name, 'templates')
install_to = path.join(path.split(path.split(pip_loc)[0])[0],
package_name, 'templates')
setup(
name=package_name,
version='0.0.1',
test_suite=package_name + '.tests',
packages=find_packages(),
package_dir={package_name: package_name},
data_files=[(install_to, [templates_join('.gitignore'),
templates_join('logging.conf')])]
)
Reference (my own): https://stackoverflow.com/a/29120636