I have a Python project in which I am using many non-code files. Currently these are all images, but I might use other kinds of files in the future. What would be a good scheme for storing and referencing these files?
I considered just making a folder "resources" in the main directory, but there is a problem; Some images are used from within sub-packages of my project. Storing these images that way would lead to coupling, which is a disadvantage.
Also, I need a way to access these files which is independent on what my current directory is.
You may want to use pkg_resources
library that comes with setuptools
.
For example, I've made up a quick little package "proj"
to illustrate the resource organization scheme I'd use:
proj/setup.py
proj/proj/__init__.py
proj/proj/code.py
proj/proj/resources/__init__.py
proj/proj/resources/images/__init__.py
proj/proj/resources/images/pic1.png
proj/proj/resources/images/pic2.png
Notice how I keep all resources in a separate subpackage.
"code.py"
shows how pkg_resources
is used to refer to the resource objects:
from pkg_resources import resource_string, resource_listdir
# Itemize data files under proj/resources/images:
print resource_listdir('proj.resources.images', '')
# Get the data file bytes:
print resource_string('proj.resources.images', 'pic2.png').encode('base64')
If you run it, you get:
['__init__.py', '__init__.pyc', 'pic1.png', 'pic2.png']
iVBORw0KGgoAAAANSUhE ...
If you need to treat a resource as a fileobject, use resource_stream()
.
The code accessing the resources may be anywhere within the subpackage structure of your project, it just needs to refer to subpackage containing the images by full name: proj.resources.images
, in this case.
Here's "setup.py"
:
#!/usr/bin/env python
from setuptools import setup, find_packages
setup(name='proj',
packages=find_packages(),
package_data={'': ['*.png']})
Caveat:
To test things "locally", that is w/o installing the package first, you'll have to invoke your test scripts from directory that has setup.py
. If you're in the same directory as code.py
, Python won't know about proj
package. So things like proj.resources
won't resolve.
You can always have a separate "resources" folder in each subpackage which needs it, and use os.path
functions to get to these from the __file__
values of your subpackages. To illustrate what I mean, I created the following __init__.py
file in three locations:
c:\temp\topp (top-level package)
c:\temp\topp\sub1 (subpackage 1)
c:\temp\topp\sub2 (subpackage 2)
Here's the __init__.py
file:
import os.path
resource_path = os.path.join(os.path.split(__file__)[0], "resources")
print resource_path
In c:\temp\work, I create an app, topapp.py, as follows:
import topp
import topp.sub1
import topp.sub2
This respresents the application using the topp
package and subpackages. Then I run it:
C:\temp\work>topapp
Traceback (most recent call last):
File "C:\temp\work\topapp.py", line 1, in
import topp
ImportError: No module named topp
That's as expected. We set the PYTHONPATH to simulate having our package on the path:
C:\temp\work>set PYTHONPATH=c:\temp
C:\temp\work>topapp
c:\temp\topp\resources
c:\temp\topp\sub1\resources
c:\temp\topp\sub2\resources
As you can see, the resource paths resolved correctly to the location of the actual (sub)packages on the path.
Update: Here's the relevant py2exe documentation.
@ pycon2009, there was a presentation on distutils and setuptools. You can find all of the videos here
Eggs and Buildout Deployment in Python - Part 1
Eggs and Buildout Deployment in Python - Part 2
Eggs and Buildout Deployment in Python - Part 3
In these videos, they describe how to include static resources in your package. I believe its in part 2.
With setuptools, you can define dependancies, this would allow you to have 2 packages that use resources from 3rd package.
Setuptools also gives you a standard way of accessing these resources and allows you to use relative paths inside of your packages, which eliminates the need to worry about where your packages are installed.