Load text file in python module after installation

2019-09-10 11:27发布

问题:

My goal is to make a program I've written easily accessible to potential employers/etc. in order to... showcase my skills.. or whatever. I am not a computer scientist, and I've never written a python module meant for installation before, so I'm new to this aspect.

I've written a machine learning algorithm, and fit parameters to data that I have locally. I would like to distribute the algorithm with "default" parameters, so that the downloader can use it "out of the box" for classification without having a training set. I've written methods which save the parameters to/load the parameters from text files, which I've confirmed work on my platform. I could simply ask users to download the files I've mentioned seperately and use the loadParameters method I've created to manually load the parameters, but I would like to make the installation process as easy as possible for people who may be evaluating me.

What I'm not sure is how to package the text files in such a way that they can automatically be loaded in the __init__ method of the object I have.

I have put the algorithm and files on github here, and written a setup.py script so that it can be downloaded from github using pip like this: pip install --upgrade https://github.com/NathanWycoff/SySE/tarball/master However, this doesn't seem to install the text files containing the data I need, only the __init__.py python file containing my code.

So I guess the question boils down to: How do I force pip to download additional files aside from just the module in __init__.py? Or, is there a better way to load default parameters?

回答1:

Yes, there is a better way, how you can distribute data files with python package.

First of all, read something about proper python package structure. For instance, it's not recommended to put a code into __init__ files. They're just marking that a directory is a python package, plus you can do some import statements there. So, it's better, if you put your SySE class to (for instance) file syse.py in that directory and in __init__.py you can from .syse import SySE.

To the data files. By default, setuptools will distribute only *.py and several other special files (README, LICENCE and so on). However, you can tell to setuptools that you want distribute some other files with the package. Use setup's kwarg package_data, more about that here. Also don't forget to include all you data file into MANIFEST.in, more on that here.

If you do all the above correctly, than you can use package pkg_resources to discover your data files on runtime. pkg_resources handles all possible situations - your package can be distributed in several ways, it can be installed from pip server, it can be installed from wheel, as egg,...more on that here.

Lastly, if you package is public, I can only recommend to upload it on pypi (in case it is not public, you can run your own pip server). Register there and upload your package. You could than do only pip install syse to install it from anywhere. It's quite likely the best way, how to distribute your package.

It's quite a lot work and reading but I'm pretty sure you will benefit from it.

Hope this help.