Im trying to use a NamedTemporaryFile and pass this object to an external program to use, before collecting the output using Popen
. My hope was that this would be quicker than creating a real file on the hard-disk and avoid as much IO as possible. This size of the temp files I am creating are small, on the order of a KB or so, and I am finding that creating a temp file to work with is actually slower than using a normal file for reading/writing. Is there a trick I am missing here? What is going on behind the scenes when I use a NamedTemporaryFile?
# Using named temp file
with tempfile.NamedTemporaryFile(delete=False) as temp: # delete=False to keep a reference to the file for process calls
for idx, item in enumerate(r):
temp.write(">{}\n{}\n".format(idx, item[1]))
>>> 8.435 ms
# Using normal file io
with open("test.fa", "w") as temp:
for idx, item in enumerate(r):
temp.write(">{}\n{}\n".format(idx, item[1]))
>>> 0.506 ms
#--------
# Read using temp file
[i for i in open(name, "r")]
>>> 1.167 ms
[i for i in open("test.fa", "r")]
>>> 0.765 ms
Doing a bit of profiling it seems almost the entire time is spent creating the temp object. Using tempfile.NamedTemporaryFile(delete=False)
takes over 8 ms in this example
I will try to answer your question although I am not very experienced with Python runtime efficiency.
Drilling in the code of Python's tempfile.py you can find a clue about what might take some time. The
_mkstemp_inner
function might open a few files and raise an exception for each one. The more temp files your directory contains, the more file name collisions you get, the longer this takes. Try to empty your temp directory.Hope that helped.