NamedTemporaryFile speed underwhelming

2019-09-06 06:25发布

Im trying to use a NamedTemporaryFile and pass this object to an external program to use, before collecting the output using Popen. My hope was that this would be quicker than creating a real file on the hard-disk and avoid as much IO as possible. This size of the temp files I am creating are small, on the order of a KB or so, and I am finding that creating a temp file to work with is actually slower than using a normal file for reading/writing. Is there a trick I am missing here? What is going on behind the scenes when I use a NamedTemporaryFile?

# Using named temp file
with tempfile.NamedTemporaryFile(delete=False) as temp:  # delete=False to keep a reference to the file for process calls
    for idx, item in enumerate(r):
        temp.write(">{}\n{}\n".format(idx, item[1]))
>>> 8.435 ms

# Using normal file io
with open("test.fa", "w") as temp:
    for idx, item in enumerate(r):
        temp.write(">{}\n{}\n".format(idx, item[1]))
>>> 0.506 ms

#--------

# Read using temp file
[i for i in open(name, "r")]
>>> 1.167 ms

[i for i in open("test.fa", "r")]
>>> 0.765 ms

Doing a bit of profiling it seems almost the entire time is spent creating the temp object. Using tempfile.NamedTemporaryFile(delete=False) takes over 8 ms in this example

1条回答
看我几分像从前
2楼-- · 2019-09-06 06:50

I will try to answer your question although I am not very experienced with Python runtime efficiency.

Drilling in the code of Python's tempfile.py you can find a clue about what might take some time. The _mkstemp_inner function might open a few files and raise an exception for each one. The more temp files your directory contains, the more file name collisions you get, the longer this takes. Try to empty your temp directory.

def _mkstemp_inner(dir, pre, suf, flags):
    """Code common to mkstemp, TemporaryFile, and NamedTemporaryFile."""

    names = _get_candidate_names()

    for seq in range(TMP_MAX):
        name = next(names)
        file = _os.path.join(dir, pre + name + suf)
        try:
            fd = _os.open(file, flags, 0o600)
            _set_cloexec(fd)
            return (fd, _os.path.abspath(file))
        except OSError as e:
            if e.errno == _errno.EEXIST:
                continue # try again
            raise

    raise IOError(_errno.EEXIST, "No usable temporary file name found")

Hope that helped.

查看更多
登录 后发表回答