What is the difference among idioms for serving a

2020-06-25 04:16发布

问题:

A Web search turns up several simple (undocumented) examples of (and good answers here about) how to dynamically serve Matplotlib figures using Flask; but there are features of these, and differences among them that puzzle me.

Some use low level IO and return tuples

io = StringIO.StringIO()
plt.savefig(io, format='png')
io.seek(0)
data = io.read()
return data, 200, {'Content-type': 'image/png'}

while several others use different IO APIs and return a Response

io = StringIO.StringIO()
canvas = FigureCanvas(fig)
canvas.print_png(io)
response = make_response(io.getvalue())
response.mimetype = 'image/png' # or response.headers['Content-Type'] = 'image/png'
return response

and yet others take a different approach to encoding and building the return value

io = StringIO.StringIO()
fig.savefig(io, format='png')
data = io.getvalue().encode('base64')
return html.format(data)

All of these seem to work; but I wonder if there are features of the approaches they share, or differences among them that have non-obvious consequences (e.g. for performance, or applicability to different scenarios).

First,

  • what is the role played by StringIO; is it the only way to prepare to serve an image (of any kind)?

In my sheltered Python life I've never seen it used before, and am unclear why it seems to be a required part of the process of server a (binary?) file.

Second, I wonder about the different approaches these examples take to packaging their response; specifically

  • is there any significance to the use of seek plus read, vs. getvalue, or do these do essentially the same thing;
  • what governs the choice among approaches for what is returned: a tuple vs. html.format vs. a Response (with make_response); and, finally
  • why do some approaches set the Content-type explicitly, while others set the encoding (to 'base64')?

Is any one of these approaches considered the "best" or most current idiomatic (or at least Pythonic) approach?

回答1:

what is the role played by StringIO; is it the only way to prepare to serve an image (of any kind)?

First of all, no, it is not the only way. The "classical" way would be to involve the file system:

  1. Let matplotlib create a plot.
  2. Persistently save the corresponding image data to a file in the file system (that involves context switches to the kernel which invokes system calls like write()).
  3. Read the contents of this file again (which lets the kernel read out the file system for you, via read()).
  4. Serve the contents to the client, in an HTTP response with well-defined data encoding as well as properly set headers.

Steps (3) and (4) involve file system interaction. That is, the kernel actually talks to hardware components. This takes time (with classical hrad drives, writing just a few bytes to the disc might take a couple of milliseconds, as of the long access times). Now, the question is: do you need to have the image data persisted to disk? If the answer is "no", then you can skip the entire interaction with the file system and save some time, by keeping the image data within the memory of your web application process. That is what StringIO is good for:

StringIO is a very generic tool in Python that provides file-like objects, whereas the actual data is never delegated to the kernel for writing it to the file system or reading it from the file system. It is kept in memory. That is why StringIO objects are also called in-memory files.

The point is that plt.savefig() wants to have an object as first argument that looks like an object that actually represents a real file in the file system. StringIO provides such an object, but -- under the hood -- writes data to a buffer in the heap of the current process, and reads it from there again if requested.

Reading/writing small portions of data via StringIO takes nanoseconds or microseconds, whereas the interaction with the file system usually is orders of magnitudes slower.

Now, don't get me wrong: usually, the file system is fast enough, and an operating system has its own techniques to make file system interaction as fast as possible. The real question is, as stated before: do you need the image data persisted? If you don't care about accessing this image data at some point later on, then do not involve the file system. This is what the creators of the three snippets you show decided.

Replacing real file system interaction with StringIO for performance reasons might be a very very valid decision. However, in your web application there surely are other bottlenecks. For instance, using StringIO may reduce the request-response latency by let's say 5 ms. But does this actually matter considering network latencies of 100 ms? Also, remember that a serious web application should better not be bothered with sending large file contents -- these are better served with a well-established web server which can also make use of the sendfile() system call. In this case, it might again be better performance-wise to let matplotlib write the file to the file system and then tell your web server (via an X-Sendfile header) to do the rest. So, performance is a complicated topic might not be the strongest argument. But only you know your requirements!

is there any significance to the use of seek plus read, vs. getvalue, or do these do essentially the same thing

Essentially the same thing. Does not make a conceptual difference, does not make a (significant) performance difference.

what governs the choice among approaches for what is returned: a tuple vs. html.format vs. a Response (with make_response); and, finally

No definite answer. There are many ways to get data to the client. There is no "correct" approach, just better or worse. Which approach to take best strongly depends on the web framework. With Flask, make_response() is the canonical way for creating a response object. html.format() might have some advantages I am not aware of -- you need to read about this yourself! But, read on, I think there is a method built into Flask which perfectly fits your scenario.

why do some approaches set the Content-type explicitly, while others set the encoding (to 'base64')?

There are proper and improper ways to send files to browsers via HTTP. Generally, an HTTP response should contain certain headers (also see What HTTP response headers are required). Just for your understanding, you might want to read about these details. Surely, binary data needs to be encoded with an encoding the client understands, and the encoding must be clarified in the response header. Also, a proper HTTP response should contain a MIME type (content type). The methods you have presented seem to not really take control of one or the other (no offense, quick & dirty examples often focus more on one thing than on the other).

I think you really should use Flask's send_file method which takes care of some important things for you. There are a couple of arguments to this method. I would explicitly define the MIME type via mimetype. The first argument can be a file-like object, so a StringIO object works fine. However, in this case you need to do seek(0) before:

Make sure that the file pointer is positioned at the start of data to send before calling send_file().

The following two approaches are semantically elegant (in my opinion) and should take proper care of encoding the file contents and setting HTTP response headers:

from flask import send_file 

1)

f = StringIO.StringIO()
plt.savefig(f, format='png', dpi=300)
f.seek(0)
send_file(f, mimetype='image/png')

2)

plt.savefig('image.png', dpi=300)
send_file('image.png', mimetype='image/png')

In the second case your webserver (e.g. nginx) can, if properly configured, transmit the file for you.