I have been googling for a while and couldn't figure out a way to do this. I have a simple Flask app which takes a CSV file, reads it into a Pandas dataframe, converts it and output as a new CSV file. I have managed to upload and convert it successfully with HTML
<div class="container">
<form method="POST" action="/convert" enctype="multipart/form-data">
<div class="form-group">
<br />
<input type="file" name="file">
<input type="submit" name="upload"/>
</div>
</form>
</div>
where after I click submit, it runs the conversion in the background for a while and automatically triggers a download once it's done. The code that takes the result_df and triggers download looks like
@app.route('/convert', methods=["POST"])
def convert(
if request.method == 'POST':
# Read uploaded file to df
input_csv_f = request.files['file']
input_df = pd.read_csv(input_csv_f)
# TODO: Add progress bar for pd_convert
result_df = pd_convert(input_df)
if result_df is not None:
resp = make_response(result_df.to_csv())
resp.headers["Content-Disposition"] = "attachment; filename=export.csv"
resp.headers["Content-Type"] = "text/csv"
return resp
I'd like to add a progress bar to pd_convert
which is essentially a pandas apply operation. I found that tqdm
works with pandas now and it has a progress_apply
method instead of apply
. But I'm not sure if it is relevant for making a progress bar on a web page. I guess it should be since it works on Jupyter notebooks. How do I add a progress bar for pd_convert()
here?
The ultimate result I want is:
- User clicks upload, select the CSV file from their filesystem
- User clicks submit
- The progress bar starts to run
- Once the progress bar reaches 100%, a download is triggered
1 and 2 are done now. Then the next question is how to trigger the download. For now, my convert
function triggers the download with no problem because the response is formed with a file. If I want to render the page I form a response with return render_template(...)
. Since I can only have one response, is it possible to have 3 and 4 with only one call to /convert
?
Not a web developer, still learning about the basics. Thanks in advance!
====EDIT====
I tried the example here with some modifications. I get the progress from the row index in a for loop on the dataframe and put it in Redis. The client gets the progress from Redis from the stream by asking this new endpoint /progress
. Something like
@app.route('/progress')
def progress():
"""Get percentage progress for the dataframe process"""
r = redis.StrictRedis(
host=redis_host, port=redis_port, password=redis_password, decode_responses=True)
r.set("progress", str(0))
# TODO: Problem, 2nd submit doesn't clear progress to 0%. How to make independent progress for each client and clear to 0% on each submit
def get_progress():
p = int(r.get("progress"))
while p <= 100:
p = int(r.get("progress"))
p_msg = "data:" + str(p) + "\n\n"
yield p_msg
logging.info(p_msg)
if p == 100:
r.set("progress", str(0))
time.sleep(1)
return Response(get_progress(), mimetype='text/event-stream')
It is currently working but with some issues. The reason is definitely my lack of understanding in this solution.
Issues:
- I need the progress to be reset to 0 every time
submit
button is pressed. I tried several places to reset it to 0 but haven't found the working version yet. It's definitely related to my lack of understanding in how stream works. Now it only resets when I refresh the page. - How to handle concurrent requests aka the Redis race condition? If multiple users make requests at the same time, the progress should be independent for each of them. I'm thinking about giving a random
job_id
for eachsubmit
event and make it the key in Redis. Since I don't need the entry after each job is done, I will just delete the entry after it's done.
I feel my missing part is the understanding of text/event-stream
. Feeling I'm close to a working solution. Please share your opinion on what is the "proper" way to do this. I'm just guessing and trying to put together something that works with my very limited understanding.
OK, I narrowed down the problems I was missing and figured it out. The concepts I needed include
Backend
/progress
for an event stream (HTML5)text/event-stream
MIME type responseFrontend
The sample HTML
Sample backend Flask code
The rest is the code for Pandas for loop writing to Redis.
I pieced together a lot of the results from hours of Googling so I feel it's best to document here for people who also need this basic feature: add a progress bar in a Flask web app for Pandas dataframe processing.
Some useful references
• https://medium.com/code-zen/python-generator-and-html-server-sent-events-3cdf14140e56
• https://codeburst.io/polling-vs-sse-vs-websocket-how-to-choose-the-right-one-1859e4e13bd9
• What are Long-Polling, Websockets, Server-Sent Events (SSE) and Comet?