First of, I am new to Python. It's irrelevant to the question, but I have to mention it.
I am creating an crawler as my first project, to understand how things work in Python, but so far this is my major issue... Understanding "how to get multiple progress bars" in Terminal while using requests
and pathos.multiprocessing
.
I managed to go through everything, I just want to have prettier output, so I decide to add progressbars. I am using tqdm
as I like the looks and it seems easiest to implement.
Here's my method which purpose is to download the file.
def download_lesson(self, lesson_data):
if not 'file' in lesson_data:
return print('=> Skipping... File {file_name} already exists.'.format(file_name=lesson_data['title']))
response = requests.get(lesson_data['video_source'], stream=True)
chunk_size = 1024
with open(lesson_data['file'], 'wb') as file:
progress = tqdm(
total=int(response.headers['Content-Length']),
unit='B',
unit_scale=True
)
for chunk in response.iter_content(chunk_size=chunk_size):
if chunk:
progress.update(len(chunk))
file.write(chunk)
progress.close()
print('=> Success... File "{file_name}" has been downloaded.'.format(file_name=lesson_data['title']))
I run that method through Processing
:
# c = instance of my crawling class
# cs = returns the `lesson_data` for `download_lesson` method
p = Pool(1)
p.map(c.download_lesson, cs)
So everything works great, as I am using processes=1
in the Pool
. But when I run multiple processes, let's say processes=3
then things start to get weird and I get multiple progresses one inside of another.
I've found in tqdm documentation that there is parameter for position
. Which clearly states the purpose of what I do need in this case.
position : int, optional Specify the line offset to print this bar (starting from 0) Automatic if unspecified. Useful to manage multiple bars at once (eg, from threads).
However, I have no clues how to set that position. I tried some weird stuff, such as adding an variable that's suppoused to increment itself by one, but whenever the method download_lesson
is being ran, it doesn't seem to do any incrementing. Always 0
so position is always 0
.
So seems like I don't understand much in this case... Any tips, hints or complete solutions are welcome. Thank you!
UPDATE #1:
I found out that I can pass another argument to the map as well, so I am passing amount of processes that were being set. (e.g. processes=2)
p = Pool(config['threads'])
p.map(c.download_lesson, cs, range(config['threads']))
So, in my method I tried to print out that argument and indeed I do get 0
and 1
, as I am running 2
processes in the example.
But this does not seem to do anything at all...
progress = tqdm(
total=int(response.headers['Content-Length']),
unit='B',
unit_scale=True,
position=progress_position
)
I still get the same issue of overlapping progress bars. When I manually set position to (e.g. 10) it jumps in Terminal so position does move, still with overlapping ofc because now both are set to 10. But when set dynamically, it does not seem to work either. I don't understand what's my issue here... It's like when map run this method two times, it still gives the latest set position to both progress bars. What the heck am I doing wrong?