How to detect memory leak in python code?

2020-06-23 08:10发布

问题:

I'm new to machine learning and python both! I want my code to predict the object which is mostly the car in my case. When I start the script it runs smoothly but after 20 or so pictures it hangs up my system because of the memory leak. I want this script to run to my whole database which is way much more than 20 pictures.

I have tried pympler tracker to track which objects are taking the most memory -

This is the code I'm trying to run to predict the objects in the picture:

from imageai.Prediction import ImagePrediction
import os
import urllib.request
import mysql.connector
from pympler.tracker import SummaryTracker
tracker = SummaryTracker()

mydb = mysql.connector.connect(
  host="localhost",
  user="phpmyadmin",
  passwd="anshu",
  database="python_test"
)
counter = 0
mycursor = mydb.cursor()

sql = "SELECT id, image_url FROM `used_cars` " \
      "WHERE is_processed = '0' AND image_url IS NOT NULL LIMIT 1"
mycursor.execute(sql)
result = mycursor.fetchall()



def dl_img(url, filepath, filename):
    fullpath = filepath + filename
    urllib.request.urlretrieve(url,fullpath)

for eachfile in result:
    id = eachfile[0]
    print(id)
    filename = "image.jpg"
    url = eachfile[1]
    filepath = "/home/priyanshu/PycharmProjects/untitled/images/"
    print(filename)
    print(url)
    print(filepath)
    dl_img(url, filepath, filename)

    execution_path = "/home/priyanshu/PycharmProjects/untitled/images/"

    prediction = ImagePrediction()
    prediction.setModelTypeAsResNet()
    prediction.setModelPath( os.path.join(execution_path,                 "/home/priyanshu/Downloads/resnet50_weights_tf_dim_ordering_tf_kernels.h    5"))
    prediction.loadModel()

    predictions, probabilities =         prediction.predictImage(os.path.join(execution_path, "image.jpg"), result_count=1)
    for eachPrediction, eachProbability in zip(predictions, probabilities):
        per = 0.00
        label = ""
        print(eachPrediction, " : ", eachProbability)
        label = eachPrediction
        per = eachProbability

    print("Label: " + label)
    print("Per:" + str(per))
    counter = counter + 1
    print("Picture Number: " + str(counter))

    sql1 = "UPDATE used_cars SET is_processed = '1' WHERE id = '%s'" % id
    sql2 = "INSERT into label (used_car_image_id, object_label, percentage) " \
           "VALUE ('%s', '%s', '%s') " % (id, label, per)
    print("done")

    mycursor.execute(sql1)
    mycursor.execute(sql2)

    mydb.commit()
    tracker.print_diff()

This is the result I'm getting from a single picture and it is consuming whole RAM after some iterations. What change should I do to stop the leaking?

seat_belt  :  12.617655098438263
Label: seat_belt
Per:12.617655098438263
Picture Number: 1
done
types |    objects |   total size
<class 'tuple |      130920 |     11.98 MB
<class 'dict |       24002 |      6.82 MB
<class 'list |       56597 |      5.75 MB
<class 'int |      175920 |      4.70 MB
<class 'str |       26047 |      1.92 MB
<class 'set |         740 |    464.38 KB
<class 'tensorflow.python.framework.ops.Tensor |        6515 |    
356.29 KB
<class 'tensorflow.python.framework.ops.Operation._InputList |        
6097 |    333.43 KB
<class 'tensorflow.python.framework.ops.Operation |        6097 |    
333.43 KB
<class 'SwigPyObject |        6098 |    285.84 KB
<class 'tensorflow.python.pywrap_tensorflow_internal.TF_Output |        
4656 |    254.62 KB
<class 'tensorflow.python.framework.traceable_stack.TraceableObject |        3309 |    180.96 KB
<class 'tensorflow.python.framework.tensor_shape.Dimension |        
     1767 |     96.63 KB
<class 'tensorflow.python.framework.tensor_shape.TensorShapeV1 |        
1298 |     70.98 KB
<class 'weakref |         807 |     63.05 KB

回答1:

Have a look at this article: Tracing python memory leaks

Also, note that the garbage collection module actually can have debug flags set. Look at the set_debug function. Additionally, look at this code by Gnibbler for determining the types of objects that have been created after a call.



回答2:

In this case the model is loading every time in the for loop with image. The model should be outside the for loop, in that case the model won't start every time and won't take the memory which the program is taking. Code should work this way ->

execution_path = "/home/priyanshu/PycharmProjects/untitled/images/"

prediction = ImagePrediction()
prediction.setModelTypeAsResNet()
prediction.setModelPath( os.path.join(execution_path, "/home/priyanshu/Downloads/resnet50_weights_tf_dim_ordering_tf_kernels.h    5"))
prediction.loadModel()

for eachfile in result:
    id = eachfile[0]
    print(id)
    filename = "image.jpg"
url = eachfile[1]
filepath = "/home/priyanshu/PycharmProjects/untitled/images/"
print(filename)
print(url)
print(filepath)
dl_img(url, filepath, filename)

predictions, probabilities = prediction.predictImage(os.path.join(execution_path, "image.jpg"), result_count=1)
for eachPrediction, eachProbability in zip(predictions, probabilities):
    per = 0.00
    label = ""
    print(eachPrediction, " : ", eachProbability)
    label = eachPrediction
    per = eachProbability

    print("Label: " + label)
    print("Per:" + str(per))
    counter = counter + 1
    print("Picture Number: " + str(counter))

    sql1 = "UPDATE used_cars SET is_processed = '1' WHERE id = '%s'" % id
    sql2 = "INSERT into label (used_car_image_id, object_label, percentage) " \
       "VALUE ('%s', '%s', '%s') " % (id, label, per)
    print("done")

    mycursor.execute(sql1)
    mycursor.execute(sql2)

    mydb.commit()
    tracker.print_diff()